The SeqWare Pipeline sub-project is really the heart of the overall SeqWare project. This provides the core functionality of SeqWare; it is workflow developer environment and a series of tools for installing, running, and monitoring workflows.
We currently support one workflow language (Java) and four workflow engines (oozie, oozie-sge, whitestar, and whitestar-sge).
(Previously, we also supported Pegasus/Condor/Globus as a workflow engine).
Our current recommended combination is Java workflows with the Oozie-sge engine.
We highly recommend you go through the User, Developer, and Admin tutorials since the documentation below assumes you already have.
SeqWare Pipeline has several key features that distinguish it from other open source and private workflow solutions. These include:
See About for more information.
Workflows define a series of steps and how they relate to each other. Typically, these encode a series of calls to command line tools that operate on files read from and written to a shared filesystem. Individual steps usually run on a randomly chosen cluster node.
Modules are really optional for those interested in workflow development since most workflows simply refer to command line tools bundled inside the workflow. For those interested in extending the underlying SeqWare system, Modules provide a way to define new step types and could be useful for writing custom steps that interact with databases, trigger analysis in other frameworks (Pig/Hive/MapReduce), make calls to web services, etc. We use Modules to provide core services in SeqWare (such as file provisioning and bash shell execution). Again, Modules are mainly targeted at core SeqWare developers not general workflow developers.
The Deciders framework allows for the automatic parameterization and calling of workflows in SeqWare Pipeline. It allows you to easily encode the parent workflow and file types that, when present, enable a subsequent workflow to be launched.
A major focus of the SeqWare Web Service is providing reporting resources. These are command line tools that are particularly useful for generating reports for SeqWare entities such as workflow runs and their outputs.
Other useful tools used for import, export, and annotation of results.
We have provided a new, simplified command line interface. The best way to learn its features is to simply add --help
.
$ seqware --help
Usage: seqware [<flag>]
seqware <command> [--help]
Commands:
annotate Add arbitrary key/value pairs to seqware objects
bundle Interact with a workflow bundle during development/admin
copy Copy files between local and remote file systems
create Create new seqware objects (e.g., study)
files Extract information about workflow output files
study Extract information about studies
workflow Interact with workflows
workflow-run Interact with workflow runs
checkdb Check the seqware database for convention errors
check Check the seqware environment for configuration issues
Flags:
--help Print help out
--version Print Seqware's version
$ seqware workflow --help
Usage: seqware workflow [--help]
seqware workflow <sub-command> [--help]
Description:
Interact with workflows.
Sub-commands:
ini Generate an ini file for a workflow
list List all installed workflows
report List the details of all runs of a given workflow
schedule Schedule a workflow to be run
Most commands will print the help if no arguments are provided.
The old command line still exists, and its documentation is auto-generated and covers the Plugins (which are utility tools used outside of workflows) and Modules (which model custom steps in workflows and know how to integrate with the SeqWare MetaDB for metadata writeback).