1. Introduction to SeqWare
    1. What is SeqWare?
    2. Why use SeqWare?
    3. Why not use SeqWare?
    4. Who uses SeqWare?
    5. How to use SeqWare?
    6. Similar Projects
  2. Installation
  3. Getting Started
  4. SeqWare Pipeline
  5. SeqWare MetaDB
  6. SeqWare Portal
  7. SeqWare Web Service
  8. SeqWare Query Engine
  9. Glossary
  10. Frequently Asked Questions
  11. APIs
  12. Source Code
  13. Plugins
  14. Modules
  15. Advanced Topics

Introduction to SeqWare

What is SeqWare?

SeqWare is an open-source bioinformatics workflow deployment and management system. Its core features include:

  • A centralized metadata database that tracks samples, annotations and analyses, along with a web application to visualize it
  • A workflow bundle specification and execution engine that lets you package computational tools and use them to build and run complex analytical workflows
  • Support for running workflows irrespective of the underlying cluster environment
  • An advanced query engine that allows you to store and search variants and annotations produced in your workflows using a highly-scalable, distributed database backend

Why use SeqWare?

You would want to use SeqWare in order to build workflows and processes that automate large volumes of NGS analysis, that track analytical events in a database (provenance), and that link analysis to wet lab entities like samples and studies. You would especially want to use SeqWare if you needed to do the above on both a local and cloud-based environments or if you have a lot of different types of clusters to submit jobs to.

Why not use SeqWare?

You would not want to use SeqWare if you have a small number of NGS samples to analyze, if you need to interactively explore tools and settings for your project, or you want pre-built workflows that will analyze data “out-of-the-box”. If this is you, we would instead recommend you look for commercial solutions such as Nimbus Informatics, DNAnexus, or BaseSpace for pre-built workflows ready to run on the cloud or Galaxy for interactive analysis either locally or on the cloud. Remember, SeqWare is an infrastructure toolkit not an analysis pipeline for particular NGS experimental designs. You use SeqWare to build the workflows you need.

Who uses SeqWare?

The type of users targeted by the SeqWare project are ones that have massive amounts of NGS data to analyze (TBase to PBase), have specific/custom analytical workflows in mind, that their workflows are typically more complex than the generic workflows offered by other projects, want to automate and track analysis of their data, and need to be able to run on a local cluster or the cloud.

How to use SeqWare?

There are currently 3 different ways to work with SeqWare:

  • a local VM (that may or may not be connected to a local cluster)
  • a VM on Amazon’s cloud
  • or SeqWare hosted on the cloud by Nimbus Informatics, our commercial partner (a SeqWare Platform as a Service, PaaS)

Similar Projects

There are several workflow engines which can be used for NGS data analysis: Ergatis, Galaxy, Pegasus and Taverna. Each has their own strengths and weaknesses depending on how easy they are to work with vs. how much data they can process. Some of these provide ready-made workflows for analyzing NGS data while others require you to build your own. There are also commercial solutions available on the cloud that typically provide a one-size-fits all model in which they have a collection of standardized workflows that a user can choose from. Examples include BaseSpace by Illumina, DNAnexus and Samsung SDS Bioinformatics Service. These commercial services attempt to accomplish similar things to a SeqWare install but they offer the analysis itself as a service whereas SeqWare allows you to create local or cloud-based infrastructure similar to what powers these commercial offerings.