1. Introduction to SeqWare
  2. Installation
  3. Getting Started
    1. By the End of These Tutorials
    2. How to Install a Workflow
    3. How to Launch
    4. How to Monitor
    5. See Also
  4. SeqWare Pipeline
  5. SeqWare MetaDB
  6. SeqWare Portal
  7. SeqWare Web Service
  8. SeqWare Query Engine
  9. Glossary
  10. Frequently Asked Questions
  11. APIs
  12. Source Code
  13. Plugins
  14. Modules
  15. Advanced Topics

Admin Tutorial

Note:This guide assumes you have installed SeqWare already. If you have not, please install SeqWare by either downloading the VirtualBox VM or launching the AMI on the Amazon cloud. See Installation for directions. We also recommend you follow the User Tutorial and Developer Tutorial before this guide.

This guide is intended for a SeqWare administrator. Currently, it covers the tools required to install workflows, monitor workflows globally, and launch scheduled jobs. We also cover tools that are required for cancelling workflows that have started and restarting workflows.

By the End of These Tutorials

By the end of these tutorials you will:

  • install workflows
  • monitor workflows
  • see how to connect a local VM to a local cluster for running large-scale workflows
  • see how to launch a cluster on Amazon’s cloud for running large-scale workflows

How to Install a Workflow

When provided with a tested workflow bundle from a workflow developer, the next step is to install it, this means it will be inserted into the MetaDB via a running web service. During this process it will copy the bundle into your released-bundles directory and provision it into your provisioned-bundles directory. The provisioned bundles directory is where running workflows will access their files.

Here is an example showing how this process works on the VM and what is happening in the database and your released-bundles directory as you do this.

See the Developer Tutorial for how to make the zipped workflow bundle. After the zip bundle is created, the bundle can be provided to the admin for install as below.

$ seqware bundle install --zip ~/packaged-bundles/Workflow_Bundle_MyHelloWorld_1.0_SeqWare_1.1.0.zip 
Now transferring /home/seqware/packaged-bundles/Workflow_Bundle_MyHelloWorld_1.0_SeqWare_1.1.0.zip to the directory: /home/seqware/released-bundles Please be aware, this process can take hours if the bundle is many GB in size. Processing input: /home/seqware/packaged-bundles/Workflow_Bundle_MyHelloWorld_1.0_SeqWare_1.1.0.zip
  output-dir: /home/seqware/released-bundles

WORKFLOW_ACCESSION: 16
Bundle Has Been Installed to the MetaDB and Provisioned to /home/seqware/packaged-bundles/Workflow_Bundle_MyHelloWorld_1.0_SeqWare_1.1.0.zip!

What happens here is the Workflow_Bundle_MyHelloWorld_1.0_SeqWare_1.1.0.zip copied to your released-bundles directory and unzip’d into your provisioned-bundles directory. The metadata about the workflow is then saved to the database.

How to Launch

In our reference SeqWare environment, we typically schedule jobs and then launch them asynchronously via a cronjob.

A user will schedule workflow launches using a command similar to that below:

$ seqware workflow schedule --accession 1 --parent-accession 99  --ini workflow.ini --host `hostname --long` 

Then in a cronjob we use the following command to launch scheduled jobs:

$ seqware workflow-run launch-scheduled

Note that in the first command, we allow jobs to be scheduled on a specific host. When we launch scheduled workflows, we check this value in order to determine whether a particular scheduled workflow should be launched on this host. Note that while we normally use a fully qualified hostname, any unique string can be used to designate a host for launching (for example on Amazon S3).

How to Monitor

Since the engine that executes the workflow is separate from the SeqWare MetaDB, a separate process is used to propagate statuses between the workflow engine and MetaDB:

$ seqware workflow-run propagate-statuses

Once this is executed, workflow-run reports will reflect the updated status.

Cron Jobs

The SeqWare VM performs both of the above functions via a cronjob:

$ crontab -l
* * * * * /home/seqware/crons/status.cron >> /home/seqware/logs/status.log

$ cat /home/seqware/crons/status.cron

#!/bin/bash

source /home/seqware/.bash_profile

seqware workflow-run launch-scheduled
seqware workflow-run propagate-statuses --threads 10

This script runs every minute and uses the first command to launch workflows that have been previously scheduled while the second command is used to check the status of launched workflows.

For more information see the Monitor Configuration documentation.

See Also

Note: Before proceeding further, it is worth noting that the SeqWare MetaDB should be regularly backed-up. On our deployment, we have a cron script which calls the Files Report and pg_dump nightly to do back-up.

As an admin the next steps are to explore the various sub-project guides in this documentation. Also take a look at the guide for creating a SeqWare VM which provides low-level, technical details on how to install the components of the SeqWare software stack.