docssamtools-tutorial
Last edit August 08, 2024

Creating a Samtools Instance

This is a tutorial of how to create a Samtools instance on DiPhyx.

Samtools is a suite of programs for interacting with high-throughput sequencing data. It is widely used for a variety of tasks such as sorting, indexing, and manipulating SAM/BAM/CRAM files. Leveraging the power of DiPhyx, you can streamline your Samtools projects by utilizing its advanced computational capabilities and collaborative tools. This tutorial will guide you through setting up a Samtools project on DiPhyx, from initial setup to execution.

Prerequisites

Before you begin, make sure you have the following:

fa
ex1.fa3.23 kB
Download
gz
ex1.sam.gz115 kB
Download
sh
run_script.sh814 B
Download

We will explain later in this guide about how you should use these files.

Steps

Creating a Samtools Instance

1. Search for Samtools

Navigate to DiPhyx Dashboard and Click "Software Packages".

Find Samtools software package using the search field.

2. Click "View Details"

3. Select Compute Unit

On the Samtools software package page, select the desired compute unit from the "Select compute unit" option menu. This allows you to select the suitable compute unit from the list of available units in the READY state.

Tip: If you don't have any available compute unit or need to create a new one for this project instance, we recommend following our guide on How to Create a Compute Unit on DiPhyx for step-by-step instructions.

After selecting a proper compute unit, click "Create Instance". Click "Create Instance".

4. Fill in The Fields

On 'Create new project' dialog:

  • Ensure that the Project name is unique and not already in use within the same compute unit to avoid any conflicts.
  • The Compute unit Volume specifies the directory on the compute unit that is mounted into the Samtools instance. You don't need to change the default path.
  • The Project Volume refers to the project volume.
  • The Run script name is the name of the run-script in the working directory.

5. Upload Compute Unit Volume

You can use the resources provided in the Prerequisites section at the beginning of this tutorial or from this link for the purpose of testing.

Click "Upload file".

Click "Browse files".

Here's the list of files we want to upload.

Click "Upload".

Wait for a few seconds for the files to upload, then click "Done".

Make sure the uploaded files are in the right path.

When you're finished uploading the input files, click "Done".

6. Click "Create"

When you have finished filling out the form fields, click on the "Create" button to submit the request for creating the Samtools instance.

7. Wait and Click "View projects"

Wait for a few seconds while the project is being created.

After the project is created, click "View projects" to navigate to the projects list of the compute unit you have selected.

Starting The Samtools Instance

1. Navigate to the instance page

When your Samtools project is created, click on the "View projects" button.

Click the "Start" button on the instance entry of the list.

2. Confirm "Start"

Monitoring Logs & Viewing the Instance Output

1. Navigate to the instance page

Click "Instance".

2.1 View Logs

View and monitor the project's logs in the last section of the Samtools instance page.

2.2 View Software Output Files

To view the outputs of the software, click the "Browse" button under the "Volumes" section of the instance page.

The output files are written here.

3. Download any file

You can download any file you want. You only need to click "Download" from the more options menu of each file.

Click "Download".

Terminating the Samtools Instance

1. Navigate to the project page

For terminating the instance, navigate to the compute unit page.

2. Click "Terminate"

Under the Projects, click on more options button of the instance.

Click "Terminate".

3. Confirm the termination

Confirm by clicking on the "Terminate" button.

Note that once you terminate a project, the associated files are likely to get removed, depending on the path you have stored them. Only if you have selected a path starting with "/volume" as in this test case, you will keep having access to the files as long as the compute unit is running. When you terminate the compute unit, the data will be lost. To keep the output files of your projects from missing, you can make a back-up of them by taking at least one of these actions:

  1. Download the required file to your device,
  2. Transfer the files to your Bucket Storage, either AWS-S3 or GCP-Bucket.