Constellation Classifier

Scorpio

Serious Constellations Of Reoccurring Phylogenetically-Independent Origin

A Python-based command-line tool for rapid classification of SARS-CoV-2 sequences using mutation constellations. Essential for detecting variants of concern (VOCs) and tracking key mutation patterns.

v0.3.19
Latest Release
39
GitHub Stars
GPL-3.0
License
Core Features

Four Primary Commands

Scorpio provides specialized commands for different analysis workflows.

classify

Evaluates sequences against lineage-defining mutation patterns (constellations) and reports matches.

$ scorpio classify -i sequences.fasta -o output.csv

haplotype

Generates haplotype representations as strings or tabular data for analysis.

$ scorpio haplotype -i sequences.fasta --output-format table

list

Outputs constellation metadata including lineage names and classification rules.

$ scorpio list -c constellations/

define

Extracts shared mutations from grouped sequences with optional outgroup comparison.

$ scorpio define -i sequences.fasta --groups metadata.csv
Getting Started

Installation

Scorpio can be installed via Bioconda (recommended) or from the GitHub repository. It works seamlessly with Pangolin for comprehensive lineage analysis.

1

Install via Bioconda

The easiest method with all dependencies managed.

2

Install Constellations

Download SARS-CoV-2 specific constellation definitions.

3

Run Classification

Classify your sequences against known variant patterns.

Terminal
# Install via conda (recommended)
$ conda install -c bioconda scorpio
# Or install from source
$ git clone https://github.com/cov-lineages/scorpio.git
$ cd scorpio
$ conda env create -f environment.yml
$ pip install .
# Classify sequences
$ scorpio classify -i sequences.fasta -o results.csv
# Example output
taxon,constellation,support,conflict
seq1,Omicron (BA.2-like),0.95,0.02
seq2,Delta (AY.4-like),0.98,0.01
Mutation Patterns

Understanding Constellations

Constellations are JSON-formatted files defining mutation patterns that characterize specific variants. Each constellation specifies sites to check and classification rules.

Sites

Mutation codes in format gene:[ref]position[alt] (e.g., s:N501Y)

Rules

Thresholds like minimum/maximum counts of reference, alternate, or ambiguous calls

Metadata

Name, description, WHO label, and citations for each constellation

constellation.json
{
  "name": "Omicron (BA.1-like)",
  "description": "Omicron BA.1 variant",
  "citation": "WHO designation",
  "sites": [
    "s:A67V",
    "s:H69-",
    "s:V70-",
    "s:T95I",
    "s:G142D",
    "s:N211-",
    "s:ins214EPE",
    "s:G339D",
    "s:S371L",
    "s:S373P",
    "s:S375F",
    "s:K417N",
    "s:N440K",
    "s:G446S",
    "s:S477N",
    "s:T478K",
    "s:E484A",
    "s:Q493R",
    "s:G496S",
    "s:Q498R",
    "s:N501Y",
    "s:Y505H"
  ],
  "rules": {
    "min_alt": 20,
    "max_ref": 2
  }
}

Get started with Scorpio

Install Scorpio and start classifying your sequences against known variant constellations.