C

Cosmic Database Toolkit

Powerful skill for access, cosmic, cancer, mutation. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

COSMIC Database Toolkit

A scientific computing skill for querying COSMIC (Catalogue of Somatic Mutations in Cancer) — the world's largest database of somatic mutations in human cancer, maintained by the Wellcome Sanger Institute. COSMIC Database Toolkit helps you search for cancer-associated mutations, retrieve mutation frequency data, and analyze mutational signatures across cancer types.

When to Use This Skill

Choose COSMIC Database Toolkit when:

  • Searching for somatic mutations associated with specific cancer types
  • Looking up mutation frequencies for known cancer genes (TP53, KRAS, EGFR)
  • Retrieving cancer gene census data for oncogenes and tumor suppressors
  • Analyzing mutational patterns and signatures in specific tumor types

Consider alternatives when:

  • You need germline variant data (use ClinVar or gnomAD)
  • You need clinical trial data for cancer drugs (use ClinicalTrials.gov)
  • You need gene expression in cancer (use TCGA via GDC)
  • You need drug sensitivity data (use GDSC or DepMap)

Quick Start

claude "Find the most common TP53 mutations in lung cancer from COSMIC"
import requests # COSMIC API requires authentication # Register at cancer.sanger.ac.uk for API access headers = { "Authorization": "Bearer YOUR_API_TOKEN" } # Search for TP53 mutations in lung cancer url = "https://cancer.sanger.ac.uk/cosmic/api/v1/mutations" params = { "gene": "TP53", "tumour_site": "lung", "sort": "count:desc", "limit": 10 } response = requests.get(url, headers=headers, params=params) mutations = response.json() for mut in mutations: print(f"Mutation: {mut['mutation_cds']}") print(f" Protein: {mut['mutation_aa']}") print(f" Count: {mut['count']} samples") print(f" Type: {mut['mutation_description']}")

Core Concepts

COSMIC Data Categories

CategoryDescriptionExample
Cancer Gene CensusCurated list of cancer genesTP53, KRAS, BRCA1
Somatic MutationsPoint mutations, indelsTP53 R248W
Copy NumberAmplifications, deletionsERBB2 amplification
Gene FusionsTranslocation-derived fusionsBCR-ABL1
Mutational SignaturesPatterns of base changesSBS1 (clock-like), SBS4 (smoking)
Drug ResistanceMutations conferring resistanceEGFR T790M

Cancer Gene Census

# The Cancer Gene Census — curated cancer genes def get_cancer_gene_census(headers): """Retrieve the COSMIC Cancer Gene Census""" url = "https://cancer.sanger.ac.uk/cosmic/api/v1/cancer-gene-census" response = requests.get(url, headers=headers) census = response.json() oncogenes = [g for g in census if "oncogene" in g.get("role", "").lower()] tsgs = [g for g in census if "tsg" in g.get("role", "").lower()] print(f"Total cancer genes: {len(census)}") print(f"Oncogenes: {len(oncogenes)}") print(f"Tumor suppressors: {len(tsgs)}") return census # Lookup specific gene def gene_mutation_profile(gene, headers): """Get mutation profile for a cancer gene""" url = f"https://cancer.sanger.ac.uk/cosmic/api/v1/gene/{gene}" response = requests.get(url, headers=headers) return response.json()

Mutational Signatures

# COSMIC Mutational Signatures (SBS, DBS, ID) signatures = { "SBS1": "Spontaneous deamination (age-related, clock-like)", "SBS2": "APOBEC activity", "SBS4": "Tobacco smoking", "SBS6": "Defective DNA mismatch repair (MSI)", "SBS7a": "UV light exposure (melanoma)", "SBS10a": "POLE proofreading deficiency", "SBS13": "APOBEC activity (alternative)", "SBS22": "Aristolochic acid exposure", } # Dominant signatures by cancer type cancer_signatures = { "melanoma": ["SBS7a", "SBS7b"], "lung_squamous": ["SBS4", "SBS2"], "colorectal_MSI": ["SBS6", "SBS15"], "breast": ["SBS1", "SBS2", "SBS13"], }

Configuration

ParameterDescriptionDefault
api_tokenCOSMIC API authentication tokenRequired
genome_buildGRCh37 or GRCh38GRCh38
result_limitMax results per query100
tumour_siteFilter by cancer typeNone (all)
mutation_typeSNV, insertion, deletion, complexNone (all)

Best Practices

  1. Use the Cancer Gene Census as your starting gene list. The CGC is expertly curated — start with known cancer genes rather than searching the full COSMIC database. It distinguishes oncogenes from tumor suppressors, guiding interpretation.

  2. Filter by primary tissue type. A mutation's significance varies by cancer type. KRAS G12D is common in pancreatic cancer but rare in melanoma. Always contextualize mutation frequencies by the specific cancer type under investigation.

  3. Check sample count, not just mutation presence. COSMIC reports how many samples carry each mutation. A mutation found in 1 sample out of 50,000 screened is very different from one found in 1,000. Use frequency data to prioritize significant mutations.

  4. Combine COSMIC with functional databases. COSMIC catalogues mutations but doesn't always assess their functional impact. Cross-reference top mutations with functional data from DepMap (essentiality), OncoKB (clinical actionability), or ClinVar (pathogenicity).

  5. Account for detection bias. Highly studied genes (TP53, KRAS) have more data than rarely studied genes. High mutation counts may reflect high screening frequency rather than biological importance. Consider the number of samples screened for each gene.

Common Issues

API access denied or rate limited. COSMIC requires registration for API access. Free academic accounts have rate limits — add delays between requests. For large-scale data needs, download the COSMIC data files directly rather than using the API.

Mutation nomenclature doesn't match between databases. COSMIC uses its own mutation IDs (COSV/COSM) alongside HGVS nomenclature. When cross-referencing with ClinVar or gnomAD, map via genomic coordinates (chromosome, position, ref, alt) rather than mutation names.

Different cancer type classifications between studies. COSMIC uses specific primary site and histology classifications that may not match other databases' terminology. Lung cancer in COSMIC splits into multiple subtypes — search by primary site first, then refine by histology.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates