Advanced bioRxiv Database

A scientific computing skill for searching and retrieving preprints from bioRxiv — the preprint server for biology. Advanced bioRxiv Database helps you search by keywords, authors, date ranges, and subject areas, and retrieve full metadata including abstracts, DOIs, and publication status.

When to Use This Skill

Choose Advanced bioRxiv Database when:

Searching for recent preprints by topic, author, or date range
Monitoring new preprints in specific subject areas
Retrieving preprint metadata (abstract, DOI, dates, publication status)
Building literature review pipelines that include preprints

Consider alternatives when:

You need peer-reviewed papers only (use PubMed)
You need full-text access (use publisher APIs)
You're searching chemistry preprints (use ChemRxiv)
You need physics/math preprints (use arXiv)

Quick Start


claude "Search bioRxiv for recent CRISPR gene therapy preprints"


import requests
from datetime import datetime, timedelta

# bioRxiv API — content detail endpoint
base_url = "https://api.biorxiv.org/details/biorxiv"

# Search by date range
end_date = datetime.now().strftime("%Y-%m-%d")
start_date = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d")

response = requests.get(f"{base_url}/{start_date}/{end_date}/0/25")
data = response.json()

print(f"Total preprints in range: {data['messages'][0]['total']}")
for paper in data["collection"]:
    print(f"\n{paper['title']}")
    print(f"  DOI: {paper['doi']}")
    print(f"  Category: {paper['category']}")
    print(f"  Date: {paper['date']}")
    print(f"  Published: {paper.get('published', 'Not yet')}")

Core Concepts

bioRxiv API Endpoints

Endpoint	Purpose	Format
`/details/biorxiv/{start}/{end}`	Preprints by date range	YYYY-MM-DD
`/details/biorxiv/{start}/{end}/{cursor}/{count}`	Paginated results	cursor=0, count=100
`/pubs/biorxiv/{start}/{end}`	Published preprints only	YYYY-MM-DD
`/publisher/{prefix}/{start}/{end}`	By publisher DOI prefix	e.g., 10.1038

Keyword Search Implementation


def search_biorxiv(keyword, days=30, max_results=100):
    """Search bioRxiv preprints by keyword in title/abstract"""
    end = datetime.now().strftime("%Y-%m-%d")
    start = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")

    results = []
    cursor = 0
    batch_size = 100

    while len(results) < max_results:
        url = f"https://api.biorxiv.org/details/biorxiv/{start}/{end}/{cursor}/{batch_size}"
        resp = requests.get(url)
        data = resp.json()

        if not data.get("collection"):
            break

        for paper in data["collection"]:
            text = f"{paper['title']} {paper['abstract']}".lower()
            if keyword.lower() in text:
                results.append(paper)

        cursor += batch_size
        total = int(data["messages"][0]["total"])
        if cursor >= total:
            break

    return results[:max_results]

# Search for CRISPR papers
crispr_papers = search_biorxiv("CRISPR", days=30)
print(f"Found {len(crispr_papers)} CRISPR preprints in last 30 days")

Publication Tracking


def check_publication_status(dois):
    """Check if bioRxiv preprints have been published in journals"""
    published = []
    for doi in dois:
        # Use the pubs endpoint to check publication status
        url = f"https://api.biorxiv.org/pubs/biorxiv/{doi}"
        resp = requests.get(url)
        data = resp.json()

        if data.get("collection"):
            pub = data["collection"][0]
            if pub.get("published_doi"):
                published.append({
                    "preprint_doi": doi,
                    "journal": pub.get("published_journal", "Unknown"),
                    "published_doi": pub["published_doi"],
                    "published_date": pub.get("published_date", "Unknown")
                })

    return published

Configuration

Parameter	Description	Default
`api_base_url`	bioRxiv API base URL	`https://api.biorxiv.org`
`batch_size`	Results per API call (max 100)	`100`
`date_range_days`	Default search window	`30`
`include_abstracts`	Include full abstracts in results	`true`
`server`	biorxiv or medrxiv	`biorxiv`

Best Practices

Paginate through all results. The bioRxiv API returns at most 100 results per request. Use the cursor parameter to fetch subsequent pages. Check the total field in the response to know when you've retrieved everything.
Cache results for repeated searches. bioRxiv API doesn't enforce strict rate limits, but repeated queries for the same date range waste bandwidth. Cache responses locally with timestamps and only re-fetch when the date range extends beyond your cache.
Use date ranges strategically. Narrow date ranges return results faster. For monitoring new preprints, query the last 1-7 days rather than large date ranges. For comprehensive literature reviews, query month by month to manage result volumes.
Check publication status for citing preprints. Before citing a bioRxiv preprint, check if the peer-reviewed version has been published using the /pubs endpoint. Citing the published version is preferred when available.
Combine with PubMed for comprehensive coverage. bioRxiv only has preprints. For a complete literature review, search both bioRxiv (for recent, unpublished work) and PubMed (for peer-reviewed published work). Deduplicate by DOI.

Common Issues

API returns empty collection for valid date ranges. The API has a maximum date range per request (typically 1 month). Split longer ranges into monthly chunks and combine results. Also verify the date format is YYYY-MM-DD.

Keyword search misses relevant preprints. The bioRxiv API doesn't support full-text search — you can only filter client-side on the returned metadata. Use broad date ranges to capture more papers, then filter locally by matching keywords against title and abstract text.

Rate limiting during bulk downloads. While bioRxiv's API is generally permissive, rapid-fire requests may be throttled. Add a 1-second delay between paginated requests for bulk operations. For very large downloads, use the bioRxiv data dumps available on their FTP server.

⚠️ Loading Issue

Advanced Biorxiv Database

Advanced bioRxiv Database

When to Use This Skill

Quick Start

Core Concepts

bioRxiv API Endpoints

Keyword Search Implementation

Publication Tracking

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace