Advanced Biorxiv Database
All-in-one skill covering efficient, database, search, tool. Includes structured workflows, validation checks, and reusable patterns for scientific.
Advanced bioRxiv Database
A scientific computing skill for searching and retrieving preprints from bioRxiv — the preprint server for biology. Advanced bioRxiv Database helps you search by keywords, authors, date ranges, and subject areas, and retrieve full metadata including abstracts, DOIs, and publication status.
When to Use This Skill
Choose Advanced bioRxiv Database when:
- Searching for recent preprints by topic, author, or date range
- Monitoring new preprints in specific subject areas
- Retrieving preprint metadata (abstract, DOI, dates, publication status)
- Building literature review pipelines that include preprints
Consider alternatives when:
- You need peer-reviewed papers only (use PubMed)
- You need full-text access (use publisher APIs)
- You're searching chemistry preprints (use ChemRxiv)
- You need physics/math preprints (use arXiv)
Quick Start
claude "Search bioRxiv for recent CRISPR gene therapy preprints"
import requests from datetime import datetime, timedelta # bioRxiv API — content detail endpoint base_url = "https://api.biorxiv.org/details/biorxiv" # Search by date range end_date = datetime.now().strftime("%Y-%m-%d") start_date = (datetime.now() - timedelta(days=30)).strftime("%Y-%m-%d") response = requests.get(f"{base_url}/{start_date}/{end_date}/0/25") data = response.json() print(f"Total preprints in range: {data['messages'][0]['total']}") for paper in data["collection"]: print(f"\n{paper['title']}") print(f" DOI: {paper['doi']}") print(f" Category: {paper['category']}") print(f" Date: {paper['date']}") print(f" Published: {paper.get('published', 'Not yet')}")
Core Concepts
bioRxiv API Endpoints
| Endpoint | Purpose | Format |
|---|---|---|
/details/biorxiv/{start}/{end} | Preprints by date range | YYYY-MM-DD |
/details/biorxiv/{start}/{end}/{cursor}/{count} | Paginated results | cursor=0, count=100 |
/pubs/biorxiv/{start}/{end} | Published preprints only | YYYY-MM-DD |
/publisher/{prefix}/{start}/{end} | By publisher DOI prefix | e.g., 10.1038 |
Keyword Search Implementation
def search_biorxiv(keyword, days=30, max_results=100): """Search bioRxiv preprints by keyword in title/abstract""" end = datetime.now().strftime("%Y-%m-%d") start = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d") results = [] cursor = 0 batch_size = 100 while len(results) < max_results: url = f"https://api.biorxiv.org/details/biorxiv/{start}/{end}/{cursor}/{batch_size}" resp = requests.get(url) data = resp.json() if not data.get("collection"): break for paper in data["collection"]: text = f"{paper['title']} {paper['abstract']}".lower() if keyword.lower() in text: results.append(paper) cursor += batch_size total = int(data["messages"][0]["total"]) if cursor >= total: break return results[:max_results] # Search for CRISPR papers crispr_papers = search_biorxiv("CRISPR", days=30) print(f"Found {len(crispr_papers)} CRISPR preprints in last 30 days")
Publication Tracking
def check_publication_status(dois): """Check if bioRxiv preprints have been published in journals""" published = [] for doi in dois: # Use the pubs endpoint to check publication status url = f"https://api.biorxiv.org/pubs/biorxiv/{doi}" resp = requests.get(url) data = resp.json() if data.get("collection"): pub = data["collection"][0] if pub.get("published_doi"): published.append({ "preprint_doi": doi, "journal": pub.get("published_journal", "Unknown"), "published_doi": pub["published_doi"], "published_date": pub.get("published_date", "Unknown") }) return published
Configuration
| Parameter | Description | Default |
|---|---|---|
api_base_url | bioRxiv API base URL | https://api.biorxiv.org |
batch_size | Results per API call (max 100) | 100 |
date_range_days | Default search window | 30 |
include_abstracts | Include full abstracts in results | true |
server | biorxiv or medrxiv | biorxiv |
Best Practices
-
Paginate through all results. The bioRxiv API returns at most 100 results per request. Use the cursor parameter to fetch subsequent pages. Check the
totalfield in the response to know when you've retrieved everything. -
Cache results for repeated searches. bioRxiv API doesn't enforce strict rate limits, but repeated queries for the same date range waste bandwidth. Cache responses locally with timestamps and only re-fetch when the date range extends beyond your cache.
-
Use date ranges strategically. Narrow date ranges return results faster. For monitoring new preprints, query the last 1-7 days rather than large date ranges. For comprehensive literature reviews, query month by month to manage result volumes.
-
Check publication status for citing preprints. Before citing a bioRxiv preprint, check if the peer-reviewed version has been published using the
/pubsendpoint. Citing the published version is preferred when available. -
Combine with PubMed for comprehensive coverage. bioRxiv only has preprints. For a complete literature review, search both bioRxiv (for recent, unpublished work) and PubMed (for peer-reviewed published work). Deduplicate by DOI.
Common Issues
API returns empty collection for valid date ranges. The API has a maximum date range per request (typically 1 month). Split longer ranges into monthly chunks and combine results. Also verify the date format is YYYY-MM-DD.
Keyword search misses relevant preprints. The bioRxiv API doesn't support full-text search — you can only filter client-side on the returned metadata. Use broad date ranges to capture more papers, then filter locally by matching keywords against title and abstract text.
Rate limiting during bulk downloads. While bioRxiv's API is generally permissive, rapid-fire requests may be throttled. Add a 1-second delay between paginated requests for bulk operations. For very large downloads, use the bioRxiv data dumps available on their FTP server.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.