Comprehensive Pubmed Database
Boost productivity using this direct, rest, access, pubmed. Includes structured workflows, validation checks, and reusable patterns for scientific.
Comprehensive PubMed Database
Search and retrieve biomedical literature from PubMed, the premier database of 35M+ citations covering medicine, life sciences, and biomedical research. This skill covers advanced query construction, MeSH term searching, citation retrieval, and building literature analysis workflows.
When to Use This Skill
Choose Comprehensive PubMed Database when you need to:
- Search biomedical literature with advanced Boolean queries and MeSH terms
- Retrieve article metadata, abstracts, and citation data programmatically
- Build systematic review search strategies with reproducible queries
- Monitor new publications on specific topics with saved searches
Consider alternatives when:
- You need full-text article access (use publisher APIs or Unpaywall)
- You need cross-disciplinary academic search (use OpenAlex or Semantic Scholar)
- You need preprints and non-indexed papers (use bioRxiv or Google Scholar)
Quick Start
pip install biopython requests pandas
from Bio import Entrez Entrez.email = "[email protected]" # Required by NCBI # Search PubMed handle = Entrez.esearch( db="pubmed", term="CRISPR therapy clinical trial", retmax=20, sort="relevance" ) results = Entrez.read(handle) print(f"Total results: {results['Count']}") # Fetch article details ids = results["IdList"] handle = Entrez.efetch(db="pubmed", id=ids, rettype="xml") records = Entrez.read(handle) for article in records["PubmedArticle"][:5]: medline = article["MedlineCitation"] title = medline["Article"]["ArticleTitle"] year = medline["Article"]["Journal"]["JournalIssue"]["PubDate"].get("Year", "N/A") print(f"[{year}] {title}")
Core Concepts
Search Field Tags
| Tag | Field | Example |
|---|---|---|
[ti] | Title | cancer[ti] |
[tiab] | Title/Abstract | CRISPR[tiab] |
[au] | Author | Zhang F[au] |
[mesh] | MeSH heading | "Gene Editing"[mesh] |
[pt] | Publication type | "Clinical Trial"[pt] |
[dp] | Date of publication | 2024[dp] |
[la] | Language | english[la] |
[journal] | Journal name | "Nature"[journal] |
Advanced Search Strategies
from Bio import Entrez import pandas as pd Entrez.email = "[email protected]" def advanced_pubmed_search(query, max_results=100): """Execute an advanced PubMed search with full metadata retrieval.""" # Search handle = Entrez.esearch( db="pubmed", term=query, retmax=max_results, sort="relevance", usehistory="y" ) search_results = Entrez.read(handle) total = int(search_results["Count"]) webenv = search_results["WebEnv"] query_key = search_results["QueryKey"] # Fetch records using history server handle = Entrez.efetch( db="pubmed", webenv=webenv, query_key=query_key, retmax=max_results, rettype="xml" ) records = Entrez.read(handle) articles = [] for article in records["PubmedArticle"]: medline = article["MedlineCitation"] art = medline["Article"] # Extract authors authors = [] for author in art.get("AuthorList", []): name = f"{author.get('LastName', '')} {author.get('Initials', '')}" authors.append(name.strip()) # Extract abstract abstract_parts = art.get("Abstract", {}).get("AbstractText", []) abstract = " ".join(str(part) for part in abstract_parts) # Publication date pub_date = art["Journal"]["JournalIssue"]["PubDate"] year = pub_date.get("Year", "N/A") articles.append({ "pmid": str(medline["PMID"]), "title": str(art["ArticleTitle"]), "authors": "; ".join(authors[:5]), "journal": art["Journal"]["Title"], "year": year, "abstract": abstract[:500], "doi": next( (str(eid) for eid in art.get("ELocationID", []) if eid.attributes.get("EIdType") == "doi"), "" ) }) return pd.DataFrame(articles), total # Complex Boolean search query = ( '"machine learning"[tiab] AND "drug discovery"[tiab] ' 'AND ("2023"[dp] OR "2024"[dp]) ' 'AND "english"[la] AND "journal article"[pt]' ) df, total = advanced_pubmed_search(query, max_results=50) print(f"Found {total} total, retrieved {len(df)}") print(df[["title", "journal", "year"]].head(10))
Citation Analysis
from Bio import Entrez from collections import Counter def analyze_search_results(query, max_results=200): """Analyze publication patterns for a search query.""" handle = Entrez.esearch( db="pubmed", term=query, retmax=max_results ) results = Entrez.read(handle) ids = results["IdList"] handle = Entrez.efetch(db="pubmed", id=ids, rettype="xml") records = Entrez.read(handle) journals = [] years = [] mesh_terms = [] for article in records["PubmedArticle"]: medline = article["MedlineCitation"] art = medline["Article"] journals.append(art["Journal"]["Title"]) year = art["Journal"]["JournalIssue"]["PubDate"].get("Year") if year: years.append(year) for heading in medline.get("MeshHeadingList", []): mesh_terms.append(str(heading["DescriptorName"])) print("Top Journals:") for journal, count in Counter(journals).most_common(10): print(f" {count:3d} | {journal}") print("\nPublications by Year:") for year, count in sorted(Counter(years).items()): print(f" {year}: {'█' * count} ({count})") print("\nTop MeSH Terms:") for term, count in Counter(mesh_terms).most_common(15): print(f" {count:3d} | {term}") analyze_search_results('"single cell RNA"[tiab]')
Configuration
| Parameter | Description | Default |
|---|---|---|
email | Required email for NCBI identification | Required |
api_key | NCBI API key for higher rate limits | Optional |
db | Database to search | "pubmed" |
retmax | Maximum records to retrieve | 20 |
sort | Sort order (relevance, date) | "relevance" |
usehistory | Use NCBI history server | "y" |
Best Practices
-
Use MeSH terms for comprehensive searches — MeSH (Medical Subject Headings) are standardized terms assigned by NCBI indexers. Searching
"Neoplasms"[mesh]captures all cancer-related articles regardless of the specific terms used by authors. Combine MeSH with text-word searches for thorough coverage. -
Set your NCBI API key for higher rate limits — Without an API key, you're limited to 3 requests per second. Register for a free key at NCBI to get 10 requests per second. Set it with
Entrez.api_key = "your_key". -
Use the history server for large result sets — For queries returning thousands of results, use
usehistory="y"withWebEnvandQueryKeyto paginate through results without re-executing the search. Fetch in batches of 500 records. -
Build reproducible search strings — Document your exact search query, date of execution, and number of results returned. This is required for systematic reviews and enables others to replicate your literature search.
-
Combine field tags with Boolean operators — Use AND/OR/NOT with field-specific tags for precise searches.
"CRISPR"[ti] AND "therapy"[tiab] NOT "review"[pt]finds original research with CRISPR in the title and therapy mentioned in the abstract, excluding reviews.
Common Issues
Search returns too many irrelevant results — Broad terms like "cancer" return millions of results. Use field tags ([ti] for title, [tiab] for title/abstract), date filters, and publication type filters. MeSH terms with subheadings (e.g., "Neoplasms/therapy"[mesh]) provide more focused results.
Entrez.efetch returns incomplete XML — When fetching large numbers of records, the XML response may be truncated. Use the history server and fetch in batches of 200-500 records. Set retmax to a reasonable number and paginate using retstart for subsequent batches.
Some articles missing abstracts — Not all PubMed entries have abstracts — letters, editorials, and older articles may lack them. Check for art.get("Abstract") before accessing abstract text, and handle the case where it's None. Use the full-text link to access content when abstracts are missing.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.