Comprehensive PubMed Database

Search and retrieve biomedical literature from PubMed, the premier database of 35M+ citations covering medicine, life sciences, and biomedical research. This skill covers advanced query construction, MeSH term searching, citation retrieval, and building literature analysis workflows.

When to Use This Skill

Choose Comprehensive PubMed Database when you need to:

Search biomedical literature with advanced Boolean queries and MeSH terms
Retrieve article metadata, abstracts, and citation data programmatically
Build systematic review search strategies with reproducible queries
Monitor new publications on specific topics with saved searches

Consider alternatives when:

You need full-text article access (use publisher APIs or Unpaywall)
You need cross-disciplinary academic search (use OpenAlex or Semantic Scholar)
You need preprints and non-indexed papers (use bioRxiv or Google Scholar)

Quick Start


pip install biopython requests pandas


from Bio import Entrez

Entrez.email = "[email protected]"  # Required by NCBI

# Search PubMed
handle = Entrez.esearch(
    db="pubmed",
    term="CRISPR therapy clinical trial",
    retmax=20,
    sort="relevance"
)
results = Entrez.read(handle)
print(f"Total results: {results['Count']}")

# Fetch article details
ids = results["IdList"]
handle = Entrez.efetch(db="pubmed", id=ids, rettype="xml")
records = Entrez.read(handle)

for article in records["PubmedArticle"][:5]:
    medline = article["MedlineCitation"]
    title = medline["Article"]["ArticleTitle"]
    year = medline["Article"]["Journal"]["JournalIssue"]["PubDate"].get("Year", "N/A")
    print(f"[{year}] {title}")

Core Concepts

Search Field Tags

Tag	Field	Example
`[ti]`	Title	`cancer[ti]`
`[tiab]`	Title/Abstract	`CRISPR[tiab]`
`[au]`	Author	`Zhang F[au]`
`[mesh]`	MeSH heading	`"Gene Editing"[mesh]`
`[pt]`	Publication type	`"Clinical Trial"[pt]`
`[dp]`	Date of publication	`2024[dp]`
`[la]`	Language	`english[la]`
`[journal]`	Journal name	`"Nature"[journal]`

Advanced Search Strategies


from Bio import Entrez
import pandas as pd

Entrez.email = "[email protected]"

def advanced_pubmed_search(query, max_results=100):
    """Execute an advanced PubMed search with full metadata retrieval."""
    # Search
    handle = Entrez.esearch(
        db="pubmed", term=query,
        retmax=max_results, sort="relevance",
        usehistory="y"
    )
    search_results = Entrez.read(handle)
    total = int(search_results["Count"])
    webenv = search_results["WebEnv"]
    query_key = search_results["QueryKey"]

    # Fetch records using history server
    handle = Entrez.efetch(
        db="pubmed",
        webenv=webenv, query_key=query_key,
        retmax=max_results, rettype="xml"
    )
    records = Entrez.read(handle)

    articles = []
    for article in records["PubmedArticle"]:
        medline = article["MedlineCitation"]
        art = medline["Article"]

        # Extract authors
        authors = []
        for author in art.get("AuthorList", []):
            name = f"{author.get('LastName', '')} {author.get('Initials', '')}"
            authors.append(name.strip())

        # Extract abstract
        abstract_parts = art.get("Abstract", {}).get("AbstractText", [])
        abstract = " ".join(str(part) for part in abstract_parts)

        # Publication date
        pub_date = art["Journal"]["JournalIssue"]["PubDate"]
        year = pub_date.get("Year", "N/A")

        articles.append({
            "pmid": str(medline["PMID"]),
            "title": str(art["ArticleTitle"]),
            "authors": "; ".join(authors[:5]),
            "journal": art["Journal"]["Title"],
            "year": year,
            "abstract": abstract[:500],
            "doi": next(
                (str(eid) for eid in art.get("ELocationID", [])
                 if eid.attributes.get("EIdType") == "doi"),
                ""
            )
        })

    return pd.DataFrame(articles), total

# Complex Boolean search
query = (
    '"machine learning"[tiab] AND "drug discovery"[tiab] '
    'AND ("2023"[dp] OR "2024"[dp]) '
    'AND "english"[la] AND "journal article"[pt]'
)
df, total = advanced_pubmed_search(query, max_results=50)
print(f"Found {total} total, retrieved {len(df)}")
print(df[["title", "journal", "year"]].head(10))

Citation Analysis


from Bio import Entrez
from collections import Counter

def analyze_search_results(query, max_results=200):
    """Analyze publication patterns for a search query."""
    handle = Entrez.esearch(
        db="pubmed", term=query, retmax=max_results
    )
    results = Entrez.read(handle)
    ids = results["IdList"]

    handle = Entrez.efetch(db="pubmed", id=ids, rettype="xml")
    records = Entrez.read(handle)

    journals = []
    years = []
    mesh_terms = []

    for article in records["PubmedArticle"]:
        medline = article["MedlineCitation"]
        art = medline["Article"]

        journals.append(art["Journal"]["Title"])
        year = art["Journal"]["JournalIssue"]["PubDate"].get("Year")
        if year:
            years.append(year)

        for heading in medline.get("MeshHeadingList", []):
            mesh_terms.append(str(heading["DescriptorName"]))

    print("Top Journals:")
    for journal, count in Counter(journals).most_common(10):
        print(f"  {count:3d} | {journal}")

    print("\nPublications by Year:")
    for year, count in sorted(Counter(years).items()):
        print(f"  {year}: {'█' * count} ({count})")

    print("\nTop MeSH Terms:")
    for term, count in Counter(mesh_terms).most_common(15):
        print(f"  {count:3d} | {term}")

analyze_search_results('"single cell RNA"[tiab]')

Configuration

Parameter	Description	Default
`email`	Required email for NCBI identification	Required
`api_key`	NCBI API key for higher rate limits	Optional
`db`	Database to search	`"pubmed"`
`retmax`	Maximum records to retrieve	`20`
`sort`	Sort order (relevance, date)	`"relevance"`
`usehistory`	Use NCBI history server	`"y"`

Best Practices

Use MeSH terms for comprehensive searches — MeSH (Medical Subject Headings) are standardized terms assigned by NCBI indexers. Searching "Neoplasms"[mesh] captures all cancer-related articles regardless of the specific terms used by authors. Combine MeSH with text-word searches for thorough coverage.
Set your NCBI API key for higher rate limits — Without an API key, you're limited to 3 requests per second. Register for a free key at NCBI to get 10 requests per second. Set it with Entrez.api_key = "your_key".
Use the history server for large result sets — For queries returning thousands of results, use usehistory="y" with WebEnv and QueryKey to paginate through results without re-executing the search. Fetch in batches of 500 records.
Build reproducible search strings — Document your exact search query, date of execution, and number of results returned. This is required for systematic reviews and enables others to replicate your literature search.
Combine field tags with Boolean operators — Use AND/OR/NOT with field-specific tags for precise searches. "CRISPR"[ti] AND "therapy"[tiab] NOT "review"[pt] finds original research with CRISPR in the title and therapy mentioned in the abstract, excluding reviews.

Common Issues

Search returns too many irrelevant results — Broad terms like "cancer" return millions of results. Use field tags ([ti] for title, [tiab] for title/abstract), date filters, and publication type filters. MeSH terms with subheadings (e.g., "Neoplasms/therapy"[mesh]) provide more focused results.

Entrez.efetch returns incomplete XML — When fetching large numbers of records, the XML response may be truncated. Use the history server and fetch in batches of 200-500 records. Set retmax to a reasonable number and paginate using retstart for subsequent batches.

Some articles missing abstracts — Not all PubMed entries have abstracts — letters, editorials, and older articles may lack them. Check for art.get("Abstract") before accessing abstract text, and handle the case where it's None. Use the full-text link to access content when abstracts are missing.

⚠️ Loading Issue

Comprehensive Pubmed Database

Comprehensive PubMed Database

When to Use This Skill

Quick Start

Core Concepts

Search Field Tags

Advanced Search Strategies

Citation Analysis

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace