C

Comprehensive Pubmed Database

Boost productivity using this direct, rest, access, pubmed. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Comprehensive PubMed Database

Search and retrieve biomedical literature from PubMed, the premier database of 35M+ citations covering medicine, life sciences, and biomedical research. This skill covers advanced query construction, MeSH term searching, citation retrieval, and building literature analysis workflows.

When to Use This Skill

Choose Comprehensive PubMed Database when you need to:

  • Search biomedical literature with advanced Boolean queries and MeSH terms
  • Retrieve article metadata, abstracts, and citation data programmatically
  • Build systematic review search strategies with reproducible queries
  • Monitor new publications on specific topics with saved searches

Consider alternatives when:

  • You need full-text article access (use publisher APIs or Unpaywall)
  • You need cross-disciplinary academic search (use OpenAlex or Semantic Scholar)
  • You need preprints and non-indexed papers (use bioRxiv or Google Scholar)

Quick Start

pip install biopython requests pandas
from Bio import Entrez Entrez.email = "[email protected]" # Required by NCBI # Search PubMed handle = Entrez.esearch( db="pubmed", term="CRISPR therapy clinical trial", retmax=20, sort="relevance" ) results = Entrez.read(handle) print(f"Total results: {results['Count']}") # Fetch article details ids = results["IdList"] handle = Entrez.efetch(db="pubmed", id=ids, rettype="xml") records = Entrez.read(handle) for article in records["PubmedArticle"][:5]: medline = article["MedlineCitation"] title = medline["Article"]["ArticleTitle"] year = medline["Article"]["Journal"]["JournalIssue"]["PubDate"].get("Year", "N/A") print(f"[{year}] {title}")

Core Concepts

Search Field Tags

TagFieldExample
[ti]Titlecancer[ti]
[tiab]Title/AbstractCRISPR[tiab]
[au]AuthorZhang F[au]
[mesh]MeSH heading"Gene Editing"[mesh]
[pt]Publication type"Clinical Trial"[pt]
[dp]Date of publication2024[dp]
[la]Languageenglish[la]
[journal]Journal name"Nature"[journal]

Advanced Search Strategies

from Bio import Entrez import pandas as pd Entrez.email = "[email protected]" def advanced_pubmed_search(query, max_results=100): """Execute an advanced PubMed search with full metadata retrieval.""" # Search handle = Entrez.esearch( db="pubmed", term=query, retmax=max_results, sort="relevance", usehistory="y" ) search_results = Entrez.read(handle) total = int(search_results["Count"]) webenv = search_results["WebEnv"] query_key = search_results["QueryKey"] # Fetch records using history server handle = Entrez.efetch( db="pubmed", webenv=webenv, query_key=query_key, retmax=max_results, rettype="xml" ) records = Entrez.read(handle) articles = [] for article in records["PubmedArticle"]: medline = article["MedlineCitation"] art = medline["Article"] # Extract authors authors = [] for author in art.get("AuthorList", []): name = f"{author.get('LastName', '')} {author.get('Initials', '')}" authors.append(name.strip()) # Extract abstract abstract_parts = art.get("Abstract", {}).get("AbstractText", []) abstract = " ".join(str(part) for part in abstract_parts) # Publication date pub_date = art["Journal"]["JournalIssue"]["PubDate"] year = pub_date.get("Year", "N/A") articles.append({ "pmid": str(medline["PMID"]), "title": str(art["ArticleTitle"]), "authors": "; ".join(authors[:5]), "journal": art["Journal"]["Title"], "year": year, "abstract": abstract[:500], "doi": next( (str(eid) for eid in art.get("ELocationID", []) if eid.attributes.get("EIdType") == "doi"), "" ) }) return pd.DataFrame(articles), total # Complex Boolean search query = ( '"machine learning"[tiab] AND "drug discovery"[tiab] ' 'AND ("2023"[dp] OR "2024"[dp]) ' 'AND "english"[la] AND "journal article"[pt]' ) df, total = advanced_pubmed_search(query, max_results=50) print(f"Found {total} total, retrieved {len(df)}") print(df[["title", "journal", "year"]].head(10))

Citation Analysis

from Bio import Entrez from collections import Counter def analyze_search_results(query, max_results=200): """Analyze publication patterns for a search query.""" handle = Entrez.esearch( db="pubmed", term=query, retmax=max_results ) results = Entrez.read(handle) ids = results["IdList"] handle = Entrez.efetch(db="pubmed", id=ids, rettype="xml") records = Entrez.read(handle) journals = [] years = [] mesh_terms = [] for article in records["PubmedArticle"]: medline = article["MedlineCitation"] art = medline["Article"] journals.append(art["Journal"]["Title"]) year = art["Journal"]["JournalIssue"]["PubDate"].get("Year") if year: years.append(year) for heading in medline.get("MeshHeadingList", []): mesh_terms.append(str(heading["DescriptorName"])) print("Top Journals:") for journal, count in Counter(journals).most_common(10): print(f" {count:3d} | {journal}") print("\nPublications by Year:") for year, count in sorted(Counter(years).items()): print(f" {year}: {'█' * count} ({count})") print("\nTop MeSH Terms:") for term, count in Counter(mesh_terms).most_common(15): print(f" {count:3d} | {term}") analyze_search_results('"single cell RNA"[tiab]')

Configuration

ParameterDescriptionDefault
emailRequired email for NCBI identificationRequired
api_keyNCBI API key for higher rate limitsOptional
dbDatabase to search"pubmed"
retmaxMaximum records to retrieve20
sortSort order (relevance, date)"relevance"
usehistoryUse NCBI history server"y"

Best Practices

  1. Use MeSH terms for comprehensive searches — MeSH (Medical Subject Headings) are standardized terms assigned by NCBI indexers. Searching "Neoplasms"[mesh] captures all cancer-related articles regardless of the specific terms used by authors. Combine MeSH with text-word searches for thorough coverage.

  2. Set your NCBI API key for higher rate limits — Without an API key, you're limited to 3 requests per second. Register for a free key at NCBI to get 10 requests per second. Set it with Entrez.api_key = "your_key".

  3. Use the history server for large result sets — For queries returning thousands of results, use usehistory="y" with WebEnv and QueryKey to paginate through results without re-executing the search. Fetch in batches of 500 records.

  4. Build reproducible search strings — Document your exact search query, date of execution, and number of results returned. This is required for systematic reviews and enables others to replicate your literature search.

  5. Combine field tags with Boolean operators — Use AND/OR/NOT with field-specific tags for precise searches. "CRISPR"[ti] AND "therapy"[tiab] NOT "review"[pt] finds original research with CRISPR in the title and therapy mentioned in the abstract, excluding reviews.

Common Issues

Search returns too many irrelevant results — Broad terms like "cancer" return millions of results. Use field tags ([ti] for title, [tiab] for title/abstract), date filters, and publication type filters. MeSH terms with subheadings (e.g., "Neoplasms/therapy"[mesh]) provide more focused results.

Entrez.efetch returns incomplete XML — When fetching large numbers of records, the XML response may be truncated. Use the history server and fetch in batches of 200-500 records. Set retmax to a reasonable number and paginate using retstart for subsequent batches.

Some articles missing abstracts — Not all PubMed entries have abstracts — letters, editorials, and older articles may lack them. Check for art.get("Abstract") before accessing abstract text, and handle the case where it's None. Use the full-text link to access content when abstracts are missing.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates