We welcome all forms of contribution and feedback to improve this project.
Note: As this is an open-source project, please ensure that any communication or contribution adheres to our code of conduct and contribution guidelines.
search ncbi requires Python 3.6 or later. Other dependencies will be automatically installed when you install the package using one of the methods above.
Supported NCBI Libraries
This project supports the following NCBI libraries:
pubmed
protein
nuccore
nucleotide
assembly
blastdbinfo
books
cdd
clinvar
gap
gene
geoprofiles
medgen
omim
orgtrack
popset
pcassay
protfam
pccompound
pcsubstance
seqannot
biocollections
taxonomy
bioproject
biosample
sra
search_ncbi Command Line Interface Usage
After installation, you can use the searchncbi command to interact with NCBI databases.
search_ncbi
This tool provides a simple and efficient way to search NCBI databases using the Entrez Programming Utilities (E-utilities)
Notice
This module is still in the development stage. If you encounter any issues or have suggestions, please:
We welcome all forms of contribution and feedback to improve this project.
Note: As this is an open-source project, please ensure that any communication or contribution adheres to our code of conduct and contribution guidelines.
Features
Installation
You can install it using one of the following methods:
Option 1: Install from Conda (Recommended)
You can install it from bioconda:
Option 2: Install from source
To install search ncbi from source, follow these steps:
Clone the repository:
Navigate to the project directory:
Install the package:
Dependencies
search ncbi requires Python 3.6 or later. Other dependencies will be automatically installed when you install the package using one of the methods above.
Supported NCBI Libraries
This project supports the following NCBI libraries:
search_ncbi Command Line Interface Usage
After installation, you can use the
searchncbicommand to interact with NCBI databases.Basic Usage
Required Arguments
--email: Your email address for NCBI queries (required)-d, --db: NCBI database to search (required)-t, --term: Search term (required)Optional Arguments
--api-key: Your NCBI API key (optional, but recommended for higher request limits)-m, --max-results: Maximum number of results to return (default: all available results)-b, --batch-size: Number of results to process in each batch (default: 500)-o, --output: Output file name (default: “output.csv”)-a, --action: Action to perform (default: “metadata”)Actions
metadata: Process and save all metadata (default)custom: Process and save custom filtered metadataraw: Retrieve and save raw datacount: Get the total count of search resultsid_list: Retrieve and save a list of IDsCustom Filtering Options (for
customaction)--include: List of column names to include--exclude: List of column names to exclude--contains: List of strings that column names should contain--regex: Regular expression for filtering column namesExamples
Search BioProject and save all metadata:
Search Nucleotide database with custom filtering:
Get raw data from Protein database:
Get total count of results for a Gene search:
Get ID list for SRA database:
Python Module
Import
First, ensure that the
search_ncbipackage is installed. Then, import theNCBIToolsclass in your Python script:Initialization
Create an instance of
NCBIToolsby providing your email address and an optional API key:Note: The API key is optional but recommended for higher request limits.
Main Methods
1. Search and Process Data
Parameters:
db: NCBI database name (string)term: Search term (string)max_results: Maximum number of results (integer, optional)batch_size: Batch size for processing (integer, default 500)process_method: Processing method, ‘all’ or ‘custom’ (string, default ‘all’)For ‘custom’ processing method, additional filtering parameters can be used:
include,exclude,contains,regex.2. Get Raw Data
3. Get Search Result Count
4. Get ID List
5. Search and Save Metadata
6. Filter Metadata
7. Search, Save, and Filter Metadata (Complete Workflow)
Note: All methods return pandas DataFrames or appropriate data structures unless otherwise specified. Ensure proper handling of the returned data.
Contributing
Contributions to NCBI Tools are welcome! Please refer to our Contributing Guidelines for more information.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
If you have any questions or feedback, please open an issue on our GitHub repository.