Spaces:
Running
Running
File size: 5,349 Bytes
9286db5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# ClinicalTrials.gov Tool: Current State & Future Improvements
**Status**: Currently Implemented
**Priority**: High (Core Data Source for Drug Repurposing)
---
## Current Implementation
### What We Have (`src/tools/clinicaltrials.py`)
- V2 API search via `clinicaltrials.gov/api/v2/studies`
- Filters: `INTERVENTIONAL` study type, `RECRUITING` status
- Returns: NCT ID, title, conditions, interventions, phase, status
- Query preprocessing via shared `query_utils.py`
### Current Strengths
1. **Good Filtering**: Already filtering for interventional + recruiting
2. **V2 API**: Using the modern API (v1 deprecated)
3. **Phase Info**: Extracting trial phases for drug development context
### Current Limitations
1. **No Outcome Data**: Missing primary/secondary outcomes
2. **No Eligibility Criteria**: Missing inclusion/exclusion details
3. **No Sponsor Info**: Missing who's running the trial
4. **No Result Data**: For completed trials, no efficacy data
5. **Limited Drug Mapping**: No integration with drug databases
---
## API Capabilities We're Not Using
### Fields We Could Request
```python
# Current fields
fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"]
# Additional valuable fields
additional_fields = [
"PrimaryOutcomeMeasure", # What are they measuring?
"SecondaryOutcomeMeasure", # Secondary endpoints
"EligibilityCriteria", # Who can participate?
"LeadSponsorName", # Who's funding?
"ResultsFirstPostDate", # Has results?
"StudyFirstPostDate", # When started?
"CompletionDate", # When finished?
"EnrollmentCount", # Sample size
"InterventionDescription", # Drug details
"ArmGroupLabel", # Treatment arms
"InterventionOtherName", # Drug aliases
]
```
### Filter Enhancements
```python
# Current
aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING"
# Could add
"status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED" # Include completed for results
"phase:PHASE2,PHASE3" # Only later-stage trials
"resultsFirstPostDateRange:2020-01-01_" # Trials with posted results
```
---
## Recommended Improvements
### Phase 1: Richer Metadata
```python
EXTENDED_FIELDS = [
"NCTId",
"BriefTitle",
"OfficialTitle",
"Condition",
"InterventionName",
"InterventionDescription",
"InterventionOtherName", # Drug synonyms!
"Phase",
"OverallStatus",
"PrimaryOutcomeMeasure",
"EnrollmentCount",
"LeadSponsorName",
"StudyFirstPostDate",
]
```
### Phase 2: Results Retrieval
For completed trials, we can get actual efficacy data:
```python
async def get_trial_results(nct_id: str) -> dict | None:
"""Fetch results for completed trials."""
url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
params = {
"fields": "ResultsSection",
}
# Returns outcome measures and statistics
```
### Phase 3: Drug Name Normalization
Map intervention names to standard identifiers:
```python
# Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug
# Solution: Use RxNorm or DrugBank for normalization
async def normalize_drug_name(intervention: str) -> str:
"""Normalize drug name via RxNorm API."""
url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}"
# Returns standardized RxCUI
```
---
## Integration Opportunities
### With PubMed
Cross-reference trials with publications:
```python
# ClinicalTrials.gov provides PMID links
# Can correlate trial results with published papers
```
### With DrugBank/ChEMBL
Map interventions to:
- Mechanism of action
- Known targets
- Adverse effects
- Drug-drug interactions
---
## Python Libraries to Consider
| Library | Purpose | Notes |
|---------|---------|-------|
| [pytrials](https://pypi.org/project/pytrials/) | CT.gov wrapper | V2 API support unclear |
| [clinicaltrials](https://github.com/ebmdatalab/clinicaltrials-act-tracker) | Data tracking | More for analysis |
| [drugbank-downloader](https://pypi.org/project/drugbank-downloader/) | Drug mapping | Requires license |
---
## API Quirks & Gotchas
1. **Rate Limiting**: Undocumented, be conservative
2. **Pagination**: Max 1000 results per request
3. **Field Names**: Case-sensitive, camelCase
4. **Empty Results**: Some fields may be null even if requested
5. **Status Changes**: Trials change status frequently
---
## Example Enhanced Query
```python
async def search_drug_repurposing_trials(
drug_name: str,
condition: str,
include_completed: bool = True,
) -> list[Evidence]:
"""Search for trials repurposing a drug for a new condition."""
statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"]
if include_completed:
statuses.append("COMPLETED")
params = {
"query.intr": drug_name,
"query.cond": condition,
"filter.overallStatus": ",".join(statuses),
"filter.studyType": "INTERVENTIONAL",
"fields": ",".join(EXTENDED_FIELDS),
"pageSize": 50,
}
```
---
## Sources
- [ClinicalTrials.gov API Documentation](https://clinicaltrials.gov/data-api/api)
- [CT.gov Field Definitions](https://clinicaltrials.gov/data-api/about-api/study-data-structure)
- [RxNorm API](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiByString.html)
|