Methods & reproducibility
Documentation for researchers citing VectaBind. Predictions are computational estimates for ranking and triage — not a substitute for experimental binding assays, FEP, or structure-based docking campaigns.
Model
| Component | Detail |
|---|---|
| Architecture | Stage 6 SE(3)-equivariant EGNN + cross-attention ligand–pocket fusion |
| Protein representation | ESM2-3B embeddings (2560-d) on binding-pocket residues + Cα coordinates |
| Ligand representation | RDKit graph → GNN encoder (Stage 5/6 path) |
| Parameters | ~65M trainable |
| Training structures | ~94k complexes (PDBBind-derived pipeline) |
| API version | v1.0.0 · endpoint https://api.vectabind.com |
Outputs
- affinity (pKd) — potency-calibrated binding strength estimate
- bind_prob — probability of active-class binding (0–1), mapped from calibrated pKd
- confidence — coarse tier (
high/medium/low) from bind_prob and pKd thresholds; not a Bayesian uncertainty interval - Physicochemical — MW, LogP, QED, Lipinski via RDKit
- ChEMBL similarity — browser-side lookup against ChEMBL REST API; clinical-analog flag = max_phase > 0 and similarity > 70%
Calibration
Raw model outputs are mapped through a potency calibration layer before display. Advanced / raw scores are available in Compound analysis → Advanced · model internals in the app. Docking (GNINA) is optional on Pro tier and is separate from the ML affinity head.
Targets & structures
Scoreable targets use pre-computed pocket embeddings from crystallographic or modeled binding sites. Alias names (e.g. EGFR, HER2) map to PDB pocket IDs via an internal registry (GET /targets). Custom pockets can be uploaded on Pro tier via POST /proteins/upload.
Recommended citation language
“Binding affinity was estimated using VectaBind (Stage 6 EGNN + ESM2-3B, API v1.0.0) as a computational rank-ordering tool. Predictions were not treated as experimental Ki/Kd values.”
Limitations
- Not validated for covalent binders, PROTACs, or macrocycles outside training distribution
- Salt forms and stereochemistry ambiguities in SMILES affect results
- Multi-target panels and MPO scores in the app are heuristic workflows — not clinical decision tools
- ChEMBL flags depend on public database coverage and Tanimoto similarity thresholds
Links
- Interactive app
- REST API documentation
- Hit Triage Workbench guide
- Privacy Policy
- Contact for methods questions or enterprise validation studies