Methods & reproducibility

Documentation for researchers citing VectaBind. Predictions are computational estimates for ranking and triage — not a substitute for experimental binding assays, FEP, or structure-based docking campaigns.

Validation status: Reported 0.20 pKd MAE is on the PDBBind 2020 validation split used for model selection. Formal held-out test evaluation (e.g. CASF) is in progress. Use VectaBind to compare compounds relative to each other on the same target.

Model

Component	Detail
Architecture	Stage 6 SE(3)-equivariant EGNN + cross-attention ligand–pocket fusion
Protein representation	ESM2-3B embeddings (2560-d) on binding-pocket residues + Cα coordinates
Ligand representation	RDKit graph → GNN encoder (Stage 5/6 path)
Parameters	~65M trainable
Training structures	~94k complexes (PDBBind-derived pipeline)
API version	v1.0.0 · endpoint `https://api.vectabind.com`

Outputs

affinity (pKd) — potency-calibrated binding strength estimate
bind_prob — probability of active-class binding (0–1), mapped from calibrated pKd
confidence — coarse tier (high / medium / low) from bind_prob and pKd thresholds; not a Bayesian uncertainty interval
Physicochemical — MW, LogP, QED, Lipinski via RDKit
ChEMBL similarity — browser-side lookup against ChEMBL REST API; clinical-analog flag = max_phase > 0 and similarity > 70%

Calibration

Raw model outputs are mapped through a potency calibration layer before display. Advanced / raw scores are available in Compound analysis → Advanced · model internals in the app. Docking (GNINA) is optional on Pro tier and is separate from the ML affinity head.

Targets & structures

Scoreable targets use pre-computed pocket embeddings from crystallographic or modeled binding sites. Alias names (e.g. EGFR, HER2) map to PDB pocket IDs via an internal registry (GET /targets). Custom pockets can be uploaded on Pro tier via POST /proteins/upload.

Recommended citation language

“Binding affinity was estimated using VectaBind (Stage 6 EGNN + ESM2-3B, API v1.0.0) as a computational rank-ordering tool. Predictions were not treated as experimental K_i/K_d values.”

Limitations

Not validated for covalent binders, PROTACs, or macrocycles outside training distribution
Salt forms and stereochemistry ambiguities in SMILES affect results
Multi-target panels and MPO scores in the app are heuristic workflows — not clinical decision tools
ChEMBL flags depend on public database coverage and Tanimoto similarity thresholds