Technology Overview    
Prediction model for HERG binding

QT elongation caused by drugs binding to the hERG channel can result in potentially fatal cardiac arrhythmias. The use of in silico models to predict the hERG binding potential of a drug (a surrogate marker for cardiotoxicity) has been gaining acceptance as a means of flagging candidates that could potentially pose problems in advanced stages of clinical development. Such models can also provide rational guidelines for molecule design to overcome this problem.

Strand Genomics has developed a suite of hERG binding predictors employing several machine learning methods such as neural networks and support vector machines to identify a small set of molecular descriptors that correlates with hERG binding activity. The input to these predictors is the 2D structure of a molecule, which is used to compute the descriptors.

Binary Classification Model for HERG Binding
hERG Binding activity is expressed as IC50, which is the concentration of drug required to inhibit Ikr by 50%. Typically binders and nonbinders to hERG are defined by the selecting an appropriate IC50 cutoff value. A high value for this cutoff implies that the model is stringent in classifying compounds as binders. One can relax the stringency by using lower cutoff values. Strand has developed three models for hERG binding that differ in the IC50 cutoff values. These models in increasing order of stringency are
a) Model 1: cutoff value is IC50 1µm
b) Model 2: cutoff value is IC50 5µm
c) Model 3: cutoff value is IC50 50µm
The flexibility in choosing models of different stringencies allows the user to pick the model most applicable for decision-making. For example, a user may wish to use a model with lower stringency at a very early design stage while using a more stringent model at a later stage of development. Only Strand offers this flexibility to the user.

Training Set Profile

Data for the training set was curated from multiple public domain sources and normalized to a common basis prior to model building. Models 1 through 3 contain training sets with experimental hERG binding IC50 values for 127, 170 and 204 compounds respectively.

Almost all molecules used in the model are commercially available drugs. 98% of these drugs are considered drug-like based on Lipinski's rule of five.

Number of Lipinski’s Rule Violations
Number of Molecules
0
170
1
28
2
3
3
3
4
0

A self-dissimilarity test for the training set was carried by using 2D structural fingerprints and was determined to be 64%. The maximal dissimilarity was observed to be 96%.

Model Characteristics: Validation
Cross Validation Statistics
IC50 cutoff (µm)
Number of Compounds
Model Type
Classification Accuracy
Binder
Non Binder
Overall
1
127
ANN
93%
81%
86%
5
170
ANN
84%
86%
85%
50
204
ANN
77%
78%
78%
SVM
80%
77%
78%

External Validation Statistics

IC50 cutoff (µm)
Number of Compounds
Model Type
Prediction Accuracy
Binder
Non Binder
Overall
1
199
ANN
100%
80%
81%
5
191
ANN
100%
77%
78%
50
189
ANN
88%
76%
77%
SVM
83%
79%
80%

 
 
© 2004 Strand Genomics. All Rights Reserved. | trutox@strandgenomics.com