Machine learning and proteomics more accurately predict cardiovascular risk

Cardiovascular disease (CVD) remains the leading cause of death in most industrialized and developing countries. The prevention of cardiovascular disease depends on its rapid diagnosis until the initiation of cardioprotective therapies early in the course of the disease. However, there is still a lack of an accurate risk model to predict an individual’s susceptibility to cardiovascular disease.

A new Science Translational Medicine describes a novel proteomics-based model that predicts the risk of cardiovascular events over the next four years with greater accuracy than current clinical models.

Study: A proteomic surrogate for cardiovascular outcomes that is sensitive to multiple mechanisms of risk change.  Image Credit: alexacrib/

Study: A proteomic surrogate for cardiovascular outcomes that is sensitive to multiple mechanisms of risk change. Image Credit: alexacrib/


Endpoints for clinical trials of cardiovascular drugs include acute coronary events, hospitalizations, and deaths. However, this led to some drugs going through advanced development before they were discovered to increase cardiovascular risk. Comparatively, other drugs with promising cardioprotective effects have not been approved for such indications because these effects were demonstrated too late in the development process.

Traditional cardiovascular risk factors are also not particularly useful in predicting risk in people with known cardiovascular diseases but with controlled cholesterol and blood pressure levels, people with multiple chronic diseases, and the elderly.

Finally, many of these risk factors, including age, gender, history of diabetes, and some imaging factors, do not change to reduce the calculated risk reduction effect when using agents that act independently of these factors. Thus, researchers in the current study sought to generate and test a new model of cardiovascular risk that would use new biomarkers as an outcome measure instead of previous clinical parameters.

Idealized requirements for accurate and sensitive prognoses that respond agnostically and reliably to all changes in outcome, regardless of the mechanism of intervention, are key characteristics of a surrogate endpoint..”

Here, the researchers created a proteomics-based prognostic score that would predict actual cardiovascular outcomes in a relatively short time, while also including all known mechanisms and allowing the model to react to changes in outcomes. If successful, this score would be useful for phase II studies of drugs used in the prevention and treatment of cardiovascular disease and diabetes, as well as an endpoint for accelerated approval of breakthrough drugs.

Finally, the researchers also anticipate that their score could be used to selectively assign drugs to people at risk for CVD and measure patient outcomes.

Study results

The researchers measured 5,000 proteins in each plasma sample and applied machine learning to the results to develop a prognostic model. The model used 27 proteins and predicted the absolute risk of any of the multiple components that made up the composite endpoint, some of which included heart attack, stroke, hospitalization for heart failure, and death of any kind. cause, would occur within the next four years. .

This was tested on multiple cohorts with multiple comorbidities and changes in parameters were measured over time. Overall, more than 11,600 participants with a four-year outcome were included in the study.

At this point, 22% of the population had experienced one or more of these events for an event count of 2,500. These events included 622 hospitalizations for heart failure, 601 heart attacks and 345 strokes.

Among the proteins used in this model, 14 showed a positive correlation and 13 a negative correlation. These proteins correspond to ten or more biological processes, such as those involved in maintaining blood volume and sodium excretion, vesicle formation, angiogenesis, and glomerular filtration rate.

Mendelian analysis was used to explore possible causal relationships between 16 of these proteins, which were found in the available PheWAS database. This showed that a dozen of them were linked to one or more CVD-related traits.

The current model could also predict event rates over a wide range of values. The highest and lowest quintiles of predicted risk showed a five- to seven-fold increase in four-year event rates in the first two sets of validation data. The metacohort, which included all 11,600 participants, also showed a sevenfold increase in the event rate over four years.

The scientists also created four risk categories based on protein values. These had four-year event rates of 6%, 11%, 20% and 43%, respectively, in the six studies that made up the meta-cohort. This corresponded to low, low-medium, medium-high and high risk respectively. Additionally, the median lag from the event was less than two years overall, compared to 1.5 years in the highest quintile.

The model also responded in the right direction to adverse and beneficial changes in protein-predicted risk. For example, in the ACCORD trial, which is part of the dataset used here, CVD risk increased by 6% over two years, correctly predicting a future adverse event after the second specimen was collected. The PRADA trial also reflected a 6% increase in risk from baseline within three months of starting anthracycline chemotherapy.

Beneficial changes were observed in response to the glucagon-like peptide 1 (GLP-1) receptor agonist exenatide in the EXSCEL trial. The four-year absolute event risk was reduced by 0.8% in just one year, compared to the predicted reduction of 1.5% with this model. In the DiRECT trial, again, nearly 50% diabetes remission was achieved within one year, where the absolute risk was expected to be reduced by 6.7% compared to the standard diet group.

Finally, the model did not correctly predict any treatment effect in the subset of the ACCORD trial that had intensive diabetes control, and for patients in the PRADA trial in response to beta-blockers or angiotensin receptor blockers.

The model also predicted higher risks with a variety of conditions that are known to increase the incidence of event rates, such as breast cancer treatment, those with prior events, and those who are current smokers/diabetics/ with a history of cancer. In the first case, in the PRADA study, the predicted risk was 14% higher compared to the previous prediction of 5% from another cohort of matched women.


The model developed in the present study showed a consistent correlation between event rate and predicted absolute risk, which outperforms currently available prognostic models. In addition, the current model had more than doubled the dynamic range and better reclassified cardiovascular risk. Moreover, this model is biologically consistent, as the various biological processes involved in cardiovascular health are mediated and regulated by proteins.

Reliable identification of individuals with an observed event rate > 50% and a median time to event of 18 months is of clinical and economic importance.”

Proteins also change with environmental conditions, depending on the level of gene expression. The 27 proteins used in the model were associated with processes that predict higher cardiovascular risk. Of these, 16 and 12, respectively, were part of a database exploring the correlation between these proteins and the genome, and were causally linked to a genetic factor of CVD or one of its risk factors. risk.

Under conditions of positive, negative, and neutral risk factor changes, this protein-based model showed true reductions, increases, or no change in predicted absolute risk. When other conditions associated with increased cardiovascular events were incorporated into the analysis, including smoking and diabetes, the model continued to correctly predict the elevated risk. He also predicted that untreated high systolic blood pressure and high lipid levels in the same group would increase the risk.

This shows that the surrogate is universal and will respond to a change in outcome, regardless of the mechanism. This multiprotein model is also more sensitive to risk factors than individual biomarkers.

Further work along the same lines could provide a much-needed universal surrogate for cardiovascular risk.

Journal reference:

  • Williams, SA, Ostroff, R., Hinterberg, MA, et al. (2022). A proteomic surrogate for cardiovascular outcomes that is sensitive to multiple mechanisms of risk change. Science Translational Medicine. doi:10.1126/scitranslmed.abj9625.

Leave a Comment