Helen Frankenthaler Foundation

Cancer research neuropeptide

Computational approaches for identifying neuropeptides: A comprehensive review

Abstract

Neuropeptides (NPs) are key signaling molecules that interact with G protein-coupled receptors, influencing neuronal activities and developmental pathways, as well as the endocrine and immune systems. They are significant in disease contexts, offering potential therapeutic targets for conditions such as anxiety, neurological disorders, cardiovascular health, and diabetes. Understanding and detecting NPs is crucial because of their complex functions in health and disease. Historically, identifying NPs via wet lab techniques has been time-consuming and costly. However, integrating computational methods has shown the potential to improve efficiency, accuracy, and cost-effectiveness. Computational techniques, such as artificial intelligence and machine learning, have been extensively researched in recent years for the identification of NP. This review explores the application of machine learning (ML) techniques in predicting various aspects of NPs, including their sequences, cleavage sites, and precursors. Additionally, it provides insights into databases containing NP metadata and specialized tools used in this domain.

Keywords

  • MT: Bioinformatics
  • neuropeptides
  • neuropeptide identification tools
  • neuropeptide databases
  • neuropeptide analysis

Introduction

Neuropeptides (NPs) represent a complex and extensive group of signaling molecules that interact with G protein-coupled receptors. These messenger molecules impact a wide range of neuronal activities and contribute to crucial developmental processes, including biological, cognitive, and socioemotional aspects. Although NPs are present in glial cells, they play multiple roles,such as acting as neurotransmitter peptides in the endocrine system and as hormonal peptides in the immune system. Given their diverse functions, NPs significantly influence muscle contraction, food digestion, and various physiological and behavioral processes such as learning, memory, adaptation, and aging. The diverse functionalities of NPs in living species contribute to their significant impact on the development and progression of various diseases and their amelioration and improvement. Consequently, NPs have emerged as promising therapeutic targets for a diverse range of diseases affecting various physiological systems, including anxiety, behavior, neurology, cardiovascular health, diabetes, and a multitude of other conditions.

NP identification has received significant attention from scientists, leading to the development of different types of methods over the years to identify various types of peptides.

Corbière and colleagues comprehensively reviewed different approaches for identifying bioactive NPs in vertebrates. Their research covered various methods, including identification on the basis of biological activity, receptor analysis, the biochemical characteristics of the NPs, genomic approaches, peptidomics approaches, and de novo identification. While wet lab methods are extremely time-consuming and costly, the integration of computational approaches has significantly enhanced efficacy, accuracy, and cost-effectiveness.

In alignment with the valuable contributions of Corbière et al., this study explores the realm of computational approaches for NP prediction. By leveraging cutting-edge technologies, our research aims to provide an alternative and complementary viewpoint for identifying NPs.

Computational and in silico approaches have played a significant role in the majority of methods used for identifying NPs, complementing laboratory experiments. These methods include genomic-based peptide identification and mass spectrometry (MS), both of which utilize various computational techniques such as data analysis, bioinformatics, and predictive modeling. Genomics-based peptide identification relies heavily on computational analysis for tasks such as genetic sequence examination, sequence alignment, NP precursor (NPP) prediction, and conserved motif analysis. In the case of MS, computational algorithms and specialized software are used to reconstruct peptide sequences and predict peptide structures through de novo identification analysis. Additionally, computational methods are often employed to simplify statistical analysis, data interpretation, and result validation. The rise of AI and machine learning (ML) approaches in recent years has further enhanced peptide identification, particularly in predicting NP sequences.

According to the literature, the overall architecture of NP prediction is presented in Figure 1. The process of predicting NPs through ML/deep learning (DL) techniques has been summarized in five distinct phases. Researchers extract labeled raw data from available databases. Next, the data are preprocessed for feature engineering. The methods utilized for feature engineering are classified into two groups: feature encoding schemes and feature embeddings, and these are summarized in Tables 1 and 2, respectively. In the next phase, feature importance algorithms are often used to optimize the feature matrix dimensions. The optimized features are subsequently fed into the constructed ML or DL model following evaluation, and once satisfactory accuracy is achieved, the model can be deployed to predict NPs from unknown sequences. Several studies have made tools available through web services or Python packages, facilitating further utilization by researchers in this field.

Figure 1 The overall architecture of NP prediction based on the literature

A systematic representation of the NP prediction framework based on information from the literature. The diagram outlines key stages in the NP prediction process, including data input, feature extraction, model training, and output.

StudiesUtilized feature encoding schemes

Cleavage site predictor

1NeuroCSCTD DDE PAAC EAAC
2NeuroPIDbiophysical quantitative properties binary features information-based statistics

NPP predictor

3NeuroPPAAC DPC TPC CPC OCPC

NP predictor

4Ridzik and Brejováone-hot
5NeuroPIpredAAC Dipeptide composition (DPC) SC BP
6PredNeuroPAAC DPC BPNC AAI GAAC GDPC GTPC CTD AAE
7NeuroPpred-FuseAAC DPC ASDC GGAP CTD PSAAC
8NeuroPred-FRLBE AAI KmerAC KgapAC PrAC TPPC) KgapAP GDPC GTPC QSO CTF
9NeuroPpred-SVMAAC PSAAC AAP AAT CTD
10NeuroPred-CLQone-hot supervised embedding
11NeuroCNN_DNBone-hot AAI GGAP
13Akbar et al.Bi-PSSM KSB discrete wavelet transforms from Bi-PSSM_DWT and KSB_DWT

Table 1

Feature encoding schemes utilized by different tools

AAC, amino acid composition; AAE, amino acid entropy; AAI, amino acid index; AAP, amino acid pair scale; AAT, amino acid trimer scale; ASDC, adaptive skip dipeptide composition; BE, binary encoding; Bi-PSSM, bigram position-specific scoring matrices; BP, binary profile; BPNC, binary profiling feature; CPC, combined peptide composition; CTD, composition-transition-distribution; CTF, conjoint triad; DDE, dipeptide deviation from expected mean; DPC, dipeptide composition; DWT, discrete wavelet transform; EAAC, enhanced amino acid composition; GAAC, grouped amino acid composition; GDPC, grouped dipeptide composition; GGAP, G-gap dipeptide composition; GTPC, grouped tripeptide composition; KgapAC, K-gap composition of amino acids; KgapAP, K-gap composition of profile-based amino acids; KmerAC, Kmer composition of amino acids; KSB, K-spaced bigram; OCPC, optimal combined peptide composition; PAAC, pseudo-amino acid composition; PrAC, profile-based composition of amino acids; PSAAC, position-specific amino acid composition; QSO, quasi-sequence order; SC, split composition; TPC, tripeptide composition; TPPC, tripeptide profile of amino acid composition.

StudiesUtilized feature embeddings
1DeepNeuropePredESM (protein language model)
2NeuroPpred-SVMBERT-cls BERT-avg
3NeuroPred-PLMESM (protein language model)
4NeuroPred-CLQword2vec embedding
5NeuroCNN_DNBSkip-Gram

Table 2

Feature embeddings utilized by different tools

BERT, bidirectional encoder representation from transformer; SVM, support vector machine.

Computational research on NPs

Over the years, ML approaches have been implemented in three aspects of NP sequences, including predicting cleavage sites, predicting NPP identifiers, and predicting NPs from their sequences, which are clarified separately in Table 3.

NameType of predictionMethod (model with the best performance)Recruited dataACCSpSnAUROCMCCFeature extractionBase learnerMeta learnerWeb server
Neuropeptid