Neuropeptides are crucial signaling molecules that regulate diverse physiological processes spanning growth, social behavior, learning, memory, metabolism, homeostasis, reproduction, and neural differentiation across both nervous and peripheral systems. Dysregulation of neuropeptides signaling is closely linked to various pathological conditions, such as neurological disorders, metabolic diseases, cardiovascular conditions, and even cancer, positioning them as potential therapeutic agents or targets for intervention. In recent years, research into neuropeptides has accelerated, with vast amounts of data continuously accumulating in multiple databases. However, the study of neuropeptides is often impeded by the need for extensive and time-consuming experimental investigations. As a result, computational tools have become essential for the rapid, large-scale identification of neuropeptides. This review systematically discusses neuropeptide-related databases and computational tools. These databases organize extensive data on neuropeptide sequences, structures, and functions. Among these, NeuroPep2.0, with 11,417 neuropeptide entries, is currently the most widely used dataset for neuropeptide prediction. Additionally, this review explores the application of computational approaches in neuropeptide prediction. While early methods predominantly relied on homologous sequence alignment and biochemical feature statistics, recent advances in machine learning have significantly enhanced prediction accuracy and efficiency. Tools such as NeuroPred-PLM and DeepNeuropePred, developed by our research group using protein language models, have substantially improved prediction performance. In conclusion, this review provides a comprehensive overview of current neuropeptide databases and computational tools, offering researchers a thorough survey of available resources and analytical methods, and emphasizing the necessity of continuous optimization to advance neuropeptide research and its therapeutic applications.
Neuropeptides, databases, computational tools, machine learning, deep learning, prediction models
Neuropeptides are signaling molecules composed of amino acid chains, synthesized and released by neurons, playing a pivotal role in modulating signals within the nervous system. They not only facilitate direct signal transmission but also regulate the release and activity of other neurotransmitters. As a result, neuropeptides influence a variety of physiological processes, including emotion, behavior, pain perception, stress responses, appetite, learning, memory, skeletal homeostasis, and metabolism. Unlike classical neurotransmitters such as acetylcholine and dopamine, neuropeptides are larger molecules with more complex structures, enabling them to convey richer chemical information and interact with a broader range of recognition sites. This structural complexity allows for more precise signal transmission and regulation. Furthermore, neuropeptides can be released not only at synaptic sites but also at various other locations outside of synaptic specializations, where they can diffuse over significant distances to exert their effects through G protein-coupled receptors (GPCRs). While neuropeptide diffusion and binding occur more slowly than that of classical neurotransmitters, their interactions with receptors are typically more robust, leading to sustained and long-lasting regulatory effects.
The identification of neuropeptides dates back to the 19th century, when biological activity was observed in extracts derived from the brain tissues of various animals. In 1905, physiologist Ernest Starling coined the term “hormone” to describe chemical messengers such as vasopressin (VP) and oxytocin (OT), which are now classified as neuropeptides. By the 1950s, it was established that many hormones are peptide chains. Extensive extraction and purification efforts led to the discovery of additional peptides in brain regions such as the hypothalamus and brainstem, emphasizing their roles in behavior and memory. Starting in the 1950s, de Wied’s pioneering research demonstrated that adrenocorticotropic hormone (ACTH), melanocyte stimulating hormone, and VP influence learning and memory. In the 1970s, he introduced the term “neuropeptide” to describe hormone-like peptides and their fragments that exhibit neural activity. The application of techniques such as mass spectrometry, immunostaining, and radiolabeled ligand binding has been crucial in exploring the distribution, synthesis pathways, and functional mechanisms of neuropeptides. In the 21st century, advancements in molecular biology, genomics, proteomics, and single-cell analysis have made neuropeptide research more precise and comprehensive.
Neuropeptides play a dual role in the regulation and transmission signals within the nervous system, while also fulfilling a diverse range of functions throughout the body. They regulate emotional and behavioral processes, pain perception, as well as learning and memory. Neuropeptides are also essential for metabolic regulation, particularly in appetite control and energy homeostasis. Additionally, they contribute to neuroprotection and nerve regeneration. Dysregulation or dysfunction of neuropeptides is often associated with various neurological disorders, including Alzheimer’s disease, Parkinson’s disease, depression, anxiety, and chronic pain. Furthermore, neuropeptide dysregulation can negatively impact metabolic and endocrine functions, contributing to conditions such as obesity, metabolic syndrome, and diabetes. Moreover, neuropeptide dysregulation has been linked to significant cardiovascular disorders. Neuropeptides have also been implicated in tumorigenesis, including in breast cancer and neuroblastoma.
Research on neuropeptides has substantially advanced our understanding of the nervous system and holds broad applications in biomedicine and agriculture. Neuropeptides and their receptors have emerged as promising targets for drug development, with over 80 neuropeptide-related drugs by the U.S. Food and Drug Administration for clinical use. Various neuropeptides, along with their analogs or receptor agonists, show potential in treating a range of diseases, including autism, depression, anxiety, diabetes, Alzheimer’s disease, stroke, and Parkinson’s disease. Neuropeptides also show promise in pest control by regulating insect growth and behavior, offering environmentally friendly alternatives. As a key area in neuroscience, clinical research, and agriculture, neuropeptides present vast potential for future research and practical applications.
Neuropeptides are synthesized and released by neurons initially as inactive precursor proteins, referred to as prepropeptides or prohormones. These precursors undergo a series of cleavage processes and post-translational modifications to generate biologically active, mature peptides. Prepropeptides contain several elements, including a signal peptide, the neuropeptide sequence, spacer peptides, and cleavage sites. Following their synthesis on ribosomes associated with the rough endoplasmic reticulum, the signal peptide directs the prepropeptides into the endoplasmic reticulum, where it is subsequently cleaved. Initial folding and glycosylation occur within the endoplasmic reticulum, resulting in the formation of propeptides. These propeptides are then transported to the Golgi apparatus, where they are cleaved into short-chain, mature neuropeptides by endopeptidases and carboxypeptidases. Additional chemical modifications, such as glycosylation, sulfonation, and methylation, further refine their structure and functionality. The processed neuropeptides are packaged into dense-core vesicles for storage and rapid release when required. During storage, further modifications may occur, including phosphorylation, N-terminal pyroglutamate formation, and C-terminal amidation, which finely regulate their activity upon release. Dense-core vesicles are transported throughout the neuron, enabling the release of neuropeptides from the synaptic cleft, cell body, or axons as needed.
Upon neuronal stimulation, neuropeptides stored in dense-core vesicles are released into the synaptic cleft or extracellular space via exocytosis, often simultaneously with classical neurotransmitters. However, their release mechanisms differ: neuropeptides require a lower concentration of Ca 2+ and are released at greater distances from Ca 2+ entry sites compared to classical neurotransmitters. This allows neuropeptides to act on distant receptors through a process known as volume transmission, leading to broader and more prolonged regulatory effects. Unlike classical neurotransmitters, neuropeptides lack a reuptake mechanism and are degraded more slowly by peptidases, which further contributing to their sustained activity.
The process of neuropeptide synthesis and release involves multiple steps and complex physiological mechanisms, making the identification of neuropeptides quite challenging. As described in the following sections, computational methods for neuropeptides are primarily divided into precursor prediction, mature neuropeptide prediction, and neuropeptide precursor cleavage site prediction. These methods are categorized based on the key stages in the synthesis and release of neuropeptides, which facilitates a better understanding and identification of these molecules.
Despite the limited number of genes encoding neuropeptides, the diversity of neuropeptides produced is exceptionally high. This diversity arises from the utilization of multiple mechanisms during neuropeptides synthesis. It also explains the presence of many similar sequences in neuropeptide databases and underscores the need for computational prediction methods to consider these features.
Typically, proteins are synthesized from messenger RNAs (mRNAs), which are derived from precursor mRNAs (pre-mRNAs) through a series of cellular processes, including splicing, capping, and polyadenylation within the nucleus. One significant mechanism employed by neurons to enhance neuropeptide diversity is alternative splicing. For example, the calcitonin gene can undergo variable splicing to produce mRNAs that encode the mature calcitonin peptide (pCal) and the calcitonin gene-related peptides (pCGRP1 and pCGRP2).
As previously noted, proteolytic processing occurs within the Golgi apparatus and dense-core vesicles, where precursor peptides are cleaved at specific sites. This selective cleavage, along with post-translational modifications, leads to the generation of a wide array of neuropeptides. A neuropeptide precursor containing a single bioactive peptide can undergo various post-translational modifications, resulting in the production of neuropeptides of differing lengths that retain identical C-terminal or N-terminal sequences. For instance, the cholecystokinin (CCK) gene encodes a pro