Helen Frankenthaler Foundation

kinase activity assay

Exploring the understudied human kinome for research and therapeutic opportunities

ABSTRACT

The functions of protein kinases have been heavily studied and inhibitors for many human kinases have been developed into FDA-approved therapeutics. A substantial fraction of the human kinome is nonetheless understudied. In this paper, members of the NIH Understudied Kinome Consortium mine public data on “dark” kinases to estimate the likelihood that they are functional. We start with a re-analysis of the human kinome and describe the criteria for creation of an inclusive set of 710 kinase domains and a curated set of 557 protein kinase like (PKL) domains. Nearly all PKLs are expressed in one or more CCLE cell lines and a substantial number are also essential in the Cancer Dependency Map. Dark kinases are frequently differentially expressed or mutated in The Cancer Genome Atlas and other disease databases and investigational and approved kinase inhibitors appear to inhibit them as off-target activities. Thus, it seems likely that the dark human kinome contains multiple biologically important genes, a subset of which may be viable drug targets.

INTRODUCTION

Protein phosphorylation is widespread in eukaryotic cells and mediates many critical events in cell fate determination, cell cycle control and signal transduction. The structures and catalytic activities of eukaryotic protein kinases (ePKs), of which more than 500 are found in humans, have been intensively investigated for many years: to date, structures for over 280 unique domains and ~4,000 co-complexes have been deposited in the PDB database. The ePK fold is thought to have arisen in procaryotes and evolved to include tyrosine kinases in metazoans, resulting in a diverse set of enzymes that are often linked in a single protein to other catalytic domains and to SH2, SH3 and protein binding domains. In addition, 13 human proteins have two ePK kinase domains. An excellent recent review describes the structural properties of ePKs and the drugs that bind them.

The kinase domain of protein Kinase A (PKA), a hetero-oligomer of a regulatory and catalytic subunit, was the first to be crystalized and is often regarded as the prototype of the ePK fold. It involves two distinct lobes with an ATP-binding catalytic cleft lying between the lobes. With respect to sequence, ePK are characterized by 12 recurrent elements involving ~30 highly conserved residues. The kinase fold is remarkably adaptable however, and has diverged in multiple ways to generate protein families distinct in sequence and structure from PKA. The eukaryotic like kinases (eLKs) retain significant sequence similarity to the N-terminal region of ePKs but differ in the substrate binding lobe; choline kinase A (CHKA) is a well-studied example of an eLK. Kinases with an atypical fold (aPKs) have weak sequence similarity to ePKs, but nevertheless adopt an ePK like structural fold and include some well-studied kinases such as the DNA damage sensing ATM and ATR enzymes as well as lipid kinases such as PI3K, one of the most heavily mutated genes in breast cancer.

In humans, ePKs, eLKs and aPKs are conventionally organized into ten groups based on sequence alignment and structure; this often corresponds to modes of regulation and function. For example, tyrosine kinases represent a distinct branch of the kinome tree that includes 58 human receptor tyrosine kinases (RTKs) that bind extracellular ligands (growth factors) and share an extended regulatory spine that allosterically controls catalytic activity. The AGC group of kinases, in contrast, are regulated by a conserved C-terminal tail flanking the kinase domain. Over 200 additional proteins annotated as “kinase” in UniProt but are unrelated to the protein kinase fold enzymes and therefore termed uPKs (unrelated to Protein Kinases). Enzymes with phosphotransferase activity in the uPKs family include hexokinases that phosphorylate sugars and STK19, which displays peptide-directed phosphotransferase activity and also binds protein kinase inhibitors.

The human kinome includes ~50 pseudokinases that lack one or more residues generally required for catalytic activity. These residues include the ATP -binding lysine (K) within the VAIK motif, the catalytic D within the HRD motif and the magnesium binding D within the DFG motif. Many pseudokinases function in signal transduction despite the absence of key catalytic residues. For example, the EGFR family member ERBB3/HER3 is a pseudokinase that, when bound to ERBB2/HER2, forms a high affinity receptor for heregulin growth factors. ERBB3 over-expression also promotes resistance to therapeutic ERBB2 inhibitors in breast cancer. Some proteins commonly annotated as pseudokinases even have phospho-transfer activity. Haspin, for example, is annotated as a pseudokinase in the ProKino database because it lacks a DFG motif in the catalytic domain, but it has been shown to phosphorylate histone H3 using a DYT motif instead; H3 phosphorylation by Haspin changes chromatin structure and mitotic outcome and is therefore physiologically important.

Protein kinase inhibitors, and the few activators that have been identified (e.g. AMPK activation by salicylate and A-769662), are diverse in mechanism and structure. The molecules include ATP-competitive inhibitors that bind in the enzyme active site and non-competitive “allosteric” inhibitors that bind outside the active site, small molecule PROTAC degraders whose binding to a kinase promotes ubiquitin-dependent degradation and antibodies that target the growth factor or ligand binding sites of receptor kinases or that interfere with a receptor’s ability to homo or hetero-oligomerize. Kinase inhibitors have been intensively studied in human clinical trials and over 50 have been developed into FDA-approved drugs.

A substantial subset of the kinome has been little studied, despite the general importance of kinases in cellular physiology, their druggability and their frequent mutation in disease. This has given rise to a project within the NIH’s Illuminating the Druggable Genome (IDG) Program, to investigate the understudied “dark kinome” and determine its role in human biology and disease. IDG has distributed a preliminary list of dark kinases based on estimates of the number of publications describing that kinase and the presence/absence of grant (NIH R01) funding; we and others have started to study the properties of these enzymes. As described in greater detail below, defining the dark kinome necessarily involves a working definition of the full kinome and a survey of the current state of knowledge. The starting point for this survey is the standard list of kinases put forward in a groundbreaking 2002 paper by Manning et al that found the human kinome to have 514 members; this has subsequently been updated via the KinHub Web resource to include 522 human kinases (although many papers cite a number closer to 520-540).

While protein kinases could in principle be defined strictly as enzymes that catalyze phosphotransfer from ATP onto serine, threonine and tyrosine, such a definition would exclude biologically active pseudokinases and structurally and functionally related lipid kinases. It would also fail to account for a lack of functional data for a substantial number of proteins, potentially excluding kinases that are physiologically or catalytically active. An alternative definition uses sequence alignment and structural data to identify closely related folds, but excludes uPKs having kinase activity as well as bromodomains that are potently bound and inactivated by kinase inhibitors. A less restrictive list is useful for the kinome-wide activity profiling that is a routine part of kinase-focused drug discovery. Profiling typically involves screening compounds against panels of recombinant enzymes (e.g. KINOMEscan) or chemoproteomics in which competitive binding to ATP-like ligands on beads (so-called kinobeads or multiplexed inhibitor beads – MIBs) is assayed using mass spectrometry. Such screens benefit from a comprehensive list of binding domains for which selectivity can be assayed.

In this perspective we analyze the composition and properties of the dark kinome, with a focus on evidence that understudied kinases are expressed and potentially functional in normal cellular physiology and in disease. As a first step we generate new lists for membership in the full kinome based on a variety of inclusion and exclusion criteria. We also re-compute membership in the dark kinome and consolidate available data on dark kinase activity and function. This evidence is typically indirect, such as data from TCGA (The Cancer Genome Atlas) on the frequency with which a kinase is mutated in particular type of cancer. In aggregate, however, the evidence strongly suggests that the understudied kinome is likely to contain many enzymes worthy of in-depth study, a subset of which may be viable therapeutic targets. All of the information in this manuscript is available in supplementary materials, and is currently being curated and released via the dark kinome portal.

RESULTS

The composition of the human kinome

A list of human kinases was obtained from Manning et al. (referred to below as ‘Manning’) and a second from Eid et al. (via the Kinhub Web resource); a list of dark kinases according to IDG was obtained from the NIH solicitation (updated in January 2018) and a fourth list of all 684 proteins tagged as “kinases” was obtained from UniProt. These lists are overlapping but not identical. For example, eight IDG dark kinases absent from Manning and Kinhub (CSNK2A3, PIK3C2B, PIK3C2G, PIP4K2C, PI4KA, PIP5K1A, PIP5K1B, and PIP5K1C) are found in the UniProt list. We therefore assembled a superset of 710 domains (the “extended kinome”) and used curated alignment profiles and structural analysis to subdivide the domains into the primary categories: “Protein Kinase Like” (PKL), if the kinase domain was similar to known protein kinases in sequence and 3D-structure; “Unrelated to Protein Kinase” (uPK), if the kinase domain was distinct from known protein kinases; and “Unknown” if there was insufficient information to decide (see meth