Skip to main content

Computational analysis uncovers the deleterious SNPs along with the mutational spectrum of p53 gene and its differential expression pattern in pan-cancer

Abstract

Background

A variety of accessible data, including those of single-nucleotide polymorphisms (SNPs) on the human p53 gene, are made widely available on a global scale. Owing to this, our investigation aimed to deal with the detrimental SNPs in the p53 gene by executing various valid computational tools, including—Filter, SIFT, PredictSNP, Fathmm, UTRScan, ConSurf, SWISS-MODEL, Amber 16 package, Tm-Adjust, I-Mutant, Task Seek, GEPIA2 after practical and basic appraisal, dissolvable openness, atomic progression, analyzing the energy minimization and assessing the gene expression pattern.

Results

Out of the total 581 p53 SNPs, 420 SNPs were found to be missense or non-synonymous, 435 SNPs were in the three prime UTR, and 112 SNPs were in the five prime UTR from which 16 non-synonymous SNPs (nsSNPs) were predicted to be non-tolerable while PredictSNP package predicted 14. Concentrating on six bioinformatics tools of various dimensions, a combined output was generated, where 14 nsSNPs could exert a deleterious effect. We found 5 missense SNPs in the DNA binding domain's three crucial amino acid positions, using diverse SNP analyzing tools. The underlying discoveries were fortified by microsecond molecular dynamics (MD) simulations, TM-align, I-Mutant, and Project HOPE. The ExPASy-PROSITE tools characterized whether the mutations were located in the functional part of the protein or not. This study provides a decisive outcome, concluding the accessible SNPs' information by recognizing the five unfavorable nsSNPs—rs28934573 (S241F), rs11540652 (R248Q), rs121913342 (R248W), rs121913343 (R273C), and rs28934576 (R273H). By utilizing Heatmapper and GEPIA2, several visualization plots, including heat maps, box plots, and survival plots, were produced.

Conclusions

These plots disclosed differential expression patterns of the p53 gene in humans. The investigation focused on recognizing the detrimental nsSNPs, which augmented the danger posed by various oncogenesis in patients of different populations, including within the genome-wide studies (GWS).

Background

Single Nucleotide Polymorphism (SNP) is marked as the most prevalent form of genetic mutation in humans. About 93% of the human genes include at least a single SNP (Chakravarti 2001). SNPs contribute to most of the variations among individuals, making each person unique. SNPs can be in the coding, non-coding, and intergenic regions between two genes (Carninci et al. 2005; Liu et al. 2006). Although non-coding SNPs are phenotypically neutral, nsSNPs can influence phenotype by altering protein sequences (Chakravarti 2001; Carninci et al. 2005; Liu et al. 2006; Ng and Henikoff 2006). Moreover, nsSNPs alter the amino acids in their corresponding protein, which could have a deleterious effect on the structure and function (Dryja et al. 1990; Smith et al. 1994; Singh et al. 2020). They are associated with various human diseases and disorders. Several studies confirm the association of nsSNPs with their susceptibility to infection and the progression of autoimmune diseases and inflammatory disorders (Dryja et al. 1990; Smith et al. 1994; Singh et al. 2020; Barroso et al. 1999; Chasman and Adams 2001; Lander 1996). About 50% of the mutations implicated with hereditary genetic disorders are nsSNPs (Kelly and Barr 2014; Radivojac et al. 2010). As a result, many researchers focus on nsSNPs in cancer biology, precisely, cancer-causing genes.

Mutations in the tumor suppressor gene, p53, account for ~ 50% of human cancers (Doniger et al. 2008; Finlay et al. 1989; Baker et al. 1990; Hamzehloie et al. 2012; Zhang et al. 2020). p53 is a critical regulator of tissue homeostasis (Baugh et al. 2018; Diller et al. 1990; Chng et al. 2007), which further binds to stabilize DNA as a tetramer, leading to the regulation of genes. Consequently, this helps mediate critical cellular processes, including cell-cycle arrest, DNA repair, senescence, and apoptosis (Kastenhuber and Lowe 2017; Riley et al. 2008). The regular allele of p53 encodes a 53-kD nuclear phosphoprotein that plays an important role in controlling cell proliferation (Eliyahu et al. 1989; Ahuja et al. 1989; Takahashi et al. 1989; Bressac et al. 1990; Matsuda et al. 2005). However, in human tumors, point mutations, rearrangements, allelic loss, and deletions are found in the p53 gene (Hamosh et al. 2004; Sherry et al. 2001; UniProt Consortium 2007). Together with the changes in oncogenes and tumor suppressor genes, these abnormalities consist of a network of mutations that leads to malignancy. Despite the importance of p53, no computational studies have been reported that detect the deleterious nsSNPs in the p53 gene. Therefore, we carried out an in silico analysis of the p53 gene to characterize the deleterious mutations in this current investigation. Our study encompasses- (1) retrieving SNPs in the p53 gene from available databases, (2) allocating deleterious nsSNPs to their phenotypic effects based on the sequence and structure-based homology, and identifying the regulatory nsSNPs responsible for altering the patterns of splicing and gene expression, (3) predicting the role of the substitution of the amino acid on the secondary structures based on solvent stability and accessibility, and (4) predicting the effect of mutations in the domain structures.

Methods

The flowchart expounds on our study's process of identifying and characterizing detrimental SNPs in the p53 gene. The structural and functional consequences have been analyzed upon missense mutation (Fig. 1). The workflow is given in Fig. 2.

Fig. 1
figure 1

Role of P53 in maintaining genomic integrity

Fig. 2
figure 2

Flowchart of the method and materials

Retrieval of SNP datasets

The human p53 gene was retrieved from web-based data sources, such as the Online Mendelian Inheritance in Man (OMIM) (http://www.ncbi.nlm.nih.gov/omim/). The SNPs' information, both protein accession number and SNP ID of the p53 gene, was retrieved from the NCBI dbSNP (Database of Single Nucleotide Polymorphism) (Hamosh et al. 2004), and the protein FASTA sequence was retrieved from UniProt (http://www.uniprot.org/) (Sherry et al. 2001).

Analysis of functional consequences of nsSNPs

An online tool known as Sorting Intolerant From Tolerant (SIFT) was employed to identify the deleterious non-synonymous SNPs of the p53 gene (UniProt Consortium 2007). This program assumes that primary amino acids will be conserved in the protein family and that changes at particular positions are potentially harmful (Ng and Henikoff 2003; fathmm - Analyze Cancer-Associated Variants 2018). During mutagenesis studies in humans, SIFT can easily differentiate between functionally neutral or innocuous and detrimental polymorphisms (UniProt Consortium 2007). The SIFT program's algorithm uses SWISSPROT, nr, and TrEMBL databases to find homologous sequences with a query. The rsIDs (identification number) of each SNP of the human p53 gene obtained from NCBI were submitted to SIFT as a query for homology searching. The SIFT score ≤ 0.05 was set to indicate the deleterious effect of a non-synonymous mutation on protein function.

Characterization of functional nsSNPs

For characterization of functional nsSNPs, we used PredictSNP web server (Ashkenazy et al. 2010). It was constructed from three independent datasets by eliminating all inconsistencies, duplicities, and mutations before assessment. The standard dataset comprising over 43,000 mutations was taken for the impartial assessment of eight well-known prediction tools: nsSNPAnalyzer, MAPP, PANTHER, PolyPhen-1, PhD-SNP, PolyPhen-2, SIFT, and SNAP. The six best-performing tools were shared into an accord classifier PredictSNP, resulting in drastically better prediction implementation and simultaneous time returned results for all mutations, therefore corroborating that the unanimity prediction denotes an accurate and vigorous alternative compared to the predictions delivered by individual tools (Ashkenazy et al. 2010).

Prediction of cancer-promoting mutations

Some mutations may have an association with cancer. To predict the cancer-associated SNPs of the p53 gene, we used the Functional Analysis through Hidden Markov Models (Fathmm) webserver (http://fathmm.biocompute.org.uk/cancer.html). Fathmm allows combining sequence conservation within hidden Markov models (HMMs). Fathmm server is a high-throughput web server often employed to identify phenotypic, molecular, and functional consequences of protein variants on coding and non-coding regions. Fathmm employs unweighted, sequence/conservation-based, and weighted algorithms combined with sequence conservation with pathogenicity weights. In the Fathmm server, the default prediction threshold is set at − 0.75, where a prediction with a score less than this value predicts that the mutation is considered to be potentially associated with cancer. Cancer-promoting mutations are detrimental to our bodies. These types of mutations play a critical role in cell-cycle regulation, and mutations falling in the conserved region can also depress the nature of the domain.

Identification of functional SNPs in conserved regions

Functional amino acids remain conserved throughout evolution. Evolutionarily conserved amino acid residues in the p53 protein were identified by the ConSurf web server (http://consurf.tau.ac.il/2016/index_proteins.php) by using a Bayesian algorithm (conservation scores: 1–4 variable, 5–6 intermediate, and 7–9 conserved) (Bendl et al. 2014; Pesole et al. 2002). Protein FASTA sequence was submitted, and the conserved regions were predicted, shown by means of coloring scheme and conservation score of the amino acids. It also predicts the functional and structural residues of the protein. For further analysis, highly conserved amino acids at high-risk nsSNP sites were selected.

Scanning of UTR SNPs

Untranslated regions (UTRs) play vital roles in the post-transcriptional instruction and regulation of gene expression, which comprise the modulation of the transport of mRNAs out of the nucleus and translation competence, subcellular localization, and constancy (EMBL-EBI 2018). To find the functional SNPs, we employed UTRScan, a web server (Pesole et al. 2001) for alignment matching. UTRScan searches nucleotide (RNA, tRNA, DNA) or protein sequences to find UTR motifs and locate motifs that distinguish 3′UTR and 5′UTR sequences (in specific sequences). The UTRSite Database defines such motifs as a compilation of functional sequence arrangements in the 5′- or 3′-UTR sequences (Grillo et al. 2010; Zhang et al. 2013). If an SNP with a different nucleotide at each UTR is found to have dissimilar working patterns, this UTR SNP is expected to impact the mRNA stability. To perform this, 5′- and 3′-UTR SNPs from NCBI were submitted in FASTA format, and the results showed predicted UTRs at the specific region.

Identification of a deleterious mutation in the functional domain

The functional domain of the product of the p53 gene was identified using InterProScan (http://www.ebi.ac.uk/InterProScan/). InterProScan connects diverse protein signature identification methods from the InterPro consortium associate databases into one resource. A web-based version of InterPro is accessible for academic and profitable organizations from the EBI. The InterProScan tool allows scanning protein sequences received in FASTA format for matches against the InterPro protein signature databases. After analyzing the deleterious mutation from the SIFT mutation among them, it was identified in the functional domain of the p53 protein.

Modeling of the mutated protein

The three-dimensional structure (3D) of p53 was obtained from the Protein Data Bank (PDB entry 6XRE), and the missing region was constructed using the SWISS-MODEL server (Waterhouse et al. 2018). The p53 model was submitted to molecular dynamics (MD) simulations through 500 ns (ns). The most populated conformer through MD simulations was employed to construct mutants, and the mutations were implemented using PyMOL (Discovery Studio 2018). MD simulations were carried out using the Amber 16 package (DeLano 2002) and the ff14SB force field (Case et al. 2005). Systems were inserted in a dodecahedral box of 1 nm between the protein and the edge of the box using the TIP3P water model and neutralized utilizing Cl and Na+ counter-ions. The solvated protein was submitted through energetic minimization using the steepest descent method through 1000 steps. Following this, the system was equilibrated by 1 ns at 300 K, where the solvent was kept to desist, but the protein atoms were restrained. MD simulations of 500 ns were performed for the wild-type protein and 100 ns for all the mutants created from the p53 protein. MD simulations were run considering an NPT ensemble, and the time step for the MD simulations was 2 femtoseconds (fs). The temperature was set at 310 K using the V-rescale algorithm (Duan et al. 2003), and the pressure was set at 1 bar using Parrinello-Rahman (Duan et al. 2003). The LINCS algorithm (Pronk et al. 2013; Hess et al. 1998) and the SETTLE algorithm (Krieger and Vriend 2015; Miyamoto and Kollman 1992) constrain all bonds, including hydrogen atoms and water molecules. The particle mesh Ewald method (Eastman and Pande 2010; Darden et al. 1993) was used to treat the long-range electrostatic forces, whereas van der Waals forces were treated using a cutoff of 1.2 nm. Trajectories were analyzed with the cpptraj module of Amber 16 (DeLano 2002).

Energy minimization and RMSD calculation of the protein models

Using the TM-align algorithm, most populated conformers obtained through clustering analysis were used to estimate RMSD between wild type and mutants. TM-align combines the TM-score rotation matrix and dynamic programming (DP) to identify the best structural alignment between protein pairs. This server was used for the RMSD calculation of the protein structures (Baker 2017). YASARA-minimization server was employed to perform the energy minimization of the most populated conformers of both wild type and mutants obtained through MD simulations. YASARA (Yet Another Scientific Artificial Reality Application) is a modeling and simulation software with multiple applications. YASARA-minimization server uses YASARA force field for energy minimization that can optimize the damage of the mutant proteins and thus precisely calculates the reliable energy. To perform this task, the PDB file of the wild-type and mutant proteins was inserted as input data, and the result was also additionally examined (Zhang and Skolnick 2005).

Effect of mutation in protein stability

I- Mutant 3.0 server was employed to predict the alteration in stability upon representative mutations. I-Mutant is a high-throughput support vector machine (SVM)-based tool server. The server can automatically predict the alteration in the stability of the structure by examining the structure of the protein sequence. I-Mutant 3.0 can be utilized as a classifier to predict the sign of stability with mutations and a regression calculator to predict the distinction in Gibbs free energy. The resulting DDG value indicates the difference between the Gibbs free energy of mutated protein and the wild-type protein in kcal/mol (Krieger et al. 2009).

Prediction of structural effects upon mutation

Project HOPE (Project Have your Protein Explained) was employed to understand the effect of the amino acid substitutions (Capriotti et al. 2005). HOPE server was utilized for molecular dynamics simulation to observe the effect of the mutations on the structure of p53. This web server performed a BLAST against the PDB, built a homology model of the query protein through YASARA (if applicable), and collected 3D structure data from WHAT IF web services. Subsequently, the sequence from the UniProt database was retrieved, and features like an active site, motifs, domains, and so forth were also shown. Finally, to predict the protein features, Distributed Annotation System (DAS) servers were utilized to exchange annotations on genomic and protein sequences (HOPE 2018).

Mutational spectrum analysis

Several analysis techniques, e.g., heat map, differential analysis, and survival analysis, were performed to retrieve the gene expression level of P53 in different types of cancer. For mutational spectrum analysis, data regarding mutation type in P53 were retrieved from cBioPortal for Cancer Genomics (https://www.cbioportal.org/). cBioPortal is an open-access resource for exploring large-scale datasets, where data were from both extensive consortium efforts (e.g., TCGA) and individual laboratories. Those data were organized in Microsoft Excel, and the heat map was generated in Heatmapper (Venselaar et al. 2010). The heat map used column Z score to compare the expression level among p53 mutation types in specific cancers, where + 4 was the highest expression (red), and − 4 was the lowest expression (blue).

Differential analysis and survival analysis

We employed GEPIA2 (Gene Expression Profiling Interactive Analysis) datasets for differential analysis, investigating a gene's differential expression patterns and comparing those with TCGA and GTEx data (Babicki et al. 2016). Here, the signature score, generated by GEPIA2, was gauged by the mean value of log2 (TPM + 1) of each gene of Th-1 like signature gene set, with a cutoff of 1 and p value cutoff of 0.01. Data were normalized by the overall survival method, with a 95% confidence interval. The group cutoff was selected median with a cutoff high of 50% and a cutoff low of 50%. The statistical significance level was considered as p value ≤ 0.05.

Results

SNP database from for p53

The polymorphism data are available in several databases. NCBI dbSNP houses extensive data for different genes, and NCBI has the largest database that helps analyze single nucleotide polymorphisms. 581 SNPs have been found for cellular tumor antigen p53 (NCBI Reference Sequence: NP_000537.3). Among them, 420 SNPs were found to be missense or non-synonymous. Among 581, there were 435 SNPs in 3′UTR and 112 in 5′UTR regions. Only the missense, 3′ (3 prime), and 5′ (5 prime) UTR SNPs have been selected for further analysis.

Prediction of detrimental non-synonymous SNP

An online tool, SIFT, was employed to analyze an amino acid's conservancy by sequence homology; this helps to determine the conservation of a specific position of an amino acid in a protein. SIFT aligns paralogous and orthologous proteins' amino acid sequences while determining the effect of an amino acid replacement, which helps to analyze its functional importance and physical characteristics. SIFT takes rsIDs as input. Our analysis found 16 missense SNPs among 420 to be predicted deleterious using the SIFT web server (Table 1).

Table 1 Prediction of deleterious SNP by SIFT tool

Analysis of SIFT predicted deleterious SNPs

In SIFT analysis, 16 SNPs were found to be detrimental. These 16 SNPs were also analyzed in an online SNP analyzing tool known as PredictSNP, a package system where other SNP analyzing methods have been assembled. Protein sequences in FASTA formats were used as inputs in PredictSNP, and following this, the SIFT result mutation was done in the PredictSNP. The impact of the mutation was analyzed by choosing nsSNPAnalyzer, MAPP, PANTHER, PolyPhen-1, PhD-SNP, PolyPhen-2, SNAP, and SIFT tools. PredictSNP shows the result of different SNPs, mentioning which percentage are either characterized as detrimental or neutral; the percentage value indicates the confidence of the result. SNPs predicted as neutral by more than one tool have been excluded from our study (Table 2).

Table 2 Further analysis of 16 SIFT predicted nsSNPs by PredictSNP (online tool)

Identification of cancer-associated SNPs from predicted deleterious SNPs

SNPs predicted as deleterious were then analyzed using Fathmm to determine whether they were associated with cancer. Our analysis revealed that every SNP predicted as detrimental was also predicted as having their association with cancer. In the Fathmm server, the default prediction threshold is − 0.75, where a prediction with a score less than this indicates that the mutation is potentially associated with cancer (Table 3). Our result finds that the predicted score is much lower than the threshold, indicating a much higher potentiality of relating these SNPs to cancer (Table 3).

Table 3 Cancer-associated SNP where the threshold value is − 0.75. Scores having − 0.75 predicted as having their association with cancer

Identification of functionally important SNPs in the conserved regions

Some amino acids are crucial for the function of a protein. Essential amino acids contributing toward specific functions tend to be evolutionarily conserved. To identify the evolutionarily conserved amino acids and the proteins, we used the online tool ConSurf. ConSurf results are tabulated, where conserved amino acids in a specific position are shown with CS and color value, where the lower the CS and the higher the color value, the higher the conservancy (Table 4).

Table 4 Summary of amino acids (conservation profile) in that corresponds to region with high-risk nsSNPs. CS: conservation score, color (6–9 = conserved, 5 = average and 1–4 = variable); (s): predicted structural site; (f): predicted functional site

Functional SNPs in UTR identification

3′UTR regions significantly affect gene expression due to the defective ribosomal RNA translation or RNA half-life. 5′UTR also plays an important role in mRNA stabilization. The UTRscan server analyzed 77 3′UTR SNPs and 129 5′UTR SNPs of the p53 gene (Table 5). The UTRscan server looks for patterns in the UTR database for regulatory region motifs and, according to the given SNP information, predicts if any matched regulatory region is damaged (Tang et al. 2019; Pesole and Liuni 1999). UTRscan found 8 UTRsite motif matches in the p53 transcript. A total of 141 matches were found for 5 motifs.

Table 5 Result shows UTR regions in the p53 transcript from the UTRScan server

Prediction of a deleterious mutation in the functional domain of p53

InterProScan tool analysis found 3 functional domains within the p53 gene. InterProScan takes the FASTA format of protein sequences as input and scans for matching protein sequences against the InterPro protein signature databases. InterProScan also predicts the function of the residues, provides the consensus amino acid for protein function, and determines whether the predicted deleterious mutation SNP is necessary for a function or not. We can authenticate our prediction if the predicted deleterious SNP is found in the amino acid in the functional domain or functioning residue in Table 6. The InterProScan result is tabulated in Table 7.

Table 6 Functional domains and their position in p53 protein
Table 7 Functional residues in different functioning sites of p53 protein

Comparative modeling of high-risk Non-synonymous SNPs and MD simulations

We used SWISS-MODEL to get the 3D structure of human p53 protein with the predicted SNPs. Initially, the stability of the p53 protein was evaluated through one microsecond (µs). Analysis of root-mean-squared deviation (RMSD) and radius of gyration (Rg) values considering backbone atoms showed that p53 protein reached constant RMSD (Fig. 3A) and Rg (Fig. 3B) values after 0.6 µs with an average Rg value of 22.3 ± 0.3. Based on this result, clustering analysis was performed over the equilibrated simulation time (last 0.4 µs) to obtain the most populated conformer. Subsequently, the following non-synonymous mutations in the DNA binding domain were introduced: S241F, R248Q, R248W, R273H, and R273C using PyMOL, to obtain the p53 mutants. Analysis of the mutants shows that the five mutants reached stable RMSD (Fig. 3C) and Rg (Fig. 3D) values between 20 and 30 ns. Average Rg values showed the following values R248Q (23.9 ± 0.2), R273C (22.2 ± 0.2), R273H (23.6 ± 0.3), S241F (23.4 ± 0.2), and R248W (21.5 ± 0.2). This analysis indicates that R248Q, R273H, and S241F systems experience an increment of the hydrodynamic radius compared with the wild-type protein, whereas the R248W showed a small decrease and R273C maintained a similar radius to the wild-type protein.

Fig. 3
figure 3

RMSD, Rg, and RMSF analysis of the wild-type and mutant p53 protein. RMSD (A) and Rg (B) analysis of wild-type p53. RMSD (C), Rg (D), and RMSF (E) analysis of the wild type and mutants of p53 protein

Root-mean-square fluctuation (RMSF) analysis over the equilibrated simulation time (Fig. 3C) showed that the regions with the highest mobility are localized between the N-terminal region and residue 40 and between residues 60 and 90. Both these regions correspond to a long loop with four small α-helices. The lowest mobility of the region between the N-terminal region and residue 40 was observed for R248W, which also showed the lowest average Rg value, suggesting that this region could be responsible for the differences in Rg values.

Clustering analysis was performed over the equilibrated simulation time (last 70 ns) to obtain the most populated conformers. Wild type and mutants were then subjected to the YASARA energy minimization server for energy minimization. Energy minimization results showed decreased free energy for all mutant models compared to the wild-type models. The results are shown in Table 8. RMSD was calculated using the Tm-align tool, where the results were shown to be between 3.0 and 4.0 Å. These outcomes demonstrate a critical change in the protein structure that can alter its natural function.

Table 8 RMSD (Å) values and total free energy after energy minimization of both wild-type and mutated protein models

After mutation of the wild type, it was found that in every case, energy after minimization was much higher (more positive) for the mutants than the conventional wild type, indicating that these mutations destabilize the structure of the protein. In case of a mutation in the R273H and R273C domains, changing the position of the amino acid arginine by histidine or cysteine affects the structure of the protein more than the other mutations.

Prediction of protein structural stability

We used the neural network-based routine tool I-Mutant 3.0 to study the potential change in protein stability upon mutations. This tool took the input of the mutated protein models derived from the PHYRE-2 server in PDB format. I-Mutant 3.0 creates results taking the help of the ProTherm database. This database housed extensive experimental data on free energy alterations due to mutations. In addition, this tool predicts the score of free energy change due to mutations, incorporating the energy-based online tool FOLD-X. This increases the precision to 93% on one-third of the database if the FOLD-X analysis is incorporated with I-Mutant (Datta et al. 2015). Models with the following mutations: S241F, R248Q, R248W, R273H, and R273C were subjected to the server to predict DDG stability and RSA calculation. The result shows that every mutation decreased the stability of the protein. Mutation R273H was responsible for the lowest DDG value (− 1.62 kcal/mol), followed by R273C (− 1.52 kcal/mol). DDG values for other mutations ranged from − 0.51 kcal/mol to − 0.93 kcal/mol; these negative DDG values decreased protein stability. The results are shown in Table 9.

Table 9 I-mutant predictions for selected nsSNPs

Analysis of structural effect upon mutation in DNA binding domain

The InterProScan tool was used to find the functional domain in p53 protein and map the predicted deleterious mutations in these domains to speculate the changes they might cause in the domain structures. Among the predicted 14 detrimental SNPs revealed by different SNP analyzer tools, we found 5 missense SNPs in the 3 crucial amino acids located in a domain responsible for DNA binding. These amino acids are essential for the functional activity of the domain. Therefore, a mutation in this amino acid position could change the protein structure and function. We observe the effect on the structure of p53 due to these 5 missense SNPs using an online tool, HOPE.

In Fig. 4A, the wild-type residue has positively charged arginine (R). However, the mutation from arginine to glutamine (Q) at the 248th position makes the mutant neutral. In Fig. 4B, serine (S) mutated into phenylalanine (F) at 241th. The mutant residue is bigger and more hydrophobic than the wild type. In Fig. 4C, arginine (R) mutated into histidine (H) at 273rd position, and the mutant residue is more minor and neutral, whereas the wild type is positively charged. In Fig. 4D, the arginine (R) mutated into cysteine (C) at the 273rd position, and the mutant residue is more minor and neutral, but the wild type is positively charged. In Fig. 4E, arginine (R) mutated into tryptophan (W) at the 248th position. The mutant residue is more considerable and neutral, whereas the wild type is more significant and positively charged.

Fig. 4
figure 4

Effect in the structure due to mutation A R248Q, B S241F, C R273H, D R273C, E 248W (The protein is colored gray, green indicates the side chains of wild type, and mutant residue is shown in red)

Evaluation of p53 gene mutation level in various cancer

For the mutational spectrum analysis, we investigated the expression of six mutation types of p53—missense, synonymous, in-frame indel, nonsense, splice site, and frameshift in 33 cancer subtypes including ACC (Adrenocortical carcinoma), BLCA (bladder urothelial carcinoma), BRCA (breast invasive carcinoma), CESC (cervical squamous cell carcinoma and endocervical adenocarcinoma), CHOL (cholangiocarcinoma), COAD (colon adenocarcinoma), DLBC (lymphoid neoplasm diffuse large B cell lymphoma), ESCA (esophageal carcinoma), GBM (glioblastoma multiforme), HNSC (head and neck squamous cell carcinoma), KICK (kidney chromophobe), KIRC (kidney renal clear cell carcinoma), KIRP (kidney renal papillary cell carcinoma), LAML (acute myeloid leukemia), LGG (brain lower grade glioma), LIHC (liver hepatocellular carcinoma), LUAD (lung adenocarcinoma), LUSC (lung squamous cell carcinoma), MESO (mesothelioma), OV (ovarian serous cystadenocarcinoma), PAAD (pancreatic adenocarcinoma), PCPG (pheochromocytoma and paraganglioma), PRAD (prostate adenocarcinoma), READ (rectum adenocarcinoma), SARC (sarcoma), SKMC (skin cutaneous melanoma), STAD (stomach adenocarcinoma), TGCT (testicular germ cell tumors), THCA (thyroid carcinoma), THYM (thymoma), UCEC (uterine corpus endometrial carcinoma), UCS (uterine carcinosarcoma), and UVM (uveal melanoma) (Fig. 5). Employing column Z score, significant upregulation was represented in red and downregulation in blue. The heat map generated the hierarchical clustering of cancer subtypes based on their level of similarity. COADREAD to ESCA subtypes were assorted in one cluster and PRAD to GBMLGG in another, leaving out UCS. GBMLGG and UCS showed different expression patterns despite being in adjacent positions. Significant upregulation of missense and frameshift mutation had been perceived in the UCS cancer subtype, which supports the findings that the highest mutation rates in p53 results in UCS (91.2%), followed by OV (83%) (Bhagwat 2010).

Fig. 5
figure 5

Mutational spectrum analysis of 33 cancer subtypes

Differential gene expression analysis and correlation of p53 with the survival rate of patients

The investigation was proceeded further by assessing the transcription level of p53 in normal and tumor cells, where the blue box denoted normal cells, and cancer cells were marked by the orange box in the box plot. The result showed a significant difference between the expression level of standard and tumor cells in CHOL, COAD, DLBC, GBM, LAML, LGG, LUSC, OV, PAAD, READ, STAD, TGCT, THYM, and UCEC subtypes (Fig. 6). In these subtypes, the expression of p53 was upregulated in tumor cells, implying that p53 mutation has a strong association with the occurrence of malignancy. According to Perri et al., 2016, more than 50% of human carcinogenesis arises from the genetic alteration of the p53 gene (Wang and Sun 2017).

Fig. 6
figure 6

Differential expression analysis of P53 in CHOL, COAD, DLBC, GBM, LAML, LGG, LUSC, OV, PAAD, READ, STAD, TGCT, THYM, and UCEC

Survival analysis estimates the statistical probability of the survival period on-time event for cancer patients. The Kaplan–Meier method approximates the survival probability and visualizes the survival plots (Perri et al. 2016; Susmi et al. 2021). We compared the overall survival period between the high p53 and low p53 groups in different cancer subtypes. Only BRCA, COAD, LGG, and PRAD exhibited statistically significant outputs among the subtypes. The survival plots disclosed that the high expression of P53 was directly correlated with the high survival rate in LGG and COAD. Conversely, a higher survival rate was associated with low levels of P53 expression in BRCA and PRAD (Fig. 7).

Fig. 7
figure 7

Survival analysis plot of p53 in LGG, BRCA, COAD, and PRAD expressing the correlation of p53 expression with patients’ survival period (months)

Discussion

Single Nucleotide Polymorphisms or SNPs are the most common nucleic acid variations that result in differences among humans; SNPs are also responsible for many hereditary disorders due to amino acid substitutions. Though approximately 4 million SNPs could be found in the database, many SNPs do not cause disease-causing alterations to protein structure due to amino acid degeneracy, which consummately dispels mutations in critical functional regions. Genetic studies to differentiate the functionally neutral nature and disease-associated polymorphism have become a significant concern. Henceforth, SNPs that become dispersed throughout the genome often become excellent genetic markers. Most non-synonymous SNPs associated with the diseases are generally found in the exonic regions, but SNPs that occur in the intrinsic sites of gene disrupting regulatory regions ultimately affect the splicing process. With the increasing rate of reported SNPs in different databases, extensive population-based study becomes difficult due to the cost, and it remains tough to select a target SNP for the investigation while identifying the ones most prone to cause diseases. However, an in silico approach to detecting detrimental SNPs can be more helpful.

This study analyzed the SNP databases to seek SNPs that might be detrimental to p53, following data-driven methods. Search for nsSNPs against p53 resulted in 420 hits. The rsIDs were submitted to SIFT and PolyPhen-2 servers. SIFT and PolyPhen found 16 nsSNPs as non-tolerable and most likely damaging (Tables 1, 2). By performing the Fathmm test, we found 14 cancer-associated SNPs. ConSurf helps to predict evolutionarily conserved amino acids and found 12 SNPs. We found three functional domains and their position in the p53 gene by analyzing them through the InterPro scan server. SWISS-MODEL allowed predicting the 3D structure, which was refined through one µs of MD simulations. Clustering analysis allowed obtaining the most populated conformer over the equilibrated simulation time, including mutations. The 5 non-synonymous SNPs in the DNA binding domain (S241F, R248Q, R248W, R273H, and R273C) were predicted with PyMOL. These mutants were also submitted through MD simulations to obtain the most populated conformers over the equilibrated simulation time. YASARA energy minimization server showed decreased free energy for all the mutant models compared to the wild-type models. The least energy was minimized in case of mutation in the 273rd arginine amino acid position by histidine and cysteine, which affects the protein structure more than the other mutations.

Regarding human cancer, the p53 gene becomes the most frequently mutated gene, and the predominance of missense mutations is scattered over 200 codons (Ferreira and Patino 2016). p53 receives inputs from stress and abnormality sensors that function within the cell’s intracellular operating systems; if the degree of damage to the genome is excessive or if the levels of nucleotide pools, growth-promoting signals, glucose, or oxygenation are suboptimal, p53 can potentially halt further cell-cycle progression until these conditions have been normalized (Vogelstein et al. 2000). Alternatively, in the face of alarm signals, indicating overwhelming or irreparable damage to such cellular subsystems, p53 can also trigger the process of apoptosis. Mutation in p53 results in the loss of regulation or over-proliferation (Fridman and Lowe 2003). Tumor cells evolve various strategies to limit or circumvent apoptosis. The most common one includes the loss of p53 tumor suppressor function, eliminating this critical damage sensor from the apoptosis-inducing circuitry (Vogelstein et al. 2000).

The functional domains of p53 have been subjected to extensive analysis. We found 5 different SNPs in the functional domains of the p53 gene that are deleterious by analyzing with different dry-laboratory tools. R248 and R273 residue have a role in the structural integrity of the functional domain (Greenblatt et al. 1994; Hainaut et al. 1997). The tetrameric p53 protein (which is a dimer of a dimer) binds to four repeats of a consensus DNA sequence. S241 connects with the phosphate backbone in the major groove (Greenblatt et al. 1994). Amino acid substitution in the sequence of the functional domain may lead to alterations of the protein structures (Cho et al. 1994).

Mutations of the p53 gene cause diverse types of cancer in humans. The research found that 70% of mutations occurred in the p53 gene in lung cancer-affected patients. 45%, 60%, 20%, 10–30%, 60%, 40%, 10%, 30%, 60%, and 60% of mutations found in p53 gene, respectively, in stomach, colon, liver, prostate, head/neck, esophagus, leukocytes, lymphocytes, ovary, and bladder cancer-affected patients (Walker and Levine 1996).

p53 was allowed to go through extensive gene expression analysis. Utilizing Heatmapper, the heat map was engendered to assess the expression level of different mutations of p53 in 33 cancer subtypes. GEPIA2 was employed for differential analysis and survival analysis. Differential analysis revealed a substantial upregulation of p53 in the tumor cells in CHOL, COAD, DLBC, GBM, LAML, LGG, LUSC, OV, PAAD, READ, STAD, TGCT, THYM, and UCEC. Most P53 mutations lead to oncogenic progression (Wang and Sun 2017; IARC TP53 Database 2018). Additionally, survival analysis estimated the interconnection of the survival period with the gene expression. In the case of LGG and COAD, the expression level is positively interrelated with survival probability, whereas BRCA and PRAD demonstrate a negative correlation of survival period with the gene expression. This study revealed that despite some correct assumptions, web-based tools need to be more precise in detecting deleterious SNPs, and population-based studies are necessary to identify and further test the predicted SNPs in different populations.

Conclusions

In this study, different SNP analyzing tools have been employed to analyze the available data from the NCBI dbSNP database for the tumor suppressor gene p53. The predicted deleterious SNPs were evaluated for their potentially detrimental effects on protein function and stability. Five SNPs were predicted to be deleterious—rs28934573 (S241F), rs11540652 (R248Q), rs121913342 (R248W), rs121913343 (R273C), and rs28934576 (R273H); they have the highest probability to make p53 functional by changing their structure and functional residues involved in the active site formation. Henceforth, it is very likely that there are unreported nsSNPs that increase disease predisposition by altering protein function or structure. The findings of this study may help in the early diagnosis of the detrimental SNPs that have the probability of increasing the risk of different types of cancers. Individuals diagnosed with the above nsSNPs can take precautions to avoid other risk factors associated with cancer development as they are susceptible to cancer due to these nsSNPs in p53, a significant tumor suppressor gene. However, population-based studies and wet-laboratory experiments are beyond our scope for verifying the current study's findings. Therefore, extensive clinical studies are required to characterize the vastly available SNP data.

Availability of data materials

Dataset used in this study will be available as per request (mailing to the corresponding author).

Abbreviations

SNPs:

Single-nucleotide polymorphisms

References

Download references

Acknowledgements

Not applicable.

Funding

No funding from any public, private, or non-profit research agency was received for this study.

Author information

Authors and Affiliations

Authors

Contributions

SA, MS, and MKH conceptualized the study. SA, MS, MKH, and MB developed the methodology of the study. SA, MS, MB, NAF, MKH, and FH validated and scrutinized the study. SA, MS, MB, SIM, MKH, and FH investigated the study. SA, MS, MB, and SIM wrote the original draft. SA, NAF, MKH, and FH critically reviewed and edited the manuscript. All authors have read and agreed to the current version of the manuscript.

Corresponding author

Correspondence to Md. Kamrul Hasan.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors report no competing interests. The authors alone are responsible for the content and writing of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alam, S., Sayem, M., Bello, M. et al. Computational analysis uncovers the deleterious SNPs along with the mutational spectrum of p53 gene and its differential expression pattern in pan-cancer. Bull Natl Res Cent 46, 191 (2022). https://doi.org/10.1186/s42269-022-00859-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42269-022-00859-0

Keywords