Determination of regulatory motifs and pathogenicity of intronic variants of GNPTAB, GNPTG , and NAGPA genes in individuals with stuttering

Background: Stuttering is a fluency disorder typically characterized by part-word repetitions, voiced or voiceless sound prolongations, and broken words. Evidence suggests that 1% of the world population stutters. Compelling evidence from past research suggests that stuttering is caused by non-synonymous coding sites. This study evaluates the intronic regions of GNPTAB, GNPTG, and NAGPA genes for possible pathogenicity of intronic variants from unrelated non-syndromic stutterers in a cohort of the south Indian population. Results: High-throughput sequencing revealed 41 intronic variants. Computational tool Reg-SNP Intron identified three intronic variants rs11110995 A>G, rs11830792 A>G, and rs1001171 T>A of having a plausible pathogenic impact which was identified in 37.9%, 26.5%, and 59.4% of stutterers, respectively. RegulomeDB identified the regulatory motifs and susceptible loci of the intronic variants. Conclusions: This study imparts the identification, association, and interpretation of pathogenicity and regulatory significance of the intronic variants in the context of the noncoding DNA elements. Future work is warranted to better understand the role of the intronic variants in a larger cohort of stutterers, and a cohort of fluent controls would be valuable.


Background
Stuttering is a fluency disorder resulting in various forms of speech interruptions affecting all language groups which typically arise in children aged ~ 2 to 5 years when they begin to develop more complex speech and language (Reilly et al. 2013;Didirková et al. 2021;Polikowsky et al. 2022). Stuttering occurs predominantly in males than females with a male-to-female ratio of 5:1 and most of them; particularly the females recover spontaneously or with the aid of speech therapy Yairi and Ambrose 2013). It has long been observed that stuttering frequently runs in families and is highly heritable (Fedyna et al. 2011;Barnes and Neutel 2016;Bloodstein et al. 2021). Various Studies have elucidated a solid genetic influence on stuttering risk and identified coding variants in GNPTAB, GNPTG, and NAGPA genes which have been linked to mutations in the lysosomal enzymetargeting pathway (Riaz et al. 2005;Kang et al. 2010;Raza et al. 2016;Frigerio-Domingues and Drayna 2017;Gunasekaran et al. 2021).
Over the years, research on neurological aspects of stuttering has been carried out to understand the nature and metabolism of the disorder (Alm 2021). Expression *Correspondence: santoshm79@gmail.com of stuttering genes (GNPTAB and NAGPA) in children with persistent stuttering and non-stuttering controls revealed gray matter differences linked to lysosomal deficits (Chow et al. 2020). Lysosomal deficits likely reduce the processing of biomolecules (Alm 2021); energy metabolism was observed in mice carrying the mutant GNPTAB gene which had fewer astrocytes in the brain which could be the result of a reduced peak rate of energy supply to the motor system (Barnes and Neutel 2016).
Genome-wide association studies (GWAS) by highthroughput sequencing have identified several loci linked with the trait and identified additional candidate genes. Exonic mutations in the SLC6A3 gene (rs2617604, rs28364997, rs28364998) and DRD2 gene (rs6275, rs6277) were detected among the Hans Chinese patients with speech disfluency (Lan et al. 2009); AP4E1 gene (rs760021635, rs556450190) variants among a large African family (Raza et al. 2013(Raza et al. , 2015; CYP17A1 gene (rs743572) variant among the Kurdish (Mohammadi et al. 2017); and CYRIA gene (rs12613255) variant in patients of European ancestry (Shaw et al. 2021). Also, high-throughput sequencing has transformed to detect an abundance of variants of noncoding segments (introns) through several GWAS (Reuter et al. 2015;Elliott and Larsson 2021). Sequence elements within the nuclear introns may modulate significant functions in gene expression, mRNA export, splicing, alternative splicing and transcription coupling (Berk 2016; Panaro et al. 2022). Studies on intronic variants in stuttering are limited, and only a few studies have reported the presence of fewer intronic alleles. Therefore, the current study was performed to reveal the intronic single-nucleotide variants (iSNVs) of three candidate genes (GNPTAB, GNPTG, and NAGPA) to conceal the possible pathogenicity of intronic variants in the south Indian cohort who stutter.

Recruitment and stuttering examination
The study included 100 participants (94 male and 6 female) > 18 years of age, who enrolled for speech impairment assessment at the All India Institute of Speech and Hearing (AIISH). The study participants had a detailed speech pathology examination. Individuals without any associated communication, cognition, psychological, and neurological problems except for developmental stuttering were selected. Among the 100 participants, 67/100 (67%) had a family history of stuttering and the remaining 33/100 (33%) participants had no family history. The distribution of severity ranged from very mild 46/100 (46%), moderate 36/100 (36%) to very severe 18/100 (18%) stuttering with an average onset age of 2-5 years. The stuttering Severity Instrument (Riley and Bakker 2009) was used to document the severity of overt stuttering.

Sample and DNA isolation
About 5 ml of peripheral venous blood was collected from the study participants (n = 100) by standard phlebotomy. DNA isolation was done using PureLink ™ Genomic DNA Mini Kit (Thermo Fisher Scientific) as per the manufacturer's protocol.

Massively parallel sequencing and analysis
Among the 100 samples, only 79 samples (75 males: 4 females; mean age ± SD = 26 ± 6.49 years) were selected based on the DNA quantitation. Custom-targeted libraries were constructed by Ion AmpliSeq Library Kit Plus (Life Technologies) and PCR enrichment was done using Ion AmpliSeq Exome RDY panel (Life Technologies) according to the manufacturer's protocols. Sequencing was processed on the Ion Proton ™ next-generation sequencing systems (Life Technologies) following the manufacturer's guidelines. All sequencing data passed specific minimal quality control requirements, and the sequence read alignment and variant calling were performed with the reference genome (hg19) using TMAP Alignment (Thermo Fisher Scientific). Variants were detected using the Ion Reporter (Thermo Fisher Scientific).

Allele frequency estimation, functional annotation, and pathogenicity prediction of iSNVs
Intronic variants were filtered based on the allele frequencies. The allele frequencies of the variants were compared with the gnomAD database (Karczewski et al. 2020) (https:// gnomad. broad insti tute. org/) that served as a control. RegulomeDB (http:// regul omedb. org) is a database integrating information from the Encyclopedia of DNA Elements (ENCODE) that was used to annotate single-nucleotide variants (Boyle et al. 2012). Reg-SNP Intron (https:// regsn ps-intron. ccbb. iupui. edu/), a computational framework, was used to predict the pathogenic impact of intronic single-nucleotide variants (Lin et al. 2019).

Statistical analysis
Descriptive statistics, i.e., mean, standard deviation (SD), and probability values of the allele frequencies, were analyzed using Statistical Package for Social Sciences (SPSS v21 IBM Corp New York).

Results
In this study, massively parallel sequencing of the three genes GNPTAB, GNPTG, and NAGPA identified 41 iSNVs in 79 samples (75 males and 4 females; mean age ± SD = 26 ± 6.49 years). Among the indexed patients, mild stuttering (46%) was more prevalent followed by moderate (36%) and severe (18%) and all the study participants were of south Indian descent. Allele frequencies of the 41 iSNVs were compared with the allele frequencies of the South Asian record and total allele frequency record using the gnomAD database; the allele frequency was highly significant and consistent with both south Asian (p = 0.001) and total allele frequency (p = 0.001) from gnomAD database (Table 1).

Functional annotation of intronic SNVs identified in this study
RegulomeDB was used to identify the potential regulatory/functional iSNVs. Overall 41 iSNVs were identified in this study, out of which 38 revealed RegulomeDB scores of 1-6 and 3 with a score of 7 (Table 1 and Additional file 1: Table S1) Further, 6 iSNVs showed comparatively more evidence for the regulatory element with a score of 1, which included 5 iSNVs (rs11111002, rs4764814, rs4764813, rs1001171, and rs1001170) with a score of 1f and 1 iSNV (rs11110995) with a score of 1d. Expression quantitative trait loci (eQTLs) were observed in GNPTAB and NAGPA gene variants which describes a fraction of the genetic variance of a gene expression phenotype (Nica and Dermitzakis 2013). It is noticeable that the lesser the RegulomeDB score, it is more likely that it would be the variant that lies within a potential functional region (Liao et al. 2016). Detailed information about the regulatory iSNVs and functional annotation of other variants observed in the study, viz. likely/less likely affecting binding, and minimal binding are shown in Additional file 1: Table S1.

Discussion
Stuttering is a disorder of speech interruptions or disfluency which is highly heritable and has a strong genetic influence. This study describes the potential regulatory and pathogenic effect of intronic SNVs which has been discussed. Apart from the coding exonic variants, the noncoding intron plays a vital part in gene regulation (Rose 2019). The assortment of proteins is enhanced by alternative splicing where introns play important roles in producing multiple variant proteins from a single gene in a eukaryotic cell (Wang et al. 2015;Yang et al. 2021). Conservations in flanking introns of conserved alternative exons regulate alternative splicing (Pan et al. 2008;Vaz-Drago et al. 2017;Yang et al. 2021). In this study, we investigated the intronic variants of GNPTAB, GNPTG, and NAGPA genes and predicted the pathogenic impact of intronic SNVs using the RegSNPs-intron tool. This study identified three possibly pathogenic intronic variants rs11110995, rs11830792, and rs1001171. Previous studies have reported an intronic variant g.10985G>A in the GNPTG gene among the Iranian stutterers (Kazemi et al. 2018), and another intronic variant c.192+618G>A (rs7837758) in the ZMAT4 gene in stutterers of African ancestry was also reported (Shaw et al. 2021).
Among the three possibly pathogenic intronic variants detected, rs11110995A>G in GNPTAB gene with a Regu-lomeDB score of 1d which is an eQTL that likely affects binding and is linked to the expression of a gene target, pathogenicity estimation showed a damaging effect which was detected in 30/79 (37.9%) of the stutterers. The variant rs11830792 A>G in the GNPTAB gene with a RegulomeDB score of 6 indicated whether a certain position in the DNA sequence is bound or unbound by the transcription factor, pathogenicity estimation was possibly damaging for the iSNV which was detected in 21/79 (26.5%) of the stutterers. The variant rs1001171T>A detected in the NAGPA gene segment with a Regu-lomeDB score of 1f also indicated to affect binding and was linked to expression of a gene target with pathogenicity estimated to be possibly damaging and was detected in 47/79 (59.4%) of stutterers. No pathogenic iSNVs were detected in the GNPTG gene. In summary, these database provided evidence allowing us to examine the nucleotide variations responsible for conservation,

Conclusions
This study identified three intronic variants of pathogenic impact (rs11110995, rs11830792, and rs1001171) using the RegSNPs-intron tool in stuttering patients that are known to be associated with a certain genetic trait, as well as the regulatory function of the intronic variants were identified using RegulomeDB database which documented a few potential regulatory variants and susceptible loci. Thus, the combination of the two computational approaches may be helpful to understand the regulatory regions and derive a valid hypothesis as to their function. The limitations of this study included the relatively small sample size, and the patients were chosen from a single center, which may limit the generalizability. Therefore, future work confirming the current findings is warranted to better understand the role of the intronic variants in a larger cohort of stutterers and a cohort of fluent controls would be valuable.