Delirium detection in hospitalized adults: the performance of the 4 ’A’s Test and the modified Confusion Assessment Method for the Emergency Department. A comparison study

Background: Early detection of delirium through systematic screening is essential to mitigate and prevent possible consequences. The 4 ’A’s Test (4AT) is a new tool that can be used for delirium detection easily and without special training. The modified Confusion Assessment Method for Emergency Department (mCAM-ED) is an operationalized version of the Confusion Assessment Method, a worldwide used tool for delirium screening in clinical practice and research. This is the first comparison of both delirium screening tools. This study aimed to investigate performance accuracy of the 4AT compared to the mCAM-ED in detecting delirium in hospitalized patients. Methods: In this prospective single-centre cross-sectional pilot study, patients from six wards were selected consecutively. All patients underwent a delirium screening with the gold standard, the mCAM-ED. To rate the algorithm of the 4AT, corresponding items of the mCAM-ED were derived and used. Results: A total of 116 patients with a median age of 73 years could be included. Dementia was present in 11 (9.5%) patients, and 42.2% were women. Delirium was present in 8/116 (6.9%) and 16/116 (13.8%) patients according to the mCAM-ED and the 4AT, respectively. In comparison, the 4AT showed 100% (95% CI 0.63, 1.00) sensitivity, 93% (95% CI 0.86, 0.97) specificity, 13.50 (95% CI 6.93, 26.30) positive likelihood ratio and 0.00 (95% CI 0.00, NaN) negative likelihood ratio. Conclusions: In this first comparison, the 4AT showed a high rate of false-positive scores, which may result in an increased need for further in-depth assessments.

withdrawal or exposure to a toxin (American Psychiatric Association 2013).
Delirium occurs in people of every age (Inouye et al. 2014) and people who are older than 65 years have a higher risk of developing delirium (Inouye 2006). In the Western world, prevalence of delirium on general medical wards is between 18-35% and 17% on surgical wards (Inouye et al. 2014). Delirium is associated with several deteriorating consequences for patients, health professionals and health care systems (Inouye et al. 2014;Leslie et al. 2008;Wand et al. 2013). Individuals with delirium show a higher mortality rate after five years (Moskowitz et al. 2017). Delirium can lead to progression of existing dementia (Davis et al. 2017) and to an increased risk of developing dementia (Olofsson et al. 2018). Each additional day of delirium persistence decreases self-efficacy in terms of daily activities and cognition (Han et al. 2017). Delirium is also a very emotionally stressful situation for patients and their relatives (Cohen et al. 2009). The increased care needs of patients with delirium place enormous demands on nurses in this situation (Belanger and Ducharme 2011). In addition, patients suffering from delirium have 2.5 times the average daily health care costs in a hospital compared to patients without delirium (Leslie et al. 2008).
Early detection of delirium is crucial to mitigate and prevent its potential consequences (Mittal et al. 2011) and can be achieved with systematic screening (Inouye et al. 2001;Grossmann et al. 2014). While early and precise detection of delirium enables early intervention and treatment of the underlying cause (Inouye 2006;Inouye et al. 2014), up to 80% of delirium is undetected by healthcare professionals (Inouye et al. 2001;Collins et al. 2010). Additionally, causal treatment is thus often delayed or missed. Therefore, it is vital that the screening tools perform well, and are time efficient (De and Wand 2015).
As the Diagnostic and Statistical Manual of Mental Disorders (DSM) has evolved, new screening tools have been developed or existing screening tools have been enhanced (Fick et al. 2015), such as the 4 ' A's Test (4AT) (MacLullich et al. 2011) and the modified Confusion Assessment Method for the Emergency Department (mCAM-ED) (Grossmann et al. 2014;Hasemann et al. 2018a). The 4AT is a concise and easy-to-use tool (De et al. 2017) that does not require special training (Bellelli et al. 2014). The mCAM-ED is a modified version of the Confusion Assessment Method (CAM) (Inouye et al. 1990), a tool for delirium screening in clinical practice and research that is used worldwide (Wei et al. 2008).
Both these tools detect and recognize delirium in the same cognitive domains (i.e. consciousness, attention and cognition). The difference between the tools is that the 4AT uses fewer questions and does not assess disorganized thinking. The mCAM-ED requires training and is therefore more complex to implement (Grossmann et al. 2014). Furthermore, when the CAM was used without a formal assessment, nurses recognized that delirium was present in only 19% of cases, leading to a misinterpretation rate of delirium of up to 80% (Inouye et al. 2001). The mCAM-ED solved this disadvantage by defining the items (Grossmann et al. 2014;Hasemann et al. 2018a). As the 4AT is less complex, a comparison of the 4AT performance in delirium detection with the mCAM-ED is of great interest. However, to our knowledge, no comparison between the 4AT and the mCAM-ED has yet been performed. The aim of this study was to investigate the accuracy of the performance, i.e. sensitivity and specificity, positive and negative likelihood, and positive and negative predictive values, of the 4AT compared to the mCAM-ED in detecting delirium in hospitalized patients. The expectation when comparing the two tools was that they would produce relatively similar results, with the 4ATs showing a sensitivity and specificity of 95% each. We conducted this small preliminary study as a pilot study to evaluate the feasibility to rate the 4AT based on the mCAM-ED assessment.

Design
This study is a prospective single-centre cross-sectional pilot study, which investigated the performance accuracy of the 4AT. The mCAM-ED was applied as a gold standard.

Setting and sample
This pilot study was carried out in the University Hospital Basel, a tertiary care centre in north-western Switzerland. In 2018, about 38,000 patients were hospitalized at the University Hospital Basel. Overall, the University Hospital Basel has a total capacity of 770 beds (Universitätsspital Basel 2019). Patients were consecutively selected from six wards with a total capacity of approximately 200 beds. The following disciplines were included: orthopaedics, traumatology and spinal disorders, cardiovascular, pulmonary and metabolic diseases, diseases of the blood and blood-forming organs, kidney diseases and tumours of various organ systems.
Patients of these disciplines were included in the study, if they were 18 years or older and already hospitalized in one of the study wards on the first day of the study. Excluded were patients with end-of-life care, communication difficulties such as not understanding or speaking the German language sufficiently to answer questions, with impaired communication due to aphasia or coma, or with severe hearing impairment.

Variables and measurements
Demographic information was collected on sex (male, female), age (in years), neurocognitive disorders (dementia, delirium), day of admission and the unit in which the patients were hospitalized. Delirium screening was exclusively performed with the mCAM-ED. The 4AT rating was derived from the items of the mCAM-ED. Corresponding items of the mCAM-ED were used to rate the 4AT algorithm (Table 1).

The 4 'A's Test (4AT)
The 4AT examines three cognitive domains (alertness, cognition, and attention), the factors incidence and course of symptoms:  Omissions and time required for the task (score: ≥ 3 points indicate inattention) until July or further (score: ≥ 7 months = 0, starts but < 7 months / refuses = 1, untestable = 2) Acute onset or fluctuation Observation during the interview and information from third party In the previous 2 weeks or 24 h (score: yes = 4; no = 0) The score for item alertness and item acute onset or fluctuation is 0 or 4. The score for item AMT-4 and item attention is 0, 1 or 2. The cumulative score of the four sections is between 0 and 12. A value of 0 indicates that the probability of delirium in a patient is low. A score between 1 and 3 indicates a possible cognitive impairment of the tested patient. With a score of 4 or more, the presence of delirium is likely (MacLullich et al. 2011).
Since 2013, the 4AT has been validated in a variety of environments and patient populations: acute elderly emergency department (Gagne et al. 2018;Hendry et al. 2016

The modified Confusion Assessment Method for the Emergency Department (mCAM-ED)
To assess delirium with mCAM-ED, changes in attention, cognition, disorganized thinking, and acute onset and/or fluctuation are assessed. The mCAM-ED consists of a two-step approach. This may be time saving in the ED setting. When used on wards for delirium consultation by the geriatric consultation team, however, both steps can be used on a regular basis (Hasemann 2019). In the first step, inattention is identified with the MBT (Meagher et al. 2015). In the second step, in cases of conspicuous attention, acute cognitive changes are evaluated by a structured interview with the Mental Status Questionnaire (MSQ) (Kahn et al. 1960), altered level of consciousness measured with the modified Richmond Agitation Sedation Scale (Chester et al. 2012) and disorganized thinking with the Comprehension Test (Hart et al. 1996). In addition, fluctuating symptoms are observed in the assessment. Possible or probable delirium is present when: (a) an acute change in cognition (MSQ) or observed fluctuating symptoms and (b) inattention (MBT) and (c) disorganized thinking and/or (d) a changed level of consciousness is identified in the interview ( Fig. 1) (Inouye et al. 1990).
The mCAM-ED was developed and validated in the emergency department (Grossmann et al. 2014;Hasemann et al. 2018a). The mCAM-ED had a sensitivity of 90% (Confidence Interval [CI] 0.70; 0.97) and specificity of 98% (CI 0.95; 0.99) compared to DSM-IV-TR in delirium detection. The assessment with the mCAM-ED needed on average five minutes (Hasemann et al. 2018a).

Data collection and management
Data collection took place on two days in August 2018. On the first day of the study, the 32 trained nursing research assistants (RA) were assigned to eligible patients. The RAs visited the assigned patients on the first day of the study and conducted the assessments. On the second day of the study, the RAs supplemented data from the electronic medical record on the paper Case Report Form (CRF). This included demographic information such as sex, age, diagnosis of pre-existing dementia or recent delirium. The double entry method was done to minimize erroneous entries and increase data quality. In addition, any paper CRF without clear answers was discussed with the principal investigator to reach a consensus on the data. Using the corresponding items of the mCAM-ED, the 4AT algorithm was rated.

Training of research assistants
The training took place one day before the data collection. This training consisted of a one-day course on the screening and assessment of delirium and dementia. Case vignettes and video examples accompanied the theoretical input on delirium and dementia. An advanced practice nurse qualified as a PhD from the geriatric consultation service provided the training.

Ethical considerations
The RAs asked the patients to sign the written informed consent to participate in the study on the first day of data collection. If a patient was unable to make an informed decision, relatives or proxies were contacted to sign the written informed consent. The Ethics Committee Northwest Central Switzerland has approved this study under the number EKNZ: 2018-00,616.

Data analysis
Demographic information was analysed by frequencies and percentages for categorical variables. For continuous data, central tendencies with median and interquartiles (Q1, Q3) were reported. For the group comparisons, the Fisher's exact test or Mann-Whitney-U-test were performed according to the distribution of the data. These analyses were performed using IBM SPSS (Statistical Package for the Social Sciences) version 22.0. In order to address the aim, the performance of the 4AT, sensitivity, specificity, positive and negative likelihood ratio and the positive and negative predictive values (PPV and NPV) were calculated and presented together with a two-sided 95% CI. The exact CI for sensitivity, specificity, PPN and NPV was computed using Collet's method (Collett 1999). For the positive and negative likelihood ratio, a twosided 95% CI was calculated and presented according to Simel's method (Simel et al. 1991). The determination of the sample size was based on the validation study of the mCAM-ED (Hasemann et al. 2018a). Analysis of the performance was conducted using the software program "R" Version 3.6.1 (R Core Team 2018) with the package "epiR" (Stevenson 2020). Missing data were handled by pairwise deletion. The significance level for all analyses was set to α < 0.05.

Patient characteristics
During the study period, 211 patients were hospitalized on the six wards, of whom 116 were included in the study. A total of 95 patients were excluded due to, for instance, refusal to participate (n = 29), admission after assessment (n = 22) or absence of patients (n = 19) (Fig. 2). The median age of patients was 73.0 years (interquartile Q 1 56.3, Q 3 83.0) and 49 (42.2%) of the patients were women. Pre-existing dementia was present in 11 (9.5%) patients. A total of 8 out of 116 patients (6.9%) had delirium according to the gold standard, the mCAM-ED, and 16 (13.8%) had delirium according to the 4AT. Subgroup analysis of patients with delirium and patients without delirium showed no statistically significant difference in the development of delirium in respect to age (U = −1.799, p = 0.072), sex (p = 1.000) or pre-existing dementia (p = 0.167) ( Table 2). There was a wide range of main diagnoses and heterogeneity among the patients (Additional file 1: Table 5), and two-thirds of them were hospitalized on medical wards (Table 2).

Performance
With the 4AT 100% (95% CI 0.63, 1.00) of all patients with delirium were correctly identified as being delirious when screened by RAs with the mCAM-ED (sensitivity). Additionally, the 4AT correctly identified 93% (95% CI 0.86, 0.97) of non-delirious patients as being  non-delirious in comparison to the gold standard (specificity). Among delirium assignments of the 4AT, 50% (95% CI 0.25, 0.75) were judged as delirious by the RAs with the mCAM-ED (positive predictive value) and 100% (95% CI 0.96, 1.00) of the 4AT non-delirium assignments were confirmed by the gold standard (negative predictive value). The probability of a patient in delirium being correctly classified as delirious with the 4AT was 13.5 times (95% CI 6.93, 26.30) higher than in a non-delirious patient falsely classified as delirious (positive likelihood ratio). Vice versa, the probability of falsely classifying a patient in delirium as non-delirious with the 4AT was zero (negative likelihood ratio) (Tables 3, 4). For both tools, there were variations in the impact of the scores. In two cases the assessment of attention was considered to be an attention deficit, in three cases not. This was the consequence of different cut-offs rating inattention by the 4AT and the mCAM-ED (Additional file 2: Table 6).

Discussion
In this comparison of the 4AT with the mCAM-ED in detecting delirium in hospitalized patients, the 4AT showed good performance, but our expectations of a 95% agreement were not fulfilled: The 4AT and the mCAM-ED did not perform equally well. The false-positive rate of the 4AT was exactly twice. The 4AT requires only an impaired level of alertness or an acute onset or fluctuation in order to suggest the presence of delirium, whereas the algorithm of the mCAM-ED requires more items to positively rate a delirium.   Although the mCAM-ED consists of more items and an additional neurocognitive domain (disorganized thinking), the remaining neurocognitive domains (attention, cognition and alertness) do not differ between the mCAM-ED and the 4AT. However, the weighting of the domains in the algorithm for detecting delirium varies between the mCAM-ED and the 4AT. While three out of four domains of the mCAM-ED are required to confirm delirium (Fig. 1) (Grossmann et al. 2014;Hasemann et al. 2018a), the 4AT requires only one domain (i.e. acute onset and/or fluctuation or alertness) (MacLullich et al. 2011). In the end, the 4AT increased its sensitivity at the expense of specificity by using a single criterion to confirm delirium. It is noteworthy that the mCAM-ED is closer to the DSM criteria than the 4AT, which is due to the use of three items instead of one criterion, as in the case of the 4AT. According to DSM-5, the following five points for the delirium diagnosis must be fulfilled: (a) disturbed attention or awareness, (b) sudden change and fluctuation in severity, (c) an additional disturbance in cognition, (d) not better explained by the presence of a neurocognitive disorder, and (e) context of medical history, substance abuse or withdrawal, or exposure to a toxin (American Psychiatric Association 2013). To meet the DSM-5 criteria of delirium, the extent of coverage of the mCAM-ED criteria is greater than that of the 4AT. Thus, the 4AT is more susceptible to overestimating the presence of delirium than the mCAM-ED.
Considering the different sample and settings, in our study the 4AT has a similarly good performance as in the first validation study by Bellelli et al. (2014). In their study in comparison to DSM-IV-TR, sensitivity and specificity of 89.7% and 84.1%, respectively (Bellelli et al. 2014), show a slight difference to this comparison with the mCAM-ED, where sensitivity and specificity were 100% and 93%, respectively. In the recently published validation study in patients in the emergency department or acute general wards, the 4AT showed a lower sensitivity of 75% and a higher specificity of 94.5% compared to the DSM-IV-TR (MacLullich et al. 2019). In a validation study of several short screening tools compared to the CAM in a stroke unit, the 4AT showed a good sensitivity of 100% and a reasonable specificity of 82% (Lees et al. 2013).
Delirium prevalence of 6.9% measured in this study with the mCAM-ED is consistent with the data from other Swiss studies. Similar prevalence of 7.0 and 9.5%, respectively, was measured with DSM-IV-TR (Hasemann et al. 2018a) and the mCAM-ED in the emergency department (Grossmann et al. 2014). In a pilot study conducted in a central hospital, the prevalence was twice as high (14%) as in this study (Schwarber et al. 2017). However, in a recently published study, the prevalence of delirium in intermediate/general medicine wards was nearly four times higher (27.3%) than in this study (Schubert et al. 2018). These figures demonstrate that the prevalence of delirium in Switzerland is highly variable. One aspect contributing to variability in prevalence is the method of measurement. In the hospital-wide Swiss study by Schubert et al. (2018), a period prevalence was measured with the Delirium Observation Scale (DOS) (Schubert et al. 2018). However, given some shortcomings of the DOS to discriminate between delirium and dementia (specificity 92%) (Hasemann et al. 2018b), this might explain the high prevalence. The unknown proportion of patients with dementia in the study by Schubert et al. (2018) might possibly have contributed to a high prevalence of delirium by misclassifying dementia as delirium. Furthermore, the period measurement could also be a contribution. In contrast to other studies, Schubert et al. (2018) determined a period prevalence and not a point prevalence. Since delirium does not usually occur over the entire hospitalization period, but only over several days, a period prevalence would capture more delirium than a point prevalence. In another Swiss study, in which a prevalence of around 16% was measured throughout the entire hospital stay in the medical wards, a considerable variability in prevalence was observed from day to day. Delirium point prevalence varied from 15% to about 50% between days in comparison with the prevalence across the entire hospital stay in same patient sample (Hasemann 2014). Symptoms of delirium are not observed constantly, but are subject to fluctuations over the course of the day (American Psychiatric Association 2013). It is clear that a point prevalence only reflects the condition at one specific time period and is therefore subject to fluctuations. These need be taken into account in the analysis, and so our measured prevalence is subject to these fluctuations.
As in the first and in a recently published validation study of the 4AT, patients aged 70 years and older were included (Bellelli et al. 2014;MacLullich et al. 2019). On average, the patients of Bellelli et al. (2014) were 84 years old (Bellelli et al. 2014), which is considerably higher than in our sample. In a point prevalence study conducted in Ireland, almost 20% of all inpatients with a median age of 69 years had delirium (Ryan et al. 2013). This is considerably higher than in the sample in this study despite the similar median age. The variability in the inclusion criterion of age on the one hand and the contextual setting and population on the other hand could explain the differences in prevalence. Patients aged 65 and older are more predisposed to the development of delirium due to their age (Inouye 2006) and in intensive care units a markedly higher prevalence and incidence of delirium is observed than in regular wards (Inouye et al. 2014). However, depending on the specialisation of regular wards, different incidences of delirium may occur in relation to patient populations, with surgical wards having a higher incidence than medical wards (Inouye et al. 2014). Compared to the gold standard, the 4AT classified twice as many patients as delirious. The false-positive cases may generate more in-depth assessments, resulting in an increased burden for patients and nursing staff. Nurses are already challenged by the increased demand for care and the increased workload of delirious patients (Schubert et al. 2018), so that further in-depth assessments of false-positive results might lead to an increase in the already high workload. This may result in a lower acceptance of delirium screening with high false-positive rates.
Several studies claim that the 4AT is a brief and easy to use tool that requires no training (De and Wand 2015;MacLullich et al. 2011;Bellelli et al. 2014). Belleli et al. (2014) pointed out that experienced physicians used the 4AT and that other professionals such as nurses should test this tool in everyday clinical practice (Bellelli et al. 2014). However, in a quality improvement project on an acute geriatric ward, the 4AT showed a low performance (sensibility 50.0%, specificity 86.2%). In that project, the nurses were given a lecture with information about the tool and a two-week training course (Myrstad et al. 2019). It was noted that the sensitivity of the application of the 4AT in daily clinical routine was low, despite short practicing phase. Therefore, in-depth training for staff prior to the introduction of the 4AT in the daily clinical routine was suggested (Myrstad et al. 2019). Additionally, similar findings were reported in another quality improvement project in a hospice, where the 4AT was applied to patients at admission. In spite of initial implementation challenges, the tool was seen as useful in this population. However, the study concluded that an increased use of the screening requires continuous feedback and training (Baird and Spiller 2017). It illustrates the need for awareness and training for early and appropriate detection. Main issues of the low performance were nurses' difficulties in rating acute onset and/or fluctuation (Myrstad et al. 2019). This illustrates, even with easy-to-use tools, the need to train for adequate application, but also for the detection of delirium, so that early intervention can be carried out and negative consequences avoided.

Strengths and limitations
A strength of this study is the fact that the ratings of the mCAM-ED and the 4AT originate from the same person, therefore no inter-rater differences are present. Another strength of this study is the consecutive sampling to minimize potential sampling bias. In accordance with the approach of Voyer et al. (2008) to use the CAM as a gold standard in the accuracy study for delirium symptoms based on the entries in the nurses' medical records (Voyer et al. 2008), our approach to use the mCAM-ED as a gold standard corresponds to current practice and is valid. The CAM was also used as a gold standard in two 4AT validation studies (Gagne et al. 2018;Lees et al. 2013). Certain shortcomings of this study must be acknowledged. Only 116 of 211 inpatients could be included in this pilot study. The high rejection rate may have been contributed to by less experienced and young RAs. The RAs may have felt overwhelmed and did not dare to contact patients' relatives to seek informed consent, so that potentially vulnerable patients were not included. In addition, many young people do not like talking on the phone (Kupke 2020), which may have been another inhibition threshold. Another limitation is the low prevalence of delirium at 6.9%. The low prevalence may result from a consecutive sample with a low median age of 73 years and the assessment of delirium on a point prevalence basis. Furthermore, the low prevalence of delirium may be due to the nurse-led hospital-integrated delirium consultation service and an established delirium management programme. Moreover, our sample is very heterogeneous with an age range from 19 to 98 years and also includes patients with a diagnosed neurocognitive disorder. Initially, we planned to conduct a sub-analysis with stratified age groups. However, the sample was too low to do so.

Conclusions
In this first comparison in delirium detection in inpatients, the 4AT showed a high false-positive rate of delirium assessments. This may result in an increased need for further in-depth assessments, which may contribute to higher workload and higher screening burden of the patient. Although the 4AT claims to require no training, nurses need more support, as our study and other studies suggest.