Skip to the main content

Review article

https://doi.org/10.11613/BM.2023.010101

On determining the sensitivity and specificity of a new diagnostic test through comparing its results against a non-gold-standard test

Farrokh Habibzadeh orcid id orcid.org/0000-0001-5360-2900


Full text: english pdf 221 Kb

page 5-9

downloads: 209

cite

Download JATS file


Abstract

Diagnostic tests are important clinical tools. To assess the sensitivity and specificity of a new test, its results should be compared against a gold standard. However, the gold-standard test is not always available. Herein, I show that we can compare the new test against a well-established diagnostic test (not a gold-standard test, but with known sensitivity and specificity) and compute the sensitivity and specificity of the new test if we would have compared it against the gold-standard test. The technique presented is useful for situations where the gold standard is not readily available.

Keywords

biostatistics; diagnostic tests; prevalence; sensitivity and specificity

Hrčak ID:

287709

URI

https://hrcak.srce.hr/287709

Publication date:

15.2.2023.

Visits: 833 *




Introduction

Diagnostic tests are among the important means commonly used in clinical medicine. Before a new test can be used in clinical practice, it should be evaluated for clinical validity. Studies assessing the clinical validity of a test (also termed diagnostic accuracy studies) involve determining the test performance indices including the test sensitivity (Se) and specificity (Sp) (1). Other common performance indices are positive and negative predictive values, and likelihood ratios, which can be calculated based on the Se and Sp and the prevalence (pr) of the disease of interest (2,3). To determine a test performance, its results should be evaluated against another test, the so-called reference standard (4). The reference standard can be a gold-standard test, i.e., a test with a Se and Sp of 1.0 (or 100%). The gold-standard test can thus correctly discriminate those with and without the disease or condition of interest. For a test with binary results, the outcome is clear – positive or negative. For tests with continuous results, however, we need to set a cut-off value to categorize the results into positive or negative (2). Compared to the gold standard, the obtained results can be categorized into true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) results (Table 1a). The tests Se and Sp are defined as follows (5):

bm-33-1-010101-e1.jpg
Table 1 Results of a hypothetical test validity study
aGold-standard test
PositiveNegativeTotal
T1PositiveTP: 85
π Se1
FP: 40
(1 – π)(1 – Sp1)
125
NegativeFN: 15
π (1 – Se1)
TN: 360
(1 – π) Sp1
375
Total100400500
bT1
PositiveNegativeTotal
T2Positive107
pr Se2,1
104
(1 – pr)(1 – Sp2,1)
211
Negative43
pr (1 – Se2,1)
346
(1 – pr) Sp2,1
389
Total150450600
cGold-standard test
PositiveNegativeTotal
T2Positive76
π Se2
64
(1 – π)(1 – Sp2)
140
Negative4
π (1 – Se2)
256
(1 – π) Sp2
260
Total80320400
a) a well-established test, T1, against the gold-standard test; b) a new test, T2, against T1; note that here, the true prevalence, π, is replaced by the apparent prevalence, pr (7) as T1 is not a gold standard; and c) another hypothetical study if T2 would have been tested against the gold standard. TP – True positive. FP – False positive. FN – False negative. TN – True negative. π – True prevalence. pr – Apparent prevalence. Sp – specificity. Se – sensitivity.

Both the Se and Sp follow the binomial distribution. Then, the squared standard errors (SE2) for Se and Sp are:

bm-33-1-010101-e2.jpg

The prevalence of the disease (π), is then:

bm-33-1-010101-e3.jpg

Combining Eq. 1 and Eq. 3, we have:

bm-33-1-010101-e4.jpg

where P(x) designates the probability of x. To evaluate the Se and Sp of a new test, it is common to compare its test results against those obtained from a gold-standard test. Nonetheless, the gold-standard test may not always be available. It either does not exist or is very difficult or expensive to perform for certain disease conditions (6). The question arise is that whether it is possible to calculate the Se and Sp of the new test based on the results obtained from its comparison with a non-perfect reference standard – a well-established (but not a gold-standard) test? This is not a new question, and several solutions has so far been proposed (1). Herein, I wish to propose an analytical method to address the question raised.

Stating the question

Suppose that we have a well-established test, say T1, with known Se and Sp (measured against a gold-standard test) of Se1 and Sp1 (Table 1a). Now, suppose that we have a new test, say T2, the results of which were compared against T1 (not against a gold standard), and that it had a Se and Sp (against T1) of Se2,1 and Sp2,1 (Table 1b). We wish to derive the Se and Sp of T2 (Se2 and Sp2), if it would have been tested against the gold standard (e.g.,Table 1c).

The proposed solution

When we compare T2 against T1, the calculated prevalence, pr, is not really the true prevalence, π, as T1 is not a gold standard and thus would have FP and FN results. However, we can calculate the true prevalence, π, as follows (7):

bm-33-1-010101-e5.jpg

Based on Eq. 4 and basic probability rules, we have (Table 1) (8,9):

bm-33-1-010101-e6.jpg

and

bm-33-1-010101-e7.jpg

where T + and T  represent positive and negative test results; and D + and D , presence and absence of the disease, respectively. P(A|B) denotes the conditional probability of event A given event B.

Based on Eq. 6, we have:

bm-33-1-010101-e8.jpg

Solving for Se2, gives:

bm-33-1-010101-e9.jpg

Based on Eq. 7, we have:

bm-33-1-010101-e10.jpg

Then:

bm-33-1-010101-e11.jpg

Equations 9 and 11 are a system of two simultaneous equations. Substituting π from Eq. 5 and solving for Se2 and Sp2, yield:

bm-33-1-010101-e12.jpg

If f is a function of k independent random variables, then the squared SE of f can be calculated as (10,11):

bm-33-1-010101-e13.jpg

Assuming that Se2 is a function of independent random variables pr, Se2,1, Sp2,1, and Sp1 (Eq. 12), using Eq. 13 and employing basic calculus, we have:

bm-33-1-010101-e14.jpg

In the same way, assuming that Sp2 is a function of independent random variables pr, Se2,1, Sp2,1, and Se1 (Eq. 12), we have:

bm-33-1-010101-e15.jpg

The SE for the Se and Sp of the tests can be calculated using Eq. 2.

Discussion

It was shown that the test Se and Sp can be determined with acceptable accuracy even if the gold-standard test is not available. The Se and Sp of the new test (T2) derived by transforming the values obtained from its comparison with a non-gold-standard test (Se2,1 and Sp2,1) are acceptably close to the values if the test would have been compared with the gold-standard (Se2 and Sp2). The variances of the calculated Se2 and Sp2 (Eqs. 14 and 15) are higher than those you might obtain if you would have compared T2 directly against the gold standard, instead of T1. This is attributed to the uncertainty exist in the variables used for the calculation (Eq. 12). To examine the application of the technique proposed let us apply it to an example.

Example

Suppose that in a validity study of 500 (arbitrary chosen) randomly selected people, a diagnostic test (let us call it T1) was tested against the gold standard (Table 1a), and that the test could correctly identify 85 of 100 diseased people, hence a Se (Se1) of 0.85, and 360 of 400 disease-free individuals, hence a Sp (Sp1) of 0.90 (Table 1a). The calculated SE2 for the Se1 and Sp1 are 1.3 × 10-3 and 2.3 × 10-4, respectively (using Eq. 2). Also, suppose that in a validity study on 600 (arbitrary chosen) randomly selected people, the results of a new diagnostic test, T2, was compared against T1 (Table 1b). Based on the information provided, the apparent prevalence, pr, is 0.25 (SE2 = 3.1 × 10-4). Using Eq. 5, the true prevalence (π) is:

bm-33-1-010101-e16.jpg

which is correct when the disease prevalence is measured by a gold-standard test (Table 1a). The Se and Sp (along with their SE2) of T2 against T1 (Table 1b), are then:

bm-33-1-010101-e17.jpg

Plugging in the values in equations 12, 14 and 15, estimations of Se2 and Sp2 are 0.95 (SE2 = 8.0 × 10-3; 95% confidence interval (CI): 0.77 to 1.00) and 0.80 (SE2 = 5.4 × 10-4; 95% CI: 0.75 to 0.85), respectively, which are compatible with the results if T2 would have been compared against the gold-standard test – 0.95 (SE2 = 5.9 × 10-4; 95% CI: 0.90 to 1.00) and 0.80 (SE2 = 5.0 × 10-4; 95% CI: 0.76 to 0.84), respectively (Table 1c). Note that the 95% CI of the calculated Se2 and Sp2 when they are derived through comparing the results with T1 is wider than those if they are directly compared against a gold-standard test.

In conclusion, it seems that this technique is useful, particularly where the gold-standard test is not readily available or is expensive. Further studies are needed to elaborate on the conditions of the validity study where the Se1 and Sp1 are estimated, the minimum number of data points examined, the probable effect of the prevalence of the disease or condition of interest on the choice of the reference test, among other things.

Notes

[1] Conflicts of interest Potential conflict of interest

None declared.

References

1 

Umemneku Chikere CM, Wilson K, Graziadio S, Vale L, Allen AJ. Diagnostic test evaluation methodology: A systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard - An update. PLoS One. 2019;14:e0223832. https://doi.org/10.1371/journal.pone.0223832 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/31603953

2 

Habibzadeh F, Habibzadeh P, Yadollahie M. On determining the most appropriate test cut-off value: the case of tests with continuous results. Biochem Med (Zagreb). 2016;26:297–307. https://doi.org/10.11613/BM.2016.034 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/27812299

3 

Habibzadeh F, Habibzadeh P. The likelihood ratio and its graphical representation. Biochem Med (Zagreb). 2019;29:020101. https://doi.org/10.11613/BM.2019.020101 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/31015780

4 

Kohn MA, Carpenter CR, Newman TB. Understanding the direction of bias in studies of diagnostic test accuracy. Acad Emerg Med. 2013;20:1194–206. https://doi.org/10.1111/acem.12255 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/24238322

5 

Altman DG, Bland JM. Diagnostic tests. 1: Sensitivity and specificity. BMJ. 1994;308:1552. https://doi.org/10.1136/bmj.308.6943.1552 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/8019315

6 

Newman TB, Kohn MA, editors. Evidence-Based Diagnosis Newman. 2nd ed.New York: Cambridge University Press; 2009.

7 

Habibzadeh F, Habibzadeh P, Yadollahie M. The apparent prevalence, the true prevalence. Biochem Med (Zagreb). 2022;32:020101. https://doi.org/10.11613/BM.2022.020101 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/35799992

8 

Enøe C, Georgiadis MP, Johnson WO. Estimation of sensitivity and specificity of diagnostic tests and disease prevalence when the true disease state is unknown. Prev Vet Med. 2000;45:61–81. https://doi.org/10.1016/S0167-5877(00)00117-3 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/10802334

9 

Toft N, Jørgensen E, Højsgaard S. Diagnosing diagnostic tests: evaluating the assumptions underlying the estimation of sensitivity and specificity in the absence of a gold standard. Prev Vet Med. 2005;68:19–33. https://doi.org/10.1016/j.prevetmed.2005.01.006 PubMed: http://www.ncbi.nlm.nih.gov/pubmed/15795013

10 

Champac V, Gervacio JG. Appendix A: Variance of a Function of Random Variables Approximated with Taylor’s Theorem. In: Champac V, Gervacio JG, eds. Timing Performance of Nanometer Digital Circuits Under Process Variations: Springer; 2018.

11 

Topping J, editor. Errors of Observation and their Treatment: Springer Science & Business Media; 2012.


This display is generated from NISO JATS XML with jats-html.xsl. The XSLT engine is libxslt.