## Multireader multicase variance analysis for binary data

JOSA A, Vol. 24, Issue 12, pp. B70-B80 (2007)

http://dx.doi.org/10.1364/JOSAA.24.000B70

Acrobat PDF (701 KB)

### Abstract

Multireader multicase (MRMC) variance analysis has become widely utilized to analyze observer studies for which the summary measure is the area under the receiver operating characteristic (ROC) curve. We extend MRMC variance analysis to binary data and also to generic study designs in which every reader may not interpret every case. A subset of the fundamental moments central to MRMC variance analysis of the area under the ROC curve (AUC) is found to be required. Through multiple simulation configurations, we compare our unbiased variance estimates to naïve estimates across a range of study designs, average percent correct, and numbers of readers and cases.

© 2007 Optical Society of America

## 1. INTRODUCTION

1. D. D. Dorfman, K. S. Berbaum, and C. E. Metz, “Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method,” Invest. Radiol. **27**, 723–731 (1992). [CrossRef] [PubMed]

2. S. V. Beiden, R. F. Wagner, and G. Campbell, “Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis,” Acad. Radiol. **7**, 341–349 (2000). [CrossRef] [PubMed]

3. N. A. Obuchowski, S. V. Beiden, K. S. Berbaum, S. L. Hillis, H. Ishwaran, H. H. Song, and R. F. Wagner, “Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods,” Acad. Radiol. **11**, 980–995 (2004). [CrossRef] [PubMed]

4. B. D. Gallas, “One-shot estimate of MRMC variance: AUC,” Acad. Radiol. **13**, 353–362 (2006). [CrossRef] [PubMed]

*M*-alternative forced-choice (MAFC) experiment. Sensitivity is the percent of abnormals correctly identified, and specificity is the percent of normals correctly identified. We shall also refer to the abnormals as the signal-present cases (hypothesis 1,

*M*-alternatives within a trial contains the signal. So, in the typical two-alternative forced-choice (2AFC) task a trial is often a pair of images, one signal-absent and one signal-present, displayed side by side or in sequence. The outcome of the choice is binary; the reader is either right or wrong. The rate at which the reader correctly picks the alternative with the signal is the PC.

*g*specifies the case and

*γ*specifies the reader. This success outcome is 0 when reader

*r*incorrectly identifies case

*g*and 1 when the reader is successful.

## 2. THEORY AND METHODS

### 2A. Setup

*D*and a success matrix

*S*. Both matrices are

*i*stands for the

*r*for the

*D*is specified before data are collected; for a random study design, there is a protocol, or sampling scheme, that determines a distribution for the possible study designs.

### 2B. Population Quantities

#### 2B1. Fixed Study Designs

*D*is straightforward:Note that we use brackets ⟨…⟩ to denote expected values over all random variables, and the notation

*D*and average over the remaining random quantities: the readers and cases. The expected reader-averaged PC, as is shown above, has no dependence on the study design or the reader weights.

4. B. D. Gallas, “One-shot estimate of MRMC variance: AUC,” Acad. Radiol. **13**, 353–362 (2006). [CrossRef] [PubMed]

*γ*or

*g*when needed to indicate weights treating each reader equally or weights treating each reading equally. The moments themselves are nothing more than second moments (

*γ*and case

*g*.

#### 2B2. Special Cases and Random Study Designs

### 2C. Variance Estimates

#### 2C1. Fixed Study Design

*γ*or

*g*to

*r*and

*r*. We also normalize the weights in the sum over

#### 2C2. Random Study Design

#### 2C3. Naïve Estimates

### 2D. Simulation

#### 2D1. Model

6. C. A. Roe and C. E. Metz, “Dorfman–Berbaum–Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation,” Acad. Radiol. **4**, 298–303 (1997). [CrossRef] [PubMed]

*Φ*is the cumulative distribution function (cdf) of the standard normal. Furthermore, since the only randomness in this last expression comes from the currently fixed reader effects

*τ*go to infinity. The second option starts over, eliminating the condition on

*r*in Eq. (20). Noticing that

6. C. A. Roe and C. E. Metz, “Dorfman–Berbaum–Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation,” Acad. Radiol. **4**, 298–303 (1997). [CrossRef] [PubMed]

#### 2D2. Simulation Configurations

6. C. A. Roe and C. E. Metz, “Dorfman–Berbaum–Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation,” Acad. Radiol. **4**, 298–303 (1997). [CrossRef] [PubMed]

7. M. Schiffman and M. E. Adrianza, “ASCUS-LSIL triage study: design, methods and characteristics of trial participants,” Acta Cytol. **44**, 726–742 (2000). [CrossRef] [PubMed]

8. J. Jeronimo, L. S. Massad, and M. Schiffman, “Visual appearance of the uterine cervix: correlation with human papillomavirus detection and type,” Am. J. Obstet. Gynecol. **97**, 47.e1–47.e8 (2007). [CrossRef]

## 3. SIMULATION RESULTS AND DISCUSSION

*Expected Variance*. Before we assess our estimates, it is worthwhile to show the variances expected from all the experiments. Figure 3 shows the population variances for all the high-PC (0.96) simulation configurations compared to the expected values (from MC averaging) of the naïve variance estimates. The expected values of our moment estimators are unbiased and thus equal the population variances. At the bottom of each column of plots, the

*x*axis is labeled according to the size of the simulated experiment. The 27 different components of variance configurations are then explored within each experiment size according to the reader component of variance

*Root-mean-square error*. Here we assess the variance estimators with the relative root-mean-square error (RRMSE), orwhere the first term in the parentheses is the squared bias of the variance estimate and the second term is the variance of the variance estimate; the term in front of the parentheses scales the RMSE to the truth. Thus, the scale of RRMSE can be interpreted as the total error given as a fraction of what we are trying to estimate. Of course, since our moment estimator is unbiased, the RRMSE can also be interpreted as just the standard deviation of our estimator relative to what is being estimated.

*x*axis is labeled according to the size of the simulated experiment, while the different variance configurations are explored within each experiment size, sorted by the reader component of variance

## 4. CONCLUSIONS AND FUTURE WORK

4. B. D. Gallas, “One-shot estimate of MRMC variance: AUC,” Acad. Radiol. **13**, 353–362 (2006). [CrossRef] [PubMed]

12. W. A. Yousef, R. F. Wagner, and M. H. Loew, “Assessing classifiers from two independent data sets using ROC analysis: a nonparametric approach,” IEEE Trans. Pattern Anal. Mach. Intell. **28**, 1809–1817 (2006). [CrossRef] [PubMed]

## APPENDIX A: SECOND-MOMENT, FIXED STUDY DESIGN

*r*,

*i*,

**13**, 353–362 (2006). [CrossRef] [PubMed]

*r*,

*i*,

*i*equal

*i*of

*i*and

## APPENDIX B: COMPONENTS OF VARIANCE

1. D. D. Dorfman, K. S. Berbaum, and C. E. Metz, “Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method,” Invest. Radiol. **27**, 723–731 (1992). [CrossRef] [PubMed]

2. S. V. Beiden, R. F. Wagner, and G. Campbell, “Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis,” Acad. Radiol. **7**, 341–349 (2000). [CrossRef] [PubMed]

14. C. A. Roe and C. E. Metz, “Variance-component modeling in the analysis of receiver operating characteristic (ROC) index estimates,” Acad. Radiol. **4**, 587–600 (1997). [CrossRef] [PubMed]

*γ*averaged over all cases in the population, orWe shall denote the mean and variance of this distribution, respectively, as These two quantities arise naturally from the variance obtained when estimating the reader-specific performance of a random reader

*γ*reading a random set of

*g*averaged over all readers in the population, orAs with reader skill, we have a mean and variance of the case difficulty,

*G*denotes a set of cases;

*et al.*[15

15. H. H. Barrett, M. A. Kupinski, and E. Clarkson, “Probabilistic Foundations of the MRMC Method,” Proc. SPIE **5749**, 21–31 (2005). [CrossRef]

16. E. Clarkson, M. A. Kupinski, and H. H. Barrett, “A probabilistic model for the MRMC method. Part 1. theoretical development,” Acad. Radiol. **13**, 1410–1421 (2006). [CrossRef] [PubMed]

1. | D. D. Dorfman, K. S. Berbaum, and C. E. Metz, “Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method,” Invest. Radiol. |

2. | S. V. Beiden, R. F. Wagner, and G. Campbell, “Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis,” Acad. Radiol. |

3. | N. A. Obuchowski, S. V. Beiden, K. S. Berbaum, S. L. Hillis, H. Ishwaran, H. H. Song, and R. F. Wagner, “Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods,” Acad. Radiol. |

4. | B. D. Gallas, “One-shot estimate of MRMC variance: AUC,” Acad. Radiol. |

5. | B. D. Gallas and D. G. Brown, “Reader studies for validation of CAD systems,” submitted to Neural Networks. |

6. | C. A. Roe and C. E. Metz, “Dorfman–Berbaum–Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation,” Acad. Radiol. |

7. | M. Schiffman and M. E. Adrianza, “ASCUS-LSIL triage study: design, methods and characteristics of trial participants,” Acta Cytol. |

8. | J. Jeronimo, L. S. Massad, and M. Schiffman, “Visual appearance of the uterine cervix: correlation with human papillomavirus detection and type,” Am. J. Obstet. Gynecol. |

9. | S. L. Hillis and K. S. Berbaum, “Monte Carlo validation of the Dorfman–Berbaum–Metz method using normalized pseudovalues and less data-based model simplification,” Acad. Radiol. |

10. | S. L. Hillis, N. A. Obuchowski, K. M. Schartz, and K. S. Berbaum, “A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data,” Stat. Med. |

11. | X. Song and X.-H. Zhou, “A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data,” Biostatistics |

12. | W. A. Yousef, R. F. Wagner, and M. H. Loew, “Assessing classifiers from two independent data sets using ROC analysis: a nonparametric approach,” IEEE Trans. Pattern Anal. Mach. Intell. |

13. | M. S. Pepe, |

14. | C. A. Roe and C. E. Metz, “Variance-component modeling in the analysis of receiver operating characteristic (ROC) index estimates,” Acad. Radiol. |

15. | H. H. Barrett, M. A. Kupinski, and E. Clarkson, “Probabilistic Foundations of the MRMC Method,” Proc. SPIE |

16. | E. Clarkson, M. A. Kupinski, and H. H. Barrett, “A probabilistic model for the MRMC method. Part 1. theoretical development,” Acad. Radiol. |

**OCIS Codes**

(000.5490) General : Probability theory, stochastic processes, and statistics

(110.3000) Imaging systems : Image quality assessment

(330.5510) Vision, color, and visual optics : Psychophysics

**History**

Original Manuscript: April 13, 2007

Revised Manuscript: July 7, 2007

Manuscript Accepted: August 2, 2007

Published: September 28, 2007

**Virtual Issues**

Vol. 3, Iss. 1 *Virtual Journal for Biomedical Optics*

**Citation**

Brandon D. Gallas, Gene A. Pennello, and Kyle J. Myers, "Multireader multicase variance analysis for binary data," J. Opt. Soc. Am. A **24**, B70-B80 (2007)

http://www.opticsinfobase.org/josaa/abstract.cfm?URI=josaa-24-12-B70

Sort: Year | Journal | Reset

### References

- D. D. Dorfman, K. S. Berbaum, and C. E. Metz, "Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method," Invest. Radiol. 27, 723-731 (1992). [CrossRef] [PubMed]
- S. V. Beiden, R. F. Wagner, and G. Campbell, "Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis," Acad. Radiol. 7, 341-349 (2000). [CrossRef] [PubMed]
- N. A. Obuchowski, S. V. Beiden, K. S. Berbaum, S. L. Hillis, H. Ishwaran, H. H. Song, and R. F. Wagner, "Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods," Acad. Radiol. 11, 980-995 (2004). [CrossRef] [PubMed]
- B. D. Gallas, "One-shot estimate of MRMC variance: AUC," Acad. Radiol. 13, 353-362 (2006). [CrossRef] [PubMed]
- B. D. Gallas and D. G. Brown, "Reader studies for validation of CAD systems," submitted to Neural Networks.
- C. A. Roe and C. E. Metz, "Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation," Acad. Radiol. 4, 298-303 (1997). [CrossRef] [PubMed]
- M. Schiffman and M. E. Adrianza, "ASCUS-LSIL triage study: design, methods and characteristics of trial participants," Acta Cytol. 44, 726-742 (2000). [CrossRef] [PubMed]
- J. Jeronimo, L. S. Massad, and M. Schiffman, "Visual appearance of the uterine cervix: correlation with human papillomavirus detection and type," Am. J. Obstet. Gynecol. 97, 47.e1-47.e8 (2007). [CrossRef]
- S. L. Hillis and K. S. Berbaum, "Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification," Acad. Radiol. 12, 1534-1541 (2005). [CrossRef] [PubMed]
- S. L. Hillis, N. A. Obuchowski, K. M. Schartz, and K. S. Berbaum, "A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data," Stat. Med. 24, 1579-1607 (2005). [CrossRef] [PubMed]
- X. Song and X.-H. Zhou, "A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data," Biostatistics 6, 303-312 (2005). [CrossRef] [PubMed]
- W. A. Yousef, R. F. Wagner, and M. H. Loew, "Assessing classifiers from two independent data sets using ROC analysis: a nonparametric approach," IEEE Trans. Pattern Anal. Mach. Intell. 28, 1809-1817 (2006). [CrossRef] [PubMed]
- M. S. Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction (Oxford U. Press, 2003).
- C. A. Roe and C. E. Metz, "Variance-component modeling in the analysis of receiver operating characteristic (ROC) index estimates," Acad. Radiol. 4, 587-600 (1997). [CrossRef] [PubMed]
- H. H. Barrett, M. A. Kupinski, and E. Clarkson, "Probabilistic Foundations of the MRMC Method," Proc. SPIE 5749, 21-31 (2005). [CrossRef]
- E. Clarkson, M. A. Kupinski, and H. H. Barrett, "A probabilistic model for the MRMC method. Part 1. theoretical development," Acad. Radiol. 13, 1410-1421 (2006). [CrossRef] [PubMed]

## Cited By |
Alert me when this paper is cited |

OSA is able to provide readers links to articles that cite this paper by participating in CrossRef's Cited-By Linking service. CrossRef includes content from more than 3000 publishers and societies. In addition to listing OSA journal articles that cite this paper, citing articles from other participating publishers will also be listed.

« Previous Article | Next Article »

OSA is a member of CrossRef.