2017. Cooper WA et al. – Intra- and inter-observer reproducibility assessment of PD-L1 biomarker in non-small cell lung cancer (NSCLC)

 

Cooper WA, Russell PA, Cherian M, Duhig EE, Godbolt D, Jessup PJ, et al. Intra- and inter-observer reproducibility assessment of PD-L1 biomarker in non-small cell lung cancer (NSCLC). Clin Cancer Res. 2017;23(16):4569-77.

https://www.ncbi.nlm.nih.gov/pubmed/28420726

Full article link

Abstract

Purpose: Reliable and reproducible methods for identifying PD-L1 expression on tumor cells are necessary to identify responders to anti-PD-1 therapy. We tested the reproducibility of the assessment of PD-L1 expression in non-small cell lung cancer (NSCLC) tissue samples by pathologists.

Experimental Design: NSCLC samples were stained with PD-L1 22C3 pharmDx kit using the Dako Autostainer Link 48 Platform. Two sample sets of 60 samples each were designed to assess inter- and intraobserver reproducibility considering two cut points for positivity: 1% or 50% of PD-L1 stained tumor cells. A randomization process was used to obtain equal distribution of PD-L1 positive and negative samples within each sample set. Ten pathologists were randomly assigned to two subgroups. Subgroup 1 analyzed all samples on two consecutive days. Subgroup 2 performed the same assessments, except they received a 1-hour training session prior to the second assessment.

Results: For intraobserver reproducibility, the overall percent agreement (OPA) was 89.7% [95% confidence interval (CI), 85.7-92.6] for the 1% cut point and 91.3% (95% CI, 87.6-94.0) for the 50% cut point. For interobserver reproducibility, OPA was 84.2% (95% CI, 82.8-85.5) for the 1% cut point and 81.9% (95% CI, 80.4-83.3) for the 50% cut point, and Cohen’s κ coefficients were 0.68 (95% CI, 0.65-0.71) and 0.58 (95% CI, 0.55-0.62), respectively. The training was found to have no or very little impact on intra- or interobserver reproducibility.

Conclusions: Pathologists reported good reproducibility at both 1% and 50% cut points. More adapted training could potentially increase reliability, in particular for samples with PD-L1 proportion, scores around 50%.