John E. Ware, Jr., PhD
Chief Science Officer and Chairman, JWRG, Worcester,MA, USA

I very much appreciate this opportunity to write during the celebration of Mapi’s 40th anniversary and to share some memories from patient-centered research during that time as well as some comments on what I believe is important for us to keep in mind going forward.
Coincidentally, Mapi and I began our journeys in patient-centered measurement around the same time in the early 1970s. Our paths crossed after I left the California-based RAND Corporation more than 15 years later. I moved to The Health Institute (THI) at Tufts Medical Center in Boston to begin to introduce patient-reported outcome (PRO) measurement advances achieved in the Health Insurance Experiment (HIE) and Medical Outcomes Study (MOS) into healthcare delivery systems. While the HIE proved that health-related quality of life (HRQOL) surveys could be self-administered, the MOS took lessons learned from the HIE to a new level by developing surveys that measured a core health profile that was applicable to both well and sick populations. At THI, we developed more practical surveys, along with products such as user manuals that were needed for their use.


Meeting Bernard Jambon and others at Mapi was very timely. We joined together to teach how to conceptualize and measure PROs for purposes of population health surveys and clinical trials worldwide. I also recall asking Bernard when we first met whether he thought PROs had progressed enough to begin translating and culturally adapting the improving surveys, hoping that he would say “yes” and would be interested in joining such a project. The International Quality of Life Assessment (IQOLA) Project began soon thereafter, starting in 1991 as a small project to translate the SF-36 in five countries and later expanding greatly. As acknowledged in a 1998 issue of Journal of Clinical Epidemiology, Bernard Jambon, Catherine Acquadro, Katrin Conway and others at Mapi participated at the core of IQOLA, coordinating our international meetings and the project’s overall administration during the early years. Also acknowledged in JCE is the medical products industry beginning with unrestricted grants for the IQOLA Project from the two founding IQOLA sponsors, Glaxo Wellcome and Schering-Plough. They were soon joined by 11 associate sponsors and by 50 other medical products companies.


As our focus changed from using PRO surveys in research to their real-world application, our psychometric methods also were evolving. Until the early 1990s, our measurement development was based on classical psychometrics. Starting in the early 1990s, we began to implement item response theory (IRT) methods, which enabled us to more rigorously to evaluate translations and develop better survey items and better scoring methods. A by-product of growing importance today is the notion of quantifying and cross-calibrating a family of directly comparable forms measuring each health domain, the shortest of which is a single-item global measure of that domain, the longest of which is a full bank of items, with multi-item static (pre-selected) forms and computer adaptive tests (CAT), which select and administer only the items that are most informative for each respondent, in between. Such a family of static and CAT measures is better suited to satisfying the requirements of different PRO measurement applications while maintaining direct comparability of scores in relation to a common underlying metric.


Our first large-scale, Internet-based application of the IRT/CAT approach was another industry-sponsored project that developed more practical and cross-calibrated measures of headache impact, as published in nine papers in Quality of Life Research in 2003. During that project I realized that headache-specific PROs should better represent the content domains of generic PROs. The same holds for nearly all chronic conditions; however, most disease-specific surveys capture only half or fewer of the most frequently measured HRQOL domains. In response to this apparent shortcoming, we are presently evaluating Disease-specific Quality of Life Scales (QDIS®) which expand and standardize the content and scoring of HRQOL impact across diseases. The project is extending IRT and CAT to rapidly and reliably standardize an overall summary metric quantifying how much each disease limits a patient’s daily activities and overall HRQOL. The breakthrough that made this practical is Internet-based tools that are programmed to efficiently measure the HRQOL impact attributed to each disease, but which also are standardized to allow disease-specific scores to be aggregated into an estimate of overall HRQOL impact when multiple comorbid conditions (MCC) are present. For the first time, individualized as opposed to population-based weighting of MCC impact may be possible in PRO-based comparative effectiveness research.


Thorough HRQOL measurement validation is the basis for interpreting PRO differences and for determining whether significant differences are also important. However, over the years of following and contributing to the HRQOL literature and participating in numerous regulatory and other reviews of clinical trial results, I began to realize that it often is problems with the internal validity of study designs and not measurement validity that weakens causal attributions of HRQOL outcomes to a specific treatment. Dan Frendl, an MD-PhD student at UMass Medical School and I published a 2014 Medical Care article reporting results from a systematic review of 185 very well controlled drug trials documenting primary clinical and HRQOL outcomes measured by the SF-36 Health Survey over a 17-year period (1995-2011). We estimate that more than half were multinational. In more than 80% of trials, the SF-36 was responsive enough to capture the benefits of improvements in laboratory tests and other clinical outcomes caused by drug therapies. However, wide variability was observed in the rates at which drug treatments achieved accepted minimally important different (MID) thresholds (about 58% overall; 0-100% across clinical areas). These results reflect variations in treatment efficacy more than measurement validity and responsiveness.


One challenge for the field going forward is taking advantage of advances in data collection and measurement while maintaining score comparability with the growing body of evidence necessary for meaningful interpretation of outcomes.