Local dependence in health outcome measurement: Lessons from the 8-item Parkinson’s disease questionnaire (PDQ-8)

Introduction: The PDQ-8 is a widely used patient-reported health outcome measure in
Parkinson’s disease (PD) research and practice. As such, it is influential in clinical and
policy decision-making processes that impact the care of persons with PD. However, its
target variable is undefined and rigorous testing of its measurement properties is lacking.
This study examined the measurement properties of the PDQ-8 to understand its role as
an outcome measure.

Methods: Complete PDQ-8 item response data from 1289 people with PD from the
Swedish national registry for Parkinson’s disease were used. PDQ-8 items represent eight
health-related problems and have five ordered response categories (“never” to “always”;
scored 0-4). Data were analyzed according to the Rasch model using the RUMM2030plus
software, focusing on targeting, reliability, response category functioning, model fit,
differential item functioning (DIF) by sex and age, and local dependence (LD).

Results: The sample represented all stages of PD severity (stage 0: No signs of disease; stage 5: confined to bed/wheelchair unless aided) and consisted of 64% men. The
mean (SD) age was 71 (9.5) years. The mean (SD) person location was -1.30 (0.91)
logits. Reliability was 0.67. There were disordered thresholds for all items but one. Four
items had significant Bonferroni-corrected chi-square statistics (P0.001), of which two
had large fit residuals (-3.26 and 4.99). There were uniform DIF by age (two items) and
sex (one item). Residual correlations identified LD for three item pairs. Subtests were
created stepwise to absorb LD. This revealed additional LD, leading to identification of
two conceptually logical subtests (four items each). Subtests showed a mean (SD) person
location of -0.98 (0.7) logits, reduced reliability (0.58), ordered response categories, improved fit residuals (≤ ±0.47; P < 0.001), and DIF by age for one subtest but no DIF by sex.

Discussion: Identifying and resolving LD improved several aspects of its measurement properties but revealed inferior reliability and targeting remained suboptimal.
Together with unclear construct validity, this argues against the appropriateness of the
PDQ-8 as an outcome measure. These experiences demonstrate the central role of LD and
the importance of considering LD when testing rating scale measurement properties.
