Interpretation of response categories in patient-reported rating scales: a controlled study among people with Parkinson's disease

Ida Knutsson, Helena Rydström, Jan Reimer, Per Nyberg, Peter Hagell

BACKGROUND: Unambiguous interpretation of ordered rating scale response categories requires distinct meanings of category labels. Also, summation of item responses into total scores assumes equal intervals between categories. While studies have identified problems with rating scale response category functioning there is a paucity of empirical studies regarding how respondents interpret response categories. We investigated the interpretation of commonly used rating scale response categories and attempted to identify distinct and roughly equally spaced response categories for patient-reported rating scales in Parkinson's disease (PD) and age-matched control subjects.

METHODS: Twenty-one rating scale response categories representing frequency, intensity and level of agreement were presented in random order to 51 people with PD (36 men; mean age, 66 years) and 36 age-matched controls (14 men; mean age, 66). Respondents indicated their interpretation of each category on 100-mm visual analog scales (VAS) anchored by Never--Always, Not at all--Extremely, and Totally disagree--Completely agree. VAS values were compared between groups, and response categories with mean values and non-overlapping 95% CIs corresponding to equally spaced locations on the VAS line were sought to identify the best options for three-, four-, five-, and six-category scales.

RESULTS: VAS values did not differ between the PD and control samples (P = 0.286) or according to educational level (P = 0.220), age (P = 0.220), self-reported physical functioning (P = 0.501) and mental health (P = 0.238), or (for the PD sample) PD duration (P = 0.213) or presence of dyskinesias (P = 0.212). Attempts to identify roughly equally spaced response categories for three-, four-, five-, and six-category scales were unsuccessful, as the 95% CIs of one or several of the identified response categories failed to include the criterion values for equal distances.

CONCLUSIONS: This study offers an evidence base for selecting more interpretable patient-reported rating scale response categories. However, problems associated with raw rating scale data, primarily related to their ordinal structure also became apparent. This argues for the application of methodologies such as Rasch measurement. Rating scale response categories need to be treated with rigour in the construction and analysis of rating scales.

