Intra and Inter-Rater Reliability and Convergent Validity of FIT-HaNSA in Individuals with Grade П Whiplash Associated Disorder
Michael Pierrynowski1, Colleen McPhee2, Saurabh P. Mehta3, *, Joy C. MacDermid1, 4, Anita Gross1
Identifiers and Pagination:Year: 2016
First Page: 179
Last Page: 189
Publisher ID: TOORTHJ-10-179
Article History:Received Date: 16/09/2015
Revision Received Date: 25/02/2016
Acceptance Date: 28/02/2016
Electronic publication date: 13/06/2016
Collection year: 2016
open-access license: This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/legalcode), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
Whiplash-Associated Disorders (WAD) are common following a motor vehicle accident. The Functional Impairment Test - Hand, and Neck/Shoulder/Arm (FIT-HaNSA) assesses upper extremity physical performance. It has been validated in patients with shoulder pathology but not in those with WAD.
Establish the Intra and inter-rater reliability and the known-group and construct validity of the FIT-HaNSA in patients with Grade II WAD (WAD2).
Twenty-five patients with WAD2 and 41 healthy controls were recruited. Numeric Pain Rating Scale (NPRS), Neck Disability Index (NDI), Disabilities of the Arm, Shoulder and Hand (DASH), cervical range of motion (CROM), and FIT-HaNSA were completed at two sessions conducted 2 to 7 days apart by two raters. Intraclass correlation coefficients (ICC) were used to describe Intra and inter-rater reliability. Spearman rank correlation coefficients (ρ) were used to quantify the associations between scores of the FIT-HaNSA and other measures in the WAD2 group (convergent construct validity).
The Intra and inter-ICCs for the FIT-HaNSA scores ranged from 0.88 to 0.89 in the control group and 0.78 to 0.85 in the WAD2 group. Statistically significant differences in FIT-HaNSA performance between the two groups suggested known group construct validity (P < 0.001). The correlations between the NPRS, NDI, DASH, CROM and FIT-HaNSA were generally poor (ρ < 0.4).
The study results indicate that the total FIT-HaNSA score has good Intra and inter-rater reliability and the construct validity in WAD2 and healthy controls.
Whiplash-Associated Disorders (WAD) are the most common type of injuries following a motor vehicle accident . Grade I and II injuries represent 90% of WAD claims . Grade II WAD (WAD2) cases persisting beyond two to six months result in most of the financial burden and are warning signs of impending chronicity . The incidence of WAD2 in Western countries is 300 per 100, 000 inhabitants .
Evidence-based clinical practice guidelines suggest that the physical examination of WAD2 patients should include tests of inspection, range of motion, strength, palpation, provocation, muscular stability and cervical proprioception . Research also suggests that apart from self-report ability measures (e.g. the Neck Disability Index (NDI) or the numeric pain rating scale (NPRS)), measures assessing physical performance should also be utilized while assessing WAD 2 patients . In a clinical setting, physical performance can be assessed by testing a patient’s ability to execute a standardized activity in a standardized environment . Usually, time to complete the activity or number of repetitions performed are used to quantify the physical performance . Conversely, self-report measures examine patients’ perception and experience of their ability to perform functional tasks . In patient groups with various musculoskeletal diagnoses, such as advanced knee or hip osteoarthritis and chronic low back pain, poor to fair concordance between physical performance and self-report measures of ability suggest that they assess different perspectives of function [8, 9]. As such both physical performance tests and self-report measures, are complementary, and should both be used [8-10]
The Functional Impairment Test - Hand, and Neck/Shoulder/Arm (FIT-HaNSA) is a relatively new physical performance test that measures the ability of a patient to perform upper limb reach, object grip and manipulation, and sustained overhead positioning . FIT-HaNSA development and psychometric evaluation have been described elsewhere . Excellent test-retest reliabilities (intraclass correlation coefficient (ICC) > 0.96) for the FIT-HaNSA were reported in controls and individuals with shoulder disorders [11, 12]. In addition, the FIT-HaNSA scores have also demonstrated good discriminant validity as well as expected convergent (r = -0.73 to -0.83 with the Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire) and divergent relationships (strength (r = 0.12 to 0.66) and shoulder range of motion (r = 0.45 to 0.64)) in patients with shoulder pathology [11, 12].
FIT-HaNSA has been used with patients with known neck and shoulder pathology [11, 12], however its psychometric properties have not been evaluated in the WAD2 population. Additionally, previous research studies have not examined the intra and inter- rater reliability of FIT-HaNSA. The purposes of this study are to estimate: 1) inter and Intrarater reliability, 2) known group construct validity, and 3) convergent validity with self-report measures of pain, ability, and impairment of FIT-HaNSA in samples of participants with and without WAD2.
MATERIAL AND METHODS
Forty-one control participants were recruited through public advertisement at a University and a Hospital. Control participants were included if they were over the age of 18, fluent in writing and speaking English and were not experiencing head, neck or upper extremity pain at the time of testing. Exclusion criteria for the control participants were: past history of a motor vehicle accident requiring rehabilitative treatment and concurrent medical concern that could significantly alter performance (e.g. past or present cervical disc herniation, cervical fracture or instability diagnosed through imaging techniques, previous neck or upper extremity surgery, rotator cuff tear diagnosed through ultrasound, neurological conditions affecting upper extremity, rheumatoid arthritis or fibromyalgia).
Twenty-five participants with WAD2 were consecutively recruited from a private outpatient physiotherapy practice during a one year period. Eligibility was determined by one of the clinic’s two physiotherapists. All of the inclusion and exclusion criteria used to enroll the control participants were followed except that the WAD2 participants were experiencing neck pain (with or without head, face and arm pain) as the result of a motor vehicle accident greater than six weeks ago and were classified as WAD2 using the Spitzer criteria .
Two physical therapists, both Fellows of the Canadian Academy of Manual Therapists, administered all measures to the WAD2 patients. The physical therapists attended a 90 minute training session one month prior to the start of the study to become proficient in administering the FIT-HaNSA and to review documentation procedures . During the training session, the raters tested non-study volunteers. A second pair of raters assessed the control participants. These raters were instructed by the developer of FIT-HaNSA (JM) and a physical therapist (CM) regarding the administration of the pain, ability and impairment measures.
Measures for pain (NPRS), self-reported disability (NDI and DASH), movement impairment (active cervical range of motion (ROM)) and physical performance (FIT-HaNSA) were obtained.
NPRS: The NPRS is commonly used in assessing neck pain intensity [13, 14]. A patient is asked to rate his pain intensity over a 24 hour period on an 11 point scale where 0 indicates “no pain” and 10 indicates the “worst pain imaginable”. The NPRS demonstrates fair to good reliability (ICC = 0.64 to 0.86) when used in neck pain populations [14, 15]. The minimal detectable change (MDC90) of the NPRS ranges from 1.3 to 2 points in a mixed orthopedic group including chronic neck pain [13, 15].
NDI: The NDI is a 10-item disease-specific self-report measure of function that captures perceived disability resulting from neck pain . Each item is measured on a six point scale from zero (no disability) to five (complete disability) with total score between 0 to 50 . The NDI total score can be interpreted, in regards to level of disability, as follows: 0 to 4 = none; 5 to 14 = mild; 15 to 24 = moderate; 25 to 34 = severe and over 34 is considered complete [16, 17]. The NDI has fair to good test retest reliability (ICC values between 0.50 to 0.98) in patient with different neck diagnoses and its MDC90 is known to be between 5 to 10 points .
DASH: The DASH is 30-item region-specific outcome measure developed to evaluate upper extremity functional status in presence of musculoskeletal condition . Each item is scored on a scale of one to five with the total score range of 0 and 100 with lower scores indicating greater ability. The reliability of the DASH is good (ICC of 0.90) and its MDC90 is10.2 points . It has been validated for use for patients with neck pain .
CROM: Cervical ROM was measured using a mechanical protractor (CROM) (Performance Attainment Associates, St Paul, MN) that quantified cervical angular ROM, in degrees, in the sagittal (flexion-extension), frontal (right-left bend) and transverse (right-left axial rotation) planes. Measurement of cervical ROM is common in evaluation of patients with neck pain . The reliability of the CROM is good (ICC = 0.80) in patients with neck pain  Fletcher and Bandy  reported an MDC90 of 5˚ to 10˚ for each plane of motion for cervical ROM.
FIT-HaNSA: The FIT-HaNSA protocol consists of three timed tasks and each task is performed for a maximum of 300 seconds (s) with approximately 30 s pause between them (set-up time for next task). Task 1 (waist-up) requires the patient to alternately “grab, lift, move and place” three 1000 g containers located on waist level and 25 cm above waist level shelves, using their affected arm, at a metronome pace of 60 beats per minute for 300 s or until they felt unable to continue. The time to complete Task 1 is measured using a stopwatch. Task 2 (eye-down) is identical to Task 1 except that the two shelves are placed at eye-level and 25 cm below. Task 3 (overhead work) requires a patient to repeatedly screw and unscrew bolts in a sagittal plane oriented plate positioned at eye-level using both arms. The FIT-HaNSA tasks have demonstrated excellent test retest reliability (ICC > 0.89) in healthy controls and those with shoulder pathology [11, 12].
Four raters, two at each site, were randomized prior to the start of the study to determine who would assess each participant at each of the two test sessions. Approximately, half of the participants were assessed by the same rater at the two test sessions - the other half was assessed by different raters at the second test session. Testing occurred in a physiotherapy clinic or in a university laboratory using the JobSim System (JTECH Medical, Salt lake City, UT). The WAD2 participants completed the study protocol before the start of treatment on the test day. During each participant’s first session, information regarding his/her age, sex, height, mass, and accident date (if a WAD2 participant) was recorded. Three self-report measures (NPRS, NDI, DASH) were completed by the participant. Shortly thereafter, the rater assessed cervical ROM using the CROM, and then FIT-HaNSA was administered. Finally, the same rater scored the self-report measures and placed the participants data in an envelope then sealed. These envelopes were coded such that WAD2 or control group identity was blinded.
The participant was then scheduled to attend a second session 2 to 7 days following the first session. The WAD2 participants continued their medications and prescribed therapy between the two test sessions. The control participants were requested to refrain from non-typical activities. During the second session, the NPRS, NDI, DASH, CROM and FIT-HaNSA were administered. Again, the participant’s data were placed in an envelope and sealed.
The research ethics board at McMaster University, Hamilton, Canada approved the study.
Two authors (MP, CM) performed data entry, screening, and inspection of scatterplots and histograms. Descriptive statistics including means and standard deviations were calculated for age, height, mass and duration of symptoms to determine the baseline characteristics of the WAD2 and control groups. Similar descriptive statistics were also calculated for the scores on the outcome measures for the two groups at each session.
The ICC were calculated to examine the reproducibility of the FIT-HaNSA tasks. Separate analyses were conducted for the same raters, each performing a FIT-HaNSA assessment at two sessions (Intrarater reliability) and different raters, each performing a FIT-HaNSA assessment at two sessions (inter-rater reliability). ICCs were calculated for each of the three FIT-HaNSA tasks and the total score for both the WAD2 and control groups. ICC values of > 0.7 were considered suggestive of good reliability .
The standard error of measurement (SEM) quantifies the error associated with a single score . SEM was determined by calculating the square root of the mean square error term from the analysis of variance (ANOVA) tables. The SEM value was used to calculate the 90% minimal detectable change (MDC90 = 1.65 x SEM x √2) which is a statistic used to assess whether the change in a participant’s score over time is a true change versus random error .
Bland and Altman analyses were used to determine the limits of agreement associated with FIT-HaNSA scores. An ICC is influenced by the size of the between-subjects variance , whereas Bland and Altman analyses examine within-subject variability or random error. Bland and Altman analyses plot the differences in scores between test and retest against the mean of test and retest scores. The mean difference and the standard deviation of the differences are used to construct 95% limits of agreement (LOA95) .
To construct validity for a human group, a repeated measures mixed design ANOVA was used to determine if there were significant differences in performance between the WAD2 and control groups across sessions for the three tasks and total FIT-HaNSA scores. The main effect for the between-group factor was deemed statistically significant at the P < 0.05 level.
To measure convergent construct validity Spearman rank correlation coefficients (ρ) were used to quantify the magnitude of the association between FIT-HaNSA and the NPRS, NDI, DASH and CROM in the WAD2 group on each occasion. The correlations were classified as poor if |ρ| < 0.40, fair (0.4 ≤ |ρ| ≤ 0.75) and good (|ρ| > 0.75) .
SPSS Version 18.0 (SPSS Inc, Chicago, IL) was used for all analyses.
In the WAD2 group, there were nineteen females (76%) and 6 males (24%) and in the control group there were twenty-nine females (71%) and twelve males (29%). The WAD2 and control groups were similar in age, height, and mass (Table 1). The mean duration of symptoms in the WAD2 group was 12.4 months suggesting chronicity of the neck pain. Examination of the NPRS, NDI, DASH and CROM scores further characterize the groups (Table 2). The NPRS, NDI and DASH scores for the WAD2 compared to the control group suggests greater variability as indicated by the larger standard deviations. The mean CROM scores were relatively stable between sessions for each group.
Demographics (mean, standard deviation) of the WAD2 and control groups.
(n = 25)
(n = 41)
|Age (years)||36.4 (13,8)||34.0 (14.2)||p = 0.58|
|Height (centimeters)||165.9 (11.4)||169.6 (9.7)||p = 0.19|
|Mass (kilograms)||71.1 (20.5)||75.7 (26.6)||p = 0.27|
|Duration of Symptoms (months)||12.4 (13.5)|
The mean scores for three tasks and the total score were lower for the WAD2 group compared to the controls (see Table 2). The average FIT-HaNSA scores between two sessions in the WAD2 group differed by 12 s for Task 1, 5 s for Task 2, and 0 s for Task 3. WAD2 patients and control group performed best on Task 1 followed by Task 3 and scored poorest on Task 2. Two WAD2 and 22 control participants demonstrated ceiling effects (achieved scores of 300 s). No WAD2 or control participant scored 0 s.
Descriptive statistics (mean, standard deviation) for the self-report, cervical motion and FIT-HaNSA scores for the WAD2 and control groups at two test sessions.
|Measure||WAD2 (n = 25)||Control (n = 41)|
|Session 1||Session 2||Session 1||Session 2|
|NPRS (%)||51 (21)||54 (26)||1 (4)||2 (4)|
|NDI (%)||43.8 (12.6)||42.9 (14.6)||3.4 (4.2)||2.4 (4.1)|
|DASH (%)||38.9 (15.7)||38.3 (16.0)||1.7 (2.3)||1.2 (2.0)|
|CROMrr (°)||61 (12)||62 (14)||72 (10)||74 (7)|
|CROMlr (°)||63 (13)||63 (12)||74 (8)||74 (8)|
|CROMf (°)||45 (12)||44 (11)||54 (8)||57 (9)|
|CROMe (°)||50 (17)||51 (18)||78 (15)||73 (15)|
|CROMrb (°)||33 (10)||34 (12)||47 (13)||47 (9)|
|CROMlb (°)||35 (9)||35 (9)||50 (12)||50 (12)|
|FIT-HaNSA Task 1 (s)||201 (80)||213 (75)||296 (20)||294 (36)|
|FIT-HaNSA Task 2 (s)||117 (75)||122 (66)||232 (89)||251 (78)|
|FIT-HaNSA Task 3 (s)||170 (79)||170 (75)||281 (49)||277 (59)|
|FIT-HaNSA Total (s)||488 (208)||506 (189)||809 (132)||820 (141)|
NPRS = Numeric Pain Rating Scale; NDI = Neck Disability Index; DASH = Disability of the Arm, Shoulder and Hand, CROMrr = cervical right axial range of motion, CROMlr = cervical left axial range of motion, CROMf = cervical flexion range of motion, CROMe = cervical extension range of motion, CROMrb = cervical right bend range of motion, CROMlb = cervical left bend range of motion
The intra and inter-rater reliabilities were fair to excellent for Tasks 1, 2 and 3 and the total score (see Table 3) when the raters assessed the control participants. The reliability coefficients were lower (ICC of 0.54 to 0.80) when the WAD2 participants were tested. The total FIT-HaNSA scores for the WAD2 had larger SEM and MDC90 values compared to the control group. The WAD2 group’s SEM and MDC90 were 76 and 176 s, respectively for the inter-rater testing compared to the control group’s 41 and 95 s, respectively.
Intra and inter-rater reliabilities of Task 1, 2, 3, and total FIT-HaNSA scores for the WAD2 (n=18) and control (n=41) participants. Different numbers of participants (indicated below) were included in the Intra and inter-rater reliability calculations. Intraclass correlation coefficients (ICC2,1) with 95% confidence intervals (in parentheses) are presented.
|Task 1||Task 2||Task 3||Total FIT-HaNSA|
(n = 25)
(n = 11)
(0.27 - 0.91)
(0.21 - 0.91)
(0.20 - 0.90)
(0.37 - 0.94)
(n = 14)
(0.01 - 0.82)
(0.39 - 0.91)
(0.59 - 0.95)
(n = 41)
(0.48 - 0.90)
(0.44 - 0.73)
(0.92 - 0.99)
(0.72 - 0.95)
(0.52 - 0.89)
(0.85 - 0.97)
(0.77 - 0.95)
The Bland and Altman plot for the WAD2 group’s total FIT-HaNSA scores (Fig. 1) indicated a 26 s bias. The standard deviation of the difference was 124 s for the WAD2 group and the 95% LOA was 248 s. The Bland and Altman plot for the control groups’ total FIT-HaNSA score indicated some systematic positive improvement in performance (bias) as the mean Session 2 compared to Session 1 difference was 13 s higher (Fig. 2). These scores were evenly distributed above and below the bias line indicating that the variance was not influenced by the size of the mean. The standard deviation of the differences was 64 s for the control group and the 95% LOA was 128s.
The FIT-HaNSA performance differed significantly between the WAD2 and control groups (Task 1 (F = 53.3, df = 1, 64, P < 0.001), Task 2 (F = 42.0, df = 1, 64, P < 0.001), Task 3 (F = 49.8, df = 1, 64, P < 0.001)) and the total FIT-HANSA scores (F = 62.6, df = 1, 64, P < 0.001). Based on these findings the FIT-HANSA total score can be considered to have good known group construct validity.
Spearman rank correlations between the FIT-HaNSA, NPRS, NDI, DASH, and CROM scores for the WAD2 group for Session 1 are presented in Fig. (3). Of the 78 correlations, most (59) were poor (p < 0.4). The NDI - DASH and CROMrr - CROMlr scores had good correlations (p > 0.75) as did the correlations between the total and three Task FIT-HANSA scores (except Task1 - Task 3, which was fair).
Pairwise scatterplots (upper tableau) and Spearman rank correlations (lower tableau) for the total and Task 1, 2, and 3 FIT-HaNSA, NPRS, NDI, DASH, CROMrr, CROMlr, CROMf, CROMe, CROMrb, and CROMlb scores for the 25 WAD2 participants assessed at Session 1. The correlations are color coded as red = low (|ρ| < 0.4), amber = moderate ( 0.4 ≤ |ρ| ≤ 0.7) and green = high (|ρ| > 0.7). Each outcome’s distribution is also plotted as a histogram along the diagonal. Abbreviations defined in Table 2.
This study found that the FIT-HaNSA has fair to good within and between raters’ reliabilities, for both WAD2 and control participants and can discriminate between WAD2 and control participants. It showed poor concordance with pain, ability and impairment measures, suggesting that it measures a different aspect of outcome. The study provides further support to the preliminary literature regarding the reliability and validity of the FIT-HaNSA that were conducted in patients with shoulder pathology [11, 12].
The results of the study can be generalized to patients with chronic WAD2 attending an outpatient clinic. The WAD2 population was consecutively sampled which minimized volunteerism and other selection biases that may call into question representativeness . Patients with WAD2 can exhibit substantial heterogeneity in clinical presentation . Since only 25 individuals with WAD2 were recruited, variations in test results may have been further inflated. The control sample was recruited across two sites (hospital and university setting) which improves the generalizability of the study results.
The raters and the WAD2 and control participants were blinded to the FIT-HaNSA scores between sessions to minimize attempts to match or improve performance. The test-retest interval of 2 to 14 days used in this study is considered acceptable for patients with musculoskeletal diagnoses . This interval allows the participant the time to recover from potential muscular fatigue and pain yet close enough to mitigate real change in upper extremity function. Examination of the NPRS, NDI, DASH and CROM values suggest that the participants did not change between the two sessions.
Clinicians are interested in how WAD2 participants change over time. The MDC90 provides a metric when a clinician is 90% confident that a true change occurred. The MDC90 associated with the WAD2 group was 176 s (20 %). This suggests that a WAD2 participant’s total FIT-HaNSA score must change 176 s between test sessions to reflect a true change. This required change of 20% of the total score is consistent with the change values associated with the NPRS, NDI and DASH that range from 7 % to 30% [13, 15, 18, 20, 33, 34]
The control group performed better as reflected in the three tasks and total FIT-HaNSA scores. Both groups performed poorest on Task 2. This result is similar to what has been observed previously in shoulder pathology groups [11, 12]. It has been hypothesized that Task 2 is the most difficult for shoulder impingement patients because it moves the injured shoulder into the “impinged position” [11, 12]. In WAD2 and control groups, Task 2 may be difficult because Task 1 “pre-fatigues” the upper quadrant musculature. Task 1 was included within the FIT-HaNSA performance test to challenge the shoulder girdle musculature prior to subsequent testing and reduce potential floor effects . This approach is effective in that WAD2, shoulder pathology, and control groups perform best on Task 1 and no floor scores were reported [11, 12].
Clinical observation suggests that patients with WAD2 often cite that over-head tasks are difficult due to pain in the neck region and that they have decreased cervical extension ROM. It is reasonable to assume that WAD2 participants would have the greatest difficulty with Task 3 due to the requirement of “looking up” for 300 s. However, the study results indicate that Task 2 (repetitive upper extremity activity) was more challenging than Task 3. A similar finding was reported for participants with shoulder pathology  and in patients with chronic neck pain  suggesting that the interaction between the upper fibers of trapezius and the serratus anterior muscles are compromised during the performance of Task 2 in the presence of chronic pain . Additionally, our WAD2 patients were observed to bilaterally shrug the shoulders to alter the mechanics of the neck and shoulder musculature and decrease the amount of neck extension during Task 3.
The Correlations Between the FIT-HaNSA and the NPRS, NDI, DASH and CROM were Generally Poor (ρ < 0.4)
The relationships between the FIT-HaNSA, NPRS, NDI, DASH and CROM scores reinforce the theory that the relationship between physical performance, pain, ability and impairment, whether they are determined by self-perception or actual performance, are varied and complex. Previous research in shoulder pathology groups has presented similar findings [11, 12]. Patients may either over or under estimate functional ability. For example, back pain patients with depression demonstrate a tendency to underestimate their functional ability but it had no significant effect on treadmill walking ability . The results of this study substantiate the concept that pain, ability and impairment measures should be used in conjunction with a physical performance test when evaluating patients with WAD2 as they all provide information about different aspects of human health secondary to injury.
There are some limitations associated with this study. It was intended to achieve a sample of 40 WAD2 and 40 control participants to achieve acceptable power . Dividing the participants among inter-rater and Intrarater groups further reduced the required sample sizes. The number of sessions could have been increased, which would have reduced the required sample size, but due to clinician time constraints, this was not possible. A more reasonable solution may be to evaluate the reliability of the protocol amongst raters who might be more apt to administer the test, such as kinesiologists and physical therapy assistants. This study was also limited in that the relationships between FIT-HaNSA and NPRS, NDI, DASH and CROM were examined at two closely separated times.
The results of this study indicate that the total FIT-HaNSA score can be reliability measured when used with patients with WAD2. The scores of the three tasks of FIT-HaNSA have fair to good within and between raters’ reliabilities for patients with WAD2. A clinician can administer FIT-HaNSA with confidence and interpret scores in a meaningful manner. The total FIT-HaNSA score discriminates between WAD2 and control participants demonstrating known group construct validity. An interpretation of the FIT-HaNSA convergent construct validity when used with WAD2 patients shows that most relationships between pain, ability and impairment are poor suggesting the continued use of a variety of assessment techniques in patients with WAD2.
LIST OF ABBREVIATIONS
CONFLICT OF INTEREST
The authors confirm that this article content has no conflict of interest.