Inter-Rater reliability of the Functional Movement Screen (FMS) amongst NHS Physiotherapists

Objectives: The primary objective of this study was to determine the inter-rater reliability of the FMS when used by novice NHS (National Health Service) Physiotherapists. The secondary objective was to determine whether clinical specialism has any impact on the inter-rater reliability of the FMS. Design: Reliability study. Methods: Forty participants with no recent MSK injury were video recorded completing the 7 component FMS tests. Six NHS Physiotherapists with no previous experience using the FMS attended a 2 hour training programme delivered by a certified FMS practitioner. Raters then viewed and scored videos of the 40 participants completing the FMS. Results: The inter-rater reliability of the FMS composite score was excellent (ICC of 0.82 (95% CI: 0.410.93)).Non-specialist rotational Physiotherapists demonstrated excellent inter-rater reliability (ICC = 0.89, 95%CI 0.78-0.94), whereas the specialist musculoskeletal Physiotherapists demonstrated good interrater reliability (ICC = 0.79, 95%CI 0.19-0.92) for FMS composite score. The seven individual movement tests of the FMS demonstrated poor to excellent inter-rater reliability. The Hurdle Step was the least reliable of the movement tests (kw = 0.15, 95% CI: -0.09-0.38), whereas Shoulder Mobility was the most reliable of the movement tests (kw= 0.85, 95%CI: 0.72-0.97).The seven individual movement tests of the FMS demonstrated moderate to excellent inter-rater reliability between non-specialist rotational Physiotherapists. In contrast, the seven individual movement tests of the FMS demonstrated poor to excellent inter-rater reliability between specialist musculoskeletal Physiotherapists. Conclusion: The FMS represents a good attempt to objectify the subjective with the FMS composite score demonstrating excellent inter-rater reliability Due to poor construct validity, it has been suggested only component scores should be utilised. The results from this study suggest that five of the seven individual movement tests do not demonstrate acceptable reliability for clinical use. With the composite score lacking construct validity and the majority of the component scores lacking both intraand inter-rater reliability, the continued use of FMS within clinical practice is not supported.


Introduction
Traditional approaches to musculoskeletal examination and treatment have focused on isolated methods such as joint range of movement, muscle strength and muscle length [1]. This isolated approach however, fails to encompass the entire kinetic or kinematic chain whilst also ignoring the role of the nervous system in movement; 'the brain knows nothing of individual muscle action, but knows only of movement' [2].
Approaches to rehabilitation are currently changing with a move away from attempting to target isolated muscle groups CrossMark ← Click for updates doi: 10.7243/2055-2386-5-17 or joints, in favour of moving towards a more comprehensive analysis and assessment of movement [3]. It has been proposed that rehabilitation approaches that attempt to maximise cortical neuroplastic change, may provide the greatest potential for rehabilitation success [4]. In order to achieve this, cognitive effort and motor skill training of meaningful movements or tasks is required; a potential limitation of the current bed-based methods of assessment and treatment [5].
This has been well demonstrated where static hip extension range measured using the Thomas' Test was not reflective of the peak hip extension seen during running, showing a low correlation between static and dynamic measures of hip extension [6]. It is therefore apparent that what happens during dynamic movement is not a reflection of what happens during clinical testing. Furthermore, improvements in hip flexibility did not transfer to increased mobility during dynamic movement [7]; changes in passive range of motion do not therefore automatically transfer to changes in movement.
An isolated rehabilitation approach following injury is not sufficient and an injury in a single area of the body, can adversely affect regions away from the injury site [8][9][10]; this is known as regional interdependence. Regional interdependence refers to how or why an injury or habitual movement patterns in one area of the body may be contributing to altered movement patterns in another region of the body [11].
Previous pain or injury can affect future movement due to the proprioceptive representation of the involved body part in the primary sensory cortex changing [12]. This in turn may have future implications for motor output and motor control as these representations are the maps utilised to plan and execute movement; if the representation of a body part is inaccurate, then motor control may be compromised, potentially creating a feedback look of mutual degradation [12,13].
Utilising standardised, validated outcome measures is an explicit requirement of the Quality Assurance Standards [14] and the HCPC Standards of Proficiency for Physiotherapists [15]. It is therefore apparent that a valid and reliable outcome measure that assesses multiple aspects of function simultaneously is required [8]. The Functional Movement Screen (FMS) is gaining notoriety among clinicians as an outcome assessment tool to quantify movement patterns [3,16] with the developers of the FMS describing the tool as a 'ranking and grading system that documents movement patterns that are deemed key to normal function [17]. The FMS consists of 3 screening tests and 7 component tests; the Deep Squat (DS), Hurdle Step (HS), Inline Lunge (ILL), Shoulder Mobility (SM), Active Straight Leg Raise (ASLR), Trunk Stability Push-UP (TSPU) and Rotary Stability (RS). These are then scored using a categorical scale ranging from 0-3, resulting in a potential overall score ranging from 0-21. The scoring method is described in detail in our previous study [20] and for a detailed description of how each component/ screening movement is performed, the reader is directed to Cook et al (2010) [17].
Prior to considering the validity of an assessment tool, it is important to first determine whether it is reliable [18]. The FMS consists of three clearing tests to determine the presence of pain, graded as positive or negative and seven individual movement tests that are each graded on a scale of 0-3, with a composite score calculated when all seven movement tests are completed with a total available score of 21 [3]. A score of 3 corresponds with the individual being able to correctly complete the movement without compensation; a score of 2 corresponds with the individual being able to perform the movement with compensation; a score of 1 corresponds with the individual being unable to perform the movement; a score of 0 corresponds with the individual reporting pain at any time when performing the movement [19]. A recent study conducted by Palmer, Cuff and Lindley (2017) questioned the intra-rater reliability of the FMS amongst NHS clnicians; both specialist musculoskeletal (MSK) and rotational Physiotherapists [20]. Furthermore, the inter-rater reliability is unclear with reports ranging from fair-excellent [16,21] and has not been determined amongst UK public sector clinicians [20].
Currently, the inter-rater reliability of the FMS has not been determined amongst clinicians working specifically within the UK public health sector, limiting the general is ability of previous studies where raters have either been Athletic Trainers, Private Physiotherapists or Students. Additionally, only one study has investigated the reliability of the FMS amongst novice raters [22], however, the 20-hour intensive rater training and use of only Physiotherapy students limits application to clinical practice. The samples of populations used in these studies to date are grossly homogenous utilising healthy physiotherapy students [19,21], military recruits [22], or high level athletes [16,23], further limiting application to clinical practice and potentially affecting the reliability coefficient produced [18].
When assessing the reliability of a scale, it is important that intra-rater reliability is determined prior to assessing inter-rater reliability [24]. Inter-rater reliability represents all potential errors encountered within intra-rater reliability as well as the potential for error between raters. Therefore, a scale with excellent inter-rater reliability would be suggestive of high intra-rater reliability however; a scale with excellent intra-rater reliability does not necessarily suggest high interrater reliability [24].
The intra-rater reliability of the FMS when used by novice National Health Service (NHS) Physiotherapist raters has been shown to be excellent (ICC 0.91; 0.89-0.93 95%CI) as part of a concurrent study [20].
Therefore, the primary objective of this study was to determine the inter-rater reliability of the FMS when used by novice NHS Physiotherapist raters. The secondary objective was to determine whether clinical specialism has any impact on the inter-rater reliability of the FMS.

Methods Participants
A purposive convenience sample of 40 participants was doi: 10.7243/2055-2386-5-17 obtained through word of mouth, recruitment posters and University email that were recruited over a three-month period from Sheffield, South Yorkshire. Participants were considered eligible for inclusion if they were over 18 years of age, had the ability to, and were willing to adhere to trial procedures. Participants were excluded if they refused participation, reported pain on any of the three FMS clearing tests, answered 'Yes' to any questions on the PAR-Q, had received treatment for, or reported having any musculoskeletal pathology in the last six weeks, were pregnant, had a cardiac medical history, hypertension or neurological impairment ( Table 1).
Potential participants had the study explained to them and were informed of the entrance criteria. Each individual completed a PAR-Q questionnaire to assess safety and suitability for inclusion; the PAR-Q is designed to identify adults for whom physical activity might be inappropriate without seeking prior medical advice [25]. Potential participants were then asked to complete the three FMS clearing tests consisting of the impingement, prone press-up and posterior-rocking clearing tests [17].
Those individuals who met the entrance criteria were then provided with an information sheet and given the opportunity to volunteer, before signing the informed consent form and being entered into the study. Participants provided informed consent for inclusion in both the intra-rater [20] and inter-rater reliability studies.

Raters
A purposive judgmental sample of six NHS Physiotherapists without prior FMS experience was obtained through word of mouth and recruitment posters from three NHS trusts in South Yorkshire, West Yorkshire and Derbyshire, UK. The raters consisted of three specialist musculoskeletal and three nonspecialist rotational Physiotherapists. Eligible Physiotherapists who met the relevant inclusion/ exclusion criteria according to their level of specialism (Tables 2 and 3) completed an  informed consent form and were entered into the study.

Training
The six raters attended a two-hour training session in order to introduce and train them in using the FMS. This session was delivered by a certified FMS practitioner and consisted of an introduction to the FMS, the seven individual movement tests and the three relevant modifications for the RS, DS and PU tests. The raters were shown the three FMS clearing tests and informed that these tests were part of the exclusion criteria for eligible participants. The raters were given a detailed explanation of the 0-3 categorical scoring system used for each individual movement test and were shown two example videos to gain familiarity with the FMS; Both videos were in the format to be used in the rating session to ensure familiarisation with study procedures and the scoring system.

Procedures
Participants were recorded from both sagittal and coronal views using two Sony HDR-XR260 (Sony Corporation, Minato, Tokyo, Japan) camcorders whilst completing the seven individual test movements. Camera placement is illustrated in Figure 1 and followed the video method previously validated by Shultz et al (2013) [16].
The seven individual test movements were then explained and demonstrated to the participants by a clinician experienced in the use of the FMSusing an official FMS Test Kit (Functional Movement Systems Incorporated, Chatham, Virginia, USA).
Each participant was also filmed completing the relevant modifications for the DS, PU and RS tests. Participants repeated each movement test three times in accordance with the FMS protocol as described by Cook [20].

Rating sessions
Each rater was set up on two individual computers. Computer one showed a continuous video of the all (non-modified) FMS movementsfor each participant; Computer two contained the modifications of the DS, PU and RS tests. This was done to allow the raters to easily transition between the continuous (non-modified) FMS video and viewing the modification videos when applicable. The rater was instructed to only view the DS, PU or RS modification video if they felt that the participant did not score a '3' in that movement test. To prevent raters from analysing still images of FMS component tests, they were only permitted to pause the continuous FMS videos during introductory/ transitionary screens [20]. This was to allow for breaks as required and to allow transition to observe modification videos on computer two.
Raters completed two rating sessions, two weeks apart. This 2 week wash-out period was used to minimise the potential for recollection of previous scoring which could have biased results. To further minimise bias, the order in which raters viewed the participant's videos between the two rating sessions was randomised using online software [26] and raters were blinded to their previous scores as well as those of the other raters.

Statistical analysis
Weighted kappa (k w ) statistic with quadratic weighting and 95% Confidence Interval (95% CI) was used [27] to determine inter-rater reliability of components scores.
The inter-rater reliability of composite FMS scores was assed using an Intraclass Correlation Coefficient (ICC) and 95% CI and were assessed for normal distribution using the Kolmogorov-Smirnoff test [27].
Data was analysed usingMedCalc for Windows (Version

Results
Forty participants (20 Males, 20 Females) met inclusion/exclusion criteria and completed the study. Three participants (2 Males, 1 Female) withdrew from the study before rating had commenced due to scheduling difficulties. The mean age of the participants was 28.9 (Range 18.9-60.3; SD+/-11.65) years. Participants included demonstrated a wide range of routine physical activity on University of California at Los Angeles (UCLA) Activity Scoring [29] (range 3-10; mean 7.75+/-2.4) ( Table 4).
The total FMS scores of all participants were within a range of 5-19. The Kolmogorov-Smirnoff test showed that the FMS composite scores were not normally distributed across the six raters (p<0.001).
The inter-rater reliability of the FMS composite scores re- sulted in an ICC of 0.82 (95% CI: 0.41-0.93) and was considered excellent ( Table 5).
The seven individual movement tests of the FMS demonstrated poor to excellent inter-rater reliability ( Table 6). The HS was the least reliable of the movement tests (k w =0.15, 95% CI: -0.09-0.38), whereas the SM was the most reliable of the movement tests (k w =0.85, 95%CI: 0.72-0.97).
The seven individual movement tests of the FMS demonstrated moderate to excellent inter-rater reliability between non-specialist rotational Physiotherapists (

Discussion
The primary aim of this study was to determine the inter-rater reliability of the FMS when used by novice NHS Physiotherapists. moderate-excellent reliability [28]. The findings of this study correspond with the current evidence base with regard to the inter-rater reliability of the FMS composite score. Previous studies have demonstrated inter-rater reliability for the composite score to be fair-excellent [21,23,31]; and Smith et al. (2013) demonstrated excellent inter-rater reliability [19].
Fair inter-rater reliability has been demonstrated by Shultz et al (2013) (Kα=0.38, 95%CI 0.35-0.41). The participants used in this study represent a small, homogenous sample of elite athletes with the total scores ranging from 14-20 [16]. The Krippendorffα (Kα) statistic calculates the reliability based upon the differences in the observed range of scores included in the analysis. The use of the ICC however, which is ubiquitous elsewhere in the literature investigating inter-rater reliability ( Table 10) focuses on the ratio of total variance and variance between groups; thus the use of the Kα may explain this anomalous result in the literature [33].
The clinical utility of the FMS composite score has been questioned [32]. The authors conducted a factor analysis of the FMS amongst a large sample of 934 high-level athletic participants and demonstrated that the FMS does not represent a unidimensional construct and that each individual movement test may be measuring a separate multidimensional construct. When the heterogeneity of the individual movements is considered, these findings are not surprising. The authors concluded therefore that each individual movement test may offer greater clinical utility than the composite score, recommending the use of the composite score with caution as it is not clear what the FMS composite is aiming to measure.
Results from Kazman et al. (2014) are interesting to consider within the context of this study and the findings of previous studies of FMS inter-rater reliability [32]. The inter-rater reliability of the FMS component scores ranged from poor to excellent (Table 6), with five of the seven individual movement tests demonstrating less than an acceptable level of reliability; acceptable reliability defined as >0.70 [30]. With the utility of the FMS composite score being questioned and some authors suggesting use of the composite score with caution [32], the findings of this study, which are in accordance with    [21]. This is in contrast to the findings of this present study; despite both studies being moderate quality, they are limited by the inclusion of only one pair or raters and unclear blinding of raters, and the utilisation of a un-weighted kappa statistic [31]. This represents a strong limitation of the study as the un-weighted kappa statistic is appropriate for determining the reliability of nominal data, and as such will only demonstrate whether raters agree or disagree, and do not take the level of disagreement into consideration [18].
The current literature (Tables 6 and 10) does not demonstrate any of the component movement tests to consistently demonstrate an acceptable level of reliability [34]. With the apparent variability and inconsistency in the reliability coefficients the clinical utility of the individual movement tests is advised with caution.
The secondary aim of this study was to determine whether clinical specialism impacts on the inter-rater reliability of the FMS when used by NHS Physiotherapists. In contrast to our findings relating onthe intra-raterreliability of the FMS [20], non-specialist rotational Physiotherapists showed excellent inter-rater reliability for composite scores (ICC=0.89, 95%CI 0.78-0.94) in comparison to musculoskeletal specialists who demonstrated good inter-rater reliability (ICC=0.79, 95%CI 0.19-0.92). The clinical implication of this is that the FMS can be used by both specialist and non-specialist Physiotherapists with acceptable reliability.
There was no consistent difference between the nonspecialist rotational Physiotherapists and the specialist musculoskeletal Physiotherapists with regard to the component scores (Tables 8 and 9). This is again in keeping with our findings of the intra-rater reliability of FMS component scores [20].
The developers of the FMS propose that the FMS is a 'ranking and grading system that documents movement patterns that are key to normal function' , labelling the individual movement tests as 'functional' [17]. However, what is functional for one individual may not be functional for another individual; the FMS is limited as it attempts to define function for the individual rather than letting the individual define their function to the clinician [35]. Furthermore, the FMS assumes that there is a correct way to move and any deviation from this is labelled as abnormal, a dysfunction that needs 'correction' [17] this is in contrast to good quality evidence suggesting that variability is a common feature of human movement and is intrinsic to all biological systems [36] whilst omitting the important role that individual psychology and cognition has upon human movement [37]; Subsequently, it would appear that the FMS is engrained within the biomedical model.
It has been suggested that it is currently not possible to identify the complete optimal solution for a given movement and that clinicians need to demonstrate more individualised clinical assessment procedures [38]. Furthermore, variability of movement is inherent and unavoidable in individuals due to the constraints that shape behaviour [39]. Variations of movement within individuals and deviations in movement between individuals may replicate attempts to maximise the variability to help the individual adapt to the demands of any given situation depending on the environment. The recognition of this variability and the understanding that movement tests are not always achieved in the same way both within and between individuals suggests the need to re-evaluate the application of the biomedical model in assessing movement [39]. The question is therefore, is the FMS testing the function of the individual as the developers propose, or is the FMS merely testing the seven component movement tests? All movement is a skill [40] andextensive motor practice can reorganise movements so that individuals acquire that skill and in turn become better at that movement [41]. By removing context, the environment and therefore defining function for the individual, a low score on one of the individual FMS movement tests may actually represent an unpractised skill as opposed to a faulty movement pattern or dysfunction. Consequently, does change in FMS score within individuals over time or following 'corrective' exercise prescription [17] demonstrate correction of the limitations that the FMS is proposed to identify or learned effects as the individual becomes more practiced at the individual movement test?
The need for a valid and reliable outcome measure that assesses multiple aspects of function simultaneously is apparent [8]. Currently, the literature does not support the use of the FMS as this outcome measure. The clinical application therefore is the need for clinicians to demonstrate a more individualised clinical assessment when analysing an individual's movementto understand the context, psychology and biomechanics of an individual's function in order to determine how the individual interacts and performs their functional movement [39].
This study has contributed to the reliability literature of the FMS by addressing limitations in previous research investigating inter-rater reliability. Previous studies participants have been grossly homogenous, whereas participants in this study were more representative of the population seen in routine clinical practice [42] and demonstrated broad variability in both age and routine activity levels ( Table 4). Although the sample of participants used should be regarded as a strength of this study, the exclusion of symptomatic individuals limits the representation of clinical practice; Future studies should consider the inclusion of individuals with injuries.
This study recruited 40 participants; a sample of 40 participants is required to achieve an ICC >0.6 if the measure is truly reliable [43] and only one previous study has included more participants when investigating the inter-rater reliability of doi: 10.7243/2055-2386-5-17 the FMS [22] however, the raters did not rate all participants (n=64), each rater only measured between 14-18 participants. Therefore, with all raters in this present study rating each participant it can be concluded that this is the largest reliability study investigating the inter-rater reliability of the FMS.
Another limitation within this investigation is that a standardised video method was utilised to record participants completing the FMS, and though this video method has previously been validated [16] the use of this method may not replicate routine clinical practice and therefore represents a potential limitation of the study.
Ordinal data is regarded as non-parametric data, nonparametic data can behave like parametic data if the data set is normally distributed [44,45]. The data set in this study was not normally distributed (KS p<0.001) and therefore the use of an ICC may potentially represent an over inflation of the inter-rater reliability of FMS composite scores. It is not apparent from the current literature whether any previous investigators have assessed for normal distribution of the data before utilising an ICC so this suggests a potential limitation of the inter-rater reliability evidence as a whole.
It is apparent that for the FMS to be continued to be used clinically, further research is required. Future studies should aim to replicate the results of Kazman et al. (2014) with regard to determining the dimensionality of the FMS within a more heterogeneous sample population [32]. With the use of the composite score appearing to lack clinical utility, there is a need to examine the construct validity of the FMS. Additionally, the reliability literature is limited by small sample sizes and as such, wide confidence intervals and in turn poor precision with regard to reliability coefficients. Future research may consider investigating the reliability of each individual movement test and should consider the utilisation of guidelines to improve the quality of reporting [43,46]. Investigations of the reliability of each individual movement test would be time efficient and allow the use of large sample sizes not currently evident in the literature [43,46].

Conclusion
The FMS represents a good attempt to objectify the subjective with the FMS composite score demonstrating excellent inter-rater reliability. However, it does not appear that the FMS composite score offers much in the way of clinical utility and therefore should be interpreted and utilised with caution.
The inter-rater reliability of the FMS component scores is varied and inconsistent in the literature with the results from this study suggesting that five of the seven individual movement tests do not demonstrate acceptable reliability for clinical use; supporting the findings from our previous paper [20].
With the composite score lacking construct validity and the majority of the component scores lacking both intra-and inter-rater reliability, the continued use of FMS within clinical practice is not supported.