Measuring health workers’ motivation composition: validation of a scale based on Self-Determination Theory in Burkina Faso

Background Although motivation of health workers in low- and middle-income countries (LMICs) has become a topic of increasing interest by policy makers and researchers in recent years, many aspects are not well understood to date. This is partly due to a lack of appropriate measurement instruments. This article presents evidence on the construct validity of a psychometric scale developed to measure motivation composition, i.e., the extent to which motivation of different origin within and outside of a person contributes to their overall work motivation. It is theoretically grounded in Self-Determination Theory (SDT). Methods We conducted a cross-sectional survey of 1142 nurses in 522 government health facilities in 24 districts of Burkina Faso. We assessed the scale’s validity in a confirmatory factor analysis framework, investigating whether the scale measures what it was intended to measure (content, structural, and convergent/discriminant validity) and whether it does so equally well across health worker subgroups (measurement invariance). Results Our results show that the scale measures a slightly modified version of the SDT continuum of motivation well. Measurements were overall comparable between subgroups, but results indicate that caution is warranted if a comparison of motivation scores between groups is the focus of analysis. Conclusions The scale is a valuable addition to the repository of measurement tools for health worker motivation in LMICs. We expect it to prove useful in the quest for a more comprehensive understanding of motivation as well as of the effects and potential side effects of interventions intended to enhance motivation. Electronic supplementary material The online version of this article (doi:10.1186/s12960-017-0208-1) contains supplementary material, which is available to authorized users.


Background
Recent years have witnessed an increased awareness of the paramount importance of a motivated health workforce for the functioning of health systems, particularly in countries burdened by severe resource limitations [1]. Interventions targeting health worker motivation such as performance-based financing (PBF) have become extremely popular among policy makers in low-and middleincome countries (LMICs) [2,3]. Despite the attention such interventions are receiving, gaps in understanding remain. In particular, the mechanisms through which interventions bring about motivational changes and potential side effects thereof remain poorly understood [4][5][6][7][8]. For instance, there is an ongoing debate around whether the monetary incentives involved in PBF undermine intrinsic motivation ("crowding out effect") [5].
The limited availability of context-adapted research tools to study motivation is a major factor contributing to this knowledge gap. Research on health worker motivation in LMICs has mostly focused on the overall amount or on determinants and outcomes of motivation, leaving other relevant dimensions discussed in the psychological literature such as motivation composition relatively unexplored [4,9]. Corresponding quantitative measurement tools (e.g., [10][11][12][13]), while without doubt useful to answer many research questions, are not suited to others, including that around the crowding out effect which deals with a shift in motivation composition from intrinsic to extrinsic forms.
Against this background, this article contributes to expanding the methodological repository for health worker motivation research by presenting evidence on the construct validity of a newly developed psychometric scale to measure health worker motivation composition. We define motivation composition as the extent to which motivation of different origin within and outside a person contributes to their overall work motivation. The scale is theoretically grounded in Deci and Ryan's Self-Determination Theory (SDT) [4,14] and was developed for use in questionnaires or structured interviews. It assesses general motivation towards work rather than task-or situation-specific motivation. The article presents evidence for the scale's validity from a structured survey with nurses in Burkina Faso. Table 1 contains our specific research questions.
The self-determination continuum of motivation Self-Determination Theory was introduced in the mid-1980s as a general framework of human motivation [14] and has since been extensively studied and further refined [15]. As part of the overall theory, SDT proposes the selfdetermination continuum of motivation (Fig. 1), a taxonomy of five major dimensions of motivation that are distinguished by the extent to which they stem from contingencies outside the person (controlled motivation) or originate within the person (autonomous motivation) [16]. The scale validated in this article measures these five motivation dimensions. Motivation originating fully within the person, such as pure enjoyment of a task, is termed intrinsic motivation in SDT. Extrinsic motivation, in contrast, refers to motivation derived from an instrumental purpose of behavior. External regulation corresponds to what is usually referred to as extrinsic motivation: the wish to attain or avoid some consequence. SDT differentiates three additional dimensions of extrinsic motivation by the degree to which the associated contingencies have become part of the person's self: introjected regulation refers to motivation derived from self-pride, reputation, or feelings of duty, identified regulation to motivation driven by recognition of the importance of one's job, and integrated regulation to full congruency between one's personal goals and values and those of one's job. They differ from external regulation in that they do not need to be maintained from the outside through rewards or punishment. However, they are not fully intrinsic as corresponding behavior is instrumental in catering to a person's set of values and goals rather than performed out of pure interest or enjoyment. A large body of research has linked autonomous forms of motivation to more favorable performance and other outcomes (e.g., wellbeing, organizational commitment) than controlled forms of motivation [9,15,17].
The validity and usefulness of the SDT taxonomy has been confirmed in a wide range of work settings, although mostly in North America and Europe [15]. However, the few studies from LMIC (non-healthcare) settings [18] and the (non-SDT-based) literature on health worker motivation in LMICs suggest its validity in LMIC healthcare contexts as well. Specifically, sources of motivation identified by the latter correspond well to the five dimensions differentiated by the SDT taxonomy (e.g., [4,7,8,[10][11][12][13][19][20][21][22][23][24][25]). For a theoretical application of SDT and the taxonomy to LMIC healthcare settings, see [4].

Study context
Burkina Faso's healthcare delivery system relies primarily on the public sector which manages approximately 80% of healthcare facilities. Primary healthcare services are mostly provided by nurses, midwives, and assistant nurses and midwives. Like many other LMICs, Burkina Faso's health system suffers from multiple challenges including a shortage of certain health worker cadres, their unequal geographical distribution, and challenging working conditions including low pay, substandard infrastructure and equipment, poor supervision, shortages in drugs and other supplies, and few incentives for individual high performance [26][27][28]. In 2014, the Ministry of Health with support from the World Bank implemented a PBF pilot intervention to strengthen the healthcare system by addressing some of these challenges. Our study took place in the context of the impact evaluation of this intervention.

Motivation composition measure
The psychometric scale to measure motivation composition was developed by our research team prior to the validation study presented in this article. A detailed description of this process can be found in Additional file 1. A pretest confirmed the scale's content validity, supporting the validity of the SDT taxonomy in the context and affirming that the items cover the five motivation constructs well and in context-appropriate language.
Similar to other SDT-based measures (e.g., [18,29]), the scale's measurement rationale is grounded in the idea that individuals will reveal their underlying motivation Fig. 1 The self-determination continuum of motivation. Legend: adapted from [15,16] composition in the reasons for the actions they provide. Following an introduction, a reflective exercise, and a guiding question ("Why are you motivated to work?"), respondents are thus presented with 26 reasons for which they might be motivated to work (4-12 per motivation dimension; see Additional file 1). They are asked to indicate, on an 11-point scale and with a visual aid, the degree to which each of these reasons are important for their personal work motivation. Respondents' answers are then used to derive an estimate of their underlying motivation level on the five dimensions.
In order to counteract diverse response biases, we used a hybrid mode of administration, with interviewers reading out instructions and items but interviewees recording their own answers on a separate questionnaire copy. The questionnaire was administered in French in light of the high French proficiency level of Burkinabé health workers. One explicit aim of the validation analyses was the selection of a subsample of items for a final shorter and easy-to-administer scale.

Sample
We assessed the scale's validity with data from a structured health worker survey implemented between October 2013 and March 2014 in the context of the abovementioned PBF impact evaluation baseline. The sampling strategy was aligned with the cluster sampling strategy of the impact evaluation accordingly [30]. Data was collected from approximately two thirds of all government health facilities in 24 districts of six regions of the country. Research assistants were instructed to interview all nurses, midwives, and assistant nurses and midwives in 498 primary as well as selected staff in 24 secondary-level facilities present on the day of the study team visit. Fifty-five per cent of all nursing and midwifery staff were on duty and present on the day of facility visit. Of those, interviewers were able to interview approximately 80%, resulting in a total sample size of 1142 (per facility: mean = 2.2, sd = 1.6, min = 1, max = 11). In addition to the motivation scale, the survey contained questions on training, clinical knowledge, compensation, and working conditions. Data was collected on paper and digitalized using a double data entry strategy. Table 2 shows the sample distribution on key characteristics.

Structural validity analyses
The structural validity analyses (research question 1 (RQ1)) aimed to confirm that the scale measures the motivation dimensions of the SDT continuum as intended. We first conducted a thorough integrated semantic and psychometric item analysis (including inspection of item distribution and correlation patterns; in Stata 12), in response to which we excluded 8 items from the initial 26item scale due to suboptimal psychometric properties or phrasing (see Additional file 2). The remaining 18 items were subsequently subjected to a confirmatory factor analysis using structural equation modeling (SEM). In line with standard SEM terminology, we refer to the five motivation dimensions as "factors" from here forward. We tested the five-factor model corresponding to Fig. 1 against the three theoretically viable alternative models in Table 3, which emerged as alternative taxonomies during the scale development process or have shown good model-data fit in previous research (e.g., [18,31]). All modeling was performed with Mplus 7.31, using a maximum likelihood estimator with robust standard errors to account for our non-normal data distribution. Standard errors were adjusted according to the clustered sample structure. Missings were handled with Mplus' standard full information procedure. All factors were allowed to covary. No cross-loadings or correlated item residuals were specified to facilitate interpretation in light of potential use of the scale with composite scores. Models were evaluated with standard fit indices, including χ 2 , comparative fit index (CFI), standardized root mean square residual (SRMR), and root mean square error of approximation (RMSEA), and compared to each other with the Akaike information criterion (AIC) [32]. For the best-fitting model C, we inspected all model parameters, Mplus' modification indices, factor correlations, and Cronbach's α. We eliminated 3 further items in the model-fitting process (see Additional file 2), arriving at the final 15-item scale in Table 4. All results presented in this article are based on this final 15-item scale.

Generalizability analyses
The generalizability analyses aimed to confirm that the scale measures the same motivation dimensions equally well in different sample subgroups ("measurement invariance"; RQ2). This is a necessary requirement for later substantive analyses aiming to compare motivation across different health worker subgroups. Specifically, we tested the scale for invariance across sexes, seniority levels, and qualification levels. Following the steps outlined in Table 5 [33], Model C was simultaneously estimated in each respective subgroup, with an increasing number of parameters restricted to equality between subgroups in each testing step. The scale is a measurement invariant at each level when the added equality restrictions do not lead to significantly worse model fit   [34].

Convergent/discriminant validity analyses
The convergent/discriminant validity analyses aimed to provide further evidence that the scale measures the SDT taxonomy as intended by relating motivation with constructs for which relationships with the SDT motivation dimensions are relatively well established (RQ3). If the new scale does indeed measure what it is intended to measure, relationships with external constructs should approximately correspond to those found in previous research, contextual differences taken into consideration. Specifically, we related motivation to organizational support, organizational commitment, and intentions to quit.
Details on hypotheses and measurement of the external constructs are provided in Table 6. We built a separate model for each external construct by adding a measurement model for the respective construct to Model C, allowing the external construct factor to covary with all five motivation factors.

Structural validity
The structural validity analyses aimed to confirm that the scale does indeed measure the different motivation dimensions of the SDT continuum. We intended to test the "pure" SDT model ( Fig. 1; Model A in Table 3) against three theoretically viable alternative models. Unfortunately, Model A could not be estimated with the final subset of items as we were only able to retain one integrated

Construct and hypotheses Measurement
Organizational support: extent to which respondents feel supported by their supervisor and coworkers, both technically and emotionally. Intentions to quit were measured with three items partly adopted from [11] (α = .72) Item examples: "I often feel like leaving my job."; "Accepting to work for this facility was a mistake." Response scale: 0 (do not agree at all)-10 (completely agree) with visual aid (analogous to the motivation measure) regulation item. Table 7 presents fit statistics for the three alternative models. Model C, which combines the integrated and identified dimensions but differentiates external regulation into a social and an economic subcomponent, clearly demonstrated the best fit. χ 2 was significant as expected given our relatively large sample size, high factor correlations, and non-normally distributed data [32,35] but of a magnitude that does not warrant concerns for model fit. All other fit indices were good in absolute terms, indicating that the modified five-factor model is well represented in the data. All following results thus pertain to Model C. A graphic representation including standardized coefficients for all estimated parameters as well as modification indices is given in Additional file 2. For each motivation factor, item-factor loadings are of relatively similar magnitude; the items thus indicate the respective factor with similar strength. Modification indices signal that some items, particularly ext6 and ext7, load on factors other than the intended to some extent. Overall, however, such crossloadings are low in magnitude, indicating good item discriminatory power. Although also mostly low in magnitude, modification indices show many residual (error term) correlations, particularly for the external regulation (EXT) items. Factor correlations (Table 8) display the expected simplex pattern, i.e., decreasing magnitude with decreasing conceptual closeness. Cronbach's α is relatively low for all factors.

Generalizability
The measurement invariance analyses aimed to confirm that the scale has the same measurement properties in different subsamples and that measurements (scores, variances, etc.) can thus be compared between health worker subgroups. Table 9 shows the results for sex, seniority, and health worker qualification level. The scale is fully invariant for seniority in healthcare. Only partial measurement invariance could be established for sex. Specifically, women scored higher than men on intro1 and ext6, but lower on intro2, at the same underlying levels of introjected and external regulation, respectively (scalar non-invariance). This raises concerns about factor means comparability for the concerned subscales. However, as intro1 and intro2 are biased in opposite directions in around the same magnitude, we can assume biases to cancel each other out. For ext6, considering that it is only one of four items measuring economic external regulation and the systematic difference in scoring is relatively small, we can also assume that the overall bias is of little practical relevance [33]. We could also establish only partial scalar invariance for qualification level. Item ext7 had a somewhat higher factor loading (i.e., item is more strongly indicative of factor) in fully qualified than in assistant nurses (metric noninvariance). At the scalar level, fully qualified nurses systematically scored higher on intro1, ext7, and im3 and lower on intro2. In similar lines of reasoning as for sex, we can reasonably assume that these systematic differences do not majorly threaten comparability between groups substantially, however.

Convergent/discriminant validity
The convergent/discriminant validity analyses aimed to provide additional evidence that the scale measures what it was intended to measure by relating motivation to other variables with which the relationship is well established. Table 10 shows correlations of the motivation factors with the three constructs introduced in Table 6. Correlation patterns are generally in the expected versus complexity of the model) Legend: IM intrinsic motivation factor, IDEN integrated/identified regulation factor, INTRO introjected regulation factor, EXT external regulation factor, EXT-S external regulation-social factor, EXT-E external regulation-economic factor, AUT autonomous motivation factor, CTRL controlled motivation factor directions, supporting the notion that the scale measures the SDT continuum of motivation well. Organizational support and organizational commitment are more strongly related to introjected regulation than expected based on previous research. Correlations of all motivation factors with intentions to quit are weaker than expected. These findings are likely substantive findings reflecting realities in the specific context rather than being indicative of measurement issues, however [6].

Discussion
The paper presents evidence on the validity of a newly developed scale to measure motivation composition of health workers, i.e., the relative contribution of different kinds of motivation to their overall work motivation, from a sample of nurses in Burkina Faso.
Our findings show that the scale measures a somewhat modified version of the SDT continuum of motivation well and relatively consistent in different health worker subgroups. Specifically, our analyses suggest that the scale is not able to distinguish between integrated and identified regulation. This finding is in line with what emerged during the scale development process and with previous attempts to measure the SDT continuum [18,29]. From an applied perspective, not distinguishing the two dimensions is even advantageous insofar as policy implications are similar and interpretation thus facilitated. Our analyses further suggest to separate external regulation into a social dimension, including aspects of social interaction and recognition, and an economic dimension, pertaining to the economic security one's job provides. Again, such a distinction is sensible  Legend: Interpretation of the absolute model fit indices [32]: Insignificant χ 2 values indicate good model-data fit. However, due to a number of conceptual and statistical issues, χ 2 is often significant even in the case of relatively good model fit. CFI values approaching .95 as well as RMSEA values of .05 or smaller and SRMR values of .05 and smaller are considered indicative of good model fit Interpretation of the likelihood ratio test statistics: #free parms is the number of freely estimated model parameters; these are gradually restricted in the invariance testing process as parameters are forced to equality in the compared subgroups (see Table 5). LR (with above model and its degrees of freedom) is the χ 2 -distributed test statistic of the rescaled likelihood ratio test. In each row, it refers to the difference in fit of the respective model and the next less restrained (i.e., above) model. Statistical insignificance indicates that the more restricted model fits similarly as the above less restricted model, i.e., that the added parameter equality restrictions for the compared sample subgroups do not substantially worsen model fit and that the scale can thus be considered measurement invariant for the compared groups at the respective level from an applied point of view in light of the different policy implications related to the two dimensions. The modified taxonomy measured by the scale is visualized in Fig. 2.

Methodological discussion
Our results are generally as expected. The structural and convergent/discriminant validity analyses support that the scale measures the SDT taxonomy of motivation, albeit in slightly modified form as explained above. It does so equally well for different health worker subgroups, although with some caveats (see below), indicating that the scale can be used for between-group comparisons. However, two aspects deserve further discussion. First, despite good overall fit of the data to the five-factor model, we found relatively low levels of Cronbach's α for all factors but EXT-E. While low αs are no longer perceived as indicators of low measurement quality [36][37][38], they do signal that our items cover different sub-aspects of the respective dimensions rather than being extremely similar. This is no problem per se, but the relative conceptual breadth of the motivation dimensions should be taken into account when interpreting measurements. Should α be even lower in other settings, a re-evaluation of the scale items and the scale's dimensionality might be necessary. Second, factor correlations were relatively large in magnitude compared to other SDT-based measures (e.g., [18]). We believe there to be two main reasons: Respondents' generally scored relatively high despite the various measures in place, the common method and acquiescence bias likely inflating correlations [39]. Additionally, we found cross-loadings and residual correlations for many items, which, although mostly small, likely also contributed to inflated factor correlations. They might have partially been caused by the more specific item phrasing compared to other SDT-based measures [18,29]. Cross-loadings and residual correlations are often explicitly modeled to improve overall model fit, for instance, in exploratory structural equation models (ESEM) [32,40]. In light of our already good fit, we opted against doing so based on the assumption that future users of the scale might want to analyze data using composite scores, which would be difficult with a scale "calibrated" in an ESEM framework.

Measurement reliability and sensitivity
We were unable to examine measurement reliability (i.e., accuracy and consistency) in-depth within the scope of our study, beyond what was possible in the scale development process. We thus cannot exclude that respondents' scores are to some extent influenced by random or systematic measurement error rather than solely by underlying levels of motivation. The convergent/discriminant validity analysis results, specifically their consistency with previous research, imply that random measurement error is at acceptable levels. Based on the continued high scores on many items, however, we suspect that some social desirability or acquiescence bias might still be at play, systematically inflating scores in relation to their "true values" for certain items. This warrants caution when interpreting absolute scores and calls into question the scale's sensitivity "at the ceiling," i.e., its ability to distinguish respondents or measure change at high motivation levels. Generally, note that systematic biases are less of a concern when investigating relationships of motivation with other variables or changes in motivation over time, assuming that biases stay constant.

Criterion validity
In addition to the convergent/discriminant validity analyses in this study, it would be important to also examine Legend: a not statistically significantly different from zero Fig. 2 The modified SDT taxonomy of motivation as measured by the scale the scale against more tangible criteria such as work performance in the future.

Recommendations for future use
We welcome the use of the scale in future research and are confident that the scale will prove a valid instrument with health workers in other countries and settings as well. The scale will be useful for researchers who want to not only investigate overall levels of work motivation ("motivation intensity") but also study how motivation of different origin and characteristics contribute to these overall levels ("motivation composition") to understand how different "motivation profiles" relate to outcomes of interest [4]. Based on our experiences with the scale so far, we would like to offer the following recommendations to researchers interested in using the tool:

Use the full 26-item scale
Use the full 26-item scale, if possible within the scope of your research. Although we are confident that the item list covers the most important reasons for work motivation even beyond Burkinabé nurses, our item selection for the final 15-item scale was heavily empirically driven and thus reliant on the specific sample. We cannot exclude that a different item selection would have resulted from a different sample.
Use a response scale with seven to nine options Although our 11-point scale seemed to have had certain advantages, we suspect that it might have overwhelmed some respondents, who might have had difficulty conceptualizing the fine differences between scores on the 11-point scale. See Additional file 1 for a more extensive discussion.

Test for measurement invariance
Test for measurement invariance to identify noninvariant scale items before moving on to the actual analysis of interest. Our generalizability analyses suggest that it is possible to compare measurements for different health worker subgroups on all statistical parameters (e.g., means, variances) if analyses are performed in an SEM framework. If factor means for different subgroups are to be compared using composite scores, however, systematic differences in scoring between groups are potentially more problematic as they might artificially create non-real or mask real group differences [37,41]. In our sample, respondents from different subgroups showed somewhat different scoring behavior on items im3, intro1, intro2, ext6, and ext7.

Use SEM for the actual analysis of interest
Generally, substantive analyses on data collected with the scale can be done in one of the following two ways: One can either calculate composite scores or use them in any other type of analysis (e.g., predictor or outcome variables in regression models). Composite scores are usually calculated as the unweighted means of responses to all items pertaining to a factor/dimension. Alternatively, one can continue in an SEM framework by adding a structural part corresponding to a regression model to the measurement model. The composite score calculation is skipped and substantive relationships are directly estimated from the items via the latent factors, thus preserving full variance in the data. For this and other reasons, SEM is clearly preferred by psychometricians and generally leads to better estimates [37,42] but is statistically complex and requires large samples [32].
Beyond its general advantages, we also recommend SEM based on a number of specific results of our analyses. Calculating composite scores bears a risk of imprecision if systematic differences in item-factor loadings (i.e., items have different indicative values for the motivation factor) or intercepts (i.e., systematic differences in item scores which are unrelated to the underlying motivation level) are not accounted for. As with other biases, this is less of an issue if relationships between variables or change over time is the focus of interest, but of critical importance if interpretation of absolute motivation levels is planned. We found only slightly inhomogeneous factor loadings and intercepts in our sample which did not seem to lead to substantial differences between composite scores and latent factor scores. However, more substantial differences are possible in other settings. If the use of SEM is not feasible, we strongly recommend developing a good understanding of all item properties before embarking on substantive analyses with composite scores. Should differences in factor loadings or intercepts across items be more substantial, one might consider weighing items when calculating composite scores rather than giving equal weight to all items, or adding constants to balance differences in intercepts. Note that such adjustments have implications for the interpretation of the measurement (i.e., the "meaning" and level of the respective motivation dimension), depending on how each item effectively contributes to the composite scores. They should thus be applied with caution.

Conclusions
This article presents evidence for the validity of a Self-Determination Theory-based scale to measure health worker motivation composition. Our results show that the scale measures a modified version of the SDT taxonomy well and relatively consistently across health worker subgroups. Results of the convergent/discriminant validation indicate that the five dimensions of motivation relate differently to important work outcomes, underlining the value of investigating motivation composition for the development of a more profound understanding of health worker motivation. We hope that our tool will contribute to meaningful research informing the design of effective and side effect-free interventions to enhance motivation and performance.