Test-retest reliability of the Work Ability Index (WAI) in nursing workers

Sérgio Henrique Almeida da Silva Junior Ana Glória Godoi Vasconcelos Rosane Harter Griep Lúcia Rotenberg About the authors


This paper assesses the test retest reliability of the Work Ability Index (WAI) in nursing workers. A self-administered questionnaire was applied twice to a group of 80 workers (nurses and nursing aides/assistants) at a public hospital in Rio de Janeiro, Brazil within an interval from seven to fifteen days. The reliability was estimated using quadratic weighted kappa statistics, interclass correlation coefficient (ICC) and the Bland and Altman plot. Eighty-one percent of participants were women aged betweenfrom 22 to 67 years (mean =39.1; SD=10.8 years); 36.3% had completed higher education. The global score of the WAI presented ICC= 0.79 (IC95% 0.67 to 0.86) and weighted kappa=0.69 (CI95% 0.50 to 0.80) for categorical WAI (classified as low, moderate, good and excellent). The quadratic weighted kappa of the WAI items ranged from 0.39 to 0.82 and the Bland and Altman plot did not show a systematic pattern. The agreement between the test and retest measures shows an acceptable degree of reliability, suggesting the adequacy of the assessment process among nursing workers.

Reliability; Work ability; Nursing; Kappa; Interclass Correlation Coefficient (ICC); Bland and Altman graphic


The Work Ability Index (WAI) assesses workers' perception of "how well they are now or will be in the near future and how well they can perform their job, based on the demands, their health status and physical and mental abilities"1. It is considered to be a predictive measure of situations of work capacity loss, early retirement, sickness absenteeism and unemployment2.

In Brazil, the WAI has been used to assess the functional capacity and/or to identify associated factors among factory workers3-6, electricians7, university professors8, bus drivers9 and nursing teams10-14. However, only three studies have assessed the psychometric performance of this index3,14,15 only one of them included health professionals, particularly nursing teams.

The interest in studies on the WAI in nursing professionals is due to particularities of this group in Brazil, such as long work shifts (usually lasting 12 hours) and having multiple jobs, which results in long working hours in addition to housework, as this is a predominantly female group. In the hospital context, nursing is the main workforce. In such context, workers are exposed to several occupational stressors, whether they are environmental or organizational, which include very specific workloads and demands that can be potential determinants of impairments in health, well-being and work capacity, as previously observed in national10-13,16 and international studies17.

Epidemiological studies provide evidence of determinants of diseases in human populations that depend, among other conditioning factors, on the quality of measures, health tests and data from interviews, assessed through validity and reliability studies. The present study assesses test-retest reliability (temporal stability) of the Brazilian version of the WAI in nursing workers.


Data collection for this study was performed in a public hospital of the city of Rio de Janeiro, Southeastern Brazil, between April and May 2005. A systematic sample of 10% was selected from a list of 1,100 nursing assistant workers, including day-shift and night-shift nurses, nursing aides and nursing assistants, aiming to perform the test-retest study. Each participant read and signed an informed consent form and subsequently completed a self-administered questionnaire during working hours, in a reserved location with the support of trained professionals. Respondents were asked to complete the questionnaire again after an interval of seven to 15 days to test the instrument's measurement process adequacy. Of all 111 workers who participated in the test, 80 (72.1%) adhered to the retest as well. Selective losses related to socio-demographic and occupational characteristics were not identified. Absences, changes of shifts or the impossibility of responding on that occasion required a new approach three days later, as professionals worked one day and were off the following three days. Therefore, four workers (5%) responded to the retest in an interval longer than expected (18 or 19 days). Authors declared there were no conflicts of interest and the present research project was approved by the Oswaldo Cruz Foundation (FIOCRUZ) Research Ethics Committee (Protocol 241/04).

Work Ability Index (WAI)

The WAI version translated and adapted to Brazilian Portuguese and published by Tuomi et al.1 and validated by Martinez et al.3 and Silva Junior et al.14 was used in the present study. The items comprising the WAI, synthesized into seven dimensions, are shown in Chart 1. The overall WAI corresponds to a score that varies from seven (lowest index) to 49 (highest index), categorized into four levels: low (7-27), average (28-36), good (37-43) and high (44-49)1.

Chart 1
Number of questions and points scores for each dimension of the WAI.

Data analysis

The intraclass correlation coefficient (ICC) was used to analyze the test-retest stability of items, scores of dimensions (continuous variables) and total WAI score. Quadratic weighted kappa was applied to the assessment of ordinal variables with more than two categories. Discordant responses were weighted by the squares of deviations of exact agreement18, as they enabled an interpretation equivalent to the ICC.

Confidence intervals of 95% were estimated for all statistics. The following criteria for the interpretation of the level of agreement, proposed by Landis and Koch19, were adopted to assess kappa: a) almost perfect: from 0.81 to 1.00; b) substantial: from 0.61 to 0.80; c) moderate: from 0.41 to 0.60; d) fair: from 0.21 to 0.40; d) slight: from 0 to 0.20; and e) poor: < 0. The following criteria were used to assess the ICC: a) high: from 1 to 0.75; b) moderate: from 0.4 to 0.74; and c) poor: < 0.418 Bland-Altman plot20 was used to assess the pattern of disagreement among repeated measurements (test-retest).

In the case of individuals who had less than 50% of missing data (five or fewer items without response in the WAI questionnaire), missing data inputation21 was performed, using the mean value if the scale was continuous or the median if the scale was discrete.

Total WAI score normality was tested with the Kolmogorov-Smirnov test and the comparison of means of total WAI scores in the test-retest was performed with the paired t-test.


Workers from 27 hospital sectors were interviewed, of which 81.3% were women and 36.3% had completed their higher education. Age varied from 22 to 67 years and mean age was 39.1 years [SD = 10.8]. With regard to occupational characteristics, 30% were nurses, 50% were nursing aides and 20% were nursing assistants; 38.8% were civil servants; 56.3% worked in the night shift; and 47.5% reported having another nursing job. The total WAI score had a normal distribution both in the test (p = 0.587) and retest (p = 0.237), enabling the performance of analyses that took into consideration the assumption of normality, such as Bland-Altman plot. This plot (Figure 1) shows that 95% of the differences between the first and second WAI measurements were between - 6 and + 6 points, with individual differences varying from - 9 to + 8 points in the study interval; 33 individuals obtained a score higher than the mean and 32 others, lower than it. There were four points (5%) out of the range of the mean ± 2 standard-deviations.

Figure 1
Bland and Altman plot, differences between test and retest against the mean test and retest and limits of confidence intervals 95% in nursing in Rio de Janeiro, RJ, 2005 (N = 80).

The mean of WAI scores in the test and retest was similar (39.7 points [SD = 4.8] versus 39.6 points [SD = 5.0]) and this difference was not statistically significant (0.175 points, with a 95%CI = [from - 0.535 to 0.885]). The total WAI score showed an ICC = 0.79 (95%CI 0.67 to 0.86). When assessed per item, the WAI showed agreement, measured by the quadratic weighted kappa, which varied from fair (0.39) for the items "Considering your health, do you think you will be able to perform your current job in two years?" and "Work capacity compared to the best capacity throughout life" to almost perfect (0.82) for the scores of current diseases diagnosed by a physician (Table 1).

Table 1
Interclass correlation coefficient and squared weighted kappa of the dimensions and total score of the WAI.

When the four categories were analyzed (categorical WAI), the percentage of agreement was 67.5% (quadratic weighted kappa = 0.69; 95%CI 0.50 to 0.80). In the retest, 13 individuals were classified in a higher category and 13 in a lower category, compared to the first measurement classification (Table 2).

Table 2
Classification of subjects according, to the category of ICT in measures of test-retest.


In general, the study suggests an adequacy of the WAI psychometric properties with regard to the test-retest stability among nursing professionals. The Bland-Altman plot did not show a systematic pattern, i.e. differences seemed to be random. The indices obtained varied from fair to almost perfect agreement, showing that the instrument's test-retest reliability was acceptable.

Similar results were identified in construction workers22, where 5% of points were out of the expected range (from - 6.86 to + 6.86). Additionally, the data on agreement obtained by these authors were similar to those of the present study, as they found an agreement of 66% (from 64 to 97), including 13 individuals classified in a higher category and 19 classified in a lower category, compared to the first measurement.

However, the present results differ from those found by Renosto et al.15 in metallurgical workers, where only two points (1.3%) were out of the range (from - 7.1 to + 7.1 points). These authors identified an ICC for the overall score of 0.84 and weighted kappa varying between 0.54 for the item "Work capacity compared to the best capacity throughout life" and 0.90 for the score of current diseases diagnosed by a physician.

Among the factors that could explain the differences between the results of these studies are the methodological aspects involved in the completion of the questionnaire and the interval between the test and retest. The present study and that by Zwart et al.22 used the self-administered questionnaire, whereas that by Renosto et al.15 was based on the interviewer-assisted questionnaire, which could contribute to the improvement in the instrument's psychometric performance. In contrast, the shorter interval of application of the WAI in the present study (between one and two weeks), compared to the four-week interval of the studies previously mentioned, could promote a higher agreement, due to the greater chance of recalling the responses given in the first application. However, this factor does not explain the results, as the agreement indices were similar to those obtained by Zwart et al.22 (self-administered questionnaire and four weeks) and lower than those obtained by Renosto et al.15 (interviewer-assisted questionnaire and four weeks). According to Streiner and Norman23, the period of application of the retest should be neither too short, as participants could simply recall their responses, nor too long, as changes in the occurrence of events could explain the variations identified.

It should be emphasized that the variation in the test-retest interval (between nine and 19 days) among the workers assessed did not affect the psychometric performance of this study. Complementary analyses indicate that the amplitude of time interval to perform the retest did not introduce variability capable of compromising the study results. Considering the median of time of the retest as the cut-off point (12 days), the paired t-test did not show a significant difference in the overall WAI score between the test and retest, when comparing participants who responded it with an interval of up to 12 days (t-test = 0.643, p = 0.522) with those who responded it with an interval longer than 12 days (t-test = 0.465 p = 0.643).

A total of 12 respondents had one or two items with incomplete responses and these values were input as previously described. Data input based on both the mean and median promotes higher agreement, as it reduces data variability.

Although the study included nursing workers with different characteristics using systematic random sampling, the reduced number of participants in the sample affects the accuracy of estimates, in addition to not enabling WAI reliability to be explored according to subgroups associated with level of education, sex and age. However, the sample is sufficiently large for studies on psychometric evaluation, which recommend approximately ten participants per item/dimension assessed24,25.

One of the limiting factors of the present study is that the sample size did not enable WAI reliability to be explored, according to subgroups associated with level of education, sex and age. Additionally, the use of a specific work group such as nursing professionals does not allow the present study to be extended to other occupations. Another limitation, previously pointed out by Martinez et al.3, is the definition of cut-off points of WAI score, based on the results obtained from Finnish workers. As Brazilian workers have a different demographic composition and as they are exposed to distinct working and living conditions than those existing in Finland, they are probably subject to a different functional aging pattern and, for this reason, the original cut-off points may not be valid.

The healthy worker effect, present in cross-sectional studies in occupational epidemiology, should be emphasized, as it often excludes individuals who are possibly ill from studies26,27. This effect can lead to the underestimation of the risks posed by the work process, because those who are most affected cannot remain in their jobs, either due to a leave of absence for health treatment, lay-offs, or other reasons.

Acceptable results on stability provide additional support to the applicability of the index to research in the area of workers' health. New studies on WAI validity in nursing professionals are being performed by the same research group, seeking to complement the evaluation of the psychometric adaptation.


  • 1
    Tuomi K, Ilmarinen J, Katajarinne L, Tulkki A. Índice de Capacidade para o Trabalho. 1ª ed. São Carlos: EDUFSCAR; 2005.
  • 2
    Welch LS. Improving work ability in construction workers--let's get to work. Scand J Work Environ Health 2009; 35(5): 321-4.
  • 3
    Martinez MC, Latorre M do RD de O, Fischer FM. Validade e confiabilidade da versão brasileira do Índice de Capacidade para o Trabalho. Rev Saúde Pública 2009; 43: 525-32.
  • 4
    Metzner RJ, Fischer FM. Fadiga e capacidade para o trabalho em turnos fixos de doze horas. Rev Saúde Pública [Internet] 2001; 35. Disponível em: http://www.scielosp.org/scielo.php?pid=S0034-89102001000600008&script=sci_arttext. [Acessado em 8 de setembro de 2011 ]
    » http://www.scielosp.org/scielo.php?pid=S0034-89102001000600008&script=sci_arttext
  • 5
    Metzner RJ, Fischer FM, Nogueira DP. Comparação da percepção de fadiga e de capacidade para o trabalho entre trabalhadores têxteis de empresas que se encontram em diferentes estágios de responsabilidade social empresarial no estado de São Paulo, Brasil. Saúde Soc 2008; 17: 46-55.
  • 6
    Walsh I, Corral S, Franco R, Canetti E, Alem M, Coury H. Capacidade para o trabalho em indivíduos com lesões músculo-esqueléticas crônicas. Rev Saúde Pública 2004; 38: 149-56.
  • 7
    Martinez MC, Latorre M do RD de O. Saúde e capacidade para o trabalho em trabalhadores de área administrativaHealth and work ability among office workers. Rev Saúde Pública 2006; 40: 851-8.
  • 8
    Marqueze EC, Moreno CR de C. Satisfação no trabalho e capacidade para o trabalho entre docentes universitários. Psicologia em Estudo 2009;14.
  • 9
    Sampaio RF, Coelho CM, Barbosa FB, Mancini MC, Parreira VF. Work ability and stress in a bus transportation company in Belo Horizonte, BrazilAvaliação da capacidade para o trabalho e estresse em uma empresa de transporte coletivo de Belo Horizonte, Brasil. Ciênc saúde colet 2009; 14: 287-96.
  • 10
    Duran ECM, Cocco MIM. Capacidade para o trabalho entre trabalhadores de enfermagem do prontosocorro de um hospital universitárioWork ability among nursing workers at the emergency service of a university hospital. Habilidad para el trabajo entre trabajadores de enfermería del puesto de socorro de un hospital universitario. Rev Latino-Am Enfermagem 2004; 12: 43-9.
  • 11
    Rotenberg L, Griep RH, Fischer FM, Fonseca M de JM, Landsbergis P. Working at night and work ability among nursing personnel: when precarious employment makes the difference. Int Arch Occup Environ Health 2009; 82(7): 877-85.
  • 12
    Rotenberg L, Portela LF, Banks B, Griep RH, Fischer FM, Landsbergis P. A gender approach to work ability and its relationship to professional and domestic work hours among nursing personnel. Appl Ergon 2008; 39(5): 646-52.
  • 13
    Fischer FM, Borges FN da S, Rotenberg L, Latorre M do RD de O, Soares NS, Rosa PLFS et al. Work ability of health care shift workers: What matters? Chronobiol Int 2006; 23(6): 1165-79.
  • 14
    Silva Junior SHA da, Vasconcelos AGG, Griep RH, Rotenberg L. Validade e confiabilidade do índice de capacidade para o trabalho (ICT) em trabalhadores de enfermagem. Cad Saúde Pública 2011; 27: 1077-87.
  • 15
    Renosto A, Biz P, Hennington ÉA, Pattussi MP. Confiabilidade teste-reteste do Índice de Capacidade para o Trabalho em trabalhadores metalúrgicos do Sul do Brasil. Rev Bras Epidemiol 2009; 12: 217-25.
  • 16
    Portela LF. Morbidade referida em profissionais da enfermagem: relações com o horário de trabalho, jornada semanal e trabalho doméstico [dissertação de mestrado ]. Rio de Janeiro: Escola Nacional de Saúde Pública; 2003.
  • 17
    Peters VPJM, de Rijk AE, Boumans NPG. Nurses' satisfaction with shiftwork and associations with work, home and health characteristics: a survey in the Netherlands. J Adv Nurs 2009; 65(12): 2689-700.
  • 18
    Fleiss JL, Cohen J. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability. Educational and Psychological Measurement 1973; 33(3): 613-9.
  • 19
    Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33(1): 159-74.
  • 20
    Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1(8476): 307-10.
  • 21
    Anderson LF. Legislative roll-call analysis. Evanston: Northwestern University Press; 1966.
  • 22
    Zwart BCH, Frings Dresen MHW, van Duivenbooden JC. Test - retest reliability of the Work Ability Index questionnaire. Occupational Medicine 2002; 52(4): 177-81.
  • 23
    Streiner DL, Norman GR. Health Measurement Scales: A practical guide to their development and use. 4th ed. Oxford: Oxford University Press, USA; 2008.
  • 24
    Crocker. Introduction to Classical and Modern Test Theory. 1st ed. New York: Wadsworth Publishing; 1986.
  • 25
    Pasquali L. Instrumentos psicológicos: manual prático de elaboração. Brasília: LabPAM/IBAPP; 1999.
  • 26
    McMichael AJ. Standardized mortality ratios and the "healthy worker effect": Scratching beneath the surface. J Occup Med 1976; 18(3): 165-8.
  • 27
    Checkoway H, Pearce N, Crawford-Brown DJ. Research Methods in Occupational Epidemiology. First Edition. New York/Oxford: Oxford University Press, USA; 1989.

Publication Dates

  • Publication in this collection
    Mar 2013


  • Received
    03 Mar 2011
  • Reviewed
    15 Sept 2011
  • Accepted
    09 Dec 2011
Associação Brasileira de Pós -Graduação em Saúde Coletiva São Paulo - SP - Brazil
E-mail: revbrepi@usp.br