Agreement between data obtained from repeated interviews with a six-years interval


Concordância entre dados obtidos em entrevistas repetidas com seis anos de intervalo



Carolina Castro MartinsI; Maria Letícia Ramos-JorgeI; Jaime Aparecido CuryII; Isabela Almeida PordeusIII; Saul Martins PaivaIII

IPrograma de Pós-Graduação em Odontologia. Faculdade de Odontologia. Universidade Federal de Minas Gerais (UFMG). Belo Horizonte, MG, Brasil
IIFaculdade de Odontologia de Piracicaba. Universidade Estadual de Campinas. Piracicaba, SP, Brasil
IIIDepartamento de Odontopediatria e Ortodontia. Faculdade de Odontologia. UFMG. Belo Horizonte, MG, Brasil

Correspondência | Correspondence




The objective of the study was to compare information collected through face-to-face interviews at first time and six years later in a city of Southeastern Brazil. In 1998, 32 mothers (N=32) of children aged 20 to 30 months answered a face-to-face interview with structured questions regarding their children's brushing habits. Six years later this same interview was repeated with the same mothers. Both interviews were compared for overall agreement, kappa and weighted kappa. Overall agreement between both interviews varied from 41 to 96%. Kappa values ranged from 0.00 to 0.65 (very bad to good) without any significant differences. The results showed lack of agreement when the same interview is conducted six years later, showing that the recall bias can be a methodological problem of interviews.

Descriptors: Interviews. Data Collection. Reproducibility of Results. Bias (Epidemiology).


O objetivo do estudo foi comparar a informação coletada em entrevista pessoal num primeiro momento e seis anos depois, em Minas Gerais. Em 1998, 32 mães (N=32) de crianças com idade entre 20 a 30 meses responderam à entrevista pessoal com questões estruturadas sobre os hábitos de escovação das crianças, sendo repetida seis anos depois. As duas entrevistas foram comparadas em concordância geral e em coeficientes kappa e kappa ponderado. A concordância geral entre as entrevistas variou de 41% a 96%. Os valores de kappa variaram de 0,00 a 0,65 (muito ruim a bom), sem diferença significativa. Os resultados mostraram que houve ausência de concordância quando a mesma entrevista foi conduzida seis anos depois, mostrando que o viés de memória pode ser um problema metodológico das entrevistas.

Descritores: Entrevistas. Coleta de Dados. Reprodutibilidade dos Testes. Viés (Epidemiologia).




Minimizing bias in order to produce more valid results is a major challenge to survey-based research. A number of research methodologies are employed to investigate oral behavior and other health habits through face-to-face and telephone interviews, mailed questionnaires, diary data and computer-based questionnaires. Such methodologies also attempt to evaluate the validity and reproducibility of these methods.1,3

Studies have investigated the reproducibility of research methodologies, comparing computed-based questionnaires to printed ones and face-to-face interviews, mailed questionnaires to telephone interviews.1,4 However, it is also important to evaluate the agreement of responses when the same method is used with the same population at two different points in time. Do respondents answer the same question in the same way as earlier? It is important to evaluate the precision of responses, especially if the investigation is about events that occurred years earlier. Agreement can be measured by kappa coefficient for two observers or between two classifications on ordinal or nominal scales.5

The aim of the present study was to compare information collected through face-to-face interviews at first time and six years later.



The study was conducted as part of a cohort study on fluoride intake among 32 children aged 20-30 months in Southeastern Brazil. In 1998, mothers answered a face-to-face interview about their children's current brushing habits with fluoridated toothpaste. The interview was conducted by a single researcher (SMP).

All 32 mothers interviewed in 1998 agreed to participate in the second interview, conducted six years later (2004), by a second interviewer (CCM) previously trained by the first one. On this second occasion, mothers were asked to recall the past habits of their children when they were 20-30 months of age. Both interviews lasted on average ten minutes.

Both interviews were conducted at the mother's home at a prescheduled time. They followed a protocol of structured questions on the child's brushing habits with fluoridated toothpaste. The data analyzed comprised six identical questions included in both interviews, from two to four answer options.

Data were entered into Stata and electronic spreadsheet. Interviews were compared for overall agreement (a+d/a+b+c+d), kappa and weighted kappa, with p<0.05 considered significant at 95% confidence interval. Strength of agreement was based on criteria defined by Altman2 (1991): 0.00 to 0.20= poor; 0.21 to 0.40= fair; 0.41 to 0.60= moderate; 0.61 to 0.80= good; 0.81 to 1.00= very good. The frequency of missing data was compared between the two interviews. "I don't know" and "I don't remember" responses were considered missing data and thus were not included for kappa calculation and overall agreement. These responses are not presented in the Table.

The study was approved by the Universidade Federal de Minas Gerais Research Ethics Committee.



The Table displays the comparison of responses between 1998 and 2004. Overall agreement between the two interviews ranged from 41% to 96%. The level of agreement determined by kappa values ranged from 0.00 to 0.65. The question with the lowest agreement was "Who brushed the child's teeth?" (k=0.00, 95% CI: 0.09–0.00, p=1.000). There were two significant questions, with p<0.05: "Who dispensed the toothpaste on the child's toothbrush?" (k=0.65, 95% CI: 0.01–1.00, p=0.001) and "What kind of toothpaste did your child use?" (k= 0.33, 95% CI: 0.00–0.73, p=0.016).

For the questions "How often did your child brush his teeth per day?" and "How much toothpaste was dispensed on the brush?," weighted kappa values were 0.23 and 0.12, respectively.

The missing data was low, but higher for the interview conducted in 1998 (5.7%) than that conducted in 2004 (0%).



The outcome addresses the comparison of responses and their implication in the ability to recall past episodes.

The response rate was 100%, which is not anticipated in longitudinal research since a proportion of subjects is often lost over a study period. Nevertheless, all mothers were located and agreed to participate, which is noteworthy.

The missing data in the present study related to "I don't know" and "I don't remember" answers, which accounted for 11 out of a total of 192 answers. These responses were not included in the kappa calculation and overall agreement. The frequency of missing data was relatively low, but it can be assumed that the small study sample influenced this finding. Missing data were higher for the interview conducted in 1998 (5.7%) than that conducted in 2004 (0%). The lower frequency of missing data in 2004 may be due to information on oral health gathered over the six-year period, in which mothers might not remember their children's habits. On the other hand, the very low frequency of missing data in 2004 does not imply reliable answers. In a study by Aitken et al1 (2004), they found 0.5% of missing data in telephone interviews and 3.8% in mailed questionnaires, which is very similar to the present study.

All questions had an overall agreement varying between 41% to 96%, with kappa values ranging from 0.00 to 0.65 (poor to good).2 Unlike kappa, overall agreement does not consider random agreement (Table), but it is used to evaluate agreement between two measures of the same individual.5 The present study evaluated the agreement of an interview conducted with the same group of mothers over time. Kappa values ranging from poor to good evidenced low reproducibility of responses over time. As evidenced by the p-value, there was no significant difference, except for the questions "Who dispensed toothpaste on the brush?" and "What kind of toothpaste did your child use?" (k= 0.33, 95% CI: 0.00–0.73, p=0.016). The former had a higher kappa value (0.65, 95% CI: 0.01–1.00, p=0.001). Only one mother changed her answer between the two interviews. The adequate agreement indicated by the kappa value is perhaps due to the fact that this information is already widespread among the population, along with young children's inability to dispense toothpaste on the brush by themselves.

In regard to the question "What kind of toothpaste did your child use?," 11 mothers who responded "the regular one" in 1998 changed their answer to "children's toothpaste" in 2004. Several factors may have influenced this change, such as new brands introduced in the market, marketing strategies, new siblings born during the recall time who used children's toothpaste. The child might have changed habits during the six-year period, which may have confused their mothers.

The following studies compared two methodological instruments and found considerable agreement. Chestnutt et al4 (2004) found good to very good agreement when comparing a face-to-face interview to a computed-assisted questionnaire (k=0.68 to 0.90). Berthelsen et al3 (2000) found an overall reliability of 93% when comparing a computed-assisted questionnaire to a printed questionnaire. Both studies comprised questions on oral practices. The present study is different from these two studies as it did not compare two different methods, but rather the same method over time.

Depending on the type of question and sample size, kappa values may increase or decrease. Many mothers changed their answers in the second interview. These changes may be explained by several reasons: mothers may have received new information on oral health and adopted new habits; some questions may be very subjective and confused the parents with regard to recalling the amount of toothpaste or brushing frequency; mothers may have over-reported some habits to apparently show they are careful; individuals may choose to respond positively to the questions to please the interviewer or adapt their answers to the socially desirable norm; and recall bias, i.e., the faulty memory of mothers in recalling the child's dental practices.

The lack of agreement between both interviews could be explained by the type of information addressed. Brushing habits may not represent important information to mothers, thereby explaining inadequate recall. The six-year period between interviews may be an extensive period of time, but studies regarding childhood habits usually ask parents to recall episodes of their children's life from long time ago. This is the reason why the present study conducted the second interview many years later. Furthermore, a small sample may affect the outcome, whereas kappa values would likely have been different if the study were conducted on a larger sample.

The lack of agreement may be due to recall bias in parent's accounts, which represents a great challenge to epidemiologic research. Many epidemiologic investigations are based on data collected though interviews and questionnaires regarding past events. The results of the present study revealed a lack of agreement when the same interview is conducted after some time. Investigators should be aware of this problem and follow proper training before conducting interviews. Caution should be taken in order not to influence the respondents and avoid bias when interpreting the results.



1 Aitken JF, Philippa HY, Janda M, Elwood M, Ring IT, Lowe JB. Comparability of skin screening histories obtained by telephone interviews and mailed questionnaires: a randomized crossover study. Am J Epidemiol. 2004;160(6):598-604.        

2 Altman DG. Practical statistics for medical research. London: Chapman and Hall; 1991.        

3 Berthelsen CL, Stilley KR. Automated personal health inventory for dentistry: a pilot study. J Am Dent Assoc. 2000;131(1):59-66.        

4 Chestnutt IG, Morgan MZ, Hoddell C, Playle R. A comparison of a computer-based questionnaire and personal interviews in determining oral health-related behaviors. Community Dent Oral Epidemiol. 2004;32(6):410-7.        

5 Cohen J. A coefficient of agreement for nominal scales. Educational Psychol Meas. 1960;20(1):37-46.        



Correspondência | Correspondence:
Carolina de Castro Martins
R. Carangola 62/101, Bairro Santo Antônio
30330-240 Belo Horizonte, MG, Brasil

Received: 22/5/2007
Reviewed: 15/8/2007
Approved: 1/10/2007



CC Martins was supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES – Master's scholarship).
Article based on master's dissertation by CC Martins presented to the Faculdade de Odontologia of Universidade Federal de Minas Gerais, in 2004.
Presented at the Annual Meeting of the Sociedade Brasileira de Pesquisa Odontológica (Brazilian Dental Research Society), held in Atibaia (São Paulo, Brasil), from September 4th to 6th, 2006.

Faculdade de Saúde Pública da Universidade de São Paulo São Paulo - SP - Brazil