Feasibility of road traffic injury surveillance integrating police and health insurance data sets in the Dominican Republic


Factibilidad de la vigilancia de las lesiones por accidentes de tránsito mediante la integración de los conjuntos de datos de la policía y del seguro nacional de salud en la República Dominicana



Adrian PuelloI; Junaid BhattiII; Louis-Rachid SalmiIII

IEscuela de Salud Publica, Universidad Autónoma de Santo Domingo, Santo Domingo, Dominican Republic. Send correspondence to: Adrian Puello, apuello60@uasd.edu.do
IIDouglas Hospital Research Centre, Montreal, Quebec, Canada
IIIInstitut de Santé Publique, d'Epidemiologie et de Developpement, Centre Institut National de la Santé et de la Recherche Médicale U897, Epidemiologie et Biostatistique, Université de Bordeaux, Bordeaux, France




OBJECTIVE: To assess the feasibility of semiautomated linking of road traffic injury (RTI) cases in different data sets in low- and middle-income countries.
METHODS: The study population consisted of RTI cases in the Dominican Republic in 2010 and were identified in police and health insurance data sets. After duplicates were removed and fatality reporting was corrected by using forensic data, police and health insurance RTI records were linked if they had the same province, collision date, and gender of RTI cases and similar age within five years. A multinomial logistic regression model assessed the likelihood of being in only one of the data sets.
RESULTS: One of five records was a duplicate, including 21.1% of 6 396 police and 16.2% of 6 178 insurance records. Health insurance data recorded 43 of 417 deaths as only injured. Capture - recapture estimated that both data sets recorded one of five RTI cases. Characteristics associated with increased likelihood (P < 0.05) of being only in the police data set were female gender [adjusted odds ratio (OR) = 2.5], age
16 years (OR = 1.7), collision in the regions of Cibao Northeast (OR = 4.1) and Valdesia (OR = 6.4), day of occurrence from Tuesday to Saturday (ORs from 1.5 to 2.9), month of occurrence from October to December (ORs from 1.6 to 4.5), and occupant of four-wheeled vehicles (OR = 5.4) or trucks (OR = 5.3).
CONCLUSIONS: Consistent semiautomated linking procedures were feasible to ascertain the RTI burden in the Dominican Republic and could be improved by standardized coding of police and health insurance RTI reporting.

Key words: Data analysis; accidents, traffic; wounds and injuries; safety; insurance, health; Dominican Republic.


OBJETIVO:  Evaluar la factibilidad de la vinculación semiautomática de los registros de casos de lesiones por accidentes de tránsito (LAT) de diferentes conjuntos de datos en países de ingresos bajos y medianos.
MÉTODOS:  La población de estudio la constituían los casos de LAT ocurridos en la República Dominicana en el 2010 y registrados en los conjuntos de datos de la policía y del seguro nacional de salud. Después de eliminar los casos duplicados y corregir la notificación de defunciones a partir de los datos forenses, se vincularon los registros de LAT de la policía y el seguro de enfermedad si los casos correspondían a la misma provincia, fecha de colisión y sexo, y la edad era similar con una diferencia no superior a cinco años. Se evaluó la probabilidad de aparecer únicamente en uno de los conjuntos de datos mediante un modelo de regresión logística polinómica.

RESULTADOS:  Uno de cada cinco registros estaba duplicado (21,1% de los 6 396 registros de la policía y 16,2% de los 6 178 registros del seguro). En el conjunto de datos del seguro nacional de salud se registraron 43 de las 417 defunciones como únicamente lesionados. Mediante el método de captura-recaptura se calculó que en ambos conjuntos de datos se registraban uno de cada cinco casos de LAT. Las características asociadas con una mayor probabilidad (P < 0,05) de aparecer únicamente en el conjunto de datos de la policía fueron el sexo femenino (razón de posibilidades ajustada [OR] = 2,5), la edad
16 años (OR = 1,7), la colisión en las regiones del nordeste de Cibao (OR = 4,1) y Valdesia (OR = 6,4), el día del accidente de martes a sábado (OR de 1,5 a 2,9), el mes del accidente de octubre a diciembre (OR de 1,6 a 4,5) y los ocupantes de vehículos de cuatro ruedas (OR = 5,4) o camiones (OR = 5,3).
CONCLUSIONES:  Los procedimientos sistemáticos de vinculación semiautomatizada se mostraron factibles para evaluar la carga de LAT en la República Dominicana, y se podrían mejorar mediante la codificación estandarizada de las notificaciones de LAT de la policía y del seguro nacional de salud.

Palavras chave: Análisis de datos; accidentes de tránsito; heridas y traumatismos; seguridad; seguro de salud; República Dominicana.



Road traffic injuries (RTIs) are a major public health problem, particularly in low- and middle-income countries (LMICs) where they result in about 1.2 million deaths annually (1). According to the World Health Organization (WHO), a major limitation to decrease this burden is the lack of data systems to provide comprehensive, accurate, and reliable information (2, 3). Previous work in LMICs consistently showed that police records, the most cited source of RTI statistics internationally, hugely underreported RTIs (4, 5). Underreporting and inadequate information on local risk factors are considered major obstacles in implementing RTI prevention and control measures in LMICs (1 - 5).

Several studies responded to this gap by linking police records with health care data sets such as emergency and ambulance records and using capture - recapture methods to document the extent of underreporting (6, 7). The main limitation of these efforts was the manual linking of records, which resulted in an inability to integrate these methods in RTI surveillance beyond the study period (6 - 8). Semiautomated linking of RTI records in a LMIC setting could be a powerful tool to develop an integrative cost-effective RTI surveillance system and overcome the above-mentioned limitations, but this tool has rarely been tested in LMICs (8).

The Dominican Republic (DR) is a middle-income country of the Latin America and Caribbean region, with a population of slightly under 10 million inhabitants distributed in 31 provinces (9). Developed regions and large cities concentrate more than 60% of the population (9, 10). Like other LMICs, its economy has grown significantly over the past decade; road transport plays an essential role in this growth, and more than 2.5 million vehicles were registered in 2010 (10). Traffic laws are comparable to those in high-income countries, although undisciplined driving is common because adequate traffic controls are insufficient. Consequently, the RTI situation in the DR, with a RTI mortality rate estimated at 41 per 100 000 inhabitants (11), does not differ from other LMICs. Multiple authorities, such as police (the official source), health, insurance, forensic, and transport, are involved in RTI data collection for their specific purposes. Regional reports from the WHO confirmed underreporting of road traffic fatalities (RTFs) in the DR and clearly indicated the need for more comprehensive RTI surveillance, as no single data set was complete (12). Yet the feasibility of integrating existing data sets was never assessed (12). The objectives of this study were to evaluate the quality and availability in the two main data sets-police and health insurance-of characteristics that could be used for linking RTI records, test the feasibility of a semiautomated linking process, and compare reporting of RTI-related characteristics among different data sets.



Approval was obtained from the Autonomous University of Santo Domingo Institutional Review Board. For reasons of confidentiality, identity variables were assessed in linking procedures only. This report does not allow identification of any injured person.

Study population and design

The study population consisted of RTI cases (injury or fatality) reported in the DR in 2010. Two data sets recording RTI-namely, police and health insurance-were assessed for the availability of internationally recommended characteristics (13, 14). By using multiple characteristics, records were linked to assess their level of ascertainment by capture - recapture methods (15). Finally, potential limits in ascertainment were assessed by comparing reporting of characteristics among data sets using multivariate analyses.

Case definitions

A RTI was defined as a person who had sustained physical damage as a result of a road traffic collision (RTC) in the DR in 2010. It included RTFs, defined as any person who died immediately or within 30 days as a result of a RTC. Suicides were excluded from RTFs.

Data sets

When a RTC occurs in the DR, the police are contacted, go on site, and complete a standardized RTC report. The reports are computerized on a spreadsheet as a national data set maintained by the National Authority of Transport. This data set has information on each RTI case characteristics and crash details.

In 2009, the Superintendence of Health and Work Risks started to register systematic data on medical services related to RTIs occurring among 2.3 million insured residents of the DR. This system is updated daily through a standardized online declaration of all requests for reimbursement of medical pharmacy services and payments. The correspondent of insurance companies transcribes information from the patient record, including injury diagnosis according to the International Classification of Diseases, 10th edition.

Identification of duplicates and data cleansing

As a first step, the quality of data sets was checked for variable format, classes, and codes; for instance, whether the RTC date was recorded with the same format for all entries was verified. A semiautomated search of duplicate records was conducted using Microsoft Office Access 2010 by linking all common characteristics, such as the national identity number (NIN), names, date, gender, province, type of collision, age, admission diagnosis, and hospital where the person received medical care (5, 6). Variables with a similar format and manually unformatted or free-text variables were automatically linked (i.e., family names and street names). A duplicate record was defined as an instance of the same person being counted twice. All identified duplicate records were verified manually and merged into the oldest record if NIN, gender, and province were identical or if any of the above were the same and name, date, and age were similar. An additional correction for fatality was performed as a previous study indicated that fatality might not be accurately reported in police- and health-based data sets (16). Therefore, both data sets were compared with forensic medicine data and those reported as injured in both data sets but registered as dead in the forensic medicine database were corrected.

Assessment of variable availability

We verified whether police and health insurance data sets included core characteristics recommended in the WHO injury surveillance guidelines: identifier, age, gender, intent, activity, place of occurrence, nature of injury, mechanism of injury, mode of transport, road user, and counterpart. The list was extended to include characteristics identified in the recent WHO and Centers for Disease Control and Prevention guidelines (13, 14).

Linking of data sets and statistical analyses

Duplicate records were characterized as proportions of total records registered. The availability of variables was characterized by the proportion of RTI records with each variable documented after removal of duplicates. RTI case records in police and insurance data sets were linked by using a semiautomated linking procedure. As gender and province were found to be the two most consistently coded variables, we devised two linking strategies: a less restrictive strategy required province and gender to be identical, RTC dates to be similar (within two days of each other), ages to be similar (within five years), and any two of the other variables to be similar; a more restrictive strategy required province, gender, and date to be identical and age and any three of the other variables to be similar. The RTI case characteristics associated with appearing in only one data set were assessed by multinomial logistic regression analyses. For these analyses, RTI cases were coded as "0" for linked, "1" for only in police records, and "2" for only in insurance records. Microsoft Access was used for the linking procedure, and SAS version 9.1 was used for all statistical analyses.



Duplicate removal and fatality reporting

Of the 6 396 police records, 21.1% were double entries, which led to a final count of 5 047 police-reported RTIs. Similarly, the insurance data set initially included 6 178 records but, after removal of double entries (16.2%), 5 176 RTI records were available for linking. Police records had 2 156 entry errors on characteristics such as age, date, hour, and name. Patients who died numbered 417 in the insurance data, including 43 who were correctly identified and 374 who were identified only as injuries. Police data were relatively complete and concordant as 97.1% (n = 1 631) of forensic records were reported by the police data set as dead. An additional 54 fatalities in the police data set could not be verified because of a lack of any personal identifier.

Data availability

The completeness of the insurance data set with regard to RTI-related characteristics was high (100%) compared with police records (Table 1). Insurance data sets contained collision details, RTI case information, and care-related information but lacked information about collision sites. The NIN was available in only 30% of the police data set.



Linking and ascertainment

The less restrictive linking identified 5 855 unique records, including 37.0% linked, 12.0% unlinked police, and 51.0% unlinked insurance records. The more strict criteria identified 8 517 records, including 17.0% linked records, 39.5% unlinked police records, and 43.5% unlinked insurance records (Table 2). The ascertainment of RTI cases was comparable for police and insurance data sets with less restrictive (39.8% vs. 40.8%) as well as with more strict (19.8% vs. 21.4%) linking.



Characteristics associated with linking

Characteristics associated with successful linking results were age, gender, type of road user, vehicle type, province or region, day, month, and hour of collision occurrence (Table 3). Characteristics associated with the likelihood of appearing only in the health insurance data set were female gender [odds ratio (OR) = 6.7], day of collision from Monday to Friday (OR = 1.7 - 2.7), and hour of collision from 6:00 am to 11:59 am (OR = 1.3) (Table 4). Similarly, characteristics associated with the likelihood of appearing only in the police data set were female gender (OR = 2.5), age 16 years (OR = 1.7), collision in the region of Cibao Northeast (OR = 4.1) or Valdesia (OR = 6.4), day of collision from Tuesday to Saturday (OR = 1.5 - 2.9), month of collision from October to December (OR = 1.6 - 4.5), and occupant of four-wheeled vehicle (OR = 5.4) or truck (OR = 5.3).






This is the first nationwide study that assessed the feasibility of a semiautomated linkage of RTI cases in the DR or in the Central American region (8). It identified several problems in record keeping and linking that could hamper development of an integrated RTI surveillance system in the DR. For instance, initial data cleansing showed that one of five records were duplicates in each data set. Similarly, health insurance data recorded many fatalities as injuries, whereas the police records were more accurate regarding fatality reporting. Police records were incomplete for many characteristics-notably the NIN, which could be the most useful variable in linking the data sets. Unexpectedly, and contrary to previous work in LMICs (8), capture - recapture estimations showed that ascertainments of both police and insurance data sets were comparable-both accounted for about two of five RTIs. It was also possible to identify RTI characteristics that were associated with limited linking of police and insurance data sets in the DR.

This study illustrates several strengths of applying capture - recapture methods in a LMIC setting (8, 17). First, it is a nationwide study on an island country with a considerable population. It uses new nationwide health insurance data, which was not the case in most studies originating from LMICs that included only major urban areas (8). Furthermore, police records were more complete in terms of RTC description, whereas hospital records included detailed information about the injury and care provided. This characteristic could be useful in developing a cost-effective integrated RTI surveillance system in the DR.

Nonetheless, this study identified some notable discrepancies that have to be taken into consideration when developing an integrated RTI surveillance system. The foremost problem in both police and health insurance data sets was the considerable amount of duplicate records. These problems have rarely been highlighted in previous capture-recapture studies that focused on estimating RTI case burden (8, 18). Use of multiple forms while being treated appeared to be the logical reason for duplicate reporting in the health insurance data set. One hypothesis is that there might be some intentional overreporting, as there is a limit of maximum reimbursement per collision, and some RTI cases might be encouraged to register as related to a new event. Our methods suggested that these errors were correctable provided that name, date, and province of collision, admission diagnosis, and hospital were linked while new data were being entered. Clearly, data cleansing is a prerequisite before construction of an integrated surveillance system. For the police data set, correct registration of the NIN, available in only one-third of police records, could also reduce duplicate reporting and facilitate linking of the two data sets.

Both police and health insurance data sets included WHO recommended minimum characteristics (11). Nevertheless, a large number of potentially important characteristics related to human and risk factors, such as alcohol consumption, seat-belt wearing, and helmet use or traffic volume, were not recorded. These characteristics could be added inexpensively to the current police data set. Furthermore, while police data were uninformative about severity and diagnosis, the insurance data set underreported fatality as an outcome. The study showed that semiautomatic linking could improve the availability of such information if linking characteristics were standardized across different data sets (19, 20). Regular data update and RTI case follow-up procedures between data sets could also improve the quality of official statistics (21). Forensic data could be systematically and inexpensively accessed as a gold standard to confirm RTI fatalities (8, 22).

Previous capture - recapture studies in Latin American and Caribbean countries, such as Brazil, Colombia, and Argentina, focused on noncommunicable and infectious diseases (8). Only one study (Nicaragua) assessed reporting of RTI, rail, and other traffic injuries (8, 23). It showed that police records were more complete than hospital records in terms of death reporting. Police records accounted for 1 of 2 deaths but only 1 of 38 injuries, whereas hospital records accounted for 1 of 5 RTIs and fatalities (23). This information is consistent with previous work on RTI estimation using capture - recapture methods (8). This study showed that police and insurance ascertainment of RTI was comparable in the DR, suggesting sufficient overlap required for capture - recapture (8, 17). Nonetheless, underreporting of RTI needs to be addressed in both police and hospital data-recording procedures in the DR. While some RTI cases were more likely to be found in any of the data sets alone, it is possible that populations in rural or low socioeconomic areas are not adequately represented in police and hospital records, a potential limitation in consistently ascertaining RTI burden.

This study showed that the RTI underreporting problem in official statistics from the DR could be dealt with by using semiautomated data-linking procedures. These procedures, which appeared feasible in the DR, could be useful in assessing the quality of data and in identifying coding and reporting issues in other LMICs, which still rely mostly on manual information linking (18, 19). These results could also be useful in improving awareness of coding-related problems among police and health professionals involved in RTC data collection (20, 21). Nonetheless, we have concerns about the capture homogeneity of the population, which is required for the capture - recapture method to be valid (17). As RTI cases reported in the health insurance data set strongly depend on whether they are registered, all the population might not have the same probability of being captured by the insurance companies. Therefore, we might have underestimated the actual RTI burden (18). Further, not all characteristics were coded uniformly; transcription of Dominican names (usually including two given and two family names) might have resulted in an inability to identify all unique records in the linking process with the commonly used software as is the case in this study. It was not possible to evaluate such limitations within the scope of this study, which would require in-depth investigations to fully comprehend RTI underreporting problems with respect to social circumstances.

Finally, some specific recommendations can be made to improve the utility of police and insurance data sets for RTI surveillance in the DR. First, police officers should be briefed about reporting the NIN so that police records can be easily linked to insurance or other data sets. Similarly, the insurance data set should be updated on injury outcomes using forensic data so that injury outcomes are accurately detected. A joint commission could be established between police and insurance departments so that they can work together toward harmonizing RTI definitions and effectively using both data sets for RTI prevention in the DR. This issue needs to be addressed in the future with software that is robust in detecting and matching variables in multiple formats.



The authors are grateful to the Consejo Nacional de la Seguridad Social of the Dominican Republic and the French Embassy at Santo Domingo for financially supporting this project. The authors also acknowledge Jorge Asjana and Rafael Montero from Universidad Autónoma de Santo Domingo and Eugênia Maria Silveira R. and Marcos Rodriguez from the Pan American Health Organization for their input to the manuscript.

Funding. Financial support to conduct this research was provided by the Consejo Nacional de la Seguridad Social, Ministerio de Educación Superior, Ciencia y Tecnología and INSERM U897, Équipe Prévention et Prise en Charge des Traumatismes, Bordeaux, France. Funding bodies had no inputs in study design, interpretation of results, and the decision to submit this manuscript.

Conflict of interests. None.



1. Peden M, Scurfield R, Sleet D, Mohan D, Hyder AA, Jarawan E, et al., eds. World report on road traffic injury prevention. Geneva: WHO; 2004. Available from: http://whqlibdoc.who.int/publications/2004/9241562609.pdf Accessed 28 March 2011.         

2. Lunevicius R, Herbert HK, Hyder AA. The epidemiology of road traffic injuries in the Republic of Lithuania, 1998 - 2007. Eur J Public Health. 2010;20(6):702 - 6.         

3. Mathers CD, Loncar D. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 2006;3(11):e442.         

4. Dandona R, Kumar GA, Ameer MA, Reddy GB, Dandona L. Underreporting of road traffic injuries to the police: results from two data sources in urban India. Inj Prev. 2008;14(6):360 - 5.         

5. Razzak JA, Luby SP. Estimating deaths and injuries due to road traffic accidents in Karachi, Pakistan, through the capture-recapture method. Int J Epidemiol. 1998;27(5):866 - 70.         

6. Aptel I, Salmi LR, Masson F, Bourdé A, Henrion G, Erny P. Road accident statistics: discrepancies between police and hospital data in a French island. Accid Anal Prev. 1999;31 (1 - 2):101 - 8.         

7. Lopez DG, Rosman DL, Jelinek GA, Wilkes GJ, Sprivulis PC. Complementing police road-crash records with trauma registry data-an initial evaluation. Accid Anal Prev. 2000;32(6):771 - 7.         

8. Van Hest R, Grant A, Abubakar I. Quality assessment of capture-recapture studies in resource-limited countries. Trop Med Int Health. 2011;16(8):1019 - 41.         

9. Caceres F, Martinez M, Lopez B. IX Censo Nacional de Población y Vivienda 2010: resultados preliminares. Santo Domingo: Oficina Nacional de Estadísticas; 2011. Available from: http://censo2010.one.gob.do/resultados/Resumen_resultados_generales_censo_2010.pdf Accessed 29 May 2012.         

10. Álvarez P, Madera L. República Dominicana en cifras 2010. Santo Domingo: Oficina Nacional de Estadísticas; 2011. Available from: http://www.one.gob.do/index.php?module=articles&func=view&ptid=14&catid=143 Accessed 29 May 2012.         

11. Marthers C, Boerma T, Fat D. The global burden of disease. Geneva: WHO; 2008. Available from: http://www.who.int/healthinfo/global_burden_disease/2004_report_update/en/index.html Accessed 29 May 2012.         

12. Fraade-Blanar L, Concha-Eastman A, Baker T. Injury in the Americas: the relative burden and challenge. Rev Panam Salud Publica. 2007;22(4):254 - 9.         

13. Harvey A, ed. Data systems: a road traffic safety manual for decision-makers and practitioners. Geneva: WHO; 2010. http://whqlibdoc.who.int/publications/2010/ 9789241598965_eng.pdf Accessed 29 May 2012.         

14. Holder Y, Peden M, Krug E, Lund J, Gururaj G, Kobusingye O, eds. Injury surveillance guidelines. Geneva: WHO and Centers for Disease Control and Prevention; 2001. Available from: http://www.who.int/violence_injury_prevention/media/en/136.pdf Accessed 29 May 2012.         

15. Ballivet S, Salmi LR, Dubourdieu D. Capture-recapture method to determine the best design of a surveillance system: application to a thyroid cancer registry. Eur J Epidemiol. 2000;16(2):147 - 53.         

16. Bhatti JA, Razzak JA, Lagarde E, Salmi LR. Differences in police, ambulance, and emergency department reporting of traffic injuries on Karachi-Hala road, Pakistan. BMC Res Notes. 2011;4:75.         

17. Hubert B, Desenclos JC. Evaluation de l'exhaustivité et la représentativité d'un système de surveillance en utilisant la méthode de capture-recapture. Application à la surveillance des infections à méningocoques en France en 1989 et 1990. Rev Epidemiol Santé Publique. 1993;41:241 - 9.         

18. McCarty DJ, Tull ES, Moy CS, Kwoh CK, LaPorte RE. Ascertainment corrected rates: applications of capture-recapture methods. Int J Epidemiol. 1993;22(3):559 - 65.         

19. Farchi S, Molino N, Giorgi Rossi P, Borgia P, Krzyzanowski M, Dalbokova D, et al. Defining a common set of indicators to monitor road accidents in the European Union. BMC Public Health. 2006;6(1):183.         

20. Calil AM, Sallum EA, Domingues C de A, Nogueira L de S. Mapping injuries in traffic accident victims: a literature review. Rev Lat Am Enfermagem. 2009;17(1):120 - 5.         

21. Henson R, Hadfield JM, Cooper S. Injury control strategies: extending the quality and quantity of data relating to road traffic accidents in children. J Accid Emerg Med. 1999;16(2):87 - 90.         

22. Rojas Medina Y, Espitia-Hardeman V, Dellinger AM, Loayza M, Leiva R, Cisneros G. A road traffic injury surveillance system using combined data sources in Peru. Rev Panam Salud Publica. 2011;29(3):191 - 7.         

23. Tercero F, Andersson R. Measuring transport injuries in a developing country: an application of the capture - recapture method. Accid Anal Prev. 2004;36:13 - 20.         



Manuscript received on 12 August 2012.
Revised version accepted for publication on 10 June 2013.

Organización Panamericana de la Salud Washington - Washington - United States
E-mail: contacto_rpsp@paho.org