Sampling design for the Birth in Brazil: National Survey into Labor and Birth

Vasconcellos, Mauricio Teixeira Leite de; Silva, Pedro Luis do Nascimento; Pereira, Ana Paula Esteves; Schilithz, Arthur Orlando Correa; Souza Junior, Paulo Roberto Borges de; Szwarcwald, Celia Landmann

doi:10.1590/0102-311X00176013

Abstract

This paper describes the sample design for the National Survey into Labor and Birth in Brazil. The hospitals with 500 or more live births in 2007 were stratified into: the five Brazilian regions; state capital or not; and type of governance. They were then selected with probability proportional to the number of live births in 2007. An inverse sampling method was used to select as many days (minimum of 7) as necessary to reach 90 interviews in the hospital. Postnatal women were sampled with equal probability from the set of eligible women, who had entered the hospital in the sampled days. Initial sample weights were computed as the reciprocals of the sample inclusion probabilities and were calibrated to ensure that total estimates of the number of live births from the survey matched the known figures obtained from the Brazilian System of Information on Live Births. For the two telephone follow-up waves (6 and 12 months later), the postnatal woman’s response probability was modelled using baseline covariate information in order to adjust the sample weights for nonresponse in each follow-up wave.

Sampling Studies; Stratified Sampling; Statistical Modeles; Parturition

Introduction

According to do Carmo Leal et al. ¹1 do Carmo Leal M, da Silva AA, Dias MA, da Gama SG, Rattner D, Moreira ME, et al. Birth in Brazil: national survey into labour and birth. Reprod Health 2012; 9:15. the objectives of the National Survey into Labour and Birth were: (1) to describe the incidence of excessive caesarean section (according to Robson’s groups) and examine the consequences on women’s and new-borns’ health; (2) to investigate the relationship between excessive caesarean section and late preterm birth and low birth weight; and (3) to investigate the relationship between excessive caesarean section and the use of technological procedures after birth.

This article describes the sample design used in the survey including the definition of the survey population, the stratification of primary sampling units, the criteria for selection of hospitals, days and postnatal women, the base sample weights calculation and their calibration. It also describes the strategy used for estimating the response probabilities of respondents in the two additional telephone follow-up waves six and 12 months after the interview in the hospital, in order to calculate the sampling weights for the respondents in each follow-up wave.

Survey population, first stage sampling frame and stratification

The survey population ²2 Cochran WG. Sampling techniques. 3rd Ed. New York: John Wiley & Sons; 1977. corresponds to the set of postnatal women who gave birth in 2011 in hospitals with 500 or more live births in 2007, according to the Information System on Live Births (SINASC. http://portal.saude.gov.br/portal/saude/visualizar_texto.cfm?idtxt=21379). The SINASC was created by the Brazilian Department of Health in 1990 to gather epidemiological information on live births in hospitals and households all over the country.

For operational reasons, a number of groups were excluded from the survey population including postnatal women with severe mental health disorders, those who were homeless or were foreigners who did not understand Portuguese, deaf/mutes, and women sectioned by court order. Given the survey population definition, only hospitals with 500 live births or more in 2007 were included in the first stage sampling frame. In the end 1.403 of the 3.961 hospitals registered in 2007 were eligible for the study, accounting for 2,228,534 (77.1%) of the 2,891,328 live births that year.

In order to ensure different types of hospital governance (public, private and mixed) in all the five macro-regions of the country, divided into the set of state capitals and the other cities, which have important differences in dimension and kinds of health services, the hospitals in the first stage sampling frame were stratified by the combination of macro-region, capital or not and type of hospital governance, defining the strata presented in Table 1. Mixed governance was used for private hospitals that had beds contracted by the public sector.

Thumbnail

Table 1
Number of live births and hospitals in survey population and sample size, according to strata.

Sample size and its allocation by stratum

According to do Carmo Leal et al. ¹1 do Carmo Leal M, da Silva AA, Dias MA, da Gama SG, Rattner D, Moreira ME, et al. Birth in Brazil: national survey into labour and birth. Reprod Health 2012; 9:15., the sample size in each stratum was calculated based on the caesarean section rate in Brazil in 2007 of 46.6%, with 5% significance to detect differences of 14% between public, mixed and private hospitals and power of 95%. The minimum sample per stratum was 341 postnatal women. Since the sample was clustered by hospital, a design effect of approximately 1.3 was used to inflate the initial sample sizes, leading to a minimum sample size of 450 postnatal women per stratum.

Although not usual in sample survey, this way to determine sample size is common in clinical trials and randomized experiments. It derives from a two-tailed test of the hypothesis of equality between the proportions within treatment and control groups ³3 Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991.. For this calculation the expression 3.14 from Fleiss ⁴4 Fleiss JL. Statistical methods for rates and proportions, 2nd Ed. New York: John Wiley & Sons; 1981. was used.

According to do Carmo Leal et al. ¹1 do Carmo Leal M, da Silva AA, Dias MA, da Gama SG, Rattner D, Moreira ME, et al. Birth in Brazil: national survey into labour and birth. Reprod Health 2012; 9:15., the sample size has a power of 80% to detect adverse outcomes in the order of 3%, and differences of at least 1.5% among large geographic regions or type of hospital governance (public/private/mixed).

Considering the minimum size of 450 postnatal women by stratum, it was decided to select at least five hospitals by stratum, leading to a sample size of 90 postnatal women by hospital. If an equal allocation among the strata were used, these parameters would lead to a sample size of 210 hospitals. However, a proportional allocation to the number of hospitals was used and conducted to a sample size of 266 hospitals, since in all strata with an allocated sample size smaller than five hospitals, the sample size was increased to five in order to ensure a minimum of five hospitals and 450 postnatal women, as indicated in Table 1.

Hospital selection

In the first stage, the hospitals were selected with probability proportional to size (PPS), defined by number of live births of the hospital according to SINASC 2007. As usual in PPS selection, the hospitals with large numbers of live births (more than 13 per day on average, in this case) were included with certainty in the sample and treated as selection strata for sampling days and postnatal women. In the case of strata having five or less hospitals, a take-all procedure was used and each hospital was also treated as a selection stratum for the subsequent sampling stages.

The hospital selection was done systematically ⁵5 Madow WG. On the theory of systematic sampling, II. Annals of Mathematical Statistics 1949; 20: 333-54., after sorting the hospitals in each stratum in ascending order by number of live births in 2007. The sample inclusion probabilities of hospitals are provided in expressions (1a) and (1b) of Figure 1.

Thumbnail

Figure 1
Sample probability schem

Selection of survey days

In the second stage of sampling, an inverse sampling method ²2 Cochran WG. Sampling techniques. 3rd Ed. New York: John Wiley & Sons; 1977.^,⁶6 Haldane JBS. On a method of estimating frequencies. Biometrika 1945; 33:222-5. was used to select as many days as necessary to reach 90 postnatal women interviewed in the hospital. This method, originally proposed by Haldane ⁶6 Haldane JBS. On a method of estimating frequencies. Biometrika 1945; 33:222-5. to estimate frequencies and proportions, can be defined as a technique to sample as many units (in this case, days) as needed to be observed in order to obtain a pre-specified number of successes or, in this case, 90 interviews performed with postnatal women in the hospital.

It is called inverse sampling because rather than defining a fixed number of days sufficient to have an expected sample size of 90 interviews as done by Veloso et al. ⁷7 Veloso VG, Portela MC, Vasconcellos MTL, Matzenbacher LA, Vasconcelos ALR, Grinsztejn B, et al. HIV testing among pregnant women in Brazil: rates and predictors. Rev Saúde Pública 2008; 42:859-67., it defines the number of interviews performed as the stopping rule of the consecutive sample of survey days. The first survey day in each hospital was always selected with equal probability during the year, as indicated by expression (2) of Figure 1. The -1 in the numerator and denominator in expression (2) are explained by the loss of one degree of freedom due to the stopping rule, as defined by Haldane ⁶6 Haldane JBS. On a method of estimating frequencies. Biometrika 1945; 33:222-5..

To account for the difference of number of live births in weekends and work days, a minimum of seven consecutive days was mandatory and the size of field team was determined to ensure this rule.

Selection of postnatal women

The number of postnatal women to be selected per day and hospital depended on the number of live births and the numbers of interview shifts and interviewers per day in the hospital. To establish the number of shifts and interviewers, the mean number of live births per day per hospital in 2007 was used and four combinations were defined: (1) one interviewer and one shift for four interviews; (2) one interviewer and two shifts for six interviews; (3) two interviewers and one shift for eight interviews; and (4) two interviewers and two shifts for twelve interviews.

To ensure a random selection of postnatal women, the survey central office has prepared tables with the number of order of the women to be interviewed according to the numbers of live births (up to 40) and interviews per day and hospital (4, 6, 8 and 12). The number of order of the postnatal women was defined by the order of entrance in the hospital. Some additional numbers of order have been selected for replacement of non-responses.

Unfortunately, the number of live births per hospital and survey day were not recorded during the field work. To overcome this problem, the SINASC 2011 and 2012 files were processed to determine the number of live births in each hospital and survey day, as required to calculate the inclusion probabilities described in expression (3) of Figure 1.

Treatment of non-responses

Nine sampled hospitals refused to take part in the survey, and three had the maternity service closed prior to the start of the fieldwork. The established replacement procedure for hospital non-response consisted in replacing the non-responding hospital by the next hospital in the stratum, according to the sort order of hospitals in the first stage sampling frame. Despite this, it was not possible to replace two non-responding hospitals among private hospitals located in non-capital cities in the Northeast region, as indicated in Table 1.

Postnatal women’s non-response was treated, if possible, by replacement according to selection tables prepared for each hospital or by the inverse sampling procedure used in survey day selection (more days added to the sample until 90 complete interviews were achieved per hospital). In the case of closure of the maternity service during the field work, the inverse sampling procedure was interrupted, restarting as soon as the maternity service was open.

A total of 1,356 (5.7%) postnatal women selected were replaced, 15% due to early hospital discharge and 85% due to refusal to participate. The sample size was composed of 23,940 postnatal women interviewed in 266 hospitals. During processing, records with no data from the woman or no new-born medical records were excluded and the final sample size accounted for 23,894 postnatal women (Table 1).

Sample weighting and calibration of sample weights

As indicated in Figure 1, the base sample weights were calculated by the reciprocals of the product of the inclusion probabilities in each sampling stage.

As usual in official statistical surveys (according to Silva ⁸8 Silva PLN. Calibration estimation: when and why, how much and how. Rio de Janeiro: Instituto Brasileiro de Geografia e Estatística; 2004. (Textos para Discussão da Diretoria de Pesquisas, 14).), calibration of the base sample weights was performed to enforce coherence between sample estimates and known population totals obtained from an external source. In addition, up to a point, calibration helps to compensate for potential sampling and nonresponse biases.

Since the field work was conducted in 2011 (and at the beginning of 2012 for a few hospitals), it seemed appropriate to keep the coherence between sample based estimates and the total number of live births as obtained from the SINASC 2011 for the hospitals in the sampling frame, i.e. those with more than 500 live births in 2007.

For this reason, a ratio type calibration procedure of the base sample weights was performed within each of the selection strata, as indicated in expression (6) of Figure 1.

Results comparing population data with estimates obtained using both the base and calibrated sample weights are presented in Table 2. These results show the coherence between estimates based on calibrated weights and the known population totals, as expected. Also as expected, calibration leads to a slight increase in the variation of the sample weights as shown in Table 3. This increase in sample weight variation is the price to assure coherence for estimates.

Thumbnail

Sample weights for the two telephone follow-up waves

As expected, it was not possible to contact all postnatal women interviewed in the baseline survey during the two telephone interview follow-up waves. Some possibilities could be used to correct the non-response: (1) probabilistic imputation of non-respondents’ data; (2) treating the responding sample as a subsample of the baseline sample; or (3) modelling the probability of response in each follow-up wave as a function of some covariates obtained in the baseline survey and using these to derive nonresponse weight adjustments for responding women in each follow-up wave.

Considering the information on responses achieved in each follow-up wave as provided in Table 3, note that 67.4% and 49.9% of the women interviewed in the baseline survey responded in the first and second follow-up waves respectively. Due to the high nonresponse rates, the first two options were not considered suitable alternatives for nonresponse compensation.

Thus the solution adopted was to model the response probabilities using the covariate information available from the baseline survey. The procedure used was proposed by Little ⁹9 Little RJ. Survey nonresponse adjustments. International Statistical Review 1986; 54:139-57., and is also described in Lepkowski ¹⁰10 Lepkowski J. Non-observation error in household surveys in developing countries. In: Department of Economic and Social Affairs, Statistics Division, editor. Household surveys in developing and transition countries. New York: United Nations; 2005. p. 149-69. (Series F, 96). and Brick & Montaquila ¹¹11 Brick JM, Montaquila JM. Nonresponse and weighting, In: Pfeffermann D, Rao CR, editors. Handbook of statistics 29A. Sample surveys: design, methods and applications. Philadelphia: Elsevier; 2009. p. 163-85..

The general idea behind the procedure used to obtain the sample weights in each telephone interview follow-up wave can be described in four steps, as presented in Figure 2.

Thumbnail

Figure 2
Modeling response probabilities to calculate adjustments to the weights of the two segments.

In the first step, a model was fitted to explain the probability of responding to each follow-up wave for each postnatal woman in the baseline sample using the baseline covariate information as well as the follow-up wave response indicator. This procedure was applied independently for each follow-up wave.

In the second step, the predicted values of the response probabilities in each follow-up wave were estimated using the model fitted in step one.

In the third step, for each follow-up wave the quintiles of the predicted response probabilities were used to define five weight adjustment classes in which a response rate was estimated by the ratio of the sum of respondents’ baseline calibra-ted sample weights to the total of baseline calibrated sample weights of postnatal women of the class, as indicated by expression (9) of Figure 2.

In the last step, the reciprocals of the response rates estimated by follow-up wave and weight adjustment class were used to adjust the baseline calibrated sample weights of the postnatal women interviewed in each follow-up wave.

For the models of response probability, the set of potential predictor variables initially considered included: macro-region; located in capital city or not; type of hospital governance; postnatal woman’s socioeconomic class (A+B, C, or D+E), delivery payment (public, private health insurance, or directly out of pocket), postnatal woman age class (12-19 years, 20-34 years, and 35 years or more); “Have you got any work where you get paid?” (yes or no); “Were you satisfied with your pregnancy at its beginning?” (yes or no); “Still birth or neonatal death of child?” (yes or no); race or skin color (white, black, brown, yellow, or indigenous); “Were there obstetric complications during gestation leading to negative perinatal outcomes?” (yes or no); and for the second follow-up wave only, has the woman responded to the first follow-up wave (yes or no).

For the first follow-up wave, the significant predictor variables were the three variables that defined sample strata (macro-region, capital or not and type of hospital governance), postnatal woman’s socioeconomic class and postnatal woman’s age class.

For the second follow-up wave the significant variables were the same five variables listed above plus “Have you got any work where you get paid?”, “Were you satisfied with your pregnancy at its beginning?” and “Still birth or neonatal death of child?”.

In the correction of follow-up sample weight (third step), the predicted response probabilities were not used directly to adjust the baseline calibrated sample weights in each follow-up wave to avoid undesirable variation in the final weights. In fact, Kish ¹²12 Kish L. Weigthing for unequal Pi. Journal of Official Statistics 1992; 8:183-200. demonstrates that sample weights may reduce bias but often increase the variance of weighted estimators, since the ratio between the variance of the weighted estimator and the variance of the corresponding un-weighted estimator is equal to 1 plus the square of the coefficient of variation of the sample weights. Thus the solution in the third and fourth steps leads to a better solution in correcting the follow-up sample weights for nonresponse, while keeping the increase in weight variation to a minimum (Table 3).

Acknowledgments

To the regional and state coordinators, supervisors, interviewers and crew of the study and the mothers who participated and made this study possible.

References

¹
do Carmo Leal M, da Silva AA, Dias MA, da Gama SG, Rattner D, Moreira ME, et al. Birth in Brazil: national survey into labour and birth. Reprod Health 2012; 9:15.
²
Cochran WG. Sampling techniques. 3^rdEd. New York: John Wiley & Sons; 1977.
³
Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991.
⁴
Fleiss JL. Statistical methods for rates and proportions, 2^nd Ed. New York: John Wiley & Sons; 1981.
⁵
Madow WG. On the theory of systematic sampling, II. Annals of Mathematical Statistics 1949; 20: 333-54.
⁶
Haldane JBS. On a method of estimating frequencies. Biometrika 1945; 33:222-5.
⁷
Veloso VG, Portela MC, Vasconcellos MTL, Matzenbacher LA, Vasconcelos ALR, Grinsztejn B, et al. HIV testing among pregnant women in Brazil: rates and predictors. Rev Saúde Pública 2008; 42:859-67.
⁸
Silva PLN. Calibration estimation: when and why, how much and how. Rio de Janeiro: Instituto Brasileiro de Geografia e Estatística; 2004. (Textos para Discussão da Diretoria de Pesquisas, 14).
⁹
Little RJ. Survey nonresponse adjustments. International Statistical Review 1986; 54:139-57.
¹⁰
Lepkowski J. Non-observation error in household surveys in developing countries. In: Department of Economic and Social Affairs, Statistics Division, editor. Household surveys in developing and transition countries. New York: United Nations; 2005. p. 149-69. (Series F, 96).
¹¹
Brick JM, Montaquila JM. Nonresponse and weighting, In: Pfeffermann D, Rao CR, editors. Handbook of statistics 29A. Sample surveys: design, methods and applications. Philadelphia: Elsevier; 2009. p. 163-85.
¹²
Kish L. Weigthing for unequal Pi. Journal of Official Statistics 1992; 8:183-200.

Funding
National Council for Scientific and Technological Development (CNPq); Science and Tecnology Department, Secretariat of Science, Tecnology, and Strategic Inputs, Brazilian Ministry of Health; National School of Public Health, Oswaldo Cruz Foundation (INOVA Project); and Foundation for supporting Research in the State of Rio de Janeiro (Faperj).

Publication Dates

Publication in this collection
Aug 2014

History

Received
09 Oct 2013
Reviewed
26 Feb 2014
Accepted
24 Mar 2014

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

[1] ¹
do Carmo Leal M, da Silva AA, Dias MA, da Gama SG, Rattner D, Moreira ME, et al. Birth in Brazil: national survey into labour and birth. Reprod Health 2012; 9:15.

[2] ²
Cochran WG. Sampling techniques. 3^rdEd. New York: John Wiley & Sons; 1977.

[3] ³
Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991.

[4] ⁴
Fleiss JL. Statistical methods for rates and proportions, 2^nd Ed. New York: John Wiley & Sons; 1981.

[5] ⁵
Madow WG. On the theory of systematic sampling, II. Annals of Mathematical Statistics 1949; 20: 333-54.

[6] ⁶
Haldane JBS. On a method of estimating frequencies. Biometrika 1945; 33:222-5.

[7] ⁷
Veloso VG, Portela MC, Vasconcellos MTL, Matzenbacher LA, Vasconcelos ALR, Grinsztejn B, et al. HIV testing among pregnant women in Brazil: rates and predictors. Rev Saúde Pública 2008; 42:859-67.

[8] ⁸
Silva PLN. Calibration estimation: when and why, how much and how. Rio de Janeiro: Instituto Brasileiro de Geografia e Estatística; 2004. (Textos para Discussão da Diretoria de Pesquisas, 14).

[9] ⁹
Little RJ. Survey nonresponse adjustments. International Statistical Review 1986; 54:139-57.

[10] ¹⁰
Lepkowski J. Non-observation error in household surveys in developing countries. In: Department of Economic and Social Affairs, Statistics Division, editor. Household surveys in developing and transition countries. New York: United Nations; 2005. p. 149-69. (Series F, 96).

[11] ¹¹
Brick JM, Montaquila JM. Nonresponse and weighting, In: Pfeffermann D, Rao CR, editors. Handbook of statistics 29A. Sample surveys: design, methods and applications. Philadelphia: Elsevier; 2009. p. 163-85.

[12] ¹²
Kish L. Weigthing for unequal Pi. Journal of Official Statistics 1992; 8:183-200.

Macro-regions and hospital type of governance	Total				State capitals				Non-capitals
Macro-regions and hospital type of governance	Live births in 2007	Hospitals in 2007	Hospital sample size	Effective sample size of women	Live births in 2007	Hospitals in 2007	Hospital sample size	Effective sample size of women	Live births in 2007	Hospitals in 2007	Hospital sample size	Effective sample size of women
Total	2,228,534	1,403	266	23,894	802,543	308	84	7,551	1,425,991	1,095	182	16,343
Public	932,617	531	95	8,537	412,069	137	30	2,699	520,548	394	65	5,838
Mixed	966,190	649	115	10,330	186,580	61	24	2,157	779,610	588	91	8,173
Private	329,727	223	56	5,027	203,894	110	30	2,695	125,833	113	26	2,332
North
Public	136,987	91	17	1,531	57,320	14	5	448	79,667	77	12	1,083
Mixed	74,641	47	10	899	31,366	12	5	450	43,275	35	5	449
Private	10,721	9	5	450	10,721	9	5	450	0	0	0	0
Northeast
Public	341,638	211	31	2,779	141,079	44	6	538	200,559	167	25	2,241
Mixed	273,815	160	28	2,516	51,892	17	5	450	221,923	143	23	2,066
Private *	46,213	31	9	801	42,502	26	6	539	3,711	5	3	262
Southeast
Public	313,853	155	26	2,341	141,235	53	8	722	172,618	102	18	1,619
Mixed	402,730	273	42	3,776	61,976	14	5	452	340,754	259	37	3,324
Private	213,047	136	21	1,888	113,219	51	8	718	99,828	85	13	1,170
South
Public	74,770	36	11	991	31,126	10	6	541	43,644	26	5	450
Mixed	156,559	130	24	2,159	15,384	4	4	360	141,175	126	20	1,799
Private	40,141	31	11	989	22,947	13	6	539	17,194	18	5	450
Central
Public	65,369	38	10	895	41,309	16	5	450	24,060	22	5	445
Mixed	58,445	39	11	980	25,962	14	5	445	32,483	25	6	535
Private	19,605	16	10	899	14,505	11	5	449	5,100	5	5	450

Macro-regions and type of hospital governance	Population data from SINASC 2011	Base sample weight		Calibrated sample weight
Macro-regions and type of hospital governance	Population data from SINASC 2011	Estimate	Relative error (%) *	Estimate	Relative error (%) *
Total	2,337,476	2,697,463	15.4	2,337,476	0.0
Public	962,273	1,058,939	10.0	962,273	0.0
Mixed	1,036,634	1,170,514	12.9	1,036,634	0.0
Private	338,569	468,010	38.2	338,569	0.0
North
Public	154,305	161,788	4.8	154,305	0.0
Mixed	57,571	83,284	44.7	57,571	0.0
Private	12,690	13,430	5.8	12,690	0.0
Northeast
Public	334,541	376,493	12.5	334,541	0.0
Mixed	230,107	360,287	56.6	230,107	0.0
Private	110,702	67,497	-39.0	110,702	0.0
Southeast
Public	337,772	362,600	7.4	337,772	0.0
Mixed	501,644	458,582	-8.6	501,644	0.0
Private	154,042	296,744	92.6	154,042	0.0
South
Public	66,793	75,919	13.7	66,793	0.0
Mixed	182,224	197,981	8.6	182,224	0.0
Private	42,932	67,762	57.8	42,932	0.0
Central
Public	68,862	82,139	19.3	68,862	0.0
Mixed	65,088	70,381	8.1	65,088	0.0
Private	18,203	22,577	24.0	18,203	0.0

Summary statistic	Base sample weight	Calibrated sample weight	1st follow-up wave sample weight	2nd follow-up wave sample weight
Number of observations	23,894	23,894	16,109	11,925
Minimum	7.4	4.5	6.0	7.0
First quartile (Q1)	69.4	55.3	76.8	103.3
Median	96.1	78.6	119.0	162.6
Third quartile (Q3)	132.6	114.8	175.5	255.2
Maximum	3,499.9	4,194.9	3,870.4	7,395.8
Range (maximum – minimum)	3,492.5	4,190.4	3,864.4	7,388.8
Interquartile range (Q3 – Q1)	63.2	59.5	98.7	151.9
Mode	19.3	14.9	29.6	39.5
Mean	112.9	97.8	149.1	211.0
Standard deviation	97.6	97.0	151.5	222.4
Coefficient of variation (%)	86.4	99.2	101.6	105.4