ORIGINAL ARTICLE Year : 2019  Volume : 26  Issue : 2  Page : 123126 Forecasting Indian infant mortality rate: An application of autoregressive integrated moving average model Amit K Mishra, Chandar Sahanaa, Mani Manikandan Department of Community Medicine, Pondicherry Institute of Medical Sciences, Puducherry, India Correspondence Address: BACKGROUND: The Infant Mortality Rate (IMR) reflects the socioeconomic development of a nation. The IMR was reduced by 28% between 2015 and 2016 (National Family Health Survey4 [NFHS4]) as compared to 2005–2006 (NFHS3), from 57/1000 to 41/1000 live births. The target fixed by the Government of India for IMR in 2019 is 28/1000 live births (National Health Policy, 2017). One of the most common methods of forecasting this is the autoregressive integrated moving average (ARIMA) model. A forecast of IMR can help implementation of interventions to reduce the burden of infant mortality within the target range. MATERIALS AND METHODS: The objective of the study was to give a detailed explanation of ARIMA model to forecast the IMR (2017–2025). Secondary data analysis and forecast were done for the available year and IMR data extracted from “open government data platform India” website. RESULTS: The forecast of the sample period (1971–2016) showed accuracy by the selected ARIMA (2, 1, 1) model. The postsample forecast with ARIMA (2, 1, 1) showed a decreasing trend of IMR (2017–2025). The forecast IMR for 2025 is 15/1000 live births. CONCLUSION: In the current study, longtime series IMR data were used to forecast the IMR for 9 years. The data showed that IMR would decline from 33/1000 live births in 2017 to 15/1000 live births in 2025. When the actual data for another year (2017) are available, the model can be checked for validity and a more accurate forecast can be performed.
Introduction Globally, infant mortality appears to have declined on average by about 23/1000 live births and mortality of older infants by about 25/1000 live births.[1] In 2000, when the UN adopted eightmillennium developmental goals (MDG), the infant mortality rate (IMR) in India was more than 25 times, with rural India being higher than rural areas in the developed countries.[2] IMR reflects the socioeconomic development of a nation. India made significant moves to reduce the IMR (41/1000 live births)[3] but had been unable to achieve the MDG in 2015.[4] The IMR fell by 28% in 2015–2016 (National Family Health Survey4 [NFHS4]) as compared to 2005–2006 (NFHS3), from 57/1000 to 41/1000 live births.[5],[6] As per the NFHS4, the lowest IMR was reported from Kerala and Uttar Pradesh reported the highest.[7] The target fixed by the Government of India for IMR in 2019 is 28/1000 live births (National Health Policy, 2017).[8],[6] One of the targets in sustainable developmental goals is to reduce the under5 mortality to 25/1000 live births by 2030.[9] Infant mortality accounts for over 80% of under5 mortality rate.[10] IMR fluctuates according to health status of the country, which is a dynamic process. In this regard, if the forecast IMR is available, intervention could be planned and implemented effectively at the right time. One of the most common methods of forecasting is the autoregressive integrated moving average (ARIMA) model, widely used in the field of health and agricultural sciences being simple and easy to use.[11] As infant mortality is one of the indicators of the health status of the country, an attempt was made, in the current study, to forecast the IMR of India up to 2025 with a detailed stepwise explanation of ARIMA model. Materials and Methods The current study was based on the secondary data analysis of IMR of India between 1971 and 2016, collected from the “open government data platform India” website.[12] The data available to public are not individually identifiable (the data referred to in the present study were the annual mortality rates and not the individual data). Ethical approval was not obtained for this study as there was no direct involvement of any human subject. For the analysis (forecast), the ARIMA model introduced by Box and Jekins (also called as Box–Jenkins model) in the 1960s for forecasting a time series variable was used. The ARIMA method is an extrapolation method for forecasting data/variable, and like other methods of forecasting, it requires only longtime series data. To forecast any event, the best ARIMA model has to be selected/prepared as per the following three steps (i.e., model identification, parameter estimation, and model validation). Once the model is validated, the procedure could be carried out with the available data. In the current study, the data for IMR for the period of 1971–2016 (46 years) were used to fulfill the need for longterm series data set for analysis using the ARIMA model. The expression for the ARIMA (p, d, q) model is Yt = b0+ φ1Yt1+…+ φpYtp+ θ1et1+… + θqetq + et, where Yt and et are actual and random error at period t, respectively, and the other variables involved are the order of autoregressive (AR) part (p), the degree of differencing involved (d), and the order of the moving average (MA) part (q). The first step (model identification) in forecasting any time series data using ARIMA model is to check how stationary the available data is because the ARIMA model is used when the data are stationary. A stationary data series is that whose values vary over time around a constant mean and variance. Hence, to apply the ARIMA model to forecast data that are nonstationary, it should be transformed into stationary data series. There are several ways to ascertain stationarity of data. The most common method to check stationarity of data is by examining its graph/time plot. [Figure 1] shows that the data are nonstationary as there is a downward trend and the values did not vary over time around a constant mean and variance. Another method of checking for stationarity of data is the autocorrelation function (ACF) and partial ACF (PACF) of the IMR time series data. [Figure 2] shows that the values are not within the upper and lower confidence limits, which signify that the data set is nonstationary.{Figure 1}{Figure 2} The transformation of data from nonstationary to stationary can be done by considering appropriate differencing of the data. The newly constructed variable after considering the difference of order (d) is “Xt” which can be examined for stationarity. For each difference of order (d), the final hypothetical time series data should be checked for stationarity. The stationarity of the new time series hypothetical data can be checked by ACF and PACF. In the current study, the difference of order (d) 1 was sufficient to achieve stationarity in mean. The stationarity of the hypothetical time series data after difference of order 1 was checked by ACF and PACF of IMR data. After transforming the data to a stationary set, the next step (parameter estimation) is to identify the values of p and q, where p is the order of AR component and q is the order of MA component. When AR and MA are combined, it is known as ARMA. However, in time series data, two other components besides AR and MA are the “trend (longterm variation)” and “seasonality (shortterm variation).” When the ARMA model is integrated with the trend and seasonality component, it is known as the ARIMA model. For each ARIMA model to forecast the data under consideration, a set of values, i.e., (p, d, q) is required. The value of “d” is already estimated as 1 for the current forecast. Using different values (0, 1, 2) for p and q, with permutations and combinations, many models can be suggested. The criteria for the best model of forecasting for the available data are the minimum Akaike information criterion (AIC) and Bayesian information criterion (BIC). For the current study, to consider different values for p and q, the AIC and BIC values were estimated and the model (ARIMA [2, 1, 1]) with lowest values was selected for the forecast [Table 1].{Table 1} The last step (model validation) in ARIMA model estimation is the validation of the selected model. There are many methods of validating the selected model, but the most common (goodness of fit) is by examining the ACF and PACF of the residuals of IMR. As shown in [Figure 3], the ACF and PACF of residuals of IMR at different lag are within the confidence level. This proves that the selected ARIMA model is appropriate for the forecast of the available data or data under consideration.{Figure 3} Results Using the selected ARIMA (2, 1, 1) model, the IMR was forecasted for postsample period of 9 years (2017–2025). The results showed a declining trend of IMR in the predicted period [Table 2] and [Figure 4]. The predicted infant mortality in India is 28/1000 live births in 2019 (19.6, 35.5) and 15/1000 live births by 2025 (1.2, 27.3).{Table 2}{Figure 4} Discussion According to the available literature, forecasting for a specific period is the fourth and final step [13],[14] in the ARIMA model. There are two kinds of forecasts: sample period (the period for which data are available) and postsample period (actual forecasting period) forecast.[11] The sample period forecast gives confidence to the model as the gap between actual data and forecast data could be discovered. This is an indirect indicator of the accuracy of the model. The postsample forecast generates a genuine forecast for a specific period for use in planning and intervention. The selected ARIMA model is used in the current study (2, 1, 1), and the IMR has been forecasted for both sample and postsample periods [Table 2] and [Figure 4]. [Figure 4] shows the forecast of IMR. The gap between the actual data (red line) and forecast data (blue line) is visible but negligible and gives confidence to the selected model for further forecasts. The postsample forecast has been plotted with 95% confidence interval. As the period of forecast increases, the confidence range becomes broader. In the current study, the postsample forecast is for 9 years and the change in the confidence interval range from 25.8812–38.9695 to 1.19033–27.3160 becomes noticeable. There is a decreasing trend of IMR in the period of forecast (2017–2025) and the predicted IMR by 2025 is 15/1000 live births. The predicted data show that the nation will achieve the target of IMR 28/1000 live births in 2019 as fixed by the National Health Policy 2017, Government of India.[8] It is obvious that many factors directly or indirectly impact on infant mortality, thereby reflecting in the rates. An analysis of these annual rates indirectly takes into account the factors impacting on them. One of the limitations of the current study tool is that direct consideration of factors is not possible in the use of the ARIMA model for analysis. Conclusion The ARIMA model is the most commonly used method of forecast because of its simplicity and its usefulness for any time series data, the only requirement being the longtime series data. The forecast is not 100% perfect; however, if the current data are available and the best model has been selected, the accuracy of forecasting of any variable is improved. In the current study, longtime series IMR data had been used to forecast the IMR for 9 years. The predicted data show that IMR will decline from 33 in 2017 to 15/1000 live births by 2025. According the data forecast, the target fixed by National Health Policy, 2017 (28/1000 live births) in 2019 is achievable. When the data for another year (2017) are available, the model can be checked for validity, and the forecast can be done more accurately. The current study reveals the application of a statistical tool for forecasting an event to help in the proper planning of intervention. Similarly, other health events could be forecasted so that proper intervention can be planned at the appropriate time. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. References


