Shekhar's Science Blog

Tuesday, May 19, 2020

Prediction for India: Confidence in Data & Analysis for United Stated of America


Prediction for India: Confidence in Data & Analysis for United Stated of America


Dr Himanshu Shekhar


Introduction: After analyzing data of 11 countries for confirmed cases of COVID-19, it was found that in all the mentioned countries, where turnaround is achieved, the variation of daily confirmed cases is following a normal distribution curve. The total duration of Pandemic was between 60-80 days for most of the countries and a clear dependence on population, peak values, recovery, case per million etc could not give an insight into the factors, which are responsible for the turnaround in a particular time-domain. In case of India, also, a simple normal distribution curve is attempted and it was tried to predict turnaround by extrapolation. If peak daily confirmed cases can be restricted to 6000, turnaround in India is possible by 08 June 2020. However, a concern by Shri Gautam Sitesh, about use of confidence intervals in approximating normal data distributions, is worth exploration. Current post will try to address the concern of confidence intervals for India. The post also analyzes confirmed cases for United States of America, as desired by Shri Anup Kumar, one of the avid readers of the blog.
Confidence Interval: Suppose in a production batch of 100, if 4 samples are tested and 1 is bad, then it is stated that 75% products are good, but confidence level is low. If 40 samples are tested and 10 are bad, then again 75% products are good, but confidence level is relatively high. In general, with each mean and standard deviation, confidence interval is associated and it indicates, how realistic a sample mean is representing the population or how realistic a standard deviation of sample maps the population. For the current problem of mapping actual daily confirmed cases of COVID-19 with a normal distribution curve, the main assumption is the peak value of daily confirmed cases, which is fixed as 6000, for prediction. At most area under the curve till available data for actuals and predictions may be compared to see, the level of confidence. However, more than that, a better approach is to get the Goodness of fit and the same is carried out for the confirmed cases in India till 19.05.2020.
Goodness of Fit: Although, the extrapolation of data is based on experience and assumptions and validity has to be ascertained, the correctness of data can be seen by checking the Pearson coefficient of the fit upto 19.05.2020, for each estimate. Assuming different peak values of daily confirmed cases, the turnaround using normal distribution curve is ascertained and Pearson coefficient is determined for each case. All the coefficients till 19.05.2020 are above 0.986, and a fair amount of Goodness of fit is ensured. The peak values are taken as 6000, 7000, and 8000 and superimposed normal distribution are compiled.
The values of various parameters are tabulated below.

Set
1
2
3
4
Mean
50
50
50
50
Standard Deviation
900
1000
1500
1800
Amplitude
5000
6000
7000
8000
Offset
95
100
110
120
Turnaround
24-May
29-May
8-Jun
18-Jun
Pearson Coefficient
0.98616
0.98557
0.98783
0.98713

Although an initial attempt is made with simple normal distribution curve and various parameters are derived for each case of maximum daily confirmed cases of COVID-19 for India, but the situation may demand, a sustained period of peak daily cases or a less sharper fall in number of daily confirmed cases. However, currently, more attention is focused on turnaround. If turnaround is successfully achieved then next stage of reducing daily confirmed cases to less than 100, will be attempted.
United States of America: One of the avid readers and my friend, Shri Anup Kumar suggested including the variation for America. Actually, data of those countries where Pandemic is controlled are considered in previous posts. The countries considered are New Zealand, Switzerland, Germany, Israel, Japan, South Korea, Australia, Malaysia, Greece, Croatia and Iceland. No doubt, these countries have superior healthcare facilities. These countries have different population, different population density and different level of literacy. These were the concerns raised by some of the readers. However, quantification and effect of these parameters on daily confirmed cases could not be quantified and expressed by any means.
However, United States of America is similar to India in terms of area, population. However, their healthcare facilities and literacy levels are better than India. The Curve of cumulative confirmed cases for United States of America and India on same scale is shown in the figure, till 19.05.2020.



Contrary to other countries where the variation started with a slow rise, followed by reduction in slop and then a turnaround and control, United States of America is showing a straight line rise without any curvature, either upward or downwards. India is also shown on same scale and it is having upward rising curvature, but values are miniscule. The simulated straight line of United States of America has uniform rise of 28000 confirmed cases per day from day 85, depicted by 25 March 2020. Such uniform rise is not shown by any other country. The daily confirmed cases are also analyzed using Normal distribution curves for United States of America.


The curve for daily confirmed cases for COVID-19, showed a sudden peak in the beginning to around 30000 cases, which remained more or less there, with wide level of fluctuations in values. In between it attained a maximum value of 48529 on day 117 (26 April 2020). The Normal distribution curve is superimposed, but it is a combination of two normal distributions. Both have an offset of 77 days (17 March 2020). The mean for one curve is 20 days (earlier), while for the other it is 50 days. The amplitudes for simulation were 20000 and 25000, respectively. Since rise is steep, the standard deviation for the earlier curve is smaller (i.e. 100), while for the later peak, a higher standard deviation of 1200 is taken. For the simulated curve, first peak of 32223 is observed on day 99 (08 April 2020). The second peak is observed on day 127 (06 May 2020) as 25002. The reduction in daily number of confirmed cases to less than 100 is observed in around 210 days (28 July 2020), as per the simulated normal distribution curve.
Although, for India, simple normal distribution curve is fitted and Pearson coefficient is ascertained to confirm the Goodness of fit, but the situation depicted in United States of America can be repeated in India and the curve may remain at peak for prolonged duration, leading to change of normal distribution fitting strategy. It may happen that, combination of two normal distribution curves may be needed for India, to simulate the peak and finally it will prolong the complete control over the Pandemic, stated to be controlled when daily number of confirmed cases is less than 100.
Conclusion: Efforts are placed to understand the behaviour of Pandemic in different countries of the world, with an aim to understand its nature of India. The condition of India is analyzed for different assumption of peak daily confirmed cases. The turnaround is attempted assuming simple normal distribution, which may have a difference from reality. However, Pearson coefficient for each case of normal distribution fit is 0.986, which indicates that the indicated parameters have good match with actual data and extrapolations may be correct. In this post, Data for Confirmed Cases in United States of America is analyzed and it is simulated by a combination of two normal distribution curves with different mean, standard deviation and amplitudes. The variation indicates that in United States of America, to achieve less than 100 cases, it may go to as far as 28 July 2020.

4 comments:

  1. Thanks for your constant concern in this terrific moment. But I would like to request you that please elaborate the conclusion paragraph with respect to India. and Daily confirm cases to assume the pendemic as the end may be considered as minimum 500 cases instead of 100. because India is a great country with 137 crore population. And unpredictable nature of labour displacement on mass level. Thanks a lot.

    ReplyDelete
  2. I fully appreciate your concern about India. I am delliberately not touching upon the reasons, leading to rise in confirmed cases. I am restricting myself to Mathematical treatment of the daily confirmed cases. The value less than 500 for daily confirmed cases may be considered as control, but 500 is a big population, leading to higher numbers, as second surge. That is why I am taking a lower number, deliberately. However, as mentioned in the post, current concern is more about turnaround, and the reduction pattern will be attempted when that stage comes. Thanks for constantly motivating me and giving suggestions and ideas. Regards.

    ReplyDelete
  3. In 3 hours, this post of the blog attained 100 views. Wow! Thank you all for motivation.

    ReplyDelete
  4. The second option of the table for turnaround, for peak 6000, on 29.05.2020 may be looking true, on 27.05.2020(today). Hope that no upsurge on Next Monday is seen in India.

    ReplyDelete