Prediction
for India: Confidence in Data & Analysis for United Stated of
America
Dr
Himanshu Shekhar
Introduction:
After analyzing data of 11 countries for confirmed cases of COVID-19,
it was found that in all the mentioned countries, where turnaround is
achieved, the variation of daily confirmed cases is following a
normal distribution curve. The total duration of Pandemic was between
60-80 days for most of the countries and a clear dependence on
population, peak values, recovery, case per million etc could not
give an insight into the factors, which are responsible for the
turnaround in a particular time-domain. In case of India, also, a
simple normal distribution curve is attempted and it was tried to
predict turnaround by extrapolation. If peak daily confirmed cases
can be restricted to 6000, turnaround in India is possible by 08 June
2020. However, a concern by Shri Gautam Sitesh, about use of
confidence intervals in approximating normal data distributions, is
worth exploration. Current post will try to address the concern of
confidence intervals for India. The post also analyzes confirmed
cases for United States of America, as desired by Shri Anup Kumar,
one of the avid readers of the blog.
Confidence
Interval: Suppose
in a production batch of 100, if 4 samples are tested and 1 is bad,
then it is stated that 75% products are good, but confidence level is
low. If 40 samples are tested and 10 are bad, then again 75% products
are good, but confidence level is relatively high. In general, with
each mean and standard deviation, confidence interval is associated
and it indicates, how realistic a sample mean is representing the
population or how realistic a standard deviation of sample maps the
population. For the current problem of mapping actual daily confirmed
cases of COVID-19 with a normal distribution curve, the main
assumption is the peak value of daily confirmed cases, which is fixed
as 6000, for prediction. At most area under the curve till available
data for actuals and predictions may be compared to see, the level of
confidence. However, more than that, a better approach is to get the
Goodness of fit and the same is carried out for the confirmed cases
in India till 19.05.2020.
Goodness
of Fit:
Although, the extrapolation of data is based on experience and
assumptions and validity has to be ascertained, the correctness of
data can be seen by checking the Pearson coefficient of the fit upto
19.05.2020, for each estimate. Assuming different peak values of
daily confirmed cases, the turnaround using normal distribution curve
is ascertained and Pearson coefficient is determined for each case.
All the coefficients till 19.05.2020 are above 0.986, and a fair
amount of Goodness of fit is ensured. The peak values are taken as
6000, 7000, and 8000 and superimposed normal distribution are
compiled.
The
values of various parameters are tabulated below.
| Set |
1
|
2
|
3
|
4
|
| Mean |
50
|
50
|
50
|
50
|
| Standard Deviation |
900
|
1000
|
1500
|
1800
|
| Amplitude |
5000
|
6000
|
7000
|
8000
|
| Offset |
95
|
100
|
110
|
120
|
| Turnaround |
24-May
|
29-May
|
8-Jun
|
18-Jun
|
| Pearson Coefficient |
0.98616
|
0.98557
|
0.98783
|
0.98713
|
Although
an initial attempt is made with simple normal distribution curve and
various parameters are derived for each case of maximum daily
confirmed cases of COVID-19 for India, but the situation may demand,
a sustained period of peak daily cases or a less sharper fall in
number of daily confirmed cases. However, currently, more attention
is focused on turnaround. If turnaround is successfully achieved then
next stage of reducing daily confirmed cases to less than 100, will
be attempted.
United
States of America:
One of the avid readers and my friend, Shri Anup Kumar suggested
including the variation for America. Actually, data of those
countries where Pandemic is controlled are considered in previous
posts. The countries considered are New Zealand, Switzerland,
Germany, Israel, Japan, South Korea, Australia, Malaysia, Greece,
Croatia and Iceland. No doubt, these countries have superior
healthcare facilities. These countries have different population,
different population density and different level of literacy. These
were the concerns raised by some of the readers. However,
quantification and effect of these parameters on daily confirmed
cases could not be quantified and expressed by any means.
However,
United States of America is similar to India in terms of area,
population. However, their healthcare facilities and literacy levels
are better than India. The Curve of cumulative confirmed cases for
United States of America and India on same scale is shown in the
figure, till 19.05.2020.
The
curve for daily confirmed cases for COVID-19, showed a sudden peak in
the beginning to around 30000 cases, which remained more or less
there, with wide level of fluctuations in values. In between it
attained a maximum value of 48529 on day 117 (26 April 2020). The
Normal distribution curve is superimposed, but it is a combination of
two normal distributions. Both have an offset of 77 days (17 March
2020). The mean for one curve is 20 days (earlier), while for the
other it is 50 days. The amplitudes for simulation were 20000 and
25000, respectively. Since rise is steep, the standard deviation for
the earlier curve is smaller (i.e. 100), while for the later peak, a
higher standard deviation of 1200 is taken. For the simulated curve,
first peak of 32223 is observed on day 99 (08 April 2020). The
second peak is observed on day 127 (06 May 2020) as 25002. The
reduction in daily number of confirmed cases to less than 100 is
observed in around 210 days (28 July 2020), as per the simulated
normal distribution curve.
Although,
for India, simple normal distribution curve is fitted and Pearson
coefficient is ascertained to confirm the Goodness of fit, but the
situation depicted in United States of America can be repeated in
India and the curve may remain at peak for prolonged duration,
leading to change of normal distribution fitting strategy. It may
happen that, combination of two normal distribution curves may be
needed for India, to simulate the peak and finally it will prolong
the complete control over the Pandemic, stated to be controlled when
daily number of confirmed cases is less than 100.
Conclusion:
Efforts are placed to understand the behaviour of Pandemic in
different countries of the world, with an aim to understand its
nature of India. The condition of India is analyzed for different
assumption of peak daily confirmed cases. The turnaround is attempted
assuming simple normal distribution, which may have a difference from
reality. However, Pearson coefficient for each case of normal
distribution fit is 0.986, which indicates that the indicated
parameters have good match with actual data and extrapolations may be
correct. In this post, Data for Confirmed Cases in United States of
America is analyzed and it is simulated by a combination of two
normal distribution curves with different mean, standard deviation
and amplitudes. The variation indicates that in United States of
America, to achieve less than 100 cases, it may go to as far as 28
July 2020.



Thanks for your constant concern in this terrific moment. But I would like to request you that please elaborate the conclusion paragraph with respect to India. and Daily confirm cases to assume the pendemic as the end may be considered as minimum 500 cases instead of 100. because India is a great country with 137 crore population. And unpredictable nature of labour displacement on mass level. Thanks a lot.
ReplyDeleteI fully appreciate your concern about India. I am delliberately not touching upon the reasons, leading to rise in confirmed cases. I am restricting myself to Mathematical treatment of the daily confirmed cases. The value less than 500 for daily confirmed cases may be considered as control, but 500 is a big population, leading to higher numbers, as second surge. That is why I am taking a lower number, deliberately. However, as mentioned in the post, current concern is more about turnaround, and the reduction pattern will be attempted when that stage comes. Thanks for constantly motivating me and giving suggestions and ideas. Regards.
ReplyDeleteIn 3 hours, this post of the blog attained 100 views. Wow! Thank you all for motivation.
ReplyDeleteThe second option of the table for turnaround, for peak 6000, on 29.05.2020 may be looking true, on 27.05.2020(today). Hope that no upsurge on Next Monday is seen in India.
ReplyDelete