Details of IEBC Algorithm that Amollo Otiende Presented during NASA Petition hearing
Yesterday Amollo Otiende presented a graphical linear regression formula in the form of Y= a +bX where Y is the dependent variable representing cumulative votes for Uhuru at any particular point in time during results transmission, a is the Y intercept representing votes Uhuru could have had at the hypothetical point where Raila’s votes were zero, and X is the independent variable and in this case representing Raila’s votes. The values for a and b in the equation were 1.2045 and 183546 respectively. The R² statistic for the above equation is 0.9977which is a statistic indicating how easy or hard the data fits into a line of regression, otherwise known as best line of fit. R² close to 1 indicate the data easily fits into a line of regression whereas R² close to zero indicate impossibility of the data to fit into any line of regression.
The reason this formula, if actually true, is important is because it is not expected for two or more unrelated data to fall into a straight line when plotted against each other. If they fall in a straight line as was shown yesterday in the Supreme Court, the interpretation is that if you happen to know a random value of one of the variables, then you will automatically be able to calculate the corresponding value of the other variable using a straight line formula like the linear regression formula Amollo Otiende presented.
Related Article: WHY OTIENDE CONFUSED ‘X’ WITH ‘TIMES’ IN THE ONGOING RAILA PETITION
Data produced by random events can be arranged in a such a way as to fall into a straight line, but such an arrangement removes from the data the randomness inherent therein. If the data by themselves fall in a straight line without anyone forcing them to, then such data cannot be said to have come from unrelated phenomena – that is, two unrelated variables that easily generate a straight line or other similar correlated lines (curves etc) must have an underlying phenomena that dictates the values the variables must have given certain conditions. In the case of an election results streaming in from random data points, that underlying phenomena could be an algorithm manipulating the transmission of the results as they happen.
To demonstrate the foregoing paragraphs, I took time to retrieve all the Presidential results as were posted by various media houses on Twitter between 17:40 hours on August 8th 2017 and 02:09 hours on August 10th 2017. In total, I managed to get 21 data points, tabulated them, then generated a best of line of fit as demonstrated in subsequent tables and figures.
Table 1 above show how the cumulative votes were reported at various points in time from immediately when the presidential results started streaming in up to the point when the media houses stopped publishing new results on Twitter. These data points can be checked independently by anyone simply by going through the tweets of @KBCChannel1, @CitizenTVKenya and @KTNNews published between the start and stop times indicated in the table.
When the formula Y = 1.2045x + 183546 was applied in the data above, where x represents the votes Raila had garnered at any point in time, we find the predicted votes for Uhuru deviating from the actual votes by up to hundreds of thousands of votes. On average however, this deviation comes to 57,096 votes which is only 0.07% of the average actual votes Uhuru had garnered over the reported period of time. The formula can therefore be said to be accurate.
Although the formula proved some level of accuracy, the formula Amollo Otiende presented was derived from 10 data points only; meaning the formula could be loose. Increasing the data points from 10 to 21 should help tighten the formula, and was expected a new formula was generated when the above data was scatter plotted and a best line of fit drawn. The new formula was shown to be Y = 1.21612x + 61,285 with a R² closer to 1 which is 0.99932 as illustrated in figure 1 below.
When the above formula was plugged back into the data table that generated it, the F. Expected votes for Uhuru Kenyatta and corresponding Deviation were as shown in Table 2 below.
The new formula Y = 1.21612x + 61,285 generates new F. Expected votes that deviates from the actual votes by an average of -15,175 or |15,175|, which is a 3.76 smaller in magnitude compared to the formula that was presented by Amollo Otiende at the Supreme Court. The new average deviation is only 0.02% of the average actual votes garnered by Uhuru over time. Although the formula presented by Amollo Otiende can pass the test of accuracy, If NASA team did thorough work they could have generated even a much finer formula as I have demonstrated above.
As I explained at the start of this article, if random results streamed in independently without any manner of manipulation, then the votes Uhuru and Raila had cumulatively garnered at various points in time shouldn’t have been able to nicely fit into a straight line. Given the expectation for randomness, I randomly gave both Uhuru and Raila assumed votes at exact time points recorded in Tables 1 and 2 above, but I retained the first and the last votes that were received by the two candidates as were actually reported. This is to mean the the randomised results that I gave the two candidates could in theory have been a possibility.
I then went ahead to plot a scatter graph to thereafter generate a best line of fit as shown in figure 2 below.
Notice from figure 2 how several data points fall way out of best line of fit. These random data points also generate a linear regression formula, but with a much lower value of R² of 0.9862, indicating the difficulty with which the two variables try to fit onto a straight line. The regression formula Y = 1.1388x + 29,534 generated from this randomized also gave a Formula Expected votes for Uhuru and consequent deviation as shown in Table 3 below.
Take a closer look at the deviation column. In that column, the deviation has now widened where in three instances the deviation of F. Expected votes from the actual votes for Uhuru can be as high as over 1 million votes. The highest deviation generated by Amollo Otiende formula and my refined formula were no more than 150,700 votes. The average deviation, according to this randomised table, should have been in the range of 400,000, which is at least 26 times bigger than the average deviation that was actually recorded by the actual transmission of 2017 Presidential results. A random deviation could therefore have been in the order of 0.45% of average actual votes, way above the meagre 0.02% that was actually recorded.
In conclusion, it is clear that the way the Presidential results streamed in were not random at all, since plotting Raila votes against Uhuru votes as reported at various points in time between August 8th 2017 at 17:40 hours and August 10th 2017 at 02:09 hours fit perfectly in a best line of fit. The cause of the strong correlation has been alleged to have been in algorithm implanted in IEBC servers, an allegation that NASA may or may not prove after system audit has been finalized.