Monday 25 March 2019

Is the Median Likely to be Correct?


Statistic Chances for Correct Statistics from Insecure Items ... · Zero Polarity on Five Item Statistics · One Lifespan Too Short in Five · One Lifespan Too Long in Five · Method for Good Choice · Two LifeSpans Too Short in Five Items · Two Lifespans Too Long in Five · The More Extreme Variations or Deviations · Summary for Five Lifespans · From Five to Fifteen? · From Five to Ten, the Long Way · From Five and Ten to Fifteen, the Long Way · From Fifteen to Thirty · Is the Median Likely to be Correct?

Let's take fifteen items. The following are the likelihoods of overall all correct, one or two in deficit or excess:

overall correct
0.53250254490066198554120957851409
one lifespan in deficit
0.19457948503687863317008595913649
one lifespan in excess
0.19457948503687863317008595913649
two lifespans in deficit
0.03491280835932598368558101356028
two lifespans in excess
0.03491280835932598368558101356028


And the following are the likelihoods of each of minimum, lower quartile, median, higher quartile and maximum being correct:

minimum
= 0.92284623643389001647747623232692248
lower quartile
= 0.90567961833184941798449959605929867
median
= 0.92247223095350467350482677091964878
higher quartile
= 0.90567961833184941798449959605929867
maximum
= 0.92284623643389001647747623232692248


For each of these, I added together

all correct + with one lifespan in excess (calculated ratios) + with one lifespan in deficit (mirrored) + with two lifespans in excess (calculated ratios) + with two lifespans in deficit (mirrored)


As to all correct, needs no presentation.

As to one lifespan in excess or two lifespans in excess, calculated ratios, I'm coming to it. As to one or two lifespans in deficit, mirrored, this means the calculated ratios were placed other way round, median to median minimum and maximum swapped, quartiles swapped. I was too lazy to do an extra calculation for making one or two lifespans deficient in years.

For all correct, each of above got 1 full time the known 53.25 %. For deficient or exsessive 1, each ratio was multiplies with the likelihood of 1 in excess or deficit, 19.46 % in 15 items, as we know starting from assumption deficit and excess are equally likely at 2.5 % for each item, and correctness is 95 % probable. And similarily for the ratios related to two in excess or deficit, to the multiplication with 3.49 % likelihood.

Now, how did I calculate the ratios?

I decided an empiric method, starting with 15 theoretically possible lifespans. Chosen at semirandom.

The correct values for these are 30 at minimim, then 35, 35, then 40 and 42 around lower quartile, then 45, 50, then at median you have 52, then 55, 56, then around higher quartile you have 60 and 62, then 70, 71, last at maximum you have 73.

I used an order in which they didn't come in numeric order, but in a scrambled on, so I had to swap to order of magnitude each time, so as to avoid swapping from an order I was OK with, and possibly forgetting to swap. I then added together.

For faults, I used only excess, and then for deficit, I mirrored the values.

One fault starts with 60 (the first in "preliminary order") becoming 65 and ends with 45 becomong 50.

Two faults starts with 60 and 62 becoming 65 and 67 and then 60, 62, 30 becoming 65, 62, 35 and so on.

The ratios for getting correct values were - as I could count them - for one in excess 14/15 for minimum, 13/15 for each of lower quartile and median, 12/15 for each of higher quartile and maximum. I presumed the swaps would be roughly valid for one in deficit. And, for two in excess, same process of mirroring applies, the ratios were, and I did some counting mistakes, for minimum 92/105, for lower quartile 81/105, for median 80/106 (sic, my bad), for higher quartile 66 /105 and for maximum 64/104 (sic, my bad).

So, in getting correct median values, it is not just a matter of whether none or one or two values are off, but also how likely that is to be reflected in the median, which is less likely.

This means, when I made median values about series of wikipedian biographies, even with 95 % correctness per item, a correct median in 15 items is as likely as 92.25 %.

And while with 30 items (I might be back on that one tomorrow, today is a holiday) the likelihood for none or one excess or deficit sinks and for two excess or deficit rises, the median, quartiles and extremes make up less of the spectrum, so errors are less likely to be reflected.

For 15 items, you have unitary extremes and middle, at values number 1, 8 and 15, and binary quartiles at 4 and 5 around lower, at 11 and 12 around higher. This means 7 in 15 values are possible hits for errors in the overall views.

For 30 items, you have unitary extremes and quartiles, at values number 1, 8, 23 and 30, and a binary median with values 15 and 16 around it.

Now, 6 values in 30 are clearly fewer than 7 values in 15, so, the median, quartiles and extremes for 30 values should be even less likely to be wrong than for 15.

As I have already done several statistics from wikipedian lifespans, I can tell you a median of 52 is really on the lower side, for premodern times. On the other hand, this is in contexts with either ancestor bias or professional bias. You don't become ancestor of anyone if you die at 5, and you neither become king or painter if you die at 5. This means you have in both types of bias an exclusion of infant and child mortality. The medians are not what age you could expect to live to when you were born (and in fact you were hardly calculating life expectancy at that point, even if your parents were), it's more like what you could reasonably expect to live to if you already married and got children (in some cases a wife of 12 died at or just after giving birth, even so the medians are definitely above "45" as often cited) or if you already were active in a profession. In cases when I have instead calculated "children of" the overall lifespans, reflecting life expectancy at birth, are clearly somewhat lower, perhaps 45 or so.

Hans Georg Lundahl
Nanterre
Annuntiation of Our Lady
25.III.2019

No comments:

Post a Comment