>Statistical thinking & the ability to read and write

>Statistics are hugely underused. Even when they are used, they are improperly used. Much of the bolstering of weak arguments, miscommunication and ad-hoc ideology creation performed by individuals and organisations from the industrial revolution onwards are down to poor statistical use.

Entire industries are devoted to generating statistics; duplicating monstrous amounts of effort and when they are referred to e.g. in sales pitches, organisational reports, building/product tolerances, war crimes tribunals, political and environmental manifestos, news articles and business plans – they are so heavily caveated as to make them insensible. This is by contrast to the relatively deterministic, transparent and auditable approach organisations use to produce BI to support their own business decisions. Coupling this BI with governmental statistics is often necessary for the best decision making support, so commercial BI is itself hamstrung by statistics.

There are two key reasons for this:

1) Statistics are hard to find. If you are looking for, for example, the number of people that work in London currently (a simple enough request that many service organisations would need to be aware of), you will find this close to impossible. The UK Office for National Statistics (ONS) does not have this on their main site. Nor does the Greater London Authority or the newly released UK government linked data site. After you have wasted maybe twenty minutes of your time, you will be reduced to searching for “how many people work in London” then trawling through answers others have given when that same question has been asked. You will receive answers but many will be by small organisations or individuals that do not quote their sources. In the worst case, you may not even find these – instead using an unofficial figure for the whole of the UK which you have had to factor down to make sense just for London. If you search hard enough you will find what you are looking for at an ONS micro-site (completely different URL to ONS) but this data is over five years old.

2) Statistics have a poor image. The blame for this, in part, may be attributed to the famous Disraeli quote – “Lies, damn lies and statistics” in which he set generations of professionals into thinking they were akin to a practicable yet modish Victorian politician by disregarding statistics and cocking-a-snoop at the establishment in favour of their own experience. Showing you are practicable with maverick tendencies while (in overtly disregarding information that may cast doubt on your decision-making) shoring yourself up from failure – are powerful incentives. By contrast, other famous statistical quotes have been forgotten (“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write” [H.G Wells]). Short-sighted governmental data integration, hugely delayed/over-budget government data-centric projects such as the UK National Health Service’s Records System, confusion over keys statistics e.g. number of asylum seekers and high-profile data losses haven’t helped matters since.

There is also a bit of a myth that statistical interpretation is an art and that the general public can be confused – even sent entirely the wrong message by engaging in statistical understanding. Only statisticians can work with this data. Certainly there is this side to statistical analysis (basically anything involving probability, subsets, distributions and meta-statistics) but for the most part, both the general public and organisations are crying out for basic (the answer to one question without qualifiers e.g. where/if etc.) statistical information that is quite simply – on a web-site (we can all just about manage now thanks), produced by or sponsored by the Government (we need to have a basic trust level) with a creation date (we need to know if its old). If statistics are estimates, we need to know that and any proportions need to indicate the sample size. We need this since we are now sophisticated enough to know “8/10 owners said their cats preferred it” has less impact than if we are talking about a dataset of ten cats rather than 10K cats (we don’t really need this though since we’re capable of working out proportions ourselves).

We don’t want graphs since the scale can be manipulated. We don’t want averages since it is similarly open to abuse (mean, median or mode?).  If we make a mistake and relate subsets incorrectly then the people that we are communicating to may identify this and that in itself becomes part of the informational mix (perhaps we were ill-prepared and they should treat everything else we say with care). We actually don’t need sophisticated Natural Language Processing (NLP), BI or Semantic Web techniques to do this. It would be nice if it were linked data but concentrate on sourcing it first. We really are not that bothered about accuracy either (since its unlikely we are budgeting or running up accounts on governmental statistics).

Mostly we are making decisions on this information and we are happy rounding to the nearest ten percent. Are we against further immigration? Is there enough footfall traffic to open a flower shop? Do renters prefer furnished or unfurnished properties in London? Which party has the record for the least taxation? What are the major industries for a given area? We just need all the governmental data to be gathered and kept current (on at least a yearly basis) on one site with a moderately well thought-out Query By Example (QBE)-based interface. That’s it.

Reading and writing have been fundamental human rights in developed countries for decades. Broadband Internet access is fast becoming one too. Surely we need to see access to consistent, underwritten government statistics in this vein too. Where other political parties dispute the figures, they should be able to launch an inquiry into them. Too many inquiries will themselves become a statistic – open to interpretation. It is absolutely in the interest of organisations and current affair-aware individuals.  




    Leave a comment