Tag: #ggplot2

With under two weeks to go to the 2015 UK general election, there's no better time to take stock of all the voter intention polls being published by the British media. The data has already been aggregated by UKpollingreport—a site I've used before for analysis of the Scottish independence referendum—and there's not much more to be said then is already widely known: it's likely no single party will win an outright majority and we'll be left with a coalition or minority government.

[full post]


Living in Edinburgh it's been hard to avoid the build-up to Scotland's referendum on independence. On September 18th 2014, less than a month away as I write this, people living in Scotland will go to the polls to answer the question: Should Scotland be an independent country?

[full post]


Apparently an 80s commerical for the helmet manufacturer Bell bore the slogan: "If you've got a $10 head, wear a $10 helmet". Nowadays it's a deeply-ingrained and widely accepted idea among bikers that it's worth spending a lot of money on your headgear. A top-of-the-line Arai can sell for almost four figures, particularly if you want a nice race rep design, but what are you getting for your money and, in particular, is it any safer than a helmet you pickup for a tenth of that price?

[full post]


Arnie 2010 (source)

I recently read Arnie's autobiography (great fun) and in it he writes about the various roles he's had, discussing those movies that flopped or were surprise box office successes, but it's hard to build up an overall picture of his career from these fragments. Similarly the raw filmography lists at IMDb and Wikipedia are pretty uninspiring.

That gave me the idea of charting his movie career over time, attempting to show a lot of information at once about how well the film did at box office relative to its budget, and at what points these successes and failures happened over the last few decades. After some python-powered scraping of IMDb data, this is what I came up with:

[full post]


The most popular accounts on twitter have millions of followers, but what are their demographics like? Twitter doesn't collect or release this kind of information, and even things like name and location are only voluntarily added to people's profiles. Unlike Google+ and Facebook, twitter has no real name policy, they don't care what you call yourself, because they can still divine out useful information from your account activity.

For example, you can optionally set your location on your twitter profile. Should you choose not to, twitter can still just geolocate your IP. If you use an anonymiser or VPN, they could use the timing of your account activity to infer a timezone. This could then be refined to a city or town using the topics you tweet about and the locations of friends and services you mention most.

[full post]


"Overrated" and "underrated" are slippery terms to try to quantify. An interesting way of looking at this, I thought, would be to compare the reviews of film critics with those of Joe Public, reasoning that a film which is roundly-lauded by the Hollywood press but proved disappointing for the real audience would be "overrated" and vice versa.

To get some data for this I turned to the most prominent review aggregator: Rotten Tomatoes. All this analysis was done in the R programming language, and full code to reproduce it will be attached at the end.

[full post]


There seems to be a general consensus that author lists in academic articles are growing. Wikipedia says so, and I've also come across a published letter and short Nature article which accept this is the case and discuss ways of mitigating the issue. Recently there was an interesting discussion on academia.stackexchange on the subject but again without much quantification. Luckily given the array of literature database APIs and language bindings available, it should be pretty easy to investigate with some statistical analysis in R.

[full post]