Fortunately, there's a better approach. Let's imagine a language that has only two valid sentences, and every tweet must be one of the two sentences. They are:. The messages are relatively long, but there's not a lot of information in each one — all they tell you is whether the person decided to send the trap message or the horse message. It's effectively a 1 or a 0. Although there are a lot of letters, for a reader who knows the pattern of the language, each tweet carries only one bit of information per sentence.
This example hints at a very deep idea, which is that information is fundamentally tied to the recipient's uncertainty about the messge's content and his or her ability to predict it in advance.
Claude Shannon — who almost singlehandedly invented modern information theory — had a clever method for measuring the information content of a language. He showed groups of people samples of typical written English that were cut off at a random point, then asked them to guess which letter came next. Based on the rates of correct guesses — and rigorous mathematical analysis - Shannon determined that the information content of typical written english was 1.
Indeed, if you use a good file compressor on a. It's such a staggeringly large number of tweets that it hardly matters whether it's one person reading or a billion — they won't be able to make a meaningful dent in the list in the lifetime of Earth. Instead, let's think back to that bird sharpening its beak on the mountaintop. Suppose that the bird scrapes off a tiny bit of rock from the mountain when it visits every thousand years, and it carries away those few dozen dust particles when it leaves.
A normal bird would probably deposit more beak material on the mountaintop than it would wear away, but virtually nothing else about this scenario is normal either, so we'll just go with it. Let's say you read tweets aloud for 16 hours a day, every day.
And behind you, every thousand years, the bird arrives and scrapes off a few invisible specks of dust from the top of the hundred-mile mountain with its beak. A hundred eternal years, in which the bird grinds away 36, mountains, make an eternal century.
Best counterexample to any person who believes you can divide literature to genres. I caught the mention of "Pern", and realized it was because I've never read any Pern books. And hey, Randall will be in London? Even farther from me than where he lives! Oh well. Colour me envious of anyone who can take advantage of the opportunity.
As for the incomplete explanation of "Needs more", does it? I feel like every element is explained just fine, and as someone nearly completely unfamiliar with the series and who doesn't use Twitter, I have trouble imagining anyone needing more explanation than I do. Explain xkcd: It's 'cause you're dumb.
Jump to: navigation , search. Below the tweet are several action buttons typical of a Twitter post for comments, replying, likes, etc. It read as follows: I'll be visiting the UK next week! Discussion Threadfall on Twitter beginning ?
I too have noticed an increase around that time; it was mostly screenshots of note-apps before. Maybe it's a reference to Twitter stock price falling here Here is the code fed into the bot of the Tweet-a-Program project. The project allows anyone to tweet ideas in the form of short programs that are executed automatically. The results are then tweeted back , which is often quite entertaining:.
The key here is the intelligent Interpreter function with its AmbiguityFunction option that, given a generic name string, can interpret it as a list of cities with that name. To get closer to the original xkcd comic, I wanted many more cities, but listing those cities explicitly would exceed the standard tweet size limit of characters.
I needed, therefore, a way to find them programmatically. Instead of using Interpreter , I resorted to processing data directly, and also added a few styling options:. The obvious shortcomings are label collision and patchy nonuniform spatial distribution. For those interested in the so-called code golf aspect of it, however, I have a few notes for you at the end of this article.
The more complete and structured the data, the simpler the algorithm necessary for processing. So we start by getting all the necessary data characteristics:. For these 32, US cities, we have three columns of data. The second and third columns are unique city and state entities that can be used to extract different information about them, such as geolocation, area and crime rates:. We need the first column to gather all cities with a specific name string.
After dropping all name strings with just a single corresponding city, we are left with 4, ambiguous strings:. We cannot plot thousands of cities with labels on a map that would fit a standard screen because it would be an unreadable mess and obviously Randall Munroe does not do that, either. So which cities out of those thousands should we pick?
This is where data visualization design comes into play. I suggest the following steps:. A simple approach to a more uniform spatial city distribution is to pick some cities in each state. For this, I will group cities by state:. The syntax 2;; UpTo [3] means that in each group of homonymous cities, I skip the first one, typically the most populous and famous, and take the second and third largest, if available.
Sometimes there are just two namesakes. Considering only the second- and third-largest namesakes reduces the original total number of such cities from 15, to 6, This is still too many cities to show on the map. I can reduce this number further by picking only, for example, two cities per each of the 50 states for a total of cities, which is quite reasonable for a good map layout.
For small Rhode Island, two city labels are too many, while for Texas, two are too few. A simple function can help to define how many cities to pick per state depending on its area:. I will not go over three labels per state, but this function is quite arbitrary, and readers are welcome to experiment to find better map layouts.
0コメント