Graphing Russia’s Election Fraud

Following Russia’s parliamentary elections on December 4, a link was posted to Reddit reporting an impossibly high turnout (99.51%) and near unanimous support (99.48%) for Putin’s ruling party, United Russia, in the last location one would expect it: the republic of Chechnya. Even if relations with the secessionist region have improved since the Second Chechen War, both the turnout and United Russia’s vote share are a complete joke. This absurdity prompted a more thorough examination of all regions, many of which were also plagued by irregularities. In this post, I will give some detailed visualizations of both region- and precinct- level election data, and point out some highly likely instances of fraud.


Compression, physical XORs, and NN clustering!

Today, Samson and Diran gave new twists and applications for their previous topics.

Lossless Dataset Compression: Samson discussed lossless compression of statistical data, capable of shrinking huge datasets (often hundreds of TB) down by orders of magnitude. The goal, instead of trying to file away the data into a smaller footprint than other compressors, is to rather allow the data to still be readily accessible while being compressed.

Encryption, Encore: Diran described a way to transmit secret messages across untrustworthy carriers. Using the previously discussed holomorphic encryption, and splitting the message into multiple planes coupled with random noise, one ensures that no single carrier possessing only some of the planes is able to compute the secret. Rather, decryption requires XORing every plane together, which can in fact be done physically with multiple layers of invisible ink, or a unique way of encoding 1s and 0s that allows the union of dark spots to simulate XORs.

Neural Network Clustering: As part of his “neural network” series, Samson introduced a simple, single-hidden-layer network capable of determining, in a spatially clustered dataset, which cluster (or subset of clusters, “0” or “1”) an arbitrary data point should be marked under. The hidden layer is a Gaussian of the vector distance between a given point and each cluster, such that nearer clusters have greater influence. Finding the optimal weights during training requires only least-squares regression, allowing many data points to be trained at once.

Have you gone mod?: Jimmy offered a counterargument to Lisa’s proposal that sequential numbers, such as user IDs of a website, would yield non-random subsets when their residue (mod a certain number) is taken.

Next meeting will be next Tuesday Nov 22nd at 3pm in the Stats Club Office M3 3109.

A Site for Data Scientists to Prove Their Skills and Make Money

Who’s up to challenge some big boys?

Airlines, insurance companies, hospitals and many other organizations are trying to figure out how to corral all their data and turn it into something useful.

Kaggle, a start-up, has figured out a way to connect these companies with the mathematicians and scientists who crunch numbers for a living or a hobby. On Thursday, it announced it had raised $11 million from investors including Khosla Ventures, Index Ventures and Hal Varian, Google‘s chief economist.

“We’re really making big data into a sport,” said Anthony Goldbloom, Kaggle’s founder.