Advertisement Bidding, Kaggle Contest, New Term & New Faces

Today, the Data Science team kicked off its first meeting of 2012! Welcome to Arthur, David, Paul, and Will for joining the cause! The meeting covered exciting new topics ranging from internet ad bidding to Kaggle competition.

Advertisement Bidding: Paul talked about internet ad bidding, i.e. bidding on advertisement slots on platforms like Google AdWords. He introduced the optimization problem inherent to ad bidding, the difference in pricing between advertisement slots on Google, and explained various bidding methods.

Stay Alert! Kaggle Competition: David gave a walk-through of the Stay Alert! Ford challenge on Kaggle. He described two main findings which propelled his model to finish 6th in the overall rankings. The first was a data visualization method which, in his case, zeroed out the randomly added datapoints. The second was the discovery of interaction effects between variables.

The next meeting will be held next Monday at 4:30pm in the Stats Club Office M3 3109.


Stats and Crypto: the missing link

A couple weeks ago I came across a talk by Prof. Teske on the recent advancements in homomorphic encryptions. Uncharacteristic of a crypto talk, this one dwelt very little with the crypto itself. Rather, it was lighthearted and succinct, understandable for a novice like myself. The talk left me excited and eager to learn more. Channeling from the experience, I would like to keep the following post simple and accessible. Let’s get started.

Homomorphic Encryption: a walkthrough

The idea of homomorphic encryption is to perform operations on encrypted elements in a way to mirror the same operation if it were performed prior to being encrypted. (Whoa lots of wordy words, hang in there!)

Let’s run through a scenario: say Alice has 25 dollars in her pocket and Bob has 16 dollars in his. They would like to know whether their combined pocket money forms the universe number. As neither Alice nor Bob want to reveal their actual amount they agree on performing an exchange like the following:

  • Alice encrypts her number into the cipher: [[25]]
  • Bob encrypts his number into the cipher: [[16]]
  • Alice and Bob together perform a homomorphic addition of their respective ciphers, the result of which is the encryption of  the sum: [[25]] [+] [[16]] => [[25+16]]=[[41]]
  • Alice and Bob together encrypts the universe number: [[42]]
  • Finally, together they perform a check for equality: [[41]] [==] [[42]]

Both Alice and Bob get to know the True/False result, but they do not get any additional knowledge. Hence the scheme does not reveal (1) the associated sum and (2) the amount possessed by the counterpart. Therefore both Alice’s and Bob’s secret, well, stays secret.

The above example showcased an encryption scheme with a single homomorphic operation, the addition. Another type of homomorphic operation is the multiplication. Encryption schemes that support both types of operations are called fully homomorphic encryptions.

Aside from the obvious increase in the number of operations allowed, fully homomorphic encryption allows for much more (keeners can read it off from wikipedia).

A Real-Life Usage: electronic voting

Homomorphic encryptions are often used in electronic voting. The goal in doing so is to preserve the privacy of each ballot while being able to accurately determine the winner.

Let’s look at another simplified example:

Given an election with four candidates {A,B,C,D}, each voter can vote for one and only one candidate in the following way,

  • pick a candidate, say C
  • create the tuple <0,0,1,0>
  • encrypt each elements of the tuple using a randomized encryption (ie an encryption where a given message if encrypted twice will produce two different ciphers)
  • sends out the encrypted tuple <[[0]],[[0]],[[1]],[[0]]> as the ballot
After the election has ended, the commission ends up with a set of data, the encrypted tuples from all voters. Each entry of the dataset looks like gibberish to the commissioner. The commissioner proceeds as follows,
  • performs a homomorphic sum over each individual columns of the tuples resulting in <[[sum_A]], [[sum_B]], [[sum_C]], [[sum_D]]>
  • loops i from 0 to n=#number of votes casted (with consent from voters)
    • checks whether [[i]] [==] [[sum_A]], [[sum_B]],  [[sum_C]], [[sum_D]]
  • determine whether max(sum_A, sum_B, sum_C, sum_D)
Obviously, in practice, the systems contain much more rigor than my make-shift example.

What’s To Come

Fully homomorphic encryptions is a fairly novel subject: its first implementation appeared in 2010. The benefits from it are potentially enormous. Maybe one day we’ll be able to perform statistical computations on a cloud of encrypted data.