Stats and Crypto: the missing link

A couple weeks ago I came across a talk by Prof. Teske on the recent advancements in homomorphic encryptions. Uncharacteristic of a crypto talk, this one dwelt very little with the crypto itself. Rather, it was lighthearted and succinct, understandable for a novice like myself. The talk left me excited and eager to learn more. Channeling from the experience, I would like to keep the following post simple and accessible. Let’s get started.

Homomorphic Encryption: a walkthrough

The idea of homomorphic encryption is to perform operations on encrypted elements in a way to mirror the same operation if it were performed prior to being encrypted. (Whoa lots of wordy words, hang in there!)

Let’s run through a scenario: say Alice has 25 dollars in her pocket and Bob has 16 dollars in his. They would like to know whether their combined pocket money forms the universe number. As neither Alice nor Bob want to reveal their actual amount they agree on performing an exchange like the following:

  • Alice encrypts her number into the cipher: [[25]]
  • Bob encrypts his number into the cipher: [[16]]
  • Alice and Bob together perform a homomorphic addition of their respective ciphers, the result of which is the encryption of  the sum: [[25]] [+] [[16]] => [[25+16]]=[[41]]
  • Alice and Bob together encrypts the universe number: [[42]]
  • Finally, together they perform a check for equality: [[41]] [==] [[42]]

Both Alice and Bob get to know the True/False result, but they do not get any additional knowledge. Hence the scheme does not reveal (1) the associated sum and (2) the amount possessed by the counterpart. Therefore both Alice’s and Bob’s secret, well, stays secret.

The above example showcased an encryption scheme with a single homomorphic operation, the addition. Another type of homomorphic operation is the multiplication. Encryption schemes that support both types of operations are called fully homomorphic encryptions.

Aside from the obvious increase in the number of operations allowed, fully homomorphic encryption allows for much more (keeners can read it off from wikipedia).

A Real-Life Usage: electronic voting

Homomorphic encryptions are often used in electronic voting. The goal in doing so is to preserve the privacy of each ballot while being able to accurately determine the winner.

Let’s look at another simplified example:

Given an election with four candidates {A,B,C,D}, each voter can vote for one and only one candidate in the following way,

  • pick a candidate, say C
  • create the tuple <0,0,1,0>
  • encrypt each elements of the tuple using a randomized encryption (ie an encryption where a given message if encrypted twice will produce two different ciphers)
  • sends out the encrypted tuple <[[0]],[[0]],[[1]],[[0]]> as the ballot
After the election has ended, the commission ends up with a set of data, the encrypted tuples from all voters. Each entry of the dataset looks like gibberish to the commissioner. The commissioner proceeds as follows,
  • performs a homomorphic sum over each individual columns of the tuples resulting in <[[sum_A]], [[sum_B]], [[sum_C]], [[sum_D]]>
  • loops i from 0 to n=#number of votes casted (with consent from voters)
    • checks whether [[i]] [==] [[sum_A]], [[sum_B]],  [[sum_C]], [[sum_D]]
  • determine whether max(sum_A, sum_B, sum_C, sum_D)
Obviously, in practice, the systems contain much more rigor than my make-shift example.

What’s To Come

Fully homomorphic encryptions is a fairly novel subject: its first implementation appeared in 2010. The benefits from it are potentially enormous. Maybe one day we’ll be able to perform statistical computations on a cloud of encrypted data.


Neural nets, encryption, and more

Today’s meeting was jam packed with everything from neural networks to work in progress on an automated way of doing homework.

Neural Networks: Samson introduced us to neural networks, and how to learn weights of a neural network by backpropogating the errors and using gradient descent. [Separate post pending.]

Homomorphic Encryption: Diran talked about homomorphic encryption, an encryption schema where one can perform operations (such as addition) on the encrypted messages and get meaningful results. With an encryption schema that preserves addition and multiplications, we can do things like linear regression on encrypted data. [Separate post pending.]

AOL Search Dataset: Jee explained the AOL search dataset, a dataset consisting of an anonymized list of users and the terms they searched for on AOL over the span of three months. Previous projects that use this dataset include the query “topic” connection visualization  (type something on the gray area at the top of the page) and a visualization of question queries.

Jimmy also presented on a work in progress: an automatic homework solution generator that outputs latex code for solving a given formula.

Next meeting will be next Tuesday Nov 8th at 3pm in the Stats Club Office M3 3109.