In my previous post about my status on my SPO600 final project, I decided upon the Redis project. I had done a bit of exploring of its source code and I thought I found a hash function that was used extensively. Recently, I've explored its source code more and learned more about how it works internally. I discovered that the hash function I found was actually a dependency used in limited circumstances. It was taken from the C client code (and I'm looking at the server application's code here). I can only imagine that it would be used in limited situations. It certainly wouldn't be used the way I thought it was - for generating a key for a hash table data structure into which all data inserted into Redis was stored. That, it turns out, is the hash function in this source code file.

This hashing function is used by the hash table data structure of Redis. It isn't an ordinary hash function either. It's SipHash, and it has history. I decided to look into exactly what "SipHash" was. I would need to understand it well in order to understand the code I'm to optimize after all. It was created by a research initiative by a pair of computer scientists named Jean-Philippe Aumasson and Daniel J. Bernstein. The goal of that project was to find a hash function that would be cryptographic (use a secret key so that every time the function is ran with a given input, the same value won't be produced again) but faster than most cryptographic hash functions, almost as fast as non-cryptographic hash functions. This would enable it to be used in network infrastructure and hash table data structures in a way that would prevent "hash flooding" attacks.

The implications of choosing this function for my project are more than I previously thought. This hash function became successful, and it was used in not only Redis, but also the data structures of popular programming languages like Python and Ruby. This means that if I'm able to optimize it for the AArch64 architecture and I can get word out about my improvements, I could potentially improve the performance of many computer systems worldwide, since this would also affect deployments of web frameworks like Django (Python) and Ruby on Rails (Ruby).

My next steps will be to continue exploring this hash function's source code and to profile Redis after building it to find out how much impact this hash function's execution has on the program's run time right now. This will be a good preparatory step to help me find out how much I can improve Redis's performance by working on this.


This was originally posted on the blog I used for my SPO600 class while studying at Seneca College.