As I mentioned in my previous post, I set out to choose a better hash function to use in my project. My main concern was that the function I chose would not be called very often, did not have much actual industry usage (since it would only be called in the C client of the software, not the server itself), and it was just too simple for me to be able to do much. My new chosen function, SipHash, has more steps and, unless the entire community has already optimized it as much as possible, I should have more work that I can do with it.

Over the past few days I have been getting acquainted with Redis' build process and the source code file with the hash function. Luckily, my experience in the SPO600 class has helped prepare me for this. I'm familiar with the process of going into an application's source code directory and using the "make" and "make install" commands to build and install the software. In this case, I'm not interested in installing it, just building it and running my own tests with the produced binaries. Redis' GitHub instructions are very user friendly and confirm that all I have to do is go into the "src" directory and run "make". The instructions then tell me that the "redis-server" binary that is produced (in the same "src" directory) is the binary for the built server.

Therefore, I can conclude that the way I will do the work for this project will be to use those steps to build it after adding code that will log when the hash function is hit and how much time has been spent during its execution. I can use that as a benchmark, and repeat those steps for the optimized code later. I'll keep two copies of the repo, so that I can repeatedly run tests. I'll create Bash scripts I can run to help save me time, and I'll also create a client of some sort to insert records into the running Redis processes to trigger that hash function. I want this project to be a real world representation of software optimization, so instead of extracting the hash function and just benchmarking it, I want to benchmark it in the context of a running Redis process. This will give me a chance to add more timing code later on to time the entire Redis record insert process, so that I can see the real world impact of my optimizations.

I've already begun work according to the above plan. I created the Bash scripts to build...

#!/bin/bash

# This script builds the default build of Redis.
# Matthew Welke 2018

cd redis_default

make

cd ..

and run...

#!/bin/bash

# This script starts the default build of Redis.
# Matthew Welke 2018

./redis_default/src/redis-server --port 6380 --loglevel verbose

...the default (unoptimized) Redis process.

I've also created a set of Ruby scripts that use the Ruby client for Redis to insert records. I chose Ruby because it's a high level language that I find easy and quick to program with. I've already used it for Ruby on Rails development, and I think Ruby itself will be useful for any tasks I need to do for this project.

This script contains the code common to the "insert" and "read" scripts:

# Matthew Welke 2018

# Helper classes and functions for test data
############################################

require 'active_support' # needed for activesupport hooks
require 'active_support/core_ext'

class TestModel
    attr_accessor :id, :name, :date_of_birth

    def initialize(id, name, date_of_birth)
        @id = id
        @name = name
        @date_of_birth = date_of_birth
    end

    def greeting
        "My name is #{@name} and I was born on #{@date_of_birth.to_s}."
    end

    def self.from_hash(h)
        TestModel.new(
            h['id'].to_i,
            h['name'],
            Time.parse(h['date_of_birth'])
        )
    end
end

class IdSequence
    def initialize
        @last_used = 0
    end

    def get
        @last_used += 1

        @last_used
    end
end

NAMES = [
    'Adam',
    'Bob',
    'Charlie',
    'David'
]

DOB_DAY_RANGE = (1..365).to_a

ID_SEQ = IdSequence.new

def random_test_model
    TestModel.new(ID_SEQ.get, NAMES.sample, Time.now + DOB_DAY_RANGE.sample.days)
end

And these are the "insert"...

# This script inserts test data into my Redis stores.
# Matthew Welke 2018

require 'json'
require 'redis'

require_relative './common.rb'

redis = Redis.new(host: 'localhost', port: 6380, db: 15)

inserted = 0

999999.times do
    rtm = random_test_model
    redis.set(1, rtm.to_json) # Always ID 1 for Redis, so memory usage doesn't grow.

    inserted += 1
end

puts "#{inserted} inserted into Redis."

...and "read"...

# This script is used to test reading data back from a Redis store.
# Matthew Welke 2018

require 'json'
require 'redis'

require_relative './common.rb'

redis = Redis.new(host: 'localhost', port: 6380, db: 15)

data = TestModel.from_hash(JSON.parse(redis.get(1)))

puts data.greeting

...scripts.

The insert script is used to insert random test data into Redis and the read script is used to ensure the data that comes back out isn't corrupted.

To do the benchmarking inside the Redis processes, I added the following code to the source code file "src/siphash.c" in the Redis code base, above the function "siphash". Its purpose is to track the number of times the function has been executed and the cumulative execution time of the work portion of the function:

unsigned int _siphashExecCount = 0;
double _siphashExecTime = 0;

I added the following code inside the function before the work portion begins:

_siphashExecCount += 1;

printf("-- ENTERED SIPHASH (%d) --\n", _siphashExecCount);
printf("-- Total siphash exec time: %.3f seconds --\n", _siphashExecTime);

// Begin timing test
clock_t _begin = clock();

And I added the following code at the end of the work portion of the function:

clock_t _end = clock();

double _timeSpent = (double)(_end - _begin) / CLOCKS_PER_SEC;
_siphashExecTime += _timeSpent;

Combined, this results in the output appearing on my screen when I run the Redis process. I use verbose logging mode when I run the process so that I can see my logging code alongside the actual Redis internal workings as it runs. Here's an example of the kind of output this ends up producing:

29846:M 03 Jan 19:35:16.304 - 0 clients connected (0 slaves), 487416 bytes in use
29846:M 03 Jan 19:35:17.680 - Accepted 127.0.0.1:52576
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH (235) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH (236) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH (237) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH (238) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH (239) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH (240) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH NOCASE --
-- ENTERED SIPHASH (241) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH (242) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH (243) --
-- Total siphash exec time: 0.000 seconds --
-- ENTERED SIPHASH (244) --
-- Total siphash exec time: 0.000 seconds --
29846:M 03 Jan 19:35:17.682 - Client closed connection
29846:M 03 Jan 19:35:21.316 - DB 15: 1 keys (0 volatile) in 4 slots HT.
29846:M 03 Jan 19:35:21.317 - 0 clients connected (0 slaves), 487576 bytes in use
29846:M 03 Jan 19:35:26.329 - DB 15: 1 keys (0 volatile) in 4 slots HT.
29846:M 03 Jan 19:35:26.329 - 0 clients connected (0 slaves), 487576 bytes in use
^C29846:signal-handler (1515026128) Received SIGINT scheduling shutdown...
29846:M 03 Jan 19:35:28.736 # User requested shutdown...
29846:M 03 Jan 19:35:28.736 # Redis is now ready to exit, bye bye...

Note that the (235) at the top means that the hash function was executed 235 times before my Ruby client ran. Also note that the "ENTERED SIPHASH NOCASE" is from another function I noticed in that source code file that appears to be related to hashing. I tracked it too, and I plan to further analyze these functions' invocations wiith respect to the entire code base later on so that I get a proper benchmark of the real world impact my optimizations would have on Redis.

Interestingly, the output sample above related to only 3 objects being inserted into Redis. My Ruby client only ran the Redis "PUT" command 3 times. The hash functions appear to be executed many more than 3 times for this.

Reflection

I tested these tools I created and they worked as expected. However, I think I may have taken the wrong approach with the test data. I want the test to be as precise as possible, therefore I should make sure that the exact same data is going into the hash function as its parameter each time. Instead of using random test data, I should pre-calculate a set of random data (perhaps using /dev/urandom etc) and then continually feed that into the Redis processes (both the default and the optimized ones) for every benchmark I do. I think my experience so far as a developer, having been limited to mostly web development, is what got me on the path to thinking my test data needed to be real world and random.

Another issue I recognized when reviewing my work so far is that the method of testing Redis (using Ruby scripts) may not go as smoothly as I expected once I get onto the AArch64 server. It works well on my x86_64 laptop, sure, but different CPU architectures can change the situation. Ruby is infamous for being hard to use if the gems you need have C extensions. I don't think any of the gems my scripts use have C extensions, but I should make sure that the server supports running Ruby and those gems in particular.

Thankfully, our kind overlord has trusted us with sudo access to the school's servers for this class. We have permission to install things, so hopefully I'll be able to install Ruby and/or the gems without issues. If not, I'll have to come up with something else. I would probably try to use another high level language like JavaScript (Node.js) or just end up making compiled programs using the Redis C client. One way or another, I know I'll be able to do this test, I'm just lazy, so I'm looking for the easiest way.

Next Steps

My next steps will be to come up with better test data (that is not random for each test) and set up my testing environment on the AArch64 school server. Then, I will begin benchmarking the default Redis process and tinkering with ways to optimize the Redis code.


This was originally posted on the blog I used for my SPO600 class while studying at Seneca College.