I completed a suite of unit tests for our CSV export feature of the VAT (Visual Analysis Tool). They involve going accross all the collections and doing many possible types of joins. It's impossible to create a unit test for every combination of collections joined and columns included, but I aimed to at least connect every type of collection to every other type of collection (Hit joined with Browser was good enough, no need to do the other four "media" type collections) and do a join that included just the id column from the joined table and a join that included the id and other columns from the table. In total, we have over 50 CSV export unit tests that we can run to make sure the data we get back makes sense if we change the schema.

We completed the feature to keep our copy of the User information in our MongoDB up to date with the copy in their SQL Server. We originally planned on creating an API to handle this. When we received a hit to log in our database, we would call the API which would access their database and retrieve the extra user information. This way, we would always be up to date. However, we realized that because we were already putting the SQL Server user ID on the web page (to be picked up by our JavaScript code that logs things), it would be trivial to just add the rest of the user information to the generated web page too. Then, the JavaScript we already created to mine the data can just look for these additional pieces of information and log them too. This way avoids the need to create another API for this feature. In the end, less is more, right?

We're beginning to think about the UAT (User Affinity Tool) at this point. While we have time before we begin working on it, it would be nice to have our ideas digest over the break before we return for the second consecutive coop term and begin work on it. We need to think about where the information comes from that is used in the formula to recommend things.

Should we use all hits for that user? This is safe, but prevents us from using the hits the person had before they registered as a user. What if they never register? They would be ignored.

Should we include all the hits in a shared session (for example with someone using a shared computer)? This would get us lots of information, but it may be less accurate. It would be less personalized.

We began thinking of the pros and cons of these approaches, and we'll continue to brainstorm before we get to the point that we begin this work.

Note: This was originally posted on the blog I used for my co-op term while at Seneca College (mswelke.wordpress.com) before being imported here.