Today we went live on more areas of the site. We're going to slowly ramp up and make sure our MongoDB database can handle receiving all the data. We're at a point that we're receiving about 50-100 hits per hour (where a hit is a visitor visiting one web page), and it's coping fine. We expect it to be able to cope with much, much more, thanks to the design decisions we made (like using WebSockets instead of AJAX to send the user behavior to our server).

I made a performance monitoring script that runs every 15 minutes on my work computer, running a simple test query that scans every hit in the hit collection. I want to know how many milliseconds it takes to complete that query, and I want to chart that as the number of hits in the collection increases, so that we'll know the performance of our setup as a function of the number of hits it needs to scan. So far, it's taking about 2-4 ms per query, and I suspect most of that is the time it takes to go to AWS and back. I suspect that only a tiny fraction of that is the time it takes to execute the MongoDB query. So we'll have to see how that changes as time goes on. Right now the charts I can make in LibreOffice from the data I have look pretty, but they're pretty meaningless:


I want to know how well this thing holds up when there's millions of hits in the collection. And I want to make more thorough tests that see how it performs when you need to do complicated queries on it like we need to do, that involve storing things in memory on the web and then going back to the MongoDB server for another query before it's done. We know from experience that those take far longer than 4 ms, and I think tools like this script I created will help us understand how the performance of our database decreases as it scales.

I also began working on new information we need to gather from the user behavior during their visit. We need to know whether or not they click on the social media sharing buttons to get a better idea of how interested they are in the article they're reading. Right now we track things like clicking on links, but we haven't gone deeper to the point of understanding the links and buttons they're clicking on. I started to analyze the markup produced by their CMS system to see if those buttons had useful CSS classes (easy for us to target) and they did! So it shouldn't be too hard to add that functionality to what we're already collecting.

Note: This was originally posted on the blog I used for my co-op term while at Seneca College ( before being imported here.