Today we checked out the wonders of NoSQL, specifically with DynamoDB. As we get closer to finalizing our plan on what technology to use for the project, we wanted to investigate the database the client likes to use for their setup right now. That's DynamoDB, a NoSQL database-as-a-service that's apparently quick to setup, easy to use, and infinitely scaleable. It's scalability comes from the fact that Amazon abstracts out all the details of maintaining a robust database to store tons of data. We would simply throw data into it.

The problem is... throwing data in and pulling data out seems to be all DynamoDB is capable of. It excels at a few things. Its speed and scalability are amazing. But due to its nature, you cannot query a table based on attributes alone without iterating over every row in the table and checking it. You need the primary key to do a query. This is useless for data analysis. We need something where we can ask it "what are the articles published between such and such date that fit into such and such category" etc. We need something more capable than DynamoDB for the kind of work we're going to be doing. Luckily, Amazon does offer another automatically-managed database service... RDS. We need to worry about instances, how powerful they should be, and it's billed by the hour, not by the millisecond, but it's able to do what we need. It allows databases such as PostgreSQL to run on it, which we strongly believe at this point we will need.

The rest of the week will be used to prepare our plan to submit to the client on Thursday and hopefully we'll get to start building at that point! I've been anxious to start building. So far, it feels like we haven't done anything measurable or useful. But I suppose when it comes to programming, it's best to measure thrice and cut once.

Note: This was originally posted on the blog I used for my co-op term while at Seneca College ( before being imported here.