Now that my error logging is finished, able to display errors that all three parts of our system (Push API, Get API, and the browser code) encounter and log, I'm moving on to researching how to go about open sourcing the work we've done. Up until now, we've only been coding according to what our industry partner wants and how we like to code. I've seen lots of open source projects on GitHub and I've notice some differences to our code. They're heavy on documentation and they seem to be broken into small modules with simple APIs, designed so that everybody in the world can easily dive in and start using them. I feel like we'll have a lot of work to do in order to get our system we've created ready to be published as open source software.

In my research, I encountered a talk given by Andrey Petrov, the creator of a low level Python HTTP library. He began by explaining the need for the code he created (there didn't exist a thread-safe way to concurrently do HTTP in Python in 2008) and explained that that's why he created the library. And he wanted to share it with the world. Sounds good so far... this is exactly what I imagined open sourcing your work would be like. But then he explained more. He told the story of how he got to the point that he was making a library to do this. He didn't just wake up one morning and decide "I'm going to make something and then share it with the world". His solution didn't come from nowhere. It had a problem.

His problem was that he was working at a company and they needed to interact with AWS S3 (back in 2008) with Python. He needed a tool to do this efficiently. All they had at the time was synchronous and he jokes that it would have taken them three weeks to do a complete download of all their files from S3. So he ended creating a new S3 client and he decided he was going to open source that. Now, in order to do that, he needed to implement that asynchronous HTTP functionality. So he first set out to create that tool and decided he would open source that. So he ended up recursively creating open source libraries, and the one nested the deepest, this asynchronous, multithreaded Python library ended up being the one that struck gold when it comes to developer interest. Today, it has almost 1,000 GitHub stars. It is a dependency for a higher level Python HTTP library called Requests that has over 10,000 GitHub stars, and it's a dependency for pip, Python's package manager. And this crucial piece of open source software only came about to solve a completely irrelevant concern a developer had working at a private company.

I think what I learned from watching him describe the history behind his open source libraries was that open sourcing is not setting your GitHub repository to public and calling it a day. You logically deconstruct your code into modules that can stand on their own, and be useful to a wide variety of use cases. If we think about the work we've done, we should be able to identify "milestones" where we had created something that could stand on its own. The code we had at that milestone may be where we draw that line between modules and encapsulate that completed work into a library. Also, it might be a good idea for us to actively think about this process as we develop, so that we identify these milestones as we code, rather than having to go back and refactor.

One concern I have for us is that we used off the shelf open source tools and our work was more about assembling them into something unique, rather than coding a unique feature, but perhaps it's the unique arrangement of the technologies that makes it unique, and that could still be worth publishing.

When we do begin that process of going back over what we've done, this blog post from a developer at Basecamp who created an open source Android library details some steps to follow to make sure you've created something useful. For example, refactoring, creating a public API (and hiding some into a private API), documenting, etc. So it's good to know that there are resources out there to help us work through this process.

Finally, my research included studying the different open source licenses available. The GPL (GNU General Public License) is often called "copyleft" because of it's anti-copyright attributes. If any modification is made to the code, that contribution must also be released under the GPL licence. And if you even only use the GPL code, perhaps linking to a GPL library at run-time, you still have to make your code available under the GPL licence. It has often been called "viral" for this reason, and many companies shy away from using this licence for this restriction. Projects like the Linux kernel and many Linux applications use this licence. It's synonymous with the expression "free as in freedom", in that once something is released under the GPL, it's free forever even as it evolves.

Other licences that respect commercial usage more, such as the MIT License which allows commercial use (given author attribution) and even allows creators to maintain patents on what they've created. The Apache Licence also allows commercial use, but forces creators with patents to grant usage rights for that patent to those who use their software. These licences are more common for companies that use open source software (they would seek out open source software with these licences instead of GPL open source software) and if they do release their software, they might use this licence so that they can share their software with other businesses. There's a modern movement to have businesses both use more open source software and contribute back more to the open source community. And these "permissive" licences are helpful with this.

We will have to carefully consider which licence we use when we open source our system, and we will have to carefully examine the software we used and see what licences they used with it. This may restrict what licences we're even allowed to use when we do release it.

Note: This was originally posted on the blog I used for my co-op term while at Seneca College ( before being imported here.