For my fourth lab in my SPO600 class, I'll be building software (as opposed to just blindly executing a "sudo apt-get install" etc. command). I need to pick any GNU software and then also try building glibc. An important part of this lab is that I shouldn't install the software, just build it. What's the difference?

Background

When I first began using computers, it was with Windows. To be honest, that's actually wrong. My first time using a computer was my family's old MS-DOS machine where my father created shortcuts so that I could type them in and play my video games. But "my" first computer was a Windows machine, and I soon had to learn how to install my own software. I grew up thinking to install software meant double-clicking the .exe file and clicking "next" in the install wizard a few times until it was done. I had no idea what was happening behind the scenes. I also thought there was only one way to install software and that it had to do with Windows having some proprietary way of absorbing the code somehow, where I would never see the installed program.

Fast forward to today and I'm learning how compiled computer programs actually work. I've learned by now about the preprocessor, compiler, and linker. I realize now that software is just a binary file that can be run as long as the CPU architecture supports it. And to install software in Linux means to just place the binaries where they need to be in order to be most convenient for you, the user. My first Linux machine was Ubuntu, so I learned back then how to use the "apt-get" command. Nowadays, I use Arch (amazing distro in my opinion, by the way), so I learned about "pacman", and that these are all just package managers, which are really good at moving around binaries, creating symlinks, and adding directories to your PATH so that you can use the software easily.

Building vs. Installing

So how does this relate to my lab? I finally get to see what happens before the "apt-get". Package managers work with finished binaries, and the build step is how they get to the state they need to be in to be organized into package manager repositories or for users to install themselves. To build software is to convert it from source code to binary files. To install software is to arrange the binary files and create any other needed system changes to make the software easy to use. For my lab, I'll just be building.

Wget

I downloaded the source code archive file and extracted it. I poked around and noticed a file called "INSTALL", so I opened it in a text editor and found instructions on how to build and install the software. It said to run "./configure && make && make install". I felt like I wanted to know more about these commands I was about to run, so I googled around and found this blog post which helped me understand each command a lot. I learned that "configure" is, by convention, a script that inspects the user's system and learns about it, so that the build process knows where to find compilers, dependencies, etc. I determined that after that, the "make" and "make install" steps do the actual build. I ran it and found that it actually completed a bit more than I expected it to:

learning_how_to_build_software_1

I built it, sure. I also installed it! I did exactly what I was instructed not to do. This built wget binary file is now the one my system wants to use by default. I don't think this should be too hard to fix though. The way PATH works, the system will look for the binary to run (or symlink to follow) by inspecting the contents of each directory in your PATH in order. The directory where my original wget lived still has it. I just need to make my system choose to use the original one instead. This should do the trick:

learning_how_to_build_software_2

And it does! With the new wget renamed to wget2, my system can't find it when the "wget" command is invoked (by me or any of my installed programs). Instead, my system finds the original one, located in the "/usr/bin" directory.

Next time, I should only use "./configure && make" to build the software. The "make install" command will actually install it by moving around binaries, changing system settings, etc. This happened with my wget build, and I don't want it to happen with glibc. glibc is a crucial system library that many other programs depend on. If I mess that up, I may need to reformat.

Moving on, let's test my wget. I'll use it to download the source code of glibc for my next part of my lab:

learning_how_to_build_software_3

It worked! :)

glibc

Now let's repeat this process to build and test glibc. After extracting the downloaded file and looking at the contents, it looks very similar to wget's source code. It also has an "INSTALL" file, but its instructions are more in-depth than that of wget. You should use the "--prefix" option when you use the "./configure" command to tell it where to put the built files. This is different from wget. With wget, there was one binary produced, and that binary relied on a few libraries so it would make sense to just put that binary in a specific spot meant for system programs. With glibc, it itself is a system library, so its installation is a bit more invasive. It will actually put binaries where many other programs on the system are instructed to look for them.

By default, glibc binaries will be built into "/usr/local". This is actually the same directory that my wget was installed to. So this seems to be a convention. However, I want to keep the built files constrained to my home directory, so that I can be confident that they will not affect my system in any bad way. I'll choose to build glibc into /home/mwelke/glibc_build".

Before I do this step, I need to make a small change to the source code of glibc in order to prove that it is indeed my build that I test later. In my lab instructions, a suggested change is to introduce a small bug.

The answer to life, the universe, and everything is 42. So, I can improve the efficiency of atoi() in stdlib.h by making it always return 42 instead of wasting its time parsing any other integer from its string argument:

learning_how_to_build_software_4

Now, it's time to build glibc. I create the directory "~/glibc_build" and go there, and then run the configure script from there with the appropriate --prefix:

learning_how_to_build_software_5

Now, it's time to run the "make" command. This produces a lot of output. This is a big library compared to the wget program, so this will probably take a while. It begins to produce files in my desired build directory

learning_how_to_build_software_6

Now I make a small test program to test my change:

learning_how_to_build_software_7

In order to make this program run with my new built glibc instead of the build in one (where it #includes stdlib.h), I use the testrun.sh script that the glibc project gives you. This runs the program and forces it to use your locally built glibc for its dependencies instead of your system glibc:

learning_how_to_build_software_8

It works! I'm sure the millions of programmers around the world who use glibc would appreciate my change. I should send in a pull request.

Interestingly, this was my second attempt at changing glibc and seeing the result in the compiled test program. My first change was to change M_PI from 3.14~ to 300.14~ in "math.h". However, after I ran "make" and used "testrun.sh" to run my test program against this build, it didn't work. I decided to make a change in a source code file instead, so I chose the atoi() function for my second attempt. I'm guessing that there's something different going on with header files compared to source code files when libraries are dynamically linked. I should investigate this further.

Overall, this has been a fun exercise to learn more about what tons of devs around the world do as they build the software I rely on every day. I'm glad I did it!


This was originally posted on the blog I used for my SPO600 class while studying at Seneca College.