Wednesday, July 3, 2013

"So...funny story..."

     First off, it has been a while, and I should apologize for that. Writing in a format that doesn't offend is tricky, and conveying anything with importance is even moreso. On top of that, my personal life has gotten in the way, which actually leads to this post: a hopefully enjoyable deviation from attempting to teach things about programming.

     Today's story comes to you courtesy of my first-ever programming job, from now four years ago. I was a sophomore in college, and had experience at a basic level with pointers and creating linked-lists in C++. I was given a unique opportunity to program a software-defined radio (from here on known as a "SDR", basically,  a very expensive WiFi card). Doing so required implementing a Windows NDIS driver, which as you can probably guess meant I was a little over my head, and there was a bit of a learning curve associated with it.

     SDR's serve to help reduce the cost of R&D for a new protocol. Developers have a choice: build unique hardware from the ground-up, plug in some code, and hope that everything works easy-peasy, or use one of these things, which have high initial cost, but are purpose-built for allowing any stack of code to run. They handle their own ADC/DAC, and just need a "front-end" (and antenna) to transmit on a given spectrum. For our purposes, I was transmitting a modified 802.11b protocol, so I had to use the 2.4GHz spectrum.

     Again: I was a noobie. Readers might be quick to point out "wait, you're still pretty young". Yes, but I have experience now, and have learned from my mistakes. Such as this one.

     Seeing as the aforementioned 802.11b protocol was one of our team's own creation (focusing on cooperative networking, in case anyone is curious), this meant a lot of on-the-fly decisions about implementation. Design wasn't really a factor -- I could make linked lists, I tried to separate my code blocks as best I could, but my functions were embarrassingly long. That's neither here nor there, but probably would have helped with debugging (see: previous post).

     Onto the subject of mistakes and bugs: when you are running an executable within a command prompt, and it segfaults, it exits "gracefully" -- alerting the user with a printed message to the screen, and killing the process, restoring control of that prompt to the user.

     Seeing as this driver was being developed, learned about, and decided upon simultaneously, there were numerous times where I would have a null dereference. For those unaware, that's where you dereference a pointer, attempting to access a member of whatever object that pointer was pointing to, but the pointer was NULL to begin with, therefore accessing memory you shouldn't, and the computer yells at you -- AKA a "segfault".

     What do you think a computer does when a null dereference/segfault occurs in kernel-mode code, stuff that is supposed to be super-efficient and super-careful and super-accurate?

Bad Things happen.
     In the TRANSMIT stage of our protocol, this was an inconvenience. That means one computer is down, while I reboot, recover as much of the logs as possible (only towards the end of the job did I get adept at using WinDBG + kernel-mode debugging in XP), and try to figure out which printf() call was showing something that it shouldn't.

     In the RECEIVE stage, it's an entirely different beast. My (malformed) packet has just taken down every SDR-equipped computer in the lab (aside from the source node). One null dereference, one segfault, and my lab is painted blue.

Honestly, it looked cool as hell.

Segfaults are bad, mkay?

1) Oh, and if you have a really, really bad day, the blue-screen might be a hint that you've just corrupted your Windows install. I had to teach myself how to use Clonezilla to have a backup partition to easily restore from, very shortly after such experiences.

2) When we got these SDRs, we had to have three parts: the SDR, a chip to support the front-end, and the front-end itself. These cost about $4000, total, per SDR setup. As another horror story, the instant we took two of them out of the box they came in, we heard pops and could smell a bit of smoke. A fuse blew, instantaneously. Fortunately, between a $0.10 fuse from Digikey, and someone who was way more adept at PCB reading, voltage-drop measurements, and soldering than I, we salvaged those things.

Wednesday, May 15, 2013

A Different Kind Of Lego

     I'm going to talk a bit about Object Oriented Programming, and why it's a great tool for you to use to solve a problem.

     It is far from the only tool, and there are plenty of circumstances where it isn't appropriate. You'll notice this is a theme in programming. Sometimes, a decent OOP hierarchy is just Too. Damn. Slow. The "Introduction to Data Structures and Algorithms" class at my university is an example of this -- I designed ALL of my projects with SOME semblance of "objects" (I can't call it OOP and look myself in the mirror) -- and the speed of my program took a noticable hit. That class was speed-based, so my grade took a hit as well.

     I'm a very big fan of the following mantra:

     "Make it work, make it pretty, make it fast."

     The last two, as mentioned above, are fairly interchangable, based on the needs of the program. And your program is useless without the first. Functionality above all, but there's no need making things harder on yourself when you do the eventual code restructuring for (pretty | speed), which is where the subject of this post comes in.

     Let's talk about abstraction. CS (and CE, lets be honest -- I'll cover it in a bit) is all about abstraction. People throw around the "pillars of OOP", and a common "weed out the ones who didn't pay attention in CS 102" interview question is "Explain (inheiritance | polymorphism | encapsulation)" (Source: I was asked it, and the interviewer was surprised that I had an immediate answer. I was surprised other canidates did not).

     I would argue that an understanding of what abstraction /is/, and why we use it, is fundamental for a good programmer.

     See, abstraction lets you modularize your code. Think of it as (*gasp*) turning your code into a Lego model. Each component of your code is an individual Brick. If you're anything like me, you've had those Bricks for 10, 15 years, and your little brother chewed on some of them, making them not click in as well; making the model not be built easily or successfully. Some are even (ick) MegaBloks. Every once in a while you find a K'nex.  
     Such is your code when designed in a decent OOP manner. Each object is a Brick. They click together REALLY nicely -- in fact the only reason they exist at all is to click together with each other. Private/public properties exist to force things to click only in certain manners -- if you're using superglue instead of the studs of a Brick, you are Doing It Wrong.

     Every once in a while you get a chewed-upon Brick, or a MegaBlok, or a K'Nex. OOP allows you to say "this isn't the end of the world, I'll just get a new Brick," and be on your merry way.

     Or you could have no modularity, no abstraction whatsoever to your code. You have a superglued conglomerate mess, and if you have a bug (hint, you do. You might not know it yet, but you do.) then it will be HELLA difficult to detect (see: Knowing You're a Tool), and even worse, HELLA difficult to fix. See, you have to bash open your scale-model of the Statue of Liberty with a hammer, pull out the broken section, patch together a fix with ducktape and prayer (we call this a "hack" and they're frowned upon. Hacking the Matrix is cool. Hacking your codebase is not.), throw it in there, and superglue it all up together again.

Not OOP.
     Everything in CS/CE has abstraction. Consider: transistors->logic gates->combinational/sequential circuits->basic mathematical circuits (timers, combinators, ALU's, etc)->....->processor. That's CE for you, and the idea is you can make something VERY complicated VERY elegantly out of VERY tiny components.

     CS is no different, and that elegance, alongside knowing how to apply it, is more key than I feel is being taught. Industry thrives on it. Especially in projects that are being simultaneously developed by non-trivially large teams. I talked about Revision Control before. Well, consider: what do you think is easier, 10 programmers all working on the same 4 files that collectively do something very large and complicated, or each programmer "owning" their own task, and related sets of files. You trade off a small number of files, but who cares? If you throw a 1.5 KLOC file at me, my eyes will glaze over anyway. Which do you think Git plays nicer with (if you know how to merge nicely, this is admittedly a non-issue)?

    Each programmer owns* their own Brick. They can click into their team, and have it pulled out and replaced if need be, when it's discovered you are trying to use a MegaBlok when the team really needs a Lego.

    It works, and it's pretty. It might not be the fastest, but that's okay. I'll willingly make that trade, and so will every part of industry I've experienced to date.

    That's not even getting into the discussions of the benefits of "code contracts" and RAII (Though I intend to). This is purely based on "look how easy it is to refactor your code and find/fix bugs".

    And here's a little secret: even the code I wrote that was supposed to be blazing fast (the NDIS driver) had abstraction, had some semblance of object-orientedness.

    So for the love of all that is holy, please stop saying "I have the following, and it works for 7 out of 12 test cases, but I can't find the bug..." and show me a 200 line function that has seven or eight STL containers all being indexed into simultaneously. I will want to strangle you.


*The concept of ownership is under some debate from senior developers. If you "own" some code, and that code is wrong, should you really own it? Or should they take it away from you?

For Friday: (Introduction to) Introduction to Object Oriented Programming Week, Part 2

Saturday, May 11, 2013

Git 101

A couple sections of the internet that I've been on have been like "What's Git, and why should I care?" This is a repost of my answer to that.

It doesn't really follow my preferred format for my blog, as it is a tutorial instead of a dialogue, but that's alright. My overall goal is the sharing of information, so share I shall.

Git is revision control. That means if you mess up, rather than having to remember what your old code looked like and did, you can just revert back to a previous state.

Here's something fairly important to note:

This is NOT Revision Control. This is a disaster. I know this because I did it this way, once upon a time.

Git allows for remote hosting on sites like Bitbucket, Github, and even Google Code. That way, you can pull your code down from a remote location on computers, edit it, and push it back up. It's essentially a cloud of data storage, like Dropbox, but smarter for code.

This IS Revision Control. It is noticeably easier to navigate, and in general, use.

If in Windows: Download Git Bash, because I like me a command prompt, and it teaches you what's going on a bit better.
If in Linux or OS X: There are handy-dandy installers for integrating Git commands, such as "sudo apt-get install git" if I recall correctly.
Cool, we now have a Command Prompt for Git. It uses Unix-based commands, like "ls" and "cd" and "mkdir" and the like, so you'll notice the Windows Git Bash shell is really a cygwin shell. Not a coincidence.
So now, let's assume you have some code that you want to start a revision control system on.
We'll call it hello_world.cpp, in a folder called SENIOR_DESIGN_PROJECT.
On your remote hosting site (Bitbucket has unlimited free repositories, or folders-of-code, and Github has something similar for students IIRC), you can click the big MAKE NEW REPOSITORY button, and it will create a new empty remote repository.
You can run a command (in your respective Terminal, be it Terminal in Linux/OSX, or Git Bash in Windows) on your code folder, as prompted by the $ symbol:
 $ git init
to initialize that folder to be compatible with Git. You'll have to do this once per project.
Next up, link that folder to your remote repository by telling it where EXACTLY to push things back up to the cloud (as discussed earlier).
 $ git remote add origin http://some.github.repository/SENIOR_DESIGN_PROJ.git
What this says is "this local codebase is linked to a remote codebase, which we'll call 'origin', found on this git repository at this address."
Follow it up with:
$ git add .
$ git commit -m "Initial Commit."
$ git push -u origin master
to say "push anything, and all future things, to the Master branch". For simplicity's sake, we'll only care about the Master branch.
Let's run through the commands you just used one by one real quick.
 $ git add .
Adds all NEW/CHANGED files (i.e. hello_world.cpp in this case) to a pending STAGING area.
 $ git commit -m "message" 
Commits that staging area, locking it down as some kind of feature or bugfix or whatever MESSAGE is. This is known as a "commit".
 $ git push
Actually pushes this code up to your remote repo. Notice I purposely left out all the extra flags -- you only really need those for your FIRST ever push for this project!
You can see what files have been last changed between commits with
 $ git status
And pull down changes from other users/computers with
 $ git pull
Now let's consider a different computer.
On a different computer (we'll call it C2), if you run:
 $ git clone http://some.github.repository/SENIOR_DESIGN_PROJ.git
You'll be prompted for your password, then your computer will go off and grab the most recent code and store it in a folder called SENIOR_DESIGN_PROJ.
You make some changes to the source code. Fix a bug, whatever.
 $ git add .
 $ git commit -m "I fixed that stupid memory leak."
 $ git push
And back on C1:
 $ git pull
Congrats, you just pulled back down the bugfix that you wrote on a entirely different computer! This is how teams are able to work on a codebase simultaneously -- they all commit and merge it all into branches. In our case, we, again, only care about the Master branch.
Or, you make some changes, decide these were bad changes, and you want to roll back:
 $ git checkout <filename>
resets a single file by removing it from the Staging area.
 $ git reset HEAD --hard
resets your entire codebase to the previous commit.
 $ git checkout commit-number
resets your entire codebase to an arbitrary commit-number (IIRC). This is VERY handy if you mess up your code base, but an earlier version worked.

Set up git repo on a project folder:
 $ git init
Set up a link to a remote repo:
 $ git remote add origin http://blahblahblah
Do an initial commit and push:
 $ git add .
 $ git commit -m "Initial commit"
 $ git push -u origin master
Make some changes, see what they were:
 $ git status
Pull to make sure you didn't have conflicting changes stashed in the remote repository (i.e. a team member fixed it while you were working on it, and you don't want to mess up their fix, you should just reset your own and use theirs)
 $ git pull
 $ git checkout hello_world_teammate_revised.cpp
Make some better changes, and push them back up:
 $ git add .
 $ git commit -m "I fixed teammates bug!"
 $ git push
That's basically the basics! With this you can store all your code remotely, navigate the remote location to find commit numbers, roll back code as need be (either single files, or ENTIRE groups of files to previously working commits), and work on a project with a team!

Next Week: (Introduction to) Introduction to Object Oriented Programming Week, Part 1: A Different Kind Of Lego

Friday, May 10, 2013


    I am an interesting enigma in the world of CS. That's Computer Science, not Counter-Strike, though I'm game for a round if you are (Source, not 1.6. Sorry, old-school gamers...and new-school, I guess, with GO being out now. Way to alienate right away, self.)

    I have been programming "professionally" for four years now, without actually having a true, contracted, full-time position. Only in the summers (that's not entirely true, my first programming job I elected to do concurrently with school). Many of my peers only have one internship under their belt, then they go into a full-time job. Maybe even two. I've done C-based NDIS driver-development, web development on the ASP.NET MVC platform, and most recently going on two summers worth of C# MVP development. That's a lot of internship, hence the title of this blog. It's also a lot of school, as I am going for two degrees -- both Computer Science, AND Computer Engineering. Hence the driver-dev, and being an enigma.

    There are no shortages of blogs out there that will tell you why your thought patterns are wrong. I'm actually quite a fan of them, which I'll get into why in just a moment (seems odd, right?). Even though I'm an old-as-dirt intern, I feel I can learn a lot (which is appropriate. I may be as old as dirt, but I am still just an intern). A lot of blogs I've been reading are from senior developers (and when I say senior, I mean senior, as in worked for Microsoft for so long that they left to start their own company) and they tend to know what they're doing. They're really good to learn from. I feel I can be good to learn from too -- I am not so presumptuous to say "your methodology is flawed" (or at least, I hope I'm not), but I do feel I can bring something to the table: a unique look from someone who has experienced both quite a bit of college, and quite a bit of industry, and can help bridge the gap between them.

    I will make an effort to update this fairly often -- it will serve as a good exercise of commitment for me, and hopefully a pleasant distraction from elements of my life, as hobbies are intended to do. Perhaps two or three times a week, that seems like a reasonable goal.

    As for now: hello, Internet. Let's see how this goes.

Next: Git101