Wednesday, May 15, 2013

A Different Kind Of Lego


     I'm going to talk a bit about Object Oriented Programming, and why it's a great tool for you to use to solve a problem.

     It is far from the only tool, and there are plenty of circumstances where it isn't appropriate. You'll notice this is a theme in programming. Sometimes, a decent OOP hierarchy is just Too. Damn. Slow. The "Introduction to Data Structures and Algorithms" class at my university is an example of this -- I designed ALL of my projects with SOME semblance of "objects" (I can't call it OOP and look myself in the mirror) -- and the speed of my program took a noticable hit. That class was speed-based, so my grade took a hit as well.

     I'm a very big fan of the following mantra:

     "Make it work, make it pretty, make it fast."

     The last two, as mentioned above, are fairly interchangable, based on the needs of the program. And your program is useless without the first. Functionality above all, but there's no need making things harder on yourself when you do the eventual code restructuring for (pretty | speed), which is where the subject of this post comes in.

     Let's talk about abstraction. CS (and CE, lets be honest -- I'll cover it in a bit) is all about abstraction. People throw around the "pillars of OOP", and a common "weed out the ones who didn't pay attention in CS 102" interview question is "Explain (inheiritance | polymorphism | encapsulation)" (Source: I was asked it, and the interviewer was surprised that I had an immediate answer. I was surprised other canidates did not).

     I would argue that an understanding of what abstraction /is/, and why we use it, is fundamental for a good programmer.

     See, abstraction lets you modularize your code. Think of it as (*gasp*) turning your code into a Lego model. Each component of your code is an individual Brick. If you're anything like me, you've had those Bricks for 10, 15 years, and your little brother chewed on some of them, making them not click in as well; making the model not be built easily or successfully. Some are even (ick) MegaBloks. Every once in a while you find a K'nex.  
OOP.
     Such is your code when designed in a decent OOP manner. Each object is a Brick. They click together REALLY nicely -- in fact the only reason they exist at all is to click together with each other. Private/public properties exist to force things to click only in certain manners -- if you're using superglue instead of the studs of a Brick, you are Doing It Wrong.

     Every once in a while you get a chewed-upon Brick, or a MegaBlok, or a K'Nex. OOP allows you to say "this isn't the end of the world, I'll just get a new Brick," and be on your merry way.

     Or you could have no modularity, no abstraction whatsoever to your code. You have a superglued conglomerate mess, and if you have a bug (hint, you do. You might not know it yet, but you do.) then it will be HELLA difficult to detect (see: Knowing You're a Tool), and even worse, HELLA difficult to fix. See, you have to bash open your scale-model of the Statue of Liberty with a hammer, pull out the broken section, patch together a fix with ducktape and prayer (we call this a "hack" and they're frowned upon. Hacking the Matrix is cool. Hacking your codebase is not.), throw it in there, and superglue it all up together again.

Not OOP.
     Everything in CS/CE has abstraction. Consider: transistors->logic gates->combinational/sequential circuits->basic mathematical circuits (timers, combinators, ALU's, etc)->....->processor. That's CE for you, and the idea is you can make something VERY complicated VERY elegantly out of VERY tiny components.

     CS is no different, and that elegance, alongside knowing how to apply it, is more key than I feel is being taught. Industry thrives on it. Especially in projects that are being simultaneously developed by non-trivially large teams. I talked about Revision Control before. Well, consider: what do you think is easier, 10 programmers all working on the same 4 files that collectively do something very large and complicated, or each programmer "owning" their own task, and related sets of files. You trade off a small number of files, but who cares? If you throw a 1.5 KLOC file at me, my eyes will glaze over anyway. Which do you think Git plays nicer with (if you know how to merge nicely, this is admittedly a non-issue)?

    Each programmer owns* their own Brick. They can click into their team, and have it pulled out and replaced if need be, when it's discovered you are trying to use a MegaBlok when the team really needs a Lego.

    It works, and it's pretty. It might not be the fastest, but that's okay. I'll willingly make that trade, and so will every part of industry I've experienced to date.

    That's not even getting into the discussions of the benefits of "code contracts" and RAII (Though I intend to). This is purely based on "look how easy it is to refactor your code and find/fix bugs".

    And here's a little secret: even the code I wrote that was supposed to be blazing fast (the NDIS driver) had abstraction, had some semblance of object-orientedness.

    So for the love of all that is holy, please stop saying "I have the following, and it works for 7 out of 12 test cases, but I can't find the bug..." and show me a 200 line function that has seven or eight STL containers all being indexed into simultaneously. I will want to strangle you.

-----------

*The concept of ownership is under some debate from senior developers. If you "own" some code, and that code is wrong, should you really own it? Or should they take it away from you?

For Friday: (Introduction to) Introduction to Object Oriented Programming Week, Part 2

Saturday, May 11, 2013

Git 101


A couple sections of the internet that I've been on have been like "What's Git, and why should I care?" This is a repost of my answer to that.

It doesn't really follow my preferred format for my blog, as it is a tutorial instead of a dialogue, but that's alright. My overall goal is the sharing of information, so share I shall.



Git is revision control. That means if you mess up, rather than having to remember what your old code looked like and did, you can just revert back to a previous state.

Here's something fairly important to note:

This is NOT Revision Control. This is a disaster. I know this because I did it this way, once upon a time.


Git allows for remote hosting on sites like Bitbucket, Github, and even Google Code. That way, you can pull your code down from a remote location on computers, edit it, and push it back up. It's essentially a cloud of data storage, like Dropbox, but smarter for code.

This IS Revision Control. It is noticeably easier to navigate, and in general, use.


~~THE BASICS~~
If in Windows: Download Git Bash, because I like me a command prompt, and it teaches you what's going on a bit better.
If in Linux or OS X: There are handy-dandy installers for integrating Git commands, such as "sudo apt-get install git" if I recall correctly.
Cool, we now have a Command Prompt for Git. It uses Unix-based commands, like "ls" and "cd" and "mkdir" and the like, so you'll notice the Windows Git Bash shell is really a cygwin shell. Not a coincidence.
~~LINK TO REMOTE REPOSITORIES~~
So now, let's assume you have some code that you want to start a revision control system on.
We'll call it hello_world.cpp, in a folder called SENIOR_DESIGN_PROJECT.
On your remote hosting site (Bitbucket has unlimited free repositories, or folders-of-code, and Github has something similar for students IIRC), you can click the big MAKE NEW REPOSITORY button, and it will create a new empty remote repository.
THEN.
You can run a command (in your respective Terminal, be it Terminal in Linux/OSX, or Git Bash in Windows) on your code folder, as prompted by the $ symbol:
 $ git init
to initialize that folder to be compatible with Git. You'll have to do this once per project.
Next up, link that folder to your remote repository by telling it where EXACTLY to push things back up to the cloud (as discussed earlier).
 $ git remote add origin http://some.github.repository/SENIOR_DESIGN_PROJ.git
What this says is "this local codebase is linked to a remote codebase, which we'll call 'origin', found on this git repository at this address."
Follow it up with:
$ git add .
$ git commit -m "Initial Commit."
$ git push -u origin master
to say "push anything, and all future things, to the Master branch". For simplicity's sake, we'll only care about the Master branch.
~~WHAT DID I JUST DO~~
Let's run through the commands you just used one by one real quick.
 $ git add .
Adds all NEW/CHANGED files (i.e. hello_world.cpp in this case) to a pending STAGING area.
 $ git commit -m "message" 
Commits that staging area, locking it down as some kind of feature or bugfix or whatever MESSAGE is. This is known as a "commit".
 $ git push
Actually pushes this code up to your remote repo. Notice I purposely left out all the extra flags -- you only really need those for your FIRST ever push for this project!
You can see what files have been last changed between commits with
 $ git status
And pull down changes from other users/computers with
 $ git pull
Now let's consider a different computer.
On a different computer (we'll call it C2), if you run:
 $ git clone http://some.github.repository/SENIOR_DESIGN_PROJ.git
You'll be prompted for your password, then your computer will go off and grab the most recent code and store it in a folder called SENIOR_DESIGN_PROJ.
You make some changes to the source code. Fix a bug, whatever.
SO:
 $ git add .
 $ git commit -m "I fixed that stupid memory leak."
 $ git push
And back on C1:
 $ git pull
Congrats, you just pulled back down the bugfix that you wrote on a entirely different computer! This is how teams are able to work on a codebase simultaneously -- they all commit and merge it all into branches. In our case, we, again, only care about the Master branch.
~~I DONE GOOFED~~
Or, you make some changes, decide these were bad changes, and you want to roll back:
 $ git checkout <filename>
resets a single file by removing it from the Staging area.
 $ git reset HEAD --hard
resets your entire codebase to the previous commit.
 $ git checkout commit-number
resets your entire codebase to an arbitrary commit-number (IIRC). This is VERY handy if you mess up your code base, but an earlier version worked.

TL;DR
Set up git repo on a project folder:
 $ git init
Set up a link to a remote repo:
 $ git remote add origin http://blahblahblah
Do an initial commit and push:
 $ git add .
 $ git commit -m "Initial commit"
 $ git push -u origin master
Make some changes, see what they were:
 $ git status
Pull to make sure you didn't have conflicting changes stashed in the remote repository (i.e. a team member fixed it while you were working on it, and you don't want to mess up their fix, you should just reset your own and use theirs)
 $ git pull
 $ git checkout hello_world_teammate_revised.cpp
Make some better changes, and push them back up:
 $ git add .
 $ git commit -m "I fixed teammates bug!"
 $ git push
That's basically the basics! With this you can store all your code remotely, navigate the remote location to find commit numbers, roll back code as need be (either single files, or ENTIRE groups of files to previously working commits), and work on a project with a team!

Next Week: (Introduction to) Introduction to Object Oriented Programming Week, Part 1: A Different Kind Of Lego

Friday, May 10, 2013

Beginnings

    I am an interesting enigma in the world of CS. That's Computer Science, not Counter-Strike, though I'm game for a round if you are (Source, not 1.6. Sorry, old-school gamers...and new-school, I guess, with GO being out now. Way to alienate right away, self.)

    I have been programming "professionally" for four years now, without actually having a true, contracted, full-time position. Only in the summers (that's not entirely true, my first programming job I elected to do concurrently with school). Many of my peers only have one internship under their belt, then they go into a full-time job. Maybe even two. I've done C-based NDIS driver-development, web development on the ASP.NET MVC platform, and most recently going on two summers worth of C# MVP development. That's a lot of internship, hence the title of this blog. It's also a lot of school, as I am going for two degrees -- both Computer Science, AND Computer Engineering. Hence the driver-dev, and being an enigma.

    There are no shortages of blogs out there that will tell you why your thought patterns are wrong. I'm actually quite a fan of them, which I'll get into why in just a moment (seems odd, right?). Even though I'm an old-as-dirt intern, I feel I can learn a lot (which is appropriate. I may be as old as dirt, but I am still just an intern). A lot of blogs I've been reading are from senior developers (and when I say senior, I mean senior, as in worked for Microsoft for so long that they left to start their own company) and they tend to know what they're doing. They're really good to learn from. I feel I can be good to learn from too -- I am not so presumptuous to say "your methodology is flawed" (or at least, I hope I'm not), but I do feel I can bring something to the table: a unique look from someone who has experienced both quite a bit of college, and quite a bit of industry, and can help bridge the gap between them.

    I will make an effort to update this fairly often -- it will serve as a good exercise of commitment for me, and hopefully a pleasant distraction from elements of my life, as hobbies are intended to do. Perhaps two or three times a week, that seems like a reasonable goal.

    As for now: hello, Internet. Let's see how this goes.

Next: Git101