getting git

(git is wonderful. In the same way that unix is an’operating system done right’, git is ‘source control done right’.

It has a surprising history, being basically the creation of Linus Torvalds, who decided that all the available source control solutions were useless, and so just wrote his own. That is fabulously cool.

This is a small article to introduce you to git.

Learning Curve

There definitely is a learning curve with git. At least that was the case for me.

For starters, I had to throw away some concepts which I thought were set in stone. Git forces you to think about code in a different way. It turns out that that way of thinking has large advantages.

The single biggest mental stumbling block for me was that I always considered the real code, the precious bit, to be the one I’m currently working on, and the code that is in source control system to be a backup. A byproduct, so to speak. You need to completely overturn this with git.

The real stuff, the precious stuff, is what is in your repository. What you are currently working with, your working directory, that’s something like a scrapbook. You’re trying things out, making changes here and there, but it’s all very temporary and it’s not where the real stuff is. It shouldn’t be frightening to overwrite your working directory, for instance.

Repository

Let me track back a while and introduce the three areas in a local git repository (I am not talking about a remote repository, I am talking about a local git repository, on your local hard drive, and blazingly fast).

The repository is where the history of your project is stored. It’s quite similar to a series of snapshots of your whole project. Each of these snapshots contain your whole project (technically, git is not saving the whole project but just saving the delta to your initial commit, but conceptually, think of these snapshots as whole copies of your project).

These snapshots are called commits. They are unique, and can be referred to in a number of different ways, either directly (by using its number, which is a SHA-1 hash) or relatively (three commits before the last one I did).

Each commit points to a parent commit. When you have two different commits pointing to the same parent, these two commits are the starting points of different branches. The two codestreams, from this branching point, starts having a life of their own, although it’s possible to merge different branches together at a later stage.

Decentralized weirdness

This is a good time to introduce another weirdness that took me some time to grok. There is no primacy between branches. It’s a delightfully anarchic system, no bosses – a little like the anarcho-syndicalist communes of ‘The Holy Grail’. There is no master control program, no centralized master system. One can define certain branches to be more ‘important’ than others, but it’s just a naming convention, it’s not something that is built in.

The same applies between your local repository and the remote repository. Git really doesn’t assume that one of them is more important than the other. It’s very disquieting, but liberating once you’ve understood it.

Working Directory

Next, your working directory. This is where your whole project is, where the code is, images, what have you. This is what gets ‘snapshotted’ each time one makes a commit. It could be possible to snapshot your whole project every time, i.e. every single file, but it turns out that that is a bad idea, because you end up saving too much and it gets really difficult to understand what exactly happened between two commits. I must admit to having done that at the beginning, because I was blocked in my ‘backup’ concept.

No, what you want to do is make lots of small commits, with each commit changing as few files as possible, as long as they pertain to one common change.

Staging Area

Enter the staging area. Really well named, this is where you determine which files you want to belong to the next commit. You’re not interested in the files that have changed only because something trivial like a timestamp has changed, and you’re not interested in unrelated changes either. You really want to encapsulate all the changes necessary for a single debugging or a single new feature into a commit.

Overview

The following diagram shows the basic logic: you add files to the staging area with the git add command, then you commit those added files with a git commit command. Bringing back a specific commit to the working directory is done with the git checkout command.

sublime_text_graphviz_C8OsfB

So, you’ve got this conceptually? Working directory, repository, staging area? Right. This is what you should work with.

I got very confused by how this is actually built. The repository and the staging area are stored in a folder called .git which is contained within the working directory.

This made my head wobble. Surely, not inside the directory? Surely, that way lies folly? Infinite recursions? Surely it should be external? But that’s how it’s built. Inside. It’s elegant once you start thinking about it but it made my head hurt.

Start with the command line

I would recommend staring with a command-line interface. I know that there are many GUI Interfaces out there (SourceTree being a particularly good one), but all they are doing is adding an interface on top of the command-line instructions.

I would recommend moving to a GUI once you’ve learnt the basics of git. Additionally, the command line is very helpful for noobs, as it notices when you have typed something that doesn’t make sense and makes a suggestion. Stackoverflow and the git documentation are the place to go if you get stuck.

http://gitref.org/ is a good, compact resource.

You can download the git command-line bash here.

Useful Commands

Here I have listed those commands which I am using regularly and feel comfortable with:

Initializing

[code gutter=”false”]
git init
[/code]

First initialization, creates the magical hidden folder .git. This is where the actual repository is stored (see above). You’ll need to select the directory which you want to ‘track’

Staging

Staging means ‘saying these are the files I wish to commit’

The command for staging is

[code gutter=”false”]
git add <filename>
[/code]

alternatively you can use

[code gutter=”false”]
git add .
[/code]

which will add all the files in the current directory (. is the current directory)

Committing

[code gutter=”false”]
git commit –message=’Committing message’
[/code]

The convention is to write a message with multiple lines, a bit structured like an e-mail. The first line should be like the subject of the e-mail, the other lines are the ‘body’ of the e-mail.

Creating a remote repository:

[code gutter=”false”]
git remote
[/code]

shows the remote repository linked to the current repository. You’ll need to have created this beforehand on the github site.

I use github, so the example here is with github. First log into GitHub and create a repository there. Note the URL of the remote repository (there is a convenient button for this)

[code gutter=”false”]
git remote add origin <your remote repository URL>;
[/code]

Note: origin here is one of these conventions, it’s the default name of the remote repository. Don’t make the mistake of naming one of your remote branches origin. Remember, convention!

[code gutter=”false”]
git remote -v
[/code]

shows the urls linked to your remote repository.

[code gutter=”false”]
git remote show origin
[/code]

This command shows all the branches on the origin repository (the remote one, origin is its name by default)

Fetching

[code gutter=”false”]
git fetch
[/code]

fetches the data from the external repository but does not merge the data.

Pulling

[code gutter=”false”]
git pull
[/code]

fetches, then merges the data.

Pushing

[code gutter=”false”]
git push <remote> <local branch name>:<remote branch to push into>
[/code]

pushes the data from the local repository to the remote one.

Checkout (restore files or change branch)

[code gutter=”false”]
git checkout
[/code]

This switches branches or restores working tree files. It’s a bit of an uncomfortable command at first because it seems to be doing two completely different things. Actually, it’s doing the exact same thing – it’s overwriting your working directory with information from a particular commit. Your working directory is a scrapbook, remember?

I find this useful to avoid committing fluff (my technical term for files which have, technically, changed (a timestamp for instance) but which are not actually changes.) I add the files that I want to have committed with gid add, then I do a git commit, and then a

[code gutter=”false”]
git checkout — .
[/code]

to overwrite the fluff in the working directory.

If you notice you’ve just committed nonsense (in this example I replaced the whole About This Database with nothing), you want to restore the file to what it was before. Here comes git checkout for helping again:

[code gutter=”false”]
git checkout HEAD~1 — odp/Resources/AboutDocument
[/code]

HEAD~1 is a way to refer to a specific commit, i.e. ‘The commit on HEAD’ minus (~) one (1)

Conventions for naming files and directories:

dash dash

If you see — before a filename, it’s a disambiguator saying ‘what follows are necessarily files’

dot

. is the current directory – so, everything.

Syntax for defining an entire directory with everything in it

odp/Resources/ would define that particular directory, along with anything inside it. Note the trailing slash!

Conventions for naming commits:

Naming conventions:

HEAD: Head is a pointer the commit that was pushed into the working directory last. Think of it as ‘commit corresponding to the current state of my working directory before I made any changes’.

<commit name>~1 The tilde here means ‘minus’, so this would be the previous commit.

Configuration options that are useful

In git Bash, I immediately increase the font size to something readable with our big screens, and then

[code gutter=”false”]
git config –global core.autocrlf false
[/code]

which avoids the annoying ‘lf/crlf’ comments, since I’m mostly working in a windows environment (sniff).
Do this before your initial commit! (see https://help.github.com/articles/dealing-with-line-endings/)

Setting up SSH keys with Git Bash

Since I’m lazy and don’t want to type in password and username every time I connect to git, here’s how to do it it cleanly via ssh.

[code gutter=”false”]
ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
[/code]

Press Enter for default in which to save key enter a passphrase (a complicated one, alright?)

Add the keys to the ssh-agent (This is a background agent whose job is to always automatically enter your passphrase in. You should be aware that this makes your local physical machine a lot more vulnerable)

[code gutter=”false”]
eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa
[/code]

copy the PUBLIC key to the clipboard (windows)

[code gutter=”false”]
clip < ~/.ssh/id_rsa.pub
[/code]

or mac OS

[code gutter=”false”]
pbcopy < ~/.ssh/id_rsa.pub
[/code]

add the ssh key to your github account Test the keys with

[code gutter=”false”]
ssh -T git@github.com
[/code]

Ignoring certain files or directories

There are some occasions when you don’t want certain files to be tracked by git. Typically, there are for instance binaries of your source code, or hidden files generated by your ide.

You can see which files are not being tracked with this command:

[code gutter=”false”]
git ls-files –others –exclude-standard
[/code]
or alternatively,

[code gutter=”false”]
git add -A -n
[/code]

There are unfortunately many different ways to define the excluded files. There is a central configuration file, whose location one can find with

[code gutter=”false”]
git config –get core.excludesfile
[/code]

Within your project, there are two files that control the exclusions:

[code gutter=”false”]
.gitignore
.git/info/exclude
[/code]

Confused? Here is an explanation from Junio Hamano (the maintainer of Git)

The .gitignore and .git/info/exclude are the two UIs to invoke the same mechanism. In-tree .gitignore are to be shared among project members (i.e. everybody working on the project should consider the paths that match the ignore pattern in there as cruft). On the other hand, .git/info/exclude is meant for personal ignore patterns (i.e. you, while working on the project, consider them as cruft).

This command combination is practical (thanks Akira Yamammoto)

[code gutter=”false”]
git rm -r –cached .
[/code]

(recursively remove all the cached files (i.e. tracked) from the current directory) then

[code gutter=”false”]
git add .
git commit -m "fixed untracked files"
[/code]

Notes/Domino best practices

Best practice when gitting an odp project: Never disconnect the nsf from the eclipse project.
Never change the nsf the odp is connected to.

Thanks to Adrian Osterwalder for the tips.

2 thoughts on “getting git”

julianbuss

31. December 2015 at 17:50

Hi Andrew,

good article, helps me understanding git a bit better. But what I’m still unsure with: compared to SVN, what are the key advantages of git? If I already use SVN in small teams (two to three developers), what would be better / easier with git?
andrewmagerman

4. January 2016 at 13:35

Although I haven’t used SVN in anger, the key advantage is the decentralized concept: you’re working against a very fast local repository first. There’s also an appreciable speed argument: I find that commits and adds are done so quickly, it’s easy to integrate into a daily routine of coding, and personally I found that I was making commits much more regularly. Another advantage seems to be that git can cope with very complex use-cases, whereas SVN is less granular (and therefore also easier to use, apparently).

You must be logged in to post a comment.