Git - Revision Control Perfected
Commits
A commit is meant to record a set of changes introduced to a project. What it really does is associate a tree object—representing a complete snapshot of a directory structure at a moment in time—with contextual information about it, such as who made the change and when, a description, and its parent commit(s).
A commit doesn't actually store a list of changes (a "diff") directly, but it doesn't need to. What changed can be calculated on-demand by comparing the current commit's tree to that of its parent. Comparing two trees is a lightweight operation, so there is no need to store this information. Because there actually is nothing special about the parent commit other than chronology, one commit can be compared to any other just as easily regardless of how many commits are in between.
All commits should have a parent except the first one. Commits usually have a single parent, but they will have more if they are the result of a merge (I explain branching and merging later in this article). A commit from a merge still is just a snapshot in time like any other, but its history has more than one lineage.
By following the chain of parent references backward from the current commit, the entire history of a project can be reconstructed and browsed all the way back to the first commit.
A commit is expanded recursively into a project history in exactly the same manner as a tree is expanded into a directory structure. More important, just as the SHA1 of a tree is a fingerprint of all the data in all the trees and blobs below it, the SHA1 of a commit is a fingerprint of all the data in its tree, as well as all of the data in all the commits that preceded it.
This happens automatically because references are part of an object's overall content. The SHA1 of each object is computed, in part, from the SHA1s of any objects it references, which in turn were computed from the SHA1s they referenced and so on.
Tags
A tag is just a named reference to an object—usually a commit. Tags typically are used to associate a particular version number with a commit. The 40-character SHA1 names are many things, but human-friendly isn't one of them. Tags solve this problem by letting you give an object an additional name.
There are two types of tags: object tags and lightweight tags. Lightweight tags are not objects in the repository, but instead are simple refs like branches, except that they don't change. (I explain branches in more detail in the Branching and Merging section below.)
Setting Up Git
If you don't already have Git on your system, install it with your package manager. Because Git is primarily a simple command-line tool, installing it is quick and easy under any modern distro.
You'll want to set the name and e-mail address that will be recorded in new commits:
git config --global user.name "John Doe"
git config --global user.email john@example.com
This just sets these parameters in the config file ~/.gitconfig. The config has a simple syntax and could be edited by hand just as easily.
User Interface
Git's interface consists of the "working copy" (the files you directly interact with when working on the project), a local repository stored in a hidden .git subdirectory at the root of the working copy, and commands to move data back and forth between them, or between remote repositories.
The advantages of this design are many, but right away you'll notice that there aren't pesky version control files scattered throughout the working copy, and that you can work off-line without any loss of features. In fact, Git doesn't have any concept of a central authority, so you always are "working off-line" unless you specifically ask Git to exchange commits with your peers.
The repository is made up of files that are manipulated by invoking the git command from within the working copy. There is no special server process or extra overhead, and you can have as many repositories on your system as you like.
You can turn any directory into a working copy/repository just by running this command from within it:
git init
Next, add all the files within the working copy to be tracked and commit them:
git add .
git commit -m "My first commit"
You can commit additional changes as frequently or infrequently as you
like by calling git add followed by git
commit after each modification
you want to record.
If you're new to Git, you may be wondering why you need to call git
add each time. It has to do with the process of
"staging" a set of
changes before committing them, and it's one of the most common sources of
confusion. When you call git add on one or more files, they are added
to the Index. The files in the Index—not the working copy—are what
get committed when you call git commit.
Think of the Index as what will become the next commit. It simply provides an extra layer of granularity and control in the commit process. It allows you to commit some of the differences in your working copy, but not others, which is useful in many situations.
You don't have to take advantage of the Index if you don't want to, and
you're not doing anything "wrong" if you don't. If you want to pretend
it doesn't exist, just remember to call git add . from the root of
the working copy (which will update the Index to match) each time and
immediately before git commit. You also can use the -a option with
git commit to add changes automatically; however, it will not add new
files, only changes to existing files. Running git add. always
will add everything.
The exact work flow and specific style of commands largely are left up to you as long as you follow the basic rules.
The git status command shows you all the differences between your
working copy and the Index, and the Index and the most recent commit
(the current HEAD):
git status
This lets you see pending changes easily at any given time, and it even
reminds you of relevant commands like git add to stage pending changes
into the Index, or git reset HEAD <file> to remove (unstage) changes
that were added previously.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- What's the tweeting protocol?
- New Products
- One Hand Slapping
- Readers' Choice Awards
- Developer Poll
- Reply to comment | Linux Journal
6 hours 28 min ago - Reply to comment | Linux Journal
9 hours 1 min ago - Reply to comment | Linux Journal
10 hours 18 min ago - great post
10 hours 53 min ago - Google Docs
11 hours 15 min ago - Reply to comment | Linux Journal
16 hours 4 min ago - Reply to comment | Linux Journal
16 hours 51 min ago - Web Hosting IQ
18 hours 25 min ago - Thanks for taking the time to
20 hours 1 min ago - Linux is good
21 hours 59 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.



Comments
Interesting
Good to read this informative article here on this website. It’s an interesting post.
Thank you! wintersport oostenrijk chalet / wintersport oostenrijk chalet
Ich selber habe einige
Ich selber habe einige Webseiten und brauchte genau das, was ich hier lesen konnte. Könnte ja mal schauen, ob ich es richtig gemacht habe.
Liebe Grüße
http://www.flirtcenter24.de/
About Git
Worth keeping
Checking Out A Small Subset Of Files On A Small Device?
The limitation that I immediately ran into when I considered migrating to git is to check out some (rather randomly selected) subset of files on a small/portable computing device.
Say I have a big repository of files and I only needed a very small subset of files while on the go -- to refer to and to be edited.
It was originally a small netbook computer where I could check out a few directories from a big repository and be able to edit files on the netbook computer while on the bus.
Netbook might have grown larger with regard to its disk storage, but now, I want to do the same on an Android phone.
git's sparse checkout feature still pulls the entire repository to the device. It only checkout a subset of files to give the appearance of sparse checkout, but it doesn't resolve the storage issue.
I don't think git submodules help, as, I think, one can't easily move selected files across repositories with all history intact (i.e., every now and then, add some additional directories to the list available to small devices by moving them to a submodule, when it becomes necessary), as one can easily do with CVS.
The only solution that I can think of is to remotely mount .git/objects/ directory and deal with its limitation.
Is there any creative brain power would find a solution lift this limitation?
Thanks.
Split-able git Tree?
Given that:
Tree object = Blobs file names + permissions + Blobs collection.
Can splitting git repository be implemented by splitting some git's Tree object into 2 (sub-) Tree objects on a personal workstation, (perhaps new Commit objects to keep track of the split,) allowing a smaller tree be checked out to a small device.
Remote changes (done by others) can, then, be merged to the personal workstation (as staging), before merging to the splitted Tree branches for the small devices if necessary.
Changes on the small devices can be merged to the personal workstation (as staging), before being pulled by others?
Would that solve the disk space problem by limiting checkout to a small (sub-) Tree?
If this idea works, would some able developer turns it into an implementation?
Thanks.
On the guarantees of SHA1
First of all I want to thank the author for this clear and concise article.
However, I want to point out some inaccuracy regarding the paragraph on SHA1. The author states that SHA1 guarantees that the data in the blobs is different, and that the chance that two pieces of data have the same SHA1 is infinitesimally small. I disagree on this point.
The 40-character string that SHA1 outputs gives us 16^40 = 2^160 ~~ 10^16 different checksums. Although this is big enough to assume the above descripted 'guarantee', the claim about the infinitesimal chance is just wrong.
Consider for example 2^160 + 1 pairwise distinct files (this is data, be it hypothetical). The chance that there will be two different pieces of data in this set having the same checksum is 1. And 1 is very very different from infinitesimal.
I agree that it is highly unlikely that two such files will occur in practice, let alone in one project. (For example, each person on earth would have to create about 100.000 distinct files, to come close to the 2^160 files.) Still I wanted to point this out about the cryptographic features of SHA1.
There is not enough matter in
There is not enough matter in the universe to store 2^64 bits, much less 2^160 bits, even if you stored 1 bit per atom.
Your math is *way* off. 2^160
Your math is *way* off. 2^160 ~~ 10^48.
Very nice introduction
Congrats for a very clear and concise introduction for something as difficult to teach as git.
I love git, and had to give git training to Subversion users -- hard work! It really amounts to unlearning SVN and learn something completely new.