git clone https://github.com/user/repo.git
54 Git & GitHub: The Basics
git
is a distributed version control system. It allows you to track changes to code as well as collaborate with others on code development. Code lives in a repository, which can be hosted on a remote or local server. GitHub is the largest online service for hosting git repositories.
-
Git: System that tracks changes to code from multiple users
- Free and open source distributed version control system
- Developed in 2005 by Linus Torvalds to support Linux kernel development
- Used by 87.2% of developers as of 2018, according to Stack Overflow
-
Repository: Data (code) + metadata (i.e. log of changes over time)
- Data structure that holds metadata for a set of directories / files (set of commit objects, historical record of changes)
-
GitHub: Online service that holds Git repositories (public & private)
- Git repository hosting service
- Largest source code host in the world: > 40M users, > 100M repositories
- Acquired by Microsoft for 7.5 billion USD in 2018.
54.1 Installing git
Check if you system already includes an installation of git. If not, you can either find install it through your system’s package manager, or you can download it from the official git website.
54.2 Basic git usage
In the system terminal, all git commands begin with git
and are followed by a command name.
54.2.1 Cloning (“Downloading”)
Download a repository to your computer for the first time using clone
. Replace “user” with the username and “repo” with the repository name.
This will clone the remote repository to a folder name ‘repo’. You can optionally provide a different folder name after the URL.
54.2.2 Pulling (“Updating”)
To update a previously cloned repository using pull
:
git pull
54.2.3 Pushing (“Uploading”)
Get info on local changes to repository using status
:
git status
Working locally, stage new or modified files for commit using add
:
git add /path/to/file
Still working locally, commit changes with an informative message using commit
:
git commit -m "Fixed this or added that"
Note that the above steps did not require an internet connection, but the following does. Push one or multiple commits to remote repository using push
:
git push
54.2.4 Collaborating
The main way of contributing to a project is by a) making a new “branch” of the repository, b) making your edits, and c) either merging to master yourself or requesting your edits be merged by the owner/s of the repository. This allows multiple people to work on the codebase without getting in each other’s way.
54.2.5 Branching and merging
Scenario: you are working on your own project, hosted on its own repository. You want to develop a new feature, which may take some time to code and test before you make it part of your official project code.
- Create a new branch, e.g.
devel
- Work in your new branch until all testing is successful
- Merge back to
master
branch
Always from your system terminal, from within a directory in your repository: Create a new branch:
git branch devel
Switch to your new branch:
git checkout devel
Work on your code, using git add/commit/push as per usual.
When you are done testing and are happy to merge back to master:
git checkout master
git merge devel
git push
All the commits performed while you were working in the devel
branch will be included in that last git push from master
.
54.2.6 Pull request
Scenario: You are contributing to a repository along with other collaborators. You want to suggest a new feature is added to the code:
- Create a new branch, e.g.
mynewfeature
- Work in new branch until you are ready happy to share and testing is complete
- Go on to the repository website, select your branch and perform a “Pull request” asking that the changes in your
mynewfeature
branch are merged intomaster
- The repository owner/s will review the request and can merge
54.3 Gists
GitHub also offers a very convenient pastebin-like service called Gist, which lets you quickly and easily share code snippets.
To share some R code using a gist:
- Visit the gist site.
- Write in/copy-paste some code
- Add a name including a
.R
suffix at the top left of the entry box - Copy-paste the URL to share with others
54.4 Git and GitHub for open and reproducible science
It is recommended to create a new GitHub repository for each new research project. It may be worthwhile creating a new repository when it’s time to publish a paper, to include all final working code that should accompany the publication (and e.g. exclude all trial-and-error, testing, etc. code). As Always, make sure to follow journal requirements for reporting data deposition (includes code) and accessibility.
54.5 Git Resources
Git and GitHub are very powerful and flexible, with a great deal of functionality. Some resources to learn (a great deal) more:
- Git cheat sheet
- GitHub guides # Pro Git Book by Scott Chacon and Ben Straub
See this “GitHub for beginners” blog post on the GitHub blog.