8.2 Creating a repository

Generally when you’re working with a project in R, it starts one of two ways: (a) you are beginning a new project, and you know it’s something you want to keep track of, document, and share, or (b) you are exploring some data, and the resulting work may move into another project, or develop into a larger project on its own. In either case, we recommend using version control, but the way you approach using version control may differ. For example, if you know from the start that you want this to be a long-lasting or important project, you should probably start the project by initializing a git repo. If, instead, you’re just doing some exploratory analyses, there’s a chance that work could move into an existing repo or, perhaps, even thrown away. In that case, you may find yourself in a situation where you’ve want to start a repository with an existing project. We’ll cover each of these situations below.

8.2.1 Starting with an empty project

Generally, if you’re starting a new project, you should start with initializing a repo. The easiest way to do this is to:

  • Create a new empty repo on Github
  • Clone the repo locally
  • Start your project (i.e., RStudio Projects)
  • Commit your initial changes
  • Push the changes to the remote

Once you’re logged into GitHub, you will see a + in the top right of the navigation bar with a drop down menu. In the menu, you will see “New repository”.

You can then choose a name for the repo, which would ideally be descriptive of the project, provide a (slightly) more detailed description of project, whether the repo should be public or private, and any files the repo should be initialized with. Note that the repo name must be unique to your username (i.e., you can’t have two repos with the same name). I typically initialize the repo with a README, which I fill in more later, along with a .gitignore file and a basic license. GitHub has many .gitignore templates that are helpful, including one for R. The .gitignore file will tell the repo what files should not be tracked by git. There are also a number of different licenses that are available, and even more that could be added after. A good default is the MIT license, which a permissable license allowing others to freely share and distribute your code as long as they include the license (which includes a copyright with your name and year). I am also generally in favor of the Creative Commons Attribution license (CC BY 4.0), which allows sharing provided proper attribution is made to the original developer. Note that this must be added after creating the repo and an easy way to do this is by running usethis::use_ccby_license() once you’re in a project in your repo.

If you’re feeling uncertain, the below is a decent template to get started.

The repository is now setup on the remote, or on GitHub.

But we still need to get it on our local computer. We do this by cloning the remote repo to our local. We can clone with GitKraken, but first we want to copy the link to the remote repo. We do this by first clicking on the green “code” button then clicking the clipboard, which will copy the link shown.

Cloning: Copy all the files and the commit history from a remote (online) repo to your local computer.

Next, we’re going to open GitKraken, and select “Clone a repo”.

There are multiple ways to clone the repo from GitKraken, but we’ve generally found the URL approach to be the easiest. You can click “Browse” to change the location that the repo will be cloned to (you will get a new folder downloaded that has the entire contents of the online repo). You then paste the URL we copied earlier into the respective field and click “Clone the repo!”.

GitKraken will then download the files, and you will be prompted with a dialogue box asking if you would like to open the repository in GitKraken.

Select “Yes” and you will see the repository.

In the middle of the screen you will see “Initial commit”. This is where you will see all the commits that have been made, along with the commit messages. This probably doesn’t make a whole lot of sense at this point because we haven’t talked about what commits are. However, this folder is now setup to monitor any files in it. If we add new files here, they will show up as new. If we make edits to an existing file, the changes to the specific lines will be tracked. We’ll get to all of this more in a bit, but first let’s talk about how to initialize a repo with an existing project.

8.2.2 Starting with an existing project

I regularly find myself in situations where I have written code that I thought would be “throw away” work, but evolves into a larger project. Anytime the project is ongoing, you should consider using git, and in particular, if you are collaborating on a project with others.

Let’s say I have an existing project that looks like the below. There are two folders, currently, one for scripts and one for plots. We might also (typically) have a folder for data, but in this case my script uses built-in data so there’s no need to store it here.

I’ve decided this is an important project that is going to last beyond just today. I’m therefore going to use version control (via git) and connect it to a remote repo. We start by going to GitKraken clicking “Start a local repo”.

We then “Browse” to find the folder. Importantly (and rather strangely) we need to have the name of the folder in “Name”, which means the “initialize in” field should be the folder our project is in, rather than our project folder itself. This will make the “Full path” field go to our project folder. At this point you can initialize with a .gitignore file and License from GitKraken defaults. You can ignore the “Default branch name” field and just keep it as “main”, which is the newest standard.

After you click “Create repository” it will take you to a similar screen that we saw before. The difference is now you will see //WIP above the initial commit and file names in the “Unstaged Files” area. We’ll get to all this in a bit.

If you look back at your actual folder, and you have hidden files set to be visible (on a Mac, + shift + .), you will see a .git folder. This is all the infrastructure needed to monitor all the files in the folder.

You’ll now also see the .gitignore file (which is also viewable through the “Files” pane in RStudio) which tells git the files that it should not monitor. If we open this up (either through a text editor or RStudio) you will see things like .Rhistory and /cache/. These are the defaults from the template we used, but we could delete anything in here and git would then monitor those files. Similarly, there may be other files we want to ignore, and we could add those here. For example, you might want to make a project mostly publicly available, but not include private, protected data. You could do this by adding a folder that has all of the data you do not want to share, and then put the name of this folder in the .gitignore file (e.g., adding private-data/ would ignore all the files in the “private-data” folder).

Now, we need to connect this repo to a remote, or online repo on GitHub. Just as before, we’re going to go to GitHub and create a new repo there. Unlike before, however, we want to create the repo without any files (i.e., no license, .gitignore file, or README, because). You can still give your repo a description online, but it should not have any existing files because we already have them all locally.

Once you’ve created the empty repo, it will look like the below, providing suggestions for connecting this remote repo to a local repo.

Finally, we’re just going to copy the link, as we did before, go back to GitKraken, and click the green plus button (only shows up on hover) on the remote section of the tool bar on the left-hand side.

This will then bring up a dialogue box (below) asking how you would like to add the remote. Click on URL along the top, then enter the name of the repo and paste the URL in the “Pull URL” box. The text you enter into “Pull URL” should be automatically copied to “Push URL”, but if not, copy it there too.

Finally, click the “Add Remote” button, and you’re done! Very little will actually change on the GitKraken side, but you should see the name the remote repo listed under “Remote” on the left-hand side (below where you clicked the button to add a remote).