Chapter 6 Introduction to the Command Line

Objectives:

  1. To start using the command line for basic coding
  2. To learn how to navigate in the command line
  3. To create our first lines of code to process text files

The UNIX command line is a program that will interpret commands, in the form of text instructions, for the computer to be executed.

This means that you can tell the computer to list files or print text in the screen or do a mathematical operation using a set of commands and the computer will exacute them and provide you with an answer.

However, before we start using the command line I decided to make a list of the most commonly used commands in UNIX for your future reference.


6.1 Commonly used commands

6.1.1 Logging in

  • Logging into the cluster: 140.232.222.154 in your browser
  • Logging into the cluster via FTP to upload/download files (i.e Filezilla, Cyber Duck): ssh username@140.232.222.228

6.1.3 Moving and downloading files

  • Copying a file: cp file destination
  • Renaming a file/Moving a file: mv file distination_or_new_name
  • Downloading a file: wget web_adress_of_the_file
  • Removing a file rm file

6.1.4 Editing files

  • Edit a text file: nano (However, we will try and use atom for now)
  • Counting the number of lines, words and bytes of a file: wc
  • Match a pattern and extract text: grep pattern_of_interest
  • Match and replace a pattern in a text file: sed 's/match/replacement/g'

6.2 Section 1: Basics of the command line

Today, we will do very basic things on the cluster. First, we will learn about our whereabouts (where is our home folder), then we will learn how to create a folder for the lab within your home folder. Then, we will create a folder for lab 2 and download a file to it. Finally, we will do a similar exercise to the one in class, and count the number of occurences in a text file.

6.2.1 Logging into the cluster.

  1. Log into your R studio session through your browser as indicated above
  2. Go to the Terminal tab in the Command Pane Viewer in the bottom of your screen

WELCOME TO SMAUG!!

6.2.2 Where are we? Who am I?

First thing: You land into the magical real of the cluster. Now… where are you?

Question 1

  • What command would you use to check in which working directory are you?

“Where am I?”

In addition, in case you forget what is your username, you can always use the command whoami:

“Who am I?”

This means two things:

  1. Our home directory, where all of your files are, is located within the /Smaug_SSD/MBB101/username folder
  2. Our username, in case we forget it, can always be remembered.

6.2.3 Running basic commands

So, the idea of the command line is to run commands, right? Well, congrats! You have already ran two commands!

Question 2

  • Which commands have you already ran?

So, as you know now. To run commands you just write them and that’s it!

For example: you want to know what time it is? Use the command date:

“What time is it?”

We will learn to use more commands as we use the cluster more and more.

6.2.4 Creating files and navigating through the cluster

In order to follow the structure of the reproducibility class, we will create folders for each lab session and store the files in them.

  1. Before we start, see if you have any folders in your home folder. Check that with the ls command:

“Listing files”

If you see nothing new, then that means you have nothing in that folder. That’s good!

  1. For today’s lab let’s create a folder called MBB101.

Question 3

  • Using the command list above, what command would you use to create a folder?

Exactly, so use that command to create the folder, and then use the ls command to check if the folder was correctly created (ls is short for list, as it lists all of the files within a folder):

“making our first folder”

Nice, the folder has been created!

To navigate into the new MBB101 folder, use the cd command (cd is short for change directory).

The syntax of cd is cd destination, so in our case, we would do cd MBB101:

“Changing directories”

if you see, after our username@smaug we can see now it shows a prompt that says MBB101. This means we have been able to move to our folder!

You can also check if you are inside the folder of interest by using the pwd command (pwd is short for print working directory)

Question 4

  • Create a folder within MBB101 called Lab_2. How would you do it? Add a screenshot of the command and the result.

Good job! Nicely done at creating your first folder!

6.2.5 Downloading files into the cluster.

After creating your Lab_2 folder, we need to populate it with files.

I have created a file for us to do a quick and fun exercise today and have uploaded it to a public repository of files called GitHub, at this link: https://raw.githubusercontent.com/Tabima/MBB101/master/Lab_2/text_file.txt.

To download the files, use the wget command. (wget is short for web page get):

“Downloading text file

Question 5

  • What are the contents of this file? What command from the list above would you use to open or view a file?

Question 6

  • How many lines does this file has? What command from the list above would you use to count the number of lines? What are the three outputs from the command?

6.2.6 Extracting patterns from a text file

Finally, lets do the examples from Monday’s course:

  • How many times is the word “Dursley” found in the document?
  • How many times is Harry mentioned?
  • How was the weather on that Tuesday?

To find and match a pattern, we can use the command grep. (grep is short for g/re/p: globally search for a regular expression and print matching lines)

So, if we want to find out if the word “Dursley” is mentioned in the document we use the command:

grep Dursley text_file.txt

NOTE: Before you run the command, do you see anything weird?

In this case, we are using two more arguments after the grep command: The pattern of interest we want (Dursley) and the file that we want to search in (text_file.txt).

To learn how to use these commands, we can google the command or use the man command (short for manual).

Alright, how does the output of grep looks like?

“grepping for the first time”

Ah cool, we can see if the pattern is present or not!

Question 7

  • What happens if you try and answer these questions using grep?

    • How many times is the word “Dursley” found in the document?
    • How many times is Harry mentioned?
    • How was the weather on that Tuesday?

Summarize how grep can help you with finding these patterns of interest.

Question 8

  • Using the man grep command, how can you use grep to count for an occurrence of a text string. For example, can grep count the number of times the word Dursley exists in the file? Write the command you would use and the results.

6.3 Section 2: Advanced command line.

6.3.1 Identifying paths

We know now how to download and check files, but we also need to remember how to look at where the files are in case we need to use paths in the future.

For example, what is the path of the text_file.txt we were working on?

Question 9

  • Using the readlink -f command, can you indicate what is the full path of this file?.

Question 10

  • If you are in your home (~) folder, and you have a file there called file.txt, what is the relative path of this file?

6.3.2 Creating copies of a file into a new folder.

We want to modify the text_file.txt to add, remove, replace or count elements in a new version of it.

So, the idea is to create a new folder inside your Lab_2/ folder, named modified/.

Create the folder and copy the text_file.txt file into a new file called modified_file.txt in a new folder called modified/ using the cp command

Question 11

  • Please present syntax of the command to do the instructions above.

Question 12

  • What is the absolute path of the modified_file.txt?

Lets check if it worked:

copying file


6.3.3 Replacing patterns using sed

So, if you recall the theory class, we learnt about the sed command to replace patterns.

The idea is that you will replace some patterns in the modified_file.txt file and then check if the results make sense.

  1. Change the Dursley patterns into Potter and save the file as potter.txt using the sed s/Dursley/Potter/g modified_file.txt > potter.txt command.

Question 13

  • What does the > in the command mean?

Question 14

  • Check the number of occurrences of the word Potter in the potter.txt file using grep.

Question 15

  • How many times does the word Potter appear? Is it the same number as in question 7 when you checked the number of times that Harry was mentioned? Why?

Question 17

Count the number of lines of the potter.txt file and compare them to the original file.

  • Is the number of lines similar between potter.txt and modified_file.txt? Very briefly justify your answer.
  1. Extract all the lines that say Potter from potter.txt into a new file called potter_lines.txt.

Question 18

  • Present the syntax to execute the previous instruction.

Question 19

  • What are the number of lines that say Potter in the file? Is the number different from questions 14 and 15? Explain your answer very briefly.