Linux Bash Shell Basic Commands
Using the Hive shell for genomic analyses. Hive is a high performance computing system of the Faculty of Natural Sciences at University of Haifa. This document is intended to provide helpful commands for the Hive shell.
In order to connect to the access node of Hive, you need to connect to Hive using SSH protocol. If you work with Windows, download Bitvise for an easy connection to the server.
If you want to get familiar with Linux and its command line in order to access the full range of bioinformatics tools available to researchers, I strongly recommend to take the free course Bioinformatics for Biologists: An Introduction to Linux, Bash Scripting, and R. On this course, the educators from the Wellcome Genome Campus (WGC) Advanced Courses and Scientific Conferences are joined by experts from the Institut Pasteur de Tunis, to give practical training using real biological data in different biological scenarios. This will help you see how you can work with data in your own field of biology.
Basic commands
-
cdcurrent directory -
pwdpaste working directory -
lslist all the files in the directory -
cd ..go back of one folder -
clearclear the terminal -
ls -lcontent of the directory in longer form (files names with dates and times) -
ls -tsort files by time -
ls -ltrsort files from the last modified, in the longer form -
mkdir nameofnewfoldercreate a new folder in the working directory -
cp nameoffile nameoffoldercopy file into a folder -
rm nameoffileremove file -
mvmove files from one folder to another (like cut/paste) -
man lsshow manual with all commands -
screen -rresume screen sessions -
screen -lslists of active screens -
bash nameoffilerun bash script -
sacctcheck work on hive queue -
echo $nameofobjectwhere is this variable -
whichshow the location of an object -
ls *.txtlist all the txt files in the directory -
less nameoffileshow content of file.If you want to look at a very long file then use the less command, as it will display one page at a time. You can then scroll up and down through the file. Use space to advance of one page. -
head nameoffiledisplays first lines in a file -
tail nameoffiledisplays last lines in a file -
catdisplays the contents of a file -
sortorders a list of items both alphabetically and numerically -
uniqremoves any repeated lines, only if they are adjacent to each other -
grepglobally search for a regular expression and print matching lines -
touchcreate new, empty file -
chownchanges user ownership of a file -
chgrpchanges group ownership of a file -
chmodchanges the permissions of a file. As an example,chmod 666 file.txtwill set a file called file.txt to be both readable and writable by the owner. As another example,chmod g+w file.txtwill make file.txt writeable by anyone with group ownership. -
setthis sets or unsets shell variables. If used without an argument then it will print a list of all variables, both shell and environment, and shell functions. -
unsetthis deletes shell and environment variables. -
exportthis command sets environment variables.
To assign a variable, we use the = symbol:
name="Sam"
Because variables may contain whitespace which gets interpreted by bash, it’s good practice to wrap the variable name in curly brackets and encase it in double quotes:
echo "${name}"
You can initialise an array by assigning values that are separated by spaces in standard brackets.For example:
array=("value 1" "value 2" "value 3")
Each value in an array is known as an element. Each element in an array is referenced by a numerical index. This index starts at 0. The syntax to access the first value in our array would be:
echo "${array[0]}"
We can return all of the values in our array by using the @ symbol:
echo "${array[@]}"
Modern Bash syntax for conditional expressions encases our comparative expression inside double square brackets ([[ and ]]).
The syntax for this is:
[[ option arg1 ]]or[[ arg1 operator arg2 ]]
A conditional expression returns a Boolean value i.e. true or false. If the condition is met, it will return true and if not, false. Returns true if the file exists:
[[ -e ${file} ]]
Returns true if the file exists and is a directory:
[[ -d ${directory} ]]
Returns true if the file exists and is a regular file:
[[ -f ${file} ]]
Returns true if the file exists and is readable:
[[ -r ${file} ]]
Strings, as sequences of characters, can be compared. There are two string conditional expressions you need to be aware of:
- Is equal to
== - Is not equal to
!=
This condition will return true if string1 and string2 are identical:
[[ ${string1} == ${string2} ]]
This condition will return true if string1 and string2 are different from one another:
[[ ${string1} != ${string2} ]]
Redirect the outputs of a script:
script.sh > output.txt
Conditional statements come in many forms. The most basic form essentially says: IF our conditions are met, THEN execute the following code. We can write our if statements in several ways:
> if [[ condition ]]
> then
> command
> fi
Conditional statements can be extended to have another clause by using an if..else statement. Here we are saying, IF conditions are met, THEN execute the following commands. However, ELSE IF these conditions are not met, execute a different set of commands. The syntax for this looks like:
> if [[ condition ]]
> then
> command1
> else
> command2
> fi
Use case statements to check each condition in turn and process commands based on those conditions. The case syntax looks like this:
> case $string in
> pattern_1)
> command
> ;;
> pattern_2)
> alternate command
> ;;
> *)
> default command
> ;;
> esac
The basic syntax for a for loop is:
> for variable in ${list}
> do
> # Execute some commands
> done
The basic syntax for a while loop is:
> while [condition]
> do
> # Commands to run
> done
The basic syntax for a run-until loop is:
> until [condition]
> do
> # Commands to run
> done
Bash function syntax is pretty straightforward. Start off by defining the function name, followed by parentheses. The commands that we want to execute are found between the curly brackets and are known as the body of the function.
> function my_function() {
> #some code
> }
AWK is a programming language. It is particularly useful for processing text files and extracting data, particularly when a file is split into columns or delimited by a specific character (e.g. a comma).
-
awk -F”\t” '{print $1}' Diamonds_fix.txtthis will print the value in the first column of the file Diamonds_fix.txt. -
awk -F”\t” ‘$2==”Ideal” '{print $0}' Diamonds_fix.txtthis prints only the lines of Diamonds_fix.txt in which column 2 (cut) contains the value “Ideal”.
Patterns can be combined using the && symbol (for and) so a line is printed only if two or more conditions are met. For example:
awk -F”\t” ‘$2==”Ideal” && $4==”SI2”’ Diamonds_fix.txt
In addition to strings, awk can also filter on numeric values. For example:
awk -F”\t” ‘$1>1’ Diamonds_fix.txtthis will print all lines in which the first column has a value greater than 1.
Shared softwares in Hive are located in /data/apps and working with the help of module utility LMOD (which allows to load and unload needed software and versions of software), you can read about module here.
Use command module avail to see what software is installed already in public place.
