Computing Cheatsheet (Bash, regex, sed, awk, grep, find, unix commands, etc.)

xigayari's version from 2016-05-05 08:06

Bash/unix/linux/shell commands


code/command/shell variabledoes/holds/is useful for
display -resize "200%" *.png &open *.png for viewing
\ls, \grep, etc.override aliases for commands (this is important for ls > foo.txt, which gives stange output formatting)
ls file_{6,3,0}.txtwill ls file_6.txt, file_3.txt, & file_0.txt, but not the other files
ls -d -1 $PWD/*.*ls but with the full path showing
~ls --hide "*.fits"show all files NOT matching *.fits
~rm !(*.fits)remove all files NOT matching *.fits
head -n 5 foo.pyprint first 5 lines of
tail -n 5 foo.pyprint last 5 lines of
wget -O savename linkdownload the file link from the internet and save it to the name savename
rsync -a /path/from/ /path/to/ --include \*/ --exclude \*? rsync is like more advanced version of cp
du -sh */count the amount of data in the subdirectories of the current directory
command > outputoverwrite output
command >> outputappend to output
command 2>&1 | tee -a foo.loghow to print script output to screen and file foo.log
xargssends piped output to function input. Might need this with some commands that don't play well with pipelines
history | cut -c 8-history without line numbers
fs lqshows how much of afs disk quota I have available
df -h .see how much space is available to me
$1, $2,..., $@first input ($1), 2nd input ($2), ... , all inputs ($@)
$_last argument of the previous command
LAST="~cat /tmp/x~"; exec >/dev/tty; exec > >(tee /tmp/x) env var $LAST automatically contains output of last command
$$process id of the current process
$0name of the shell or shell script.
$_the absolute file name of the shell or script being executed as passed in the argument list. Subsequently, it expands to the last argument to the previous command, after expansion. It is also set to the full pathname of each command executed and placed in the environment exported to that command.
$?exit status of the most recently executed foreground pipeline.
quote string "${var}"the expanded version of the shell variable. i.e. if already ran export var=happy it is interpreted as happy
(ex. sed "/${cluster}/d" foo deletes lines with the expanded version of the variable in them!)
quote string '${var}'won't expand shell variable var. i.e. it is interpreted as ${var}
(ex. sed '/${cluster}/d' foo deletes lines with ${cluster} in them)
command uniqremoves repeated consecutive lines (should use sort first)
command sort sorts lines
command let cut=${rms}*5let does basic math, i.e. now ${cut} is defined
(ex. for i in {1..3} ; do let num=i*2000 ; makesubimage ${num} 3500 2000 4000 < MACS0018+16_W-C-RC_coadd.fits > MACS0018+16_W-C-RC_coadd_sub${i}.fits ; done)
kill -9 PID#kill job with PID# (works from seperate terminal!)
ps -u awrightlist all jobs you are running on machine
condition?result-if-true:result-if-falseif/else in bash
for i in {1..5}; do echo $i; doneloop over numbers 1-5
&&use to chain commands together (rather than ;) so that the chain will stop if something exits abnormally
for ((CHIP=1;CHIP<=${NCHIPS};CHIP++));
if [ -e SUBARU_${CHIP}.reg ]; then
cp SUBARU_${CHIP}.reg globalweight_${CHIP}.reg
use to loop over 10 chips
echo ${out} | od -csee details of an env variable (ex. this can help determine if there is a hidden escape sequence/etc.)
atime – Access Time (ls -lu)last time the data from a file was accessed – read by one of the Unix processes directly or through commands and scripts.
ctime – Change Time (ls -lc)
mtime – Modify Time (ls -l)time of the last change to file's contents. It does not change with owner or permission changes, and is therefore used for tracking the actual changes to data of the file itself.
stat foo.anyget Access/Modify/Change time as well as user who wrote file (Uid) and other info

scp and sftp (copy and move across computers)

sftp: you can use 'l' before a command to do stuff on the local machine! so in sftp interpreter do "lls" to list local stuff or "lcd" to change local directory


Question Answer
scp myfile myfile to path
scp -r ./pathcopy remote directory here

awk, find, regex, vi, sed, and grep

awk (and paste/cat/column/sort/uniq/other ascii things)

Question Answer
use awk NRfor the Number of Rows
awk -v filter="${filt}" '{s+=$1;print filter " pretty (used) ave exptime is " s/NR}' foo_${filt}.txtget average of column
awk 'NR%30==1' all.bcc.intake every 30th row of file
awk '{print $4, $10}' some.dat > some_4_10.datmake new file some_4_10.dat with columns 4 and 10 from some.dat
awk '{if ($1<23) print $1, $2}' bccall_i.txtprint columns 1 and 2 of rows where the elmt in row 1 is less than 23 (replacing $1, $2 with $0 will print entire row)
awk 'BEGIN{do this once, before main loop}{do this thing on every line}END{do this once, after main loop}' foo.txtBEGIN, END, and do on all lines
awk 'if (condition1 && condition2 || condition3){truecommand} else {falseCommand}' foo.txtif else (with and/or statements)
awk 'NR>11 {command}' foo.txtexclude header (assuming 11 lines in header)
awk '{print $4, $7, $5, "Line4" }' infilepick out columns 4, 7, and 5, then print "Line4" in the last column
awk 'BEGIN {max = 0} {if ($3>max) max=$3} END {print max}' foo.txtpring max of column 3
awk '{sum+=$4} END { print "Average = ",sum/NR}' infileprint average of line 4
awk 'BEGIN {min = 1000} NR>15 {if (0<$4 && $4print min of all positive values in line 4, excluding the first 15 header lines
awk '{sum+=$4}END{print sum}' foo.txt sum 4th row
awk '{sum+=$5}END{print sum/3600}' foo.txt sum time [s] in 5th row and print time in hours
awk '{sum+=$5;n++}END{print (sum+30*n)/3600}' foo.txt ?
awk 'FNR==NR{if(mfill empty spots with nan
awk '{print substr($N,0,8)}' foo.txtprint Nth column's first 8 characters
awk use FNR==6to refer to row 6
awk '{if (FNR==1){print} else {commands}' infile.txt | column -t > outfile.txtprint header, but don't mess with it
awk use %3.5fto round to 5 places after the decimal and 3 places before the decimal
awk use %s with printfto print it as is
awk '{if (FNR==1){print} else {printf "%s %s %2.2f %2.3f %2.3f %2.3f %2.3f \n",$1,$2,$3,$4,$5,$6,$7}}' infile.txt | column -t > outfile.txtignore header, print lines 1 and 2 as they are, and round to 3 decimal places lines 3-7
use awk ~kinda like in in python
awk '($1 !/(#|word1|word2)/){print}' infile.txt~print all lines that don't begin with `#` or `word1` or `word2`
awk '{if($8~"DomeFlat"){print "nan"}else{print $8}}' infile.txt > outfile.txt"~" is kinda like "in" in python. This replaces anything in row 8 containing DomeFlat with nan
awk '{if ($4=="Empty"){}else{print}}' infile.txt > outfile.txtget rid of "Empty" filters
./ infile1.txt infile2.txt > outfile_first_vals_match.txtpaste two files, but only where the values in the first column matchup, if infile1's val1 isn't in infile2, then nan's are pasted in instead
dfits foo*.fits.fz | fitsort key1 key2 key3 > key123.txtdfits simple example
cat f1.txt f2.txt f3.txt | sort -g | column -t > f123.txtadd, sort, and make nice straight columns
column -t infilealign things
sort -g infilesort things by first column entries
paste -d ';' infile1.txt infile2.txtpaste together with columns seperated by ;


Question Answer
find . -iname "*.py" -exec ls {} \;find and execute ls on files (file name = {})
find . -type d -empty -exec rmdir {} \;find & remove all empty directories
find . -iname "*.f90" | xargs grep "vmax"find files in directory heirarchy that end with .f90 and print the ones that have "vmax" in them
find /u/ki/awright/data/2010-02-12_W-C-IC/WEIGHTS -printf "%T+\n" | sort -nr | head -n 1print the latest modification date of any of the files in this dir or any of it's subdirs (list direcories in order of modication date)
find . -type d -exec mkdir -p path_to/{} \;find directories and copy directory tree/structure to new location
find . -type f -exec ln -s path_from/{} path_to/{} \;find files and make symbolic link to them
find . -mtime -1find today's files
find . -mtime -2means files that are less than 2 days old, such as a file that is 0 or 1 days old.
find . -mtime +2means files that are more than 2 days old... {3, 4, 5, ...}


Example: For example, grep import *.py -h | sort | uniq will print out a unique list of all lines containing the word "import" in all python files in the directory



grep and egrep (egrep = grep -E, it uses an extended language that's closer to the regular expressions I've documented in Master_notebook):
egrep foo /path/files* = find files with 'foo' in text and print lines with 'foo'
grep cmd/extensionwhat it does
egrep "(this\|that)" infilematch "this" or "that"
grep -oprints out only the string that matches, not the whole line
-hjust print lines, not filename: line
-ljust print files, not all text in all lines
-Llist everything WITHOUT foo
pcregrep is a version of grep that uses perl regex:
Question Answer
-Mmultiline mode
(.|\n)*match any length string, even if it's multi-line
((.|\n)*?)**lazy-search** matching only the shortest possible match


pcregrep -M "(^def|^\t'((.|\n)*?)'$|^\treturn\ )" /u/ki/awright/InstallingSoftware/pythons/ matches all function stuff like this:
def name(inputs): =tags
'description line 1
desciption line n'
return outputs