syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

You are here: Linux101 > Shell 101 > shell101_awk

Linux Shell 101

Question:
In Shell101 how to awk Yahoo CSV?

To do Machine Learning on Linux I need to process the data I get from Yahoo.

A common tool for CSV files in shell programming is awk.

In this demo I want to use awk to remove columns from the GSPC.csv file:
#!/bin/bash

# ~ann/awk1_gspc.bash

# This script should awk a CSV file from Yahoo.

# I should cd to the right folder:

cd ~ann

export TKR='GSPC'

# The vertical bar is a 'pipe'.
# A pipe is similar to a caret.
# A caret takes rows from the left and feeds them into a file.
# I often see pipes in shell programs.
# A pipe takes rows from the left and feeds them to the right.

# To the right of the pipe I place a shell command which wants to eat rows.
# Many shell commands want to eat rows from the left.
# For example, sed likes to eat rows.
# Usually these commands also want to feed rows to the right.

# In the syntax below cat feeds rows to awk.
# Then awk transforms each row into a short row.
# Next awk wants to feed rows to something else.

# But, I am happy with the rows so I ask awk to caret them into a file:
cat ${TKR}.csv | awk -F, '{print $1 "," $5}' > ${TKR}2.csv
# Now ${TKR}2.csv contains only columns 1 and 5 from ${TKR}.csv

# Did awk do what I want?
head ${TKR}.csv ${TKR}2.csv

exit
When I run the above script I see this:
ann@feb ~ $ 
ann@feb ~ $ 
ann@feb ~ $ date|cat|cat|cat
Fri Feb 20 11:19:05 UTC 2015
ann@feb ~ $ 
ann@feb ~ $ date
Fri Feb 20 11:19:08 UTC 2015
ann@feb ~ $ 
ann@feb ~ $ ./awk1_gspc.bash
==> GSPC.csv <==
Date,Open,High,Low,Close,Volume,Adj Close
2015-02-19,2099.25,2102.13,2090.79,2097.45,3247100000,2097.45
2015-02-18,2099.16,2100.23,2092.15,2099.68,3370020000,2099.68
2015-02-17,2096.47,2101.30,2089.80,2100.34,3361750000,2100.34
2015-02-13,2088.78,2097.03,2086.70,2096.99,3527450000,2096.99
2015-02-12,2069.98,2088.53,2069.98,2088.48,3788350000,2088.48
2015-02-11,2068.55,2073.48,2057.99,2068.53,3596860000,2068.53
2015-02-10,2049.38,2070.86,2048.62,2068.59,3669850000,2068.59
2015-02-09,2053.47,2056.16,2041.88,2046.74,3549540000,2046.74
2015-02-06,2062.28,2072.40,2049.97,2055.47,4232970000,2055.47

==> GSPC2.csv <==
Date,Close
2015-02-19,2097.45
2015-02-18,2099.68
2015-02-17,2100.34
2015-02-13,2096.99
2015-02-12,2088.48
2015-02-11,2068.53
2015-02-10,2068.59
2015-02-09,2046.74
2015-02-06,2055.47
ann@feb ~ $ 
ann@feb ~ $ 
ann@feb ~ $ 


So, that is the fourth shell programming demo of shell101.



You are here: Linux101 > Shell 101 > shell101_awk
You can ask questions in Dan's Machine Learning Class Forum:
https://groups.google.com/forum/#!forum/dan101

syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me