syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In Python Pandas how do I read a CSV file?

For this demo I used some shell commands to wget a CSV file from Yahoo:
dan@hp ~/x611 $ 
dan@hp ~/x611 $ 
dan@hp ~/x611 $ cd /tmp
dan@hp /tmp $ 
dan@hp /tmp $ echo Pandas likes a header line in my CSV file > jnk.txt
dan@hp /tmp $ 
dan@hp /tmp $ echo ydate,opn,hgh,lo,cp,vol,adjc > spydata.csv
dan@hp /tmp $ 
dan@hp /tmp $ wget --output-document=spy.csv http://ichart.finance.yahoo.com/table.csv?s=SPY
--2015-01-01 02:01:37--  http://ichart.finance.yahoo.com/table.csv?s=SPY
Resolving ichart.finance.yahoo.com (ichart.finance.yahoo.com)... 206.190.61.106, 206.190.61.107, 216.115.107.206, ...
Connecting to ichart.finance.yahoo.com (ichart.finance.yahoo.com)|206.190.61.106|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘spy.csv’

    [ <=>                                   ] 292,247     --.-K/s   in 0.1s    

2015-01-01 02:01:37 (2.35 MB/s) - ‘spy.csv’ saved [292247]

dan@hp /tmp $ 
dan@hp /tmp $ echo I dont like the header which Yahoo gives me > jnk.txt
dan@hp /tmp $ echo so I remove it with grep > jnk.txt
dan@hp /tmp $ 
dan@hp /tmp $ grep -v Date spy.csv >>  spydata.csv
dan@hp /tmp $ 
dan@hp /tmp $ head spydata.csv
ydate,opn,hgh,lo,cp,vol,adjc
2014-12-31,207.99,208.19,205.39,205.54,123713700,205.54
2014-12-30,208.21,208.37,207.51,207.60,73540800,207.60
2014-12-29,208.22,208.97,208.14,208.72,79643900,208.72
2014-12-26,208.31,208.85,208.25,208.44,57326700,208.44
2014-12-24,208.02,208.34,207.72,207.77,42963400,207.77
2014-12-23,208.17,208.23,207.40,207.75,122167900,207.75
2014-12-22,206.75,207.47,206.46,207.47,148318900,207.47
2014-12-19,206.43,207.33,205.61,206.52,245084600,206.52
2014-12-18,204.74,212.97,203.92,206.78,247780600,205.64
dan@hp /tmp $ 
dan@hp /tmp $ 
Then I used the read_csv() method in Pandas to load the CSV into a Pandas DataFrame object:
dan@hp /tmp $ 
dan@hp /tmp $ python
Python 2.7.9 |Anaconda 2.1.0 (64-bit)| (default, Dec 12 2014, 14:52:24) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import pandas as pd
>>> csv1 = pd.read_csv('/tmp/spydata.csv')
>>> csv1.head()
        ydate     opn     hgh      lo      cp        vol    adjc
0  2014-12-31  207.99  208.19  205.39  205.54  123713700  205.54
1  2014-12-30  208.21  208.37  207.51  207.60   73540800  207.60
2  2014-12-29  208.22  208.97  208.14  208.72   79643900  208.72
3  2014-12-26  208.31  208.85  208.25  208.44   57326700  208.44
4  2014-12-24  208.02  208.34  207.72  207.77   42963400  207.77
>>> csv1.tail()
           ydate    opn    hgh     lo     cp      vol   adjc
5517  1993-02-04  44.97  45.09  44.47  45.00   531500  29.90
5518  1993-02-03  44.41  44.84  44.38  44.81   529400  29.77
5519  1993-02-02  44.22  44.38  44.12  44.34   201300  29.46
5520  1993-02-01  43.97  44.25  43.97  44.25   480500  29.40
5521  1993-01-29  43.97  43.97  43.75  43.94  1003200  29.19
>>> 
>>> 
I do not like the order of the DataFrame.

I want it ordered by ydate ascending:
/posts/python_pandas_sort


syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me