syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In Python101 class, where is the Python we study?

The main idea for this class is to step through Python code and ask Google questions along the way.

The code comes in the form of two Python scripts.

The scripts should reside in ~/ann/

Also you need to install Anaconda to run the Python scripts.

Anaconda is easy to install:

http://continuum.io/downloads
# ~ann/py101.py

# This script should help you learn enough Python to interact with some of the scikit-learn API.
# If you see a question in this script,
# your homework is to type it into Google and study the search results.
# Some of the questions ask you about the data so Google cannot help you with that.

# Ref:
# http://www.meetup.com/Palo-Alto-Data-Science-Association/events/220701369/
# http://continuum.io/downloads#py34

# Demo:
# cd ~ann
# vi py101.py
# ~ann/anaconda3/bin/python py101.py

# q: In Python what is import?
# q: In Python what is subprocess?
import subprocess

# q: In Python what is subprocess.call?
# q: In Python what is []?
subprocess.call(['/bin/date'])

# q: what is tkrh?
tkrh='%5EGSPC'
tkr ='GSPC'

# q: In Python what is +?
subprocess.call(["/bin/rm", "-f", tkr+'.csv'])

# I should call a shell command like this:
# /usr/bin/wget --output-document=${TKR}.csv  http://ichart.finance.yahoo.com/table.csv?s=${TKRH}
cmd  = "/usr/bin/wget"
arg1 = "--output-document="+tkr+".csv"
arg2 = "http://ichart.finance.yahoo.com/table.csv?s="+tkrh


subprocess.call([cmd, arg1, arg2])
# q: In Python is double-quote same as single-quote?
subprocess.call(["/usr/bin/head", tkr+'.csv'])


# q: In Python how does 'as' change the action of import?
import pandas as pd

# q: In Python what does read_csv() do?
df1 = pd.read_csv(tkr+'.csv')

head1 = df1.head()

# q: In Python what is print()?
print(head1)

# q: Shell has tail, Pandas have tail?
tail1 = df1.tail()
print(tail1)

# q: In Python how to create DataFrame2 from DataFrame1?
df2 = df1[['Date','Close']]

# q: In Python How to change name of column in DataFrame?
df2.columns = ['cdate','cp']

head2 = df2.head()
print(head2)

# q: In Python what is pdb?
import pdb

# q: In Python what is a List?
mylist = [1,2,3,4,5]

# q: In Python how to call a for-loop?
for elm in mylist:
  print(elm)

list2 = []
for elm in mylist:
  # q: In Python how to append to a list?
  list2.append(elm * 2)
# That was 3 lines of code to build list2

# q: In Python what is a Comprehension?
list2  = [elm * 2 for elm in mylist]
# That was 1 line of code to build list2

# q: In Python what is an index?
mylist[0]
# q: In Python what does colon do in index?
mylist[0:3]
# q: In Python what does this do?
mylist[3:3]

# q: How I put members 1,3,4 into list134?
# Easy way:
list134  = [mylist[1], mylist[3], mylist[4] ]
# Other way:
list134  = [mylist[idx] for idx in [1,3,4]]

# q: In Python what is numpy?
import numpy as np

# q: In Pandas what does .values do?
myarray = df2[['cp']].values

# q: In Python, Pandas, NumPy how to create List from DataFrame Column?
list1  = [acp[0] for acp in myarray]

# q: In Python, how to copy a list?
list2  = list1
# Is list2 a copy or do list1, list2 simply point to the same list?

# q: In Python, what does len() do?
len1   = len(list1)

# q: In Python, what is the expression for the last index in a List?
last_idx = -1

# q: In Python, how to get last member of list?
last_mem = list1[last_idx]

# The command below should append a copy of the last element of list1 to list2:
# q: Does the command below also change list1?
list2.append(last_mem)

final_idx = len1
cplist = list1[0:final_idx]
cplag  = list2[0+1:final_idx+1]
len(cplist) == len(cplag)
len(cplist) == len1

# q: In Python, Pandas how to add List as DataFrame Column?
df2['cplag'] = cplag
# q: Does cplag column lag the cp column?
# q: How I compare cplag column to cp column?

# q: In Pandas, how to subtract two columns?
cp_col    = df2[['cp']].values
cplag_col = df2[['cplag']].values
# q: what type of object is cp_col, cplag_col?

# q: How to calculate lag_ptcg?
lag_pctg = 100.0 * (cp_col - cplag_col) / cplag_col

# q: How to add lag_pctg to df2 as a column?
lag_pctg_list = [alag_pctg[0] for alag_pctg in lag_pctg]

# pdb.set_trace()
df2['lag_pctg'] = lag_pctg_list

# q: How to create df2['lead_pctg'] from df2['lag_pctg']?
# q: In Python, how to concatenate 2 lists?
lead_pctg_list   = [0.0] + lag_pctg_list
df2['lead_pctg'] = lead_pctg_list[0:final_idx]

# q: How to create df2['lag_pctg_2day']?
cplag2        = cplag[0+1:final_idx+1]
cplag2        = cplag[0+1:final_idx+1] + [cplag[-1]]
df2['cplag2'] = cplag2
cplag2_col    = df2[['cplag2']].values
lag_pctg_2day        = 100.0 * (cp_col - cplag2_col) / cplag2_col
lag_pctg_2day_list   = [alag_pctg_2day[0] for alag_pctg_2day in lag_pctg_2day]
df2['lag_pctg_2day'] = lag_pctg_2day_list

#q: In Python, Pandas how to write DataFrame to CSV?
df2.to_csv('df2.csv', float_format='%4.3f', index=False)

# Next step: py102.py
# Done


# ~ann/py102.py

# Ref:
# http://www.meetup.com/Palo-Alto-Data-Science-Association/events/220701369/
# http://continuum.io/downloads#py34

# Demo:
# cd ~ann
# vi py102.py
# ~ann/anaconda3/bin/python py102.py

# Your homework is to type the questions into Google and study the answers.

# This script should work with a CSV file created by py101.py

# q: Where do I find good demos of Pandas and NumPy?
import pandas as pd
import numpy  as np

# q: In Python what is pdb?
import pdb

# q: In Python how to read CSV into Pandas DataFrame?
df2 = pd.read_csv('df2.csv')

# q: In Python how to copy Pandas Column into a List?
lag_pctg      = [elm[0] for elm in df2[['lag_pctg']].values     ]
lag_pctg_2day = [elm[0] for elm in df2[['lag_pctg_2day']].values]
lead_pctg     = [elm[0] for elm in df2[['lead_pctg']].values    ]

# q: In Python how to create empty NumPy array?
number_of_rows    = len(lag_pctg)
number_of_columns = 2
xvalues  = np.zeros((number_of_rows, number_of_columns))
yclasses = np.zeros((number_of_rows, 1                ))
# q: In Python how to fill NumPy array column with a List?
xvalues[:,0] = lag_pctg
xvalues[:,1] = lag_pctg_2day
yvalues      = np.array(lead_pctg)
yclasses     = yvalues > 0.0

# q: In Python what commands does pdb offer me in its UI?
# pdb.set_trace()

# q: In Python, NumPy how to get row-0 from Array?
x_oos = xvalues[0,:]

# q: In Python, NumPy how to get row-1 through row-1000 from Array?
x_is  = xvalues[1:1001,:]
y_is  = yclasses[1:1001 ]
# q: In Machine Learning what is difference between
#    Out OF Sample (OOS) and In Sample (IS) X values?

# q: In Python scikit-learn, what is KNeighborsClassifier?
from sklearn.neighbors import KNeighborsClassifier

# q: In scikit-learn, when I build a KNeighborsClassifier object,
#    what is n_neighbors? what is weights?
knn = KNeighborsClassifier(n_neighbors=len(y_is), weights='distance')

# q: In scikit-learn, when I build a KNeighborsClassifier object,
#    what does the fit() method do?
knn.fit(x_is, y_is)
# Why does the fit() method take two arguments?

print('I have just learned from '+str(len(y_is))+' observations.')

# q: In scikit-learn, when I build a KNeighborsClassifier object,
#    what does the predict_proba() method do?
upprob = knn.predict_proba(x_oos)[0,1]
# Why does it take one argument?

print('Probability that lead_pctg will be > 0')
print('For these x-values:')
print(x_oos)
print('is: ')
print(100.0 * upprob)
print('percent')

# done


syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me