Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

In Python scikit-learn KNN how to understand knn.predict_proba(x_oos)?

In Dan's Machine Learning class, how to understand this Python call:
upprob = knn.predict_proba(x_oos)[0,1]
The answer is that KNN is designed to predict multiple classes. So, its predictions need to be in the form of a collection rather than just one number:
# ~ann/

# This script should help me understand this call:
# upprob = knn.predict_proba(x_oos)[0,1]

# Demo:
# cd ~ann
# vi ~ann/
# ~ann/anaconda3/bin/python

import subprocess["/bin/rm", "-f", 'iris.csv'])
cmd  = "/usr/bin/wget"
arg1 = "--output-document=iris.csv"
arg2 = ""[cmd, arg1, arg2])["/usr/bin/head", 'iris.csv'])

import pandas as pd
import numpy  as np
import pdb

df = pd.read_csv('iris.csv')

# In Pandas how to convert DataFrame to NumPy Array?
myarray = df[['sepal_len','sepal_wid','petal_len','petal_wid']].values

# Darn I have 1 row of bad data:
xvalues = myarray[0:-1,:]

# I should only have good data now:

classarray = df[['class']].values

# Darn I have 1 row of bad data:
yclasses = classarray[0:-1]

# I should only have good data now:

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=99, weights='distance')

#, yclasses), np.ravel(yclasses))

x_oos = np.array([5.1,3.5,1.4,0.2])

class_probabilities = knn.predict_proba(x_oos)


# I should see three probabilities like this:
# [[  1.00000000e+00   4.18202782e-38   3.04464021e-39]]

# One of them will be near 1.0
# KNN is predicting that x_oos is probably in the corresponding class.
# Also,
# KNN is predicting that x_oos has low probability of being in the other 2 classes.

# So,
# knn.predict_proba(x_oos)
# Is designed to deal with multi-class predictions.

# In,
# I only have 2 classes: False and True.

# So, knn.predict_proba(x_oos) will return 2 probabilities.

# The result should look like this:

# [[0.48, 0.52]]

# I only need the probability that x_oos is True.

# So to get that I first need to get the 0th element from the result which is this:
# [0.48, 0.52]
# Then, I need to the the last element from that List which is this:
# [0.48, 0.52][1] == 0.52

# done Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me