syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In Numpy how do I apply WHERE?

I just wrote a post on how to implement what I call WHERE predicates for Pandas.

posts/pandas_where

How do I implement WHERE predicates for NumPy Arrays?

Let NumPy do the talking:

I have a Python Pandas DataFrame which looks like this:
(Pdb) csv1[['ydates','cp']].tail()
           ydates       cp
10095  2014-12-24  2081.88
10096  2014-12-26  2088.77
10097  2014-12-29  2090.57
10098  2014-12-30  2080.35
10099  2014-12-31  2058.90

I turned it into a NumPy Array:
(Pdb) np1 = csv1[['ydates','cp']].tail().values
(Pdb) 
(Pdb) np1
array([['2014-12-24', 2081.88],
       ['2014-12-26', 2088.77],
       ['2014-12-29', 2090.57],
       ['2014-12-30', 2080.35],
       ['2014-12-31', 2058.9]], dtype=object)
(Pdb) 

If these rows were in Sqlite I could do this SELECT:
SELECT ydates,cp FROM csv1 WHERE ydates > '2014-12-24';

Then, Sqlite would return the last 4 rows.

How do I return the last 4 rows using NumPy?

When I use NumPy for this type of task, I first focus my mind on what I call the 'predicate'.

In this example the predicate is this:
ydates > '2014-12-24';

A proper predicate returns a Boolean.

The above predicate works well in SQL.

How do I write it in NumPy?

Answer:
np1[:,0] > '2014-12-24'

The above expression returns an Array of Booleans:
(Pdb) nppred = np1[:,0] > '2014-12-24'
(Pdb) nppred
array([False,  True,  True,  True,  True], dtype=bool)

When I look at the above Array, I visualize it as a column rotated clockwise.

Also, note that the colon is a NumPy expression which represents all rows.

And, note that the zero is a NumPy expression which represents the 0th column which in this case is a column full of dates.

When working with NumPy I quickly memorized that row-expressions belong on the left of the comma and column-expressions belong on the right.

If you work with R it will seem familiar.

If you work with Pandas a lot, it may add cognitive load to your thoughts.

Now that I have my predicate, and from that I have my column full of Booleans, I can apply it to my NumPy Array:
(Pdb) np1[nppred,:]
array([['2014-12-26', 2088.77],
       ['2014-12-29', 2090.57],
       ['2014-12-30', 2080.35],
       ['2014-12-31', 2058.9]], dtype=object)

I just demonstrated a common idea in Pandas, NumPy, and R. That is, to apply a column full of Booleans to data shaped like a table.

If I match a True to a row in the table, I get that row.

If I match a False to a row in the table, I get nothing.

The way I do this in NumPy is to place the Booleans on the left hand side of the comma which is also left of the column-expression (in this example the column-expression is just a single colon which represents 'all-columns' which is easy to type but sometimes easy to miss).

In Pandas I place the Booleans on the right hand side of the columns.

In SQL, like Pandas, predicates also go on the right hand side of the columns.

Remember: In NumPy we place the Booleans LEFT of the columns.

Let the syntax do the talking:
(Pdb) nppred
array([False,  True,  True,  True,  True], dtype=bool)

(Pdb) np1[nppred,:]
array([['2014-12-26', 2088.77],
       ['2014-12-29', 2090.57],
       ['2014-12-30', 2080.35],
       ['2014-12-31', 2058.9]], dtype=object)

Here is another demo.
How do I apply this predicate?
WHERE ydates == '2014-12-24'

Answer:
(Pdb) nppred2 = np1[:,0] == '2014-12-24'
(Pdb) 
(Pdb) np1[nppred2,:]
array([['2014-12-24', 2081.88]], dtype=object)
(Pdb) 


syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me