syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In H2O Sparkling Water how to navigate DataFrame?

An H2O DataFrame is a type of object with rows and columns:
scala> 
scala> 
scala> oosDF.vec(2).at(0)
res105: Double = 0.613
scala> 
scala> 

To get the second column I use syntax like this:

scala> oosDF.vec(2)

After I get a column, I can get the cell value of a row like this:

scala> oosDF.vec(2).at(0)


Another demo:

dan@pav ~/sw $ 
dan@pav ~/sw $ 
dan@pav ~/sw $ git log -1
commit dc7d5e76083496f47012ebdeb0390605a05e1579
Author: mmalohlava <michal.malohlava@gmail.com>
Date:   Wed Mar 4 14:27:21 2015 -0800

    [PUBDEV-458] Test for the issue.
dan@pav ~/sw $ 
dan@pav ~/sw $ 
dan@pav ~/sw $ bin/sparkling-shell 

-----
  Spark master (MASTER)     : local-cluster[1,2,1024]
  Spark home   (SPARK_HOME) : /home/dan/spark
----

Spark assembly has been built with Hive, including Datanucleus jars on classpath
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/   _/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.1
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80-ea)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.

scala> Spark assembly has been built with Hive, including Datanucleus jars on classpath
scala> 
scala> 

I typed in this:

import org.apache.spark.h2o._
val h2oContext = new H2OContext(sc).start()
import h2oContext._
import water.fvec._
import water.fvec.DataFrame


Then I typed in this:

scala> 
scala> 
scala> val mypath = ("examples/smalldata/prostate.csv")
mypath: String = examples/smalldata/prostate.csv
scala> 
scala> 


scala> 
scala> 
scala> val jfile = new java.io.File(mypath)
jfile: java.io.File = examples/smalldata/prostate.csv
scala> 
scala> 


scala> 
scala> 
scala> val pDataFrame = new DataFrame(jfile)
pDataFrame: water.fvec.DataFrame = 
Frame prostate.hex with 380 rows and 9 cols:
          ID  CAPSULE  AGE  RACE  DPROS  DCAPS                 PSA                 VOL  GLEASON
    min    1        0   43     0      1      1                 0.3                 0.0        0
   mean  190        0   66     1      2      1  15.408631578947354  15.812921052631584        6
 stddev  109        0    6     0      1      0   19.99757266856046   18.34761996727119        1
    max  380        1   79     2      4      2  139.70000000000002   97.60000000000001        9
missing    0        0    0     0      0      0                 0.0                 0.0        0
      0    1        0   65     1      2      1  1.4000000000000001                 0.0        6
      1    2        0   72     1      3     ...
scala> 
scala> 


scala> 
scala> 
scala> pDataFrame.vec(2).at(0)
res107: Double = 65.0
scala> 
scala> 


scala> 
scala> 
scala> pDataFrame('AGE).vec(0).at(0)
res108: Double = 65.0
scala> 
scala> 


scala> 
scala> 
scala> Array(0,1,2,3,4,5).foreach(colnum => print(pDataFrame.vec(colnum).at(0)))
1.00.065.01.02.01.0
scala> 
scala> 

Here is the general idea for printing row 0:

scala> 
scala> 
scala> Array(0,1,2,3,4,5).foreach(colnum => print(pDataFrame.vec(colnum).at(0)+","))
1.0,0.0,65.0,1.0,2.0,1.0,
scala> 
scala> 


syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me