syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In H2O how to start R Demo?

I took some notes while walking through the H2O R Tutorial:

http://docs.h2o.ai/Ruser/rtutorial.html
An h2o R demo by Dan.

I start this story with a URL:

http://h2o.ai/download/

I picked the most recent nightly development build.

It served me a page with a link which I wgetted:


dan@hp ~/Downloads $ 
dan@hp ~/Downloads $ wget http://s3.amazonaws.com/h2o-release/h2o/master/1689/h2o-2.9.0.1689.zip
--2015-02-07 09:40:00--  http://s3.amazonaws.com/h2o-release/h2o/master/1689/h2o-2.9.0.1689.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.244.4
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.244.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 145574470 (139M) [application/zip]
Saving to: 'h2o-2.9.0.1689.zip'

100%[======================================>] 145,574,470 1.65MB/s   in 1m 42s 

2015-02-07 09:41:43 (1.36 MB/s) - 'h2o-2.9.0.1689.zip' saved [145574470/145574470]

dan@hp ~/Downloads $ 
dan@hp ~/Downloads $ 
dan@hp ~/Downloads $ 

I did this:



dan@hp ~/Downloads $ 
dan@hp ~/Downloads $ cd /tmp/
dan@hp /tmp $ 
dan@hp /tmp $ unzip ~/Downloads/h2o-2.9.0.1689.zip
Archive:  /home/dan/Downloads/h2o-2.9.0.1689.zip
   creating: h2o-2.9.0.1689/
  inflating: h2o-2.9.0.1689/h2o-sources.jar  
   creating: h2o-2.9.0.1689/hadoop/
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp2.0.6.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_mapr2.1.3.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_mapr4.0.1.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh4_yarn.jar  
  inflating: h2o-2.9.0.1689/hadoop/README.txt  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh5.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp2.1.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh3.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp1.3.2.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp2.2.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh4.jar  
  inflating: h2o-2.9.0.1689/hadoop/h2odriver_mapr3.1.1.jar  
   creating: h2o-2.9.0.1689/ec2/
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-distribute-h2o.sh  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-distribute-flatfile.sh  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-print-info.py  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-launch-instances.py  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-test-ssh.sh  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-distribute-aws-credentials.sh  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-download-h2o.sh  
  inflating: h2o-2.9.0.1689/ec2/README.txt  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-stop-h2o.sh  
  inflating: h2o-2.9.0.1689/ec2/h2o-cluster-start-h2o.sh  
   creating: h2o-2.9.0.1689/tableau/
  inflating: h2o-2.9.0.1689/tableau/ClaimsTweedieAnalysis_8.2.twb  
  inflating: h2o-2.9.0.1689/tableau/Demo_Template_8.2.twb  
   creating: h2o-2.9.0.1689/tableau/meta_data/
  inflating: h2o-2.9.0.1689/tableau/meta_data/claims_metadata.csv  
  inflating: h2o-2.9.0.1689/tableau/meta_data/airlines_meta.csv  
  inflating: h2o-2.9.0.1689/tableau/meta_data/claims_coefficients.csv  
  inflating: h2o-2.9.0.1689/tableau/Demo_Template_8.1.twb  
  inflating: h2o-2.9.0.1689/tableau/TableauTutorial.docx  
   creating: h2o-2.9.0.1689/tableau/data/
  inflating: h2o-2.9.0.1689/tableau/data/claimsdata.csv.tar.xz  
  inflating: h2o-2.9.0.1689/README.txt  
   creating: h2o-2.9.0.1689/spark/
  inflating: h2o-2.9.0.1689/spark/README.txt  
   creating: h2o-2.9.0.1689/R/
  inflating: h2o-2.9.0.1689/R/README.txt  
  inflating: h2o-2.9.0.1689/R/h2o_2.9.0.1689.tar.gz  
  inflating: h2o-2.9.0.1689/h2o.jar  
  inflating: h2o-2.9.0.1689/LICENSE.txt  
  inflating: h2o-2.9.0.1689/h2o-model.jar  
dan@hp /tmp $ 
dan@hp /tmp $ ll h2o
ls: cannot access h2o: No such file or directory
dan@hp /tmp $ 
dan@hp /tmp $ 
dan@hp /tmp $ ln -s h2o-2.9.0.1689 h2o
dan@hp /tmp $ 
dan@hp /tmp $ cd h2o
dan@hp /tmp/h2o $ cd R
dan@hp /tmp/h2o/R $ 
dan@hp /tmp/h2o/R $ ll
total 40424
drwxr-xr-x 2 dan dan     4096 Feb  6 23:03 ./
drwxr-xr-x 7 dan dan     4096 Feb  6 23:03 ../
-rw-r--r-- 1 dan dan     1624 Feb  6 23:00 README.txt
-rw-r--r-- 1 dan dan 41378429 Feb  6 23:02 h2o_2.9.0.1689.tar.gz
dan@hp /tmp/h2o/R $ 
dan@hp /tmp/h2o/R $ 
dan@hp /tmp/h2o/R $ R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 
> 
> remove.packages('h2o')
Removing package from '/home/dan/rdir/lib/R/library'
(as 'lib' is unspecified)
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
> 
> 
> install.packages("h2o_2.9.0.1689.tar.gz", repos = NULL, type = "source")
* installing *source* package 'h2o' ...
** R
** demo
** inst
** preparing package for lazy loading
Creating a generic function for 'summary' from package 'base' in package 'h2o'
Creating a generic function for 'colnames' from package 'base' in package 'h2o'
Creating a generic function for 't' from package 'base' in package 'h2o'
Creating a generic function for 'colnames<-' from package 'base' in package 'h2o'
Creating a generic function for 'nrow' from package 'base' in package 'h2o'
Creating a generic function for 'ncol' from package 'base' in package 'h2o'
Creating a generic function for 'sd' from package 'stats' in package 'h2o'
Creating a generic function for 'var' from package 'stats' in package 'h2o'
Creating a generic function for 'as.factor' from package 'base' in package 'h2o'
Creating a generic function for 'is.factor' from package 'base' in package 'h2o'
Creating a generic function for 'which' from package 'base' in package 'h2o'
Creating a generic function for 'levels' from package 'base' in package 'h2o'
Creating a generic function for 'apply' from package 'base' in package 'h2o'
Creating a generic function for 'findInterval' from package 'base' in package 'h2o'
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (h2o)
> 
> 
> library(h2o)
Loading required package: statmod
Loading required package: survival
Loading required package: splines

----------------------------------------------------------------------

Your next step is to start H2O and get a connection object (named
'localH2O', for example):
    > localH2O = h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.0xdata.com

----------------------------------------------------------------------


Attaching package: 'h2o'

The following objects are masked from 'package:base':

    ifelse, max, min, strsplit, sum, tolower, toupper

> 
> 
> 
> localH2O = h2o.init()

H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpIy4qIS/h2o_dan_started_from_r.out
    /tmp/RtmpIy4qIS/h2o_dan_started_from_r.err

java version "1.7.0_60-ea"
Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)

Successfully connected to http://127.0.0.1:54321 

R is connected to H2O cluster:
    H2O cluster uptime:         6 seconds 247 milliseconds 
    H2O cluster version:        2.9.0.1689 
    H2O cluster name:           H2O_started_from_R 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.54 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 

Note:  As started, H2O is limited to the CRAN default of 2 CPUs.
       Shut down and restart H2O as shown below to use all your CPUs.
           > h2o.shutdown(localH2O)
           > localH2O = h2o.init(nthreads = -1)

I tried some syntax I found in the README:

> 
> 
> irisPath = system.file("extdata", "iris.csv", package="h2o")
> 
> irisPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/iris.csv"
> 
> 
> iris.hex = h2o.importFile(localH2O, irisPath)
  |======================================================================| 100%
> 
> 
> summary(iris.hex)
 C1              C2              C3              C4             
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.300   Median :1.300  
 Mean   :5.843   Mean   :3.054   Mean   :3.759   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
 C5                 
 Iris-setosa    :50 
 Iris-versicolor:50 
 Iris-virginica :50 
                    
                    
                    
> 
> 
> head(iris.hex)
   C1  C2  C3  C4          C5
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
6 5.4 3.9 1.7 0.4 Iris-setosa
> 
> 
> 


I looked here:

http://docs.h2o.ai/Ruser/rtutorial.html


dan@hp /tmp $ 
dan@hp /tmp $ 
dan@hp /tmp $ wget spy611.com/spy611.csv
--2015-02-07 10:05:49--  http://spy611.com/spy611.csv
Resolving spy611.com (spy611.com)... 50.63.202.7
Connecting to spy611.com (spy611.com)|50.63.202.7|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /spy611.csv [following]
--2015-02-07 10:05:49--  http://spy611.com/spy611.csv
Connecting to spy611.com (spy611.com)|50.63.202.7|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.spy611.com/spy611.csv [following]
--2015-02-07 10:05:49--  http://www.spy611.com/spy611.csv
Resolving www.spy611.com (www.spy611.com)... 54.243.122.132
Connecting to www.spy611.com (www.spy611.com)|54.243.122.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 773103 (755K) [text/csv]
Saving to: 'spy611.csv'

100%[======================================>] 773,103      620KB/s   in 1.2s   

2015-02-07 10:05:51 (620 KB/s) - 'spy611.csv' saved [773103/773103]

dan@hp /tmp $ 
dan@hp /tmp $ 
dan@hp /tmp $ 


> 
> spy611.hex = h2o.importFile(localH2O, path = '/tmp/spy611.csv', key = "spy611.hex")
  |======================================================================| 100%
> class(spy611.hex)
[1] "H2OParsedData"
attr(,"package")
[1] "h2o"
> 
> 
> tail(spy611.hex)
       algo close_price_date prediction pct_gain
27820 lr2lr    1422489600000     0.5110    -1.30
27821 lr2lr    1422576000000     0.5315     1.30
27822 lr2lr    1422835200000     0.5167     1.44
27823 lr2lr    1422921600000     0.5043    -0.42
27824 lr2lr    1423008000000     0.5138     1.03
27825 lr2lr    1423094400000     0.5031       NA
> 
> 



> h2o.anyFactor(iris.hex)
[1] TRUE
> 
> prosPath <- system.file("extdata", "prostate.csv", package="h2o")
> 
> prosPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv"
> 
> 
> prostate.hex = h2o.importFile(localH2O, path = prosPath)
  |======================================================================| 100%
> 
> 
> head(prostate.hex)
  ID CAPSULE AGE RACE DPROS DCAPS  PSA  VOL GLEASON
1  1       0  65    1     2     1  1.4  0.0       6
2  2       0  72    1     3     2  6.7  0.0       7
3  3       0  70    1     1     2  4.9  0.0       6
4  4       0  76    2     2     1 51.2 20.0       7
5  5       0  69    1     1     1 12.3 55.9       6
6  6       1  71    1     3     2  3.3  0.0       8
> 
> summary(prostate.hex)
 ID               CAPSULE          AGE             RACE           
 Min.   :  1.00   Min.   :0.0000   Min.   :43.00   Min.   :0.000  
 1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   1st Qu.:1.000  
 Median :190.50   Median :0.0000   Median :67.00   Median :1.000  
 Mean   :190.50   Mean   :0.4026   Mean   :66.04   Mean   :1.087  
 3rd Qu.:285.25   3rd Qu.:1.0000   3rd Qu.:71.00   3rd Qu.:1.000  
 Max.   :380.00   Max.   :1.0000   Max.   :79.00   Max.   :2.000  
 DPROS           DCAPS           PSA               VOL            
 Min.   :1.000   Min.   :1.000   Min.   :  0.300   Min.   : 0.00  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:  5.000   1st Qu.: 0.00  
 Median :2.000   Median :1.000   Median :  8.725   Median :14.25  
 Mean   :2.271   Mean   :1.108   Mean   : 15.409   Mean   :15.81  
 3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.: 17.125   3rd Qu.:26.45  
 Max.   :4.000   Max.   :2.000   Max.   :139.700   Max.   :97.60  
 GLEASON        
 Min.   :0.000  
 1st Qu.:6.000  
 Median :6.000  
 Mean   :6.384  
 3rd Qu.:7.000  
 Max.   :9.000  
> 
> 
> 
> prostate.data.frame<- as.data.frame(prostate.hex)
> 
> 
> summary(prostate.data.frame)
       ID            CAPSULE            AGE             RACE      
 Min.   :  1.00   Min.   :0.0000   Min.   :43.00   Min.   :0.000  
 1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   1st Qu.:1.000  
 Median :190.50   Median :0.0000   Median :67.00   Median :1.000  
 Mean   :190.50   Mean   :0.4026   Mean   :66.04   Mean   :1.087  
 3rd Qu.:285.25   3rd Qu.:1.0000   3rd Qu.:71.00   3rd Qu.:1.000  
 Max.   :380.00   Max.   :1.0000   Max.   :79.00   Max.   :2.000  
     DPROS           DCAPS            PSA              VOL       
 Min.   :1.000   Min.   :1.000   Min.   :  0.30   Min.   : 0.00  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:  5.00   1st Qu.: 0.00  
 Median :2.000   Median :1.000   Median :  8.75   Median :14.25  
 Mean   :2.271   Mean   :1.108   Mean   : 15.41   Mean   :15.81  
 3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.: 17.12   3rd Qu.:26.45  
 Max.   :4.000   Max.   :2.000   Max.   :139.70   Max.   :97.60  
    GLEASON     
 Min.   :0.000  
 1st Qu.:6.000  
 Median :6.000  
 Mean   :6.384  
 3rd Qu.:7.000  
 Max.   :9.000  
> 
> 
> head(prostate.hex[,4])
  RACE
1    1
2    1
3    1
4    2
5    1
6    1
> 
> prostate.hex[1:4,4]
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.5 

  RACE
1    1
2    1
3    1
4    2
> 
> 


> 
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> 
> 
> 
> class(iris)
[1] "data.frame"
> 
> 
> iris.h2o = as.h2o(localH2O, iris, key="iris.h2o")
  |======================================================================| 100%
> 
> 
> class(iris.h2o)
[1] "H2OParsedData"
attr(,"package")
[1] "h2o"
> 
> iris.h2o[1:5,]
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.7 

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
> 
> 
> iris.h2o[0:5,]
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.9 

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
> 
> 
> prosPath = system.file("extdata", "prostate.csv", package="h2o")
> 
> prosPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv"
> 
> 
> prostate.hex = h2o.importFile(localH2O, path = prosPath)
  |======================================================================| 100%
> 
> 
> prostate.qs = quantile(prostate.hex$PSA)
> 
> class(prostate.qs)
[1] "numeric"
> 
> 
> head(prostate.qs)
     0%     25%     50%     75%    100% 
  0.300   5.000   8.750  17.125 139.700 
> 
> 
> head(prostate.hex$PSA)
   PSA
1  1.4
2  6.7
3  4.9
4 51.2
5 12.3
6  3.3
> 
> summary(prostate.hex$PSA)
 PSA              
 Min.   :  0.300  
 1st Qu.:  5.000  
 Median :  8.725  
 Mean   : 15.409  
 3rd Qu.: 17.125  
 Max.   :139.700  
> 
> 
> predicate1 = prostate.hex$PSA <= prostate.qs[2] 
> 
> head(predicate1)
  PSA
1   1
2   0
3   1
4   0
5   0
6   1
> 
> 
> prostate.qs[2]
25% 
  5 
> 
> head(prostate.hex$PSA)
   PSA
1  1.4
2  6.7
3  4.9
4 51.2
5 12.3
6  3.3
> 
> predicate2 = prostate.hex$PSA >=   prostate.qs[10]
> 
> prostate.qs[10]
<NA> 
  NA 
> predicate2 = prostate.hex$PSA >=   prostate.qs[5]
> 
> prostate.qs[5]
 100% 
139.7 
> 
> predicate3 = predicate1 | predicate2
> 
> PSA.outers = prostate.hex[predicate3]
> 
> nrow(PSA.outers)
[1] 98
> 
> nrow(prostate.hex)
[1] 380
> 
> 
> PSA.outers = h2o.assign(PSA.outers,'PSA.outers')
> 
> nrow(PSA.outers)
[1] 98
> 
> 


> print('colnames demo')
[1] "colnames demo"
> 
> 
> colnames(prostate.hex)
[1] "ID"      "CAPSULE" "AGE"     "RACE"    "DPROS"   "DCAPS"   "PSA"    
[8] "VOL"     "GLEASON"
> 
> 
> 

> 
> min(prostate.hex$PSA)
[1] 0.3
> 
> max(prostate.hex$PSA)
[1] 139.7
> 
> quantile(prostate.hex$AGE)
  0%  25%  50%  75% 100% 
  43   62   67   71   79 
> 
> quantile(prostate.hex)
Error in quantile.H2OParsedData(prostate.hex) : 
  quantile only operates on a single column
> 
> 
> 
> summary(prostate.hex[,4:6])
 RACE            DPROS           DCAPS          
 Min.   :0.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
 Median :1.000   Median :2.000   Median :1.000  
 Mean   :1.087   Mean   :2.271   Mean   :1.108  
 3rd Qu.:1.000   3rd Qu.:3.000   3rd Qu.:1.000  
 Max.   :2.000   Max.   :4.000   Max.   :2.000  
> 
> 
> 


> 
> 
> h2o.table(prostate.hex[0:11,3])
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.39 

  row.names Count
1        61     1
2        65     1
3        68     3
4        69     2
5        70     1
6        71     1
> 
> 
> h2o.table(prostate.hex[0:22,3])
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.42 

  row.names Count
1        54     1
2        58     1
3        61     1
4        65     2
5        67     1
6        68     3
> h2o.table(prostate.hex[0:44,3])
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.45 

  row.names Count
1        54     3
2        58     2
3        60     1
4        61     1
5        63     1
6        64     1
> nrow(h2o.table(prostate.hex[0:44,3]))
[1] 20
> 
> nrow(h2o.table(prostate.hex[,3]))
[1] 32
> h2o.table(prostate.hex[,3])
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.52 

  row.names Count
1        43     1
2        47     1
3        50     2
4        51     3
5        52     2
6        53     4
> 
> 

Hello World,

In R, I am curious about h2o.table()

I start with this syntax:

> 
> h2o.clusterInfo(localH2O)
R is connected to H2O cluster:
    H2O cluster uptime:         13 hours 23 minutes 
    H2O cluster version:        2.9.0.1689 
    H2O cluster name:           H2O_started_from_R 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.54 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 
> 
> 

I load the prostate data set:


> 
> prosPath = system.file("extdata", "prostate.csv", package="h2o")
> 
> prosPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv"
> 
> 
> prostate.hex = h2o.importFile(localH2O, path = prosPath)
  |======================================================================| 100%
> 
> 


I look at column 3:

>
> head(prostate.hex[,3])
  AGE
1  65
2  72
3  70
4  76
5  69
6  71
> 
> summary(prostate.hex[,3])
 AGE            
 Min.   :43.00  
 1st Qu.:62.00  
 Median :67.00  
 Mean   :66.04  
 3rd Qu.:71.00  
 Max.   :79.00  
> 
> nrow(prostate.hex[,3])
[1] 380
> 
> 


Now I try h2o.table()

> 
> nrow(h2o.table(prostate.hex[,3]))
[1] 32
> 
> 

nrow() tells me I should see 32 rows returned from h2o.table()

So, I try h2o.table()

> 
> 
> h2o.table(prostate.hex[,3])
IP Address: 127.0.0.1 
Port      : 54321 
Parsed Data Key: Last.value.61 

  row.names Count
1        43     1
2        47     1
3        50     2
4        51     3
5        52     2
6        53     4
> 
> 

I should see 32 rows returned from h2o.table()
but I only see 6 rows.

Question: 
How do I see the 32 rows which should be returned from h2o.table() ?


Generate a column of random numbers: > > s = h2o.runif(prostate.hex) > > summary(s) rnd Min. :0.001434 1st Qu.:0.241275 Median :0.496995 Mean :0.489468 3rd Qu.:0.740592 Max. :0.994894 > > Here is a 5% sample of prostate.hex: > my5pct = prostate.hex[s <= 5.0/100.0] > > nrow(prostate.hex) [1] 380 > > nrow(my5pct) [1] 29 > >
Subject: In R how I convert split-frame to H2OParsedData? I start with this syntax: > > h2o.clusterInfo(localH2O) R is connected to H2O cluster: H2O cluster uptime: 13 hours 23 minutes H2O cluster version: 2.9.0.1689 H2O cluster name: H2O_started_from_R H2O cluster total nodes: 1 H2O cluster total memory: 1.54 GB H2O cluster total cores: 4 H2O cluster allowed cores: 2 H2O cluster healthy: TRUE > > I load the prostate data set: > > prosPath = system.file("extdata", "prostate.csv", package="h2o") > > prosPath [1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv" > > > prostate.hex = h2o.importFile(localH2O, path = prosPath) |======================================================================| 100% > > I call h2o.splitFrame() > > prostate.split = h2o.splitFrame(data = prostate.hex , ratios = 0.75) > > prostate.train = prostate.split[1] > prostate.test = prostate.split[2] > I look at it. > class(prostate.split) [1] "list" > > It sort of acts like a H2OParsedData object: > > class(prostate.split[1]) [1] "list" > > head(prostate.split[1]) [[1]] IP Address: 127.0.0.1 Port : 54321 Parsed Data Key: prostate2_part0.hex ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON 1 1 0 65 1 2 1 1.4 0.0 6 2 2 0 72 1 3 2 6.7 0.0 7 3 3 0 70 1 1 2 4.9 0.0 6 4 4 0 76 2 2 1 51.2 20.0 7 5 5 0 69 1 1 1 12.3 55.9 6 6 6 1 71 1 3 2 3.3 0.0 8 But not always: [1] "h2o" > nrow(prostate.split[1]) NULL > > nrow(prostate.split) NULL > > nrow(prostate.hex) [1] 380 > > Question: How do I convert a splitframe-object into a H2OParsedData object?
At this URL: docs.h2o.ai/Ruser/rtutorial.html I see some useful syntax at the end of the page: > mygbm_model = h2o.getModel(key = 'GBM_bb1a4664cbf6f99be853ebbdc2d4df6', h2o = localH2O) > > mygbm_model IP Address: 127.0.0.1 Port : 54321 Parsed Data Key: australia.hex GBM Model Key: GBM_bb1a4664cbf6f99be853ebbdc2d4df6 Overall Mean-squared Error: 31462.59 > > The above syntax is useful in an interactive context. I cannot think of any usecase when my script would have access to the key. But I can think of situations when I would see the key in a web-UI and want to use the associated model in a script or an R session. Another call which is useful for an interactive session is h2o.ls(localH2O). Also it is useful for looking at how objects are distributed in a cluster: > > > h2o.ls(localH2O) Key Bytesize 1 GBMPredict_f986d47542464833ba3b583b8c4ed1e5 80 2 GBM_9eab586732156bfd511a36a5015642f7 1474 3 GBM_a410fa049380b1f968c5dd46f3104fa1 1473 4 GBM_bb1a4664cbf6f99be853ebbdc2d4df6 1468 5 Last.value.11 844 6 Last.value.14 108 7 Last.value.15 844 8 Last.value.17 844 9 Last.value.18 844 10 Last.value.19 118 11 Last.value.21 844 12 Last.value.23 844 13 Last.value.24 80 14 Last.value.25 844 15 Last.value.26 76 16 Last.value.27 118 17 Last.value.28 2881 18 Last.value.29 844 19 Last.value.3 448 20 Last.value.31 844 21 Last.value.33 448 22 Last.value.36 73 23 Last.value.37 1344 24 Last.value.38 79 25 Last.value.39 152 26 Last.value.41 90 27 Last.value.42 162 28 Last.value.44 112 29 Last.value.45 176 30 Last.value.47 112 31 Last.value.48 176 32 Last.value.49 448 33 Last.value.5 72 34 Last.value.50 200 35 Last.value.51 448 36 Last.value.52 200 37 Last.value.54 448 38 Last.value.56 448 39 Last.value.57 448 40 Last.value.58 448 41 Last.value.59 200 42 Last.value.60 448 43 Last.value.61 200 44 Last.value.63 3108 45 Last.value.64 118 46 Last.value.65 1285 47 Last.value.68 640 48 Last.value.7 484 49 Last.value.70 560 50 Last.value.72 560 51 Last.value.9 484 52 PSA.nonouters 2881 53 australia.hex 3919 54 iris.h2o 1154 55 iris.hex 1154 56 prostate.hex 4874 57 prostate1.hex 4874 58 prostate2.hex 4874 59 prostate2_part0.hex 3817 60 prostate2_part1.hex 1624 61 spy611.hex 362029 > > > h2o.ls() is a simple call; I cannot pass it an object name: > > h2o.ls(localH2O, keys = 'prostate2_part1.hex') Error in h2o.ls(localH2O, keys = "prostate2_part1.hex") : unused argument (keys = "prostate2_part1.hex") > > On my laptop where my localH2O has only 1GB of memory, I might want to remove large objects: > > h2o.rm(object= localH2O, keys= "spy611.hex") > > Currently that is the end of my h2o R demo. I may add more syntax demos in the future.


syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me