Question:
In H2O how to start R Demo?
I took some notes while walking through the H2O R Tutorial:
http://docs.h2o.ai/Ruser/rtutorial.html
An h2o R demo by Dan.
I start this story with a URL:
http://h2o.ai/download/
I picked the most recent nightly development build.
It served me a page with a link which I wgetted:
dan@hp ~/Downloads $
dan@hp ~/Downloads $ wget http://s3.amazonaws.com/h2o-release/h2o/master/1689/h2o-2.9.0.1689.zip
--2015-02-07 09:40:00-- http://s3.amazonaws.com/h2o-release/h2o/master/1689/h2o-2.9.0.1689.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.244.4
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.244.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 145574470 (139M) [application/zip]
Saving to: 'h2o-2.9.0.1689.zip'
100%[======================================>] 145,574,470 1.65MB/s in 1m 42s
2015-02-07 09:41:43 (1.36 MB/s) - 'h2o-2.9.0.1689.zip' saved [145574470/145574470]
dan@hp ~/Downloads $
dan@hp ~/Downloads $
dan@hp ~/Downloads $
I did this:
dan@hp ~/Downloads $
dan@hp ~/Downloads $ cd /tmp/
dan@hp /tmp $
dan@hp /tmp $ unzip ~/Downloads/h2o-2.9.0.1689.zip
Archive: /home/dan/Downloads/h2o-2.9.0.1689.zip
creating: h2o-2.9.0.1689/
inflating: h2o-2.9.0.1689/h2o-sources.jar
creating: h2o-2.9.0.1689/hadoop/
inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp2.0.6.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_mapr2.1.3.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_mapr4.0.1.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh4_yarn.jar
inflating: h2o-2.9.0.1689/hadoop/README.txt
inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh5.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp2.1.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh3.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp1.3.2.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_hdp2.2.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_cdh4.jar
inflating: h2o-2.9.0.1689/hadoop/h2odriver_mapr3.1.1.jar
creating: h2o-2.9.0.1689/ec2/
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-distribute-h2o.sh
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-distribute-flatfile.sh
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-print-info.py
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-launch-instances.py
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-test-ssh.sh
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-distribute-aws-credentials.sh
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-download-h2o.sh
inflating: h2o-2.9.0.1689/ec2/README.txt
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-stop-h2o.sh
inflating: h2o-2.9.0.1689/ec2/h2o-cluster-start-h2o.sh
creating: h2o-2.9.0.1689/tableau/
inflating: h2o-2.9.0.1689/tableau/ClaimsTweedieAnalysis_8.2.twb
inflating: h2o-2.9.0.1689/tableau/Demo_Template_8.2.twb
creating: h2o-2.9.0.1689/tableau/meta_data/
inflating: h2o-2.9.0.1689/tableau/meta_data/claims_metadata.csv
inflating: h2o-2.9.0.1689/tableau/meta_data/airlines_meta.csv
inflating: h2o-2.9.0.1689/tableau/meta_data/claims_coefficients.csv
inflating: h2o-2.9.0.1689/tableau/Demo_Template_8.1.twb
inflating: h2o-2.9.0.1689/tableau/TableauTutorial.docx
creating: h2o-2.9.0.1689/tableau/data/
inflating: h2o-2.9.0.1689/tableau/data/claimsdata.csv.tar.xz
inflating: h2o-2.9.0.1689/README.txt
creating: h2o-2.9.0.1689/spark/
inflating: h2o-2.9.0.1689/spark/README.txt
creating: h2o-2.9.0.1689/R/
inflating: h2o-2.9.0.1689/R/README.txt
inflating: h2o-2.9.0.1689/R/h2o_2.9.0.1689.tar.gz
inflating: h2o-2.9.0.1689/h2o.jar
inflating: h2o-2.9.0.1689/LICENSE.txt
inflating: h2o-2.9.0.1689/h2o-model.jar
dan@hp /tmp $
dan@hp /tmp $ ll h2o
ls: cannot access h2o: No such file or directory
dan@hp /tmp $
dan@hp /tmp $
dan@hp /tmp $ ln -s h2o-2.9.0.1689 h2o
dan@hp /tmp $
dan@hp /tmp $ cd h2o
dan@hp /tmp/h2o $ cd R
dan@hp /tmp/h2o/R $
dan@hp /tmp/h2o/R $ ll
total 40424
drwxr-xr-x 2 dan dan 4096 Feb 6 23:03 ./
drwxr-xr-x 7 dan dan 4096 Feb 6 23:03 ../
-rw-r--r-- 1 dan dan 1624 Feb 6 23:00 README.txt
-rw-r--r-- 1 dan dan 41378429 Feb 6 23:02 h2o_2.9.0.1689.tar.gz
dan@hp /tmp/h2o/R $
dan@hp /tmp/h2o/R $
dan@hp /tmp/h2o/R $ R
R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
>
> remove.packages('h2o')
Removing package from '/home/dan/rdir/lib/R/library'
(as 'lib' is unspecified)
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
>
>
> install.packages("h2o_2.9.0.1689.tar.gz", repos = NULL, type = "source")
* installing *source* package 'h2o' ...
** R
** demo
** inst
** preparing package for lazy loading
Creating a generic function for 'summary' from package 'base' in package 'h2o'
Creating a generic function for 'colnames' from package 'base' in package 'h2o'
Creating a generic function for 't' from package 'base' in package 'h2o'
Creating a generic function for 'colnames<-' from package 'base' in package 'h2o'
Creating a generic function for 'nrow' from package 'base' in package 'h2o'
Creating a generic function for 'ncol' from package 'base' in package 'h2o'
Creating a generic function for 'sd' from package 'stats' in package 'h2o'
Creating a generic function for 'var' from package 'stats' in package 'h2o'
Creating a generic function for 'as.factor' from package 'base' in package 'h2o'
Creating a generic function for 'is.factor' from package 'base' in package 'h2o'
Creating a generic function for 'which' from package 'base' in package 'h2o'
Creating a generic function for 'levels' from package 'base' in package 'h2o'
Creating a generic function for 'apply' from package 'base' in package 'h2o'
Creating a generic function for 'findInterval' from package 'base' in package 'h2o'
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (h2o)
>
>
> library(h2o)
Loading required package: statmod
Loading required package: survival
Loading required package: splines
----------------------------------------------------------------------
Your next step is to start H2O and get a connection object (named
'localH2O', for example):
> localH2O = h2o.init()
For H2O package documentation, ask for help:
> ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.0xdata.com
----------------------------------------------------------------------
Attaching package: 'h2o'
The following objects are masked from 'package:base':
ifelse, max, min, strsplit, sum, tolower, toupper
>
>
>
> localH2O = h2o.init()
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
/tmp/RtmpIy4qIS/h2o_dan_started_from_r.out
/tmp/RtmpIy4qIS/h2o_dan_started_from_r.err
java version "1.7.0_60-ea"
Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)
Successfully connected to http://127.0.0.1:54321
R is connected to H2O cluster:
H2O cluster uptime: 6 seconds 247 milliseconds
H2O cluster version: 2.9.0.1689
H2O cluster name: H2O_started_from_R
H2O cluster total nodes: 1
H2O cluster total memory: 1.54 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 2
H2O cluster healthy: TRUE
Note: As started, H2O is limited to the CRAN default of 2 CPUs.
Shut down and restart H2O as shown below to use all your CPUs.
> h2o.shutdown(localH2O)
> localH2O = h2o.init(nthreads = -1)
I tried some syntax I found in the README:
>
>
> irisPath = system.file("extdata", "iris.csv", package="h2o")
>
> irisPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/iris.csv"
>
>
> iris.hex = h2o.importFile(localH2O, irisPath)
|======================================================================| 100%
>
>
> summary(iris.hex)
C1 C2 C3 C4
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.300 Median :1.300
Mean :5.843 Mean :3.054 Mean :3.759 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
C5
Iris-setosa :50
Iris-versicolor:50
Iris-virginica :50
>
>
> head(iris.hex)
C1 C2 C3 C4 C5
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
6 5.4 3.9 1.7 0.4 Iris-setosa
>
>
>
I looked here:
http://docs.h2o.ai/Ruser/rtutorial.html
dan@hp /tmp $
dan@hp /tmp $
dan@hp /tmp $ wget spy611.com/spy611.csv
--2015-02-07 10:05:49-- http://spy611.com/spy611.csv
Resolving spy611.com (spy611.com)... 50.63.202.7
Connecting to spy611.com (spy611.com)|50.63.202.7|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /spy611.csv [following]
--2015-02-07 10:05:49-- http://spy611.com/spy611.csv
Connecting to spy611.com (spy611.com)|50.63.202.7|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.spy611.com/spy611.csv [following]
--2015-02-07 10:05:49-- http://www.spy611.com/spy611.csv
Resolving www.spy611.com (www.spy611.com)... 54.243.122.132
Connecting to www.spy611.com (www.spy611.com)|54.243.122.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 773103 (755K) [text/csv]
Saving to: 'spy611.csv'
100%[======================================>] 773,103 620KB/s in 1.2s
2015-02-07 10:05:51 (620 KB/s) - 'spy611.csv' saved [773103/773103]
dan@hp /tmp $
dan@hp /tmp $
dan@hp /tmp $
>
> spy611.hex = h2o.importFile(localH2O, path = '/tmp/spy611.csv', key = "spy611.hex")
|======================================================================| 100%
> class(spy611.hex)
[1] "H2OParsedData"
attr(,"package")
[1] "h2o"
>
>
> tail(spy611.hex)
algo close_price_date prediction pct_gain
27820 lr2lr 1422489600000 0.5110 -1.30
27821 lr2lr 1422576000000 0.5315 1.30
27822 lr2lr 1422835200000 0.5167 1.44
27823 lr2lr 1422921600000 0.5043 -0.42
27824 lr2lr 1423008000000 0.5138 1.03
27825 lr2lr 1423094400000 0.5031 NA
>
>
> h2o.anyFactor(iris.hex)
[1] TRUE
>
> prosPath <- system.file("extdata", "prostate.csv", package="h2o")
>
> prosPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv"
>
>
> prostate.hex = h2o.importFile(localH2O, path = prosPath)
|======================================================================| 100%
>
>
> head(prostate.hex)
ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON
1 1 0 65 1 2 1 1.4 0.0 6
2 2 0 72 1 3 2 6.7 0.0 7
3 3 0 70 1 1 2 4.9 0.0 6
4 4 0 76 2 2 1 51.2 20.0 7
5 5 0 69 1 1 1 12.3 55.9 6
6 6 1 71 1 3 2 3.3 0.0 8
>
> summary(prostate.hex)
ID CAPSULE AGE RACE
Min. : 1.00 Min. :0.0000 Min. :43.00 Min. :0.000
1st Qu.: 95.75 1st Qu.:0.0000 1st Qu.:62.00 1st Qu.:1.000
Median :190.50 Median :0.0000 Median :67.00 Median :1.000
Mean :190.50 Mean :0.4026 Mean :66.04 Mean :1.087
3rd Qu.:285.25 3rd Qu.:1.0000 3rd Qu.:71.00 3rd Qu.:1.000
Max. :380.00 Max. :1.0000 Max. :79.00 Max. :2.000
DPROS DCAPS PSA VOL
Min. :1.000 Min. :1.000 Min. : 0.300 Min. : 0.00
1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 5.000 1st Qu.: 0.00
Median :2.000 Median :1.000 Median : 8.725 Median :14.25
Mean :2.271 Mean :1.108 Mean : 15.409 Mean :15.81
3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 17.125 3rd Qu.:26.45
Max. :4.000 Max. :2.000 Max. :139.700 Max. :97.60
GLEASON
Min. :0.000
1st Qu.:6.000
Median :6.000
Mean :6.384
3rd Qu.:7.000
Max. :9.000
>
>
>
> prostate.data.frame<- as.data.frame(prostate.hex)
>
>
> summary(prostate.data.frame)
ID CAPSULE AGE RACE
Min. : 1.00 Min. :0.0000 Min. :43.00 Min. :0.000
1st Qu.: 95.75 1st Qu.:0.0000 1st Qu.:62.00 1st Qu.:1.000
Median :190.50 Median :0.0000 Median :67.00 Median :1.000
Mean :190.50 Mean :0.4026 Mean :66.04 Mean :1.087
3rd Qu.:285.25 3rd Qu.:1.0000 3rd Qu.:71.00 3rd Qu.:1.000
Max. :380.00 Max. :1.0000 Max. :79.00 Max. :2.000
DPROS DCAPS PSA VOL
Min. :1.000 Min. :1.000 Min. : 0.30 Min. : 0.00
1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 5.00 1st Qu.: 0.00
Median :2.000 Median :1.000 Median : 8.75 Median :14.25
Mean :2.271 Mean :1.108 Mean : 15.41 Mean :15.81
3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 17.12 3rd Qu.:26.45
Max. :4.000 Max. :2.000 Max. :139.70 Max. :97.60
GLEASON
Min. :0.000
1st Qu.:6.000
Median :6.000
Mean :6.384
3rd Qu.:7.000
Max. :9.000
>
>
> head(prostate.hex[,4])
RACE
1 1
2 1
3 1
4 2
5 1
6 1
>
> prostate.hex[1:4,4]
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.5
RACE
1 1
2 1
3 1
4 2
>
>
>
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
>
>
>
> class(iris)
[1] "data.frame"
>
>
> iris.h2o = as.h2o(localH2O, iris, key="iris.h2o")
|======================================================================| 100%
>
>
> class(iris.h2o)
[1] "H2OParsedData"
attr(,"package")
[1] "h2o"
>
> iris.h2o[1:5,]
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.7
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
>
>
> iris.h2o[0:5,]
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.9
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
>
>
> prosPath = system.file("extdata", "prostate.csv", package="h2o")
>
> prosPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv"
>
>
> prostate.hex = h2o.importFile(localH2O, path = prosPath)
|======================================================================| 100%
>
>
> prostate.qs = quantile(prostate.hex$PSA)
>
> class(prostate.qs)
[1] "numeric"
>
>
> head(prostate.qs)
0% 25% 50% 75% 100%
0.300 5.000 8.750 17.125 139.700
>
>
> head(prostate.hex$PSA)
PSA
1 1.4
2 6.7
3 4.9
4 51.2
5 12.3
6 3.3
>
> summary(prostate.hex$PSA)
PSA
Min. : 0.300
1st Qu.: 5.000
Median : 8.725
Mean : 15.409
3rd Qu.: 17.125
Max. :139.700
>
>
> predicate1 = prostate.hex$PSA <= prostate.qs[2]
>
> head(predicate1)
PSA
1 1
2 0
3 1
4 0
5 0
6 1
>
>
> prostate.qs[2]
25%
5
>
> head(prostate.hex$PSA)
PSA
1 1.4
2 6.7
3 4.9
4 51.2
5 12.3
6 3.3
>
> predicate2 = prostate.hex$PSA >= prostate.qs[10]
>
> prostate.qs[10]
<NA>
NA
> predicate2 = prostate.hex$PSA >= prostate.qs[5]
>
> prostate.qs[5]
100%
139.7
>
> predicate3 = predicate1 | predicate2
>
> PSA.outers = prostate.hex[predicate3]
>
> nrow(PSA.outers)
[1] 98
>
> nrow(prostate.hex)
[1] 380
>
>
> PSA.outers = h2o.assign(PSA.outers,'PSA.outers')
>
> nrow(PSA.outers)
[1] 98
>
>
> print('colnames demo')
[1] "colnames demo"
>
>
> colnames(prostate.hex)
[1] "ID" "CAPSULE" "AGE" "RACE" "DPROS" "DCAPS" "PSA"
[8] "VOL" "GLEASON"
>
>
>
>
> min(prostate.hex$PSA)
[1] 0.3
>
> max(prostate.hex$PSA)
[1] 139.7
>
> quantile(prostate.hex$AGE)
0% 25% 50% 75% 100%
43 62 67 71 79
>
> quantile(prostate.hex)
Error in quantile.H2OParsedData(prostate.hex) :
quantile only operates on a single column
>
>
>
> summary(prostate.hex[,4:6])
RACE DPROS DCAPS
Min. :0.000 Min. :1.000 Min. :1.000
1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :2.000 Median :1.000
Mean :1.087 Mean :2.271 Mean :1.108
3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000
Max. :2.000 Max. :4.000 Max. :2.000
>
>
>
>
>
> h2o.table(prostate.hex[0:11,3])
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.39
row.names Count
1 61 1
2 65 1
3 68 3
4 69 2
5 70 1
6 71 1
>
>
> h2o.table(prostate.hex[0:22,3])
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.42
row.names Count
1 54 1
2 58 1
3 61 1
4 65 2
5 67 1
6 68 3
> h2o.table(prostate.hex[0:44,3])
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.45
row.names Count
1 54 3
2 58 2
3 60 1
4 61 1
5 63 1
6 64 1
> nrow(h2o.table(prostate.hex[0:44,3]))
[1] 20
>
> nrow(h2o.table(prostate.hex[,3]))
[1] 32
> h2o.table(prostate.hex[,3])
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.52
row.names Count
1 43 1
2 47 1
3 50 2
4 51 3
5 52 2
6 53 4
>
>
Hello World,
In R, I am curious about h2o.table()
I start with this syntax:
>
> h2o.clusterInfo(localH2O)
R is connected to H2O cluster:
H2O cluster uptime: 13 hours 23 minutes
H2O cluster version: 2.9.0.1689
H2O cluster name: H2O_started_from_R
H2O cluster total nodes: 1
H2O cluster total memory: 1.54 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 2
H2O cluster healthy: TRUE
>
>
I load the prostate data set:
>
> prosPath = system.file("extdata", "prostate.csv", package="h2o")
>
> prosPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv"
>
>
> prostate.hex = h2o.importFile(localH2O, path = prosPath)
|======================================================================| 100%
>
>
I look at column 3:
>
> head(prostate.hex[,3])
AGE
1 65
2 72
3 70
4 76
5 69
6 71
>
> summary(prostate.hex[,3])
AGE
Min. :43.00
1st Qu.:62.00
Median :67.00
Mean :66.04
3rd Qu.:71.00
Max. :79.00
>
> nrow(prostate.hex[,3])
[1] 380
>
>
Now I try h2o.table()
>
> nrow(h2o.table(prostate.hex[,3]))
[1] 32
>
>
nrow() tells me I should see 32 rows returned from h2o.table()
So, I try h2o.table()
>
>
> h2o.table(prostate.hex[,3])
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: Last.value.61
row.names Count
1 43 1
2 47 1
3 50 2
4 51 3
5 52 2
6 53 4
>
>
I should see 32 rows returned from h2o.table()
but I only see 6 rows.
Question:
How do I see the 32 rows which should be returned from h2o.table() ?
Generate a column of random numbers:
>
> s = h2o.runif(prostate.hex)
>
> summary(s)
rnd
Min. :0.001434
1st Qu.:0.241275
Median :0.496995
Mean :0.489468
3rd Qu.:0.740592
Max. :0.994894
>
>
Here is a 5% sample of prostate.hex:
> my5pct = prostate.hex[s <= 5.0/100.0]
>
> nrow(prostate.hex)
[1] 380
>
> nrow(my5pct)
[1] 29
>
>
Subject: In R how I convert split-frame to H2OParsedData?
I start with this syntax:
>
> h2o.clusterInfo(localH2O)
R is connected to H2O cluster:
H2O cluster uptime: 13 hours 23 minutes
H2O cluster version: 2.9.0.1689
H2O cluster name: H2O_started_from_R
H2O cluster total nodes: 1
H2O cluster total memory: 1.54 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 2
H2O cluster healthy: TRUE
>
>
I load the prostate data set:
>
> prosPath = system.file("extdata", "prostate.csv", package="h2o")
>
> prosPath
[1] "/home/dan/rdir/lib/R/library/h2o/extdata/prostate.csv"
>
>
> prostate.hex = h2o.importFile(localH2O, path = prosPath)
|======================================================================| 100%
>
>
I call h2o.splitFrame()
>
> prostate.split = h2o.splitFrame(data = prostate.hex , ratios = 0.75)
>
> prostate.train = prostate.split[1]
> prostate.test = prostate.split[2]
>
I look at it.
> class(prostate.split)
[1] "list"
>
>
It sort of acts like a H2OParsedData object:
>
> class(prostate.split[1])
[1] "list"
>
> head(prostate.split[1])
[[1]]
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: prostate2_part0.hex
ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON
1 1 0 65 1 2 1 1.4 0.0 6
2 2 0 72 1 3 2 6.7 0.0 7
3 3 0 70 1 1 2 4.9 0.0 6
4 4 0 76 2 2 1 51.2 20.0 7
5 5 0 69 1 1 1 12.3 55.9 6
6 6 1 71 1 3 2 3.3 0.0 8
But not always:
[1] "h2o"
> nrow(prostate.split[1])
NULL
>
> nrow(prostate.split)
NULL
>
> nrow(prostate.hex)
[1] 380
>
>
Question:
How do I convert a splitframe-object into a H2OParsedData object?
At this URL:
docs.h2o.ai/Ruser/rtutorial.html
I see some useful syntax at the end of the page:
> mygbm_model = h2o.getModel(key = 'GBM_bb1a4664cbf6f99be853ebbdc2d4df6', h2o = localH2O)
>
> mygbm_model
IP Address: 127.0.0.1
Port : 54321
Parsed Data Key: australia.hex
GBM Model Key: GBM_bb1a4664cbf6f99be853ebbdc2d4df6
Overall Mean-squared Error: 31462.59
>
>
The above syntax is useful in an interactive context.
I cannot think of any usecase when my script would have access to the key.
But I can think of situations when I would see the key in a web-UI and want
to use the associated model in a script or an R session.
Another call which is useful for an interactive session is h2o.ls(localH2O).
Also it is useful for looking at how objects are distributed in a cluster:
>
>
> h2o.ls(localH2O)
Key Bytesize
1 GBMPredict_f986d47542464833ba3b583b8c4ed1e5 80
2 GBM_9eab586732156bfd511a36a5015642f7 1474
3 GBM_a410fa049380b1f968c5dd46f3104fa1 1473
4 GBM_bb1a4664cbf6f99be853ebbdc2d4df6 1468
5 Last.value.11 844
6 Last.value.14 108
7 Last.value.15 844
8 Last.value.17 844
9 Last.value.18 844
10 Last.value.19 118
11 Last.value.21 844
12 Last.value.23 844
13 Last.value.24 80
14 Last.value.25 844
15 Last.value.26 76
16 Last.value.27 118
17 Last.value.28 2881
18 Last.value.29 844
19 Last.value.3 448
20 Last.value.31 844
21 Last.value.33 448
22 Last.value.36 73
23 Last.value.37 1344
24 Last.value.38 79
25 Last.value.39 152
26 Last.value.41 90
27 Last.value.42 162
28 Last.value.44 112
29 Last.value.45 176
30 Last.value.47 112
31 Last.value.48 176
32 Last.value.49 448
33 Last.value.5 72
34 Last.value.50 200
35 Last.value.51 448
36 Last.value.52 200
37 Last.value.54 448
38 Last.value.56 448
39 Last.value.57 448
40 Last.value.58 448
41 Last.value.59 200
42 Last.value.60 448
43 Last.value.61 200
44 Last.value.63 3108
45 Last.value.64 118
46 Last.value.65 1285
47 Last.value.68 640
48 Last.value.7 484
49 Last.value.70 560
50 Last.value.72 560
51 Last.value.9 484
52 PSA.nonouters 2881
53 australia.hex 3919
54 iris.h2o 1154
55 iris.hex 1154
56 prostate.hex 4874
57 prostate1.hex 4874
58 prostate2.hex 4874
59 prostate2_part0.hex 3817
60 prostate2_part1.hex 1624
61 spy611.hex 362029
>
>
>
h2o.ls() is a simple call; I cannot pass it an object name:
>
> h2o.ls(localH2O, keys = 'prostate2_part1.hex')
Error in h2o.ls(localH2O, keys = "prostate2_part1.hex") :
unused argument (keys = "prostate2_part1.hex")
>
>
On my laptop where my localH2O has only 1GB of memory, I might want to remove large objects:
>
> h2o.rm(object= localH2O, keys= "spy611.hex")
>
>
Currently that is the end of my h2o R demo.
I may add more syntax demos in the future.
|