syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In H2O Sparkling-Water how to create DataFrame from SparkFiles?

I pulled this syntax off of github:

val irisTable = new DataFrame(new File(SparkFiles.get("iris.csv")))
The github URL is/was:

https://github.com/h2oai/h2o-droplets/blob/master/sparkling-water-droplet/src/main/scala/water/droplets/SparklingWaterDroplet.scala

Note that iris.csv was 'registered' in an earlier step.

The syntax to register iris.csv is this:

sc.addFile("data/iris.csv")
Note that I can also create a DataFrame from a file using H2O-only syntax.

Here is a demo of that technique I saw at a Meetup:
//
// Load and parse bike data (year 2013) into H2O by using H2O parser
//

val dataFiles = Array[String](
    "2013-07.csv", "2013-08.csv", "2013-09.csv", "2013-10.csv",
    "2013-11.csv", "2013-12.csv").map(f => new java.io.File(DIR_PREFIX, f))

// Load and parse data
val bikesDF = new DataFrame(dataFiles:_*)
I also found the above syntax on github:

https://github.com/h2oai/sparkling-water/blob/master/examples/scripts/Meetup20150226.script.scala

syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me