syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In H2O Sparkling-Water how does TimeTransform work?

I bumped into TimeTransform here:

https://github.com/h2oai/sparkling-water/blob/master/examples/scripts/Meetup20150226.script.scala

The API call looks like this:
val daysVec = bikesPerDayDF('Days)
val finalBikeDF = bikesPerDayDF.add(new TimeTransform().doIt(daysVec))
I can see that TimeTransform() is being used to generate data for a new DataFrame column.

The call to TimeTransform() creates an object which then allows me to call an instance method named doIt() which accepts a column named Days from bikesPerDayDF DataFrame.

I found the syntax for TimeTransform() at the URL listed below (near the bottom):

https://github.com/h2oai/sparkling-water/blob/master/examples/src/main/scala/org/apache/spark/examples/h2o/CitiBikeSharingDemo.scala

When I read the syntax I see it follows a pattern I noticed before when I studied another object called TimeSplit.

Although I understand Scala poorly, my guess is that TimeTransform is a subclass of MRTask.

I do not understand why MRTask has the name, TimeSplit, in brackets to the right.

I do see that TimeTransform has two definitions for instance methods.

The first method, doIt(), accepts the column full of days.

Then doIt() implicitly calls map().

It is obvious that map() uses date-arithmetic to transform days-values into values: Month and DayOfWeek.

When map() is done, doIt() returns two DataFrame columns full of appropriate data: Month and DayOfWeek.

Later these columns are used as ML-features which makes intuitive sense.

The use-case here is to predict bike usage in NYC.

Bikes might be used more in warm months and on weekends.

If that is true it makes sense to transform days-past-epoch into Month and DayOfWeek.

syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me