syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me

Question:
In Apache Spark what is RDD?

Early in my study of Spark I encountered the acronym: 'RDD'.

RDD comes from 'Resilient Distributed Dataset'.

To run a Spark calculation on some data I need to spice up the data so Spark will want to eat it.

In Python, seasoning is done with two lines of code:
mydata = [10, 20, 30, 40, 50]
myrdd  = sc.parallelize(mydata)
# I now have an RDD which Spark can chew on using a call to myrdd.reduce(lexp)
# Mem:
# lexp is an anonymous function (AKA lambda) which has 2 inputs fed to some
# calculation syntax.

# Demo:
# myrdd.reduce(lambda a1, a2: a1 + a2)


syntax.us Let the syntax do the talking
Blog Contact Posts Questions Tags Hire Me