Hadoop as a service


It’s been a fun year learning new stuff, and along the way Andy Piper helped out with a bite sized architectural debate while I was experimenting with a Hadoop service on Bluemix. Having a short lived/disposable memory I thought it would be worth posting the discussion here for future reference…

‏@jtonline: Still pondering how a hadoop buildpack might compare to a hadoop service

@andypiper: @jtonline why would you want a buildpack for Hadoop – surely data store = service (broadly) not runtime. #cloudfoundry

@jtonline: @andypiper hmm, maybe, but you want to bring the processing to the data don’t you? Currently seems like services will hold big data in silos

@jtonline: @andypiper for example, I might want to use one of the address verification services from my map reduce job. I’m probably missing something.

@andypiper: @jtonline multiple services can be bound to multiple apps. And you can call jobs in those services from those apps.

@andypiper: @jtonline PivotalHD ships as a service in PivotalCF – obviously you may need data access libraries in the buildpack for the app.

@jtonline: @andypiper not convinced hadoop is just a data store. Do I need apps on runtime to kick off oozie jobs with details of other services?

@andypiper: @jtonline the runtime/service debate on CF has been a long one but I think fairly clean/clear. I’d see Hadoop as a shared resource.

@andypiper: @jtonline bear in mind buildpack -> droplet -> runnable containerised app instance.

@jtonline: @andypiper agreed. Maybe what I’m missing is an easy way to wire services together?

@andypiper: @jtonline yeah maybe – you end up with apps acting as service coordinators I guess.

@andypiper: @jtonline coupled with the fact that apps are intentionally short-lived and best stateless… interesting architectural debate :-)

@andypiper: @jtonline (for “short-lived” read “disposable” my bad)

It should be an interesting 2015.