Open Data Camp Day 1

If I don’t post a few notes from today’s Open Data Camp now, I never will, so here are a few things I scribbled down- it could be worse, I could have posted a PDF containing photos of the the actual scribbles!

So out of this choice


…I picked, Open Data for Elections, Open Addresses, Data Literacy, Designing Laws using Open Data, and Augmented Reality for Walkers.

Open Data for Elections

I’ve been following @floppy‘s crazy plan to get elected for a while, so this was the easiest decision of the day: what drives someone to embrace the gory inner workings of democracy like this?

Falling turnout it would seem, and concern for a functioning democracy.

The first step of his journey was the Open Politics Manifesto, which I’ve so far failed to edit- must try harder.

Perhaps more interesting was how this, and use of open data, fits into a political platform as a service. It would be nice to have the opportunity to see a few additions to the usual suspects at the ballot box, and Eastleigh got a rare chance to see what that could be like with a by election. Perhaps open data services for candidates could tip he balance enough to encourage more people to stand.

Things that sounded interesting:

  • Democracy Club
  • OpenCorporates
  • Data Packages
  • Open data certificates (food hygiene certificates for data?)
  • Candidates get one free leaflet delivery by Royal Mail- I wonder how big they expect those leaflets to be!

Open Addresses

@floppy and @giacecco introduced the (huge) problems they need to overcome to rebuild a large data set without polluting that data with any sources with intellectual property restrictions. Open Addresses still have a long way to go and there were comments about how long Open Street Map has been around, and it still has gaps.

They have some fun ideas about crowd sourcing address data (high vis jacket required) and there are some interesting philosophical questions around consent for addresses to be added.

It will be interesting to see whether Open Addresses can get enough data to provide real value, and what services they build.

Data Literacy

Mark and Laura led a discussion around data literacy founded in the observation that competent people, with all the skills you could reasonably expect them to have, still struggle with handling data sets.

Who needs to be data literate? Data scientists? Data professionals? Everyone?

Data plumbers? There were some analogies with actual plumbers! You might not be a plumber but it’s useful to know something about it.

If we live in a data driven society, we should know how to ask the right questions. Need domain expertise and technical expertise.

Things that sounded interesting:

Designing Laws using Open Data

@johnlsheridan pointed out that the least interesting thing to do with legislation is to publish it and went on to share some fascinating insights into the building blocks of statute law. It sounds like the slippery language used in legislation boils down to a small number of design patterns built with simple building blocks, such as a duty along with a claim right, and so on.

Knowing these building blocks makes it easier to get the gist of what laws are trying to achieve, helps navigate statutes, and could give policy makers a more reliable way to effect a goal.

For example, it’s easier to make sense of the legislation covering supply of gas, and it’s possible to identify where there may be problems. The gas regulator has a duty to protect the interests of consumers by promoting competition, but that’s a weak duty without a clear claim right to enforce it.

John also demonstrated a tool – – exploring how the language used in legislation has changed over time, for example how the use of “shall” has declined and been replaced by “is to be”.

Augmented Reality for Walkers

My choice of Android tablet was largely based on what might work reasonably well for maps and augmented reality, so I seized this opportunity!

Nick Whitelegg described the Hikar Android app he’s been working on, which is intended to help hikers follow paths by overlaying map data on a live camera feed.

The data is a combination of Open Street Map mapping data, with Ordnance Survey height data, which is downloaded and cached as tiles around your current location. Open GL is used to overlay a 3D view of the map data on the live camera feed, using the Android sensor APIs to detect the device’s rotation.

I’ve just downloaded and installed Hikar and, while my tablet is a tad slow, it works really well. I live somewhere flat and boring but the height data made a noticeable difference when Nick demonstrated the app in hilly Winchester.

Still to come: Day 2!

Why doesn’t Eclipse/Installation Manager work on Linux?

For the next time I’m grumbling about yet more incompatibilities causing problems with Eclipse on Linux, adding the following properties to the bottom of the launcher configuration file seems to help:


For example, edit the install.ini file for Installation Manager, which is where I first encountered problems after updating Red Hat Enterprise Linux last year. The problem appears to be due to incompatible GTK and Cairo versions, and there’s a related IBM technote for Installation Manager on RHEL 6.6.

Unfortunately, while that was enough to get Installation Manager working, the Eclipse IDE still seems somewhat unstable. I been exporting the following environment variable for a while but, based on the SDK Known Issues wiki page, maybe that doesn’t make any difference with recent versions:


Another suggestion I’ve seen recently, related to tooltips, is to export another environment variable, although so far I haven’t tried it:

export GRE_HOME=/dev/null

Need to experiment a bit more with those last two, and see if I can narrow down whether there are any other real problems.

Hadoop as a service

It’s been a fun year learning new stuff, and along the way Andy Piper helped out with a bite sized architectural debate while I was experimenting with a Hadoop service on Bluemix. Having a short lived/disposable memory I thought it would be worth posting the discussion here for future reference…

‏@jtonline: Still pondering how a hadoop buildpack might compare to a hadoop service

@andypiper: @jtonline why would you want a buildpack for Hadoop – surely data store = service (broadly) not runtime. #cloudfoundry

@jtonline: @andypiper hmm, maybe, but you want to bring the processing to the data don’t you? Currently seems like services will hold big data in silos

@jtonline: @andypiper for example, I might want to use one of the address verification services from my map reduce job. I’m probably missing something.

@andypiper: @jtonline multiple services can be bound to multiple apps. And you can call jobs in those services from those apps.

@andypiper: @jtonline PivotalHD ships as a service in PivotalCF – obviously you may need data access libraries in the buildpack for the app.

@jtonline: @andypiper not convinced hadoop is just a data store. Do I need apps on runtime to kick off oozie jobs with details of other services?

@andypiper: @jtonline the runtime/service debate on CF has been a long one but I think fairly clean/clear. I’d see Hadoop as a shared resource.

@andypiper: @jtonline bear in mind buildpack -> droplet -> runnable containerised app instance.

@jtonline: @andypiper agreed. Maybe what I’m missing is an easy way to wire services together?

@andypiper: @jtonline yeah maybe – you end up with apps acting as service coordinators I guess.

@andypiper: @jtonline coupled with the fact that apps are intentionally short-lived and best stateless… interesting architectural debate :-)

@andypiper: @jtonline (for “short-lived” read “disposable” my bad)

It should be an interesting 2015.

BigInsights Quicker Start

I’ve been taking a break from Liberty and JAX-RS recently to start tinkering with IBM’s BigInsights Hadoop distribution. To make things easier/more interesting my first attempts were using the Analytics for Hadoop service on BlueMix. In case it helps anyone, here’s what I ended up with before needing to install BigInsights myself:

And this is the script I used to upload data in the video (unfortunately I didn’t have any luck using the HttpFS API):



curl -iv –user ${BIUSER}:${BIPASSWORD} –insecure -X POST “${BIURI}/user/${BIUSER}/sample-data?isfile=false”

curl -iv –user ${BIUSER}:${BIPASSWORD} –insecure -X POST “${BIURI}/user/${BIUSER}/sample-data/orgdata.unl” –header “Content-Type:application/octet-stream” –header “Transfer-Encoding:chunked” -T “orgdata.unl”

curl -iv –user ${BIUSER}:${BIPASSWORD} –insecure -X POST “${BIURI}/user/${BIUSER}/sample-data/persondata.unl” –header “Content-Type:application/octet-stream” –header “Transfer-Encoding:chunked” -T “persondata.unl”


  • since recording the demo Bluemix has added a United Kingdom region, however it looks like the Analytics for Hadoop service is currently only available on the US South region.
  • there is also now a BigInsights service on Bluemix which allows you to provision multi-node Hadoop clusters.

Modelling restful properties

I’ve recently been playing with Liberty and JAX-RS and in an effort to remember some of what I’ve discovered, I’m going to try and keep a few notes and post them here, probably along with a few questions. If anyone else finds them useful, or knows the answers, that’s a bonus!!

To start with Creating an efficient REST API with HTTP provides a nice overview of REST APIs and JAX-RS basics has a great simple sample application to get going with… and break!


Armed with the basics I thought it would be interesting to model the sample application to compare the working code with what Rational Software Architect (RSA) would generate for me. Obviously that’s not the most complex model in the world but it was good to have a few examples to follow:

(Those articles made much more sense when I’d tracked down all the likely looking JAXRS, REST and UML features in the RSA installer!)

Before generating any code, it was quite nice to get some basic API documentation out of the model. Not as fancy as using Swagger UI but certainly better than no documentation!


It’s only a start but that’s all for part 1. Hopefully there’ll be more posts at some point when I get further, probably along these lines:

  • Creating an OSGi Web project and generating some code
  • Using a REST client to check it works, including a puzzling Wink problem
  • Adding some debug
  • Expanding the sample with some awkward long running processing
  • Deploying to Bluemix
  • Anything else I encounter along the way!

If you know of any good articles/books/fancy new media that would help, please leave recommendations below. If there are better ways to design REST services (Swagger looks interesting but I haven’t had a chance to investigate), please share them. And any other tips, comments, or questions of your own are also very welcome!


Hursley 3D Printing Expo

D’oh, looks like I missed a swarm of 3d printers in Hursley recently! I wonder if anyone has printed a model of the house/site yet.

I’m still looking for even a vaguely plausible excuse to splash out on a 3d printer, but printing models or new 3d printers still isn’t quite enough to justify the money (or space these days)!

Java dumps

I recently had to debug a problem with the MDM Workbench where exporting a tailoring project for Information Server didn’t do anything. In fact it didn’t even report any problems!

Unfortunately the code in question likes to put a brave face on things and just reports that everything was OK, even when something goes wrong. This was the perfect opportunity to try out some of the diagnostic tools available for the IBM Java runtime, which I’ve been meaning to try for ages. I had an idea where the problem was likely to be but to find out for sure I started the workbench using the following command line:

eclipsec -vmargs -Xdump:system:events=catch,filter=java/lang/AbstractMethodError#com/ibm/mdm/tools/export/infoserver/job/MDMDatabaseDAO.queryDatabaseWithoutFilter

Sure enough the failing export produced a dump which I could check using the Memory Analyzer tool. You can get the IBM version via IBM Support Assistant but it’s probably easier to get the standard Eclipse Memory Analyzer and add the required IBM plugins from the DTFJ update site.

I’m fortunate enough to work in Hursley so I could pester someone who works on IBM Java runtime diagnostics, but there’s also a helpful article on developerWorks with details of how to trigger dumps, and how to run queries using OQL:

Debugging from dumps: Diagnose more than memory leaks with Memory Analyzer

So mystery solved- if you have an Oracle database and want to exporting tailoring projects for Information Server, make sure you set up the database connection with a more recent JDBC driver than the defaults.