30th September 2013

Using Drupal to enable Open Data

Graeme Blackwood
Developer

For the last few years, there has been a growing move toward making data available for use, reuse and redistribution by anyone who has an interest in doing so. Open Data is increasingly being embraced, or even mandated by governments across the globe, and is already being used in some really interesting ways.

During my time at DrupalCon Prague last week, I had the opportunity to learn more about where Drupal fits in all this by attending an informal gathering (a BoF session) called "DKAN: The Drupal distribution for open data".

CKAN and DKAN

CKAN is a non-Drupal system that has been developed to help people working with open data on the web. It's pretty powerful, but quite specific to the task in hand. However, it generally needs to be integrated with additional systems if there is a need to have it running alongside other web content (though my understanding is that they are working on some CMS capabilities).

Because of this, CKAN is used alongside Drupal for many government websites to deliver open data across the world. Yet because CKAN is a completely different system, it doesn't take advantage of Drupal's huge developer base and all the extensibility that being on Drupal would offer. Enter DKAN, a Drupal distribution designed to leverage all the benefits of Drupal and the power of CKAN in one.

DKAN means Drupal users don't need to learn another system or programming language if they want to start working with open data standards. The DKAN modules can also be taken separately and plugged into an existing website, making it very quick and easy to start working with open data right now.

What are the main differences between CKAN and DKAN?

DKAN is is still in development but is modelling itself on CKAN. Most of the core open data features of CKAN have been replicated and can be used right now by experienced Drupal teams, although there is work to do before they reach full parity. Also CKAN was specifically designed to handle huge, multi Gigabyte open datasets without much problem. Drupal is designed to do pretty well whatever you want, so there is an efficiency trade-off. But dealing with performance in Drupal is a well-trodden path and the same knowledge used to make huge Drupal sites highly performant can be applied to DKAN.

So which should you use?

Use CKAN if:

  • You just want to manage data and open it up to the world
  • Work with or are happy to learn Python, etc
  • You are not interested in extending the system
  • You are happy to integrate with another system (if required) and maintain two codebases
  • You are handling massive datasets, though Drupal will probably be fine with good configuration

Use DKAN if:

  • You want to start using open data on an existing Drupal site
  • Your current team works with Drupal or PHP/MySQL generally
  • You want to integrate open data seamlessly with the wider experience on the website
  • You want to get up and running quickly with open data, but also have plenty of options for extensibility in the future

Over time, open data will power many more apps and services, empower research and development, improve decision making and generally increase knowledge and understanding across the world. Exciting times!