Sunday, May 5, 2013

Data, Data Everywhere

Simplify, Encapsulate, Don't Repeat Yourself (DRY)

This is the mantra for the work that the Rice team is doing on the new KRAD data architecture. It's a sweeping project with the end-goal being support for the Java Persistence API (JPA) within Kuali. I'm really excited to tell you all about what we're up to. But first, to ensure that you have all of the context, I want to take a little trip down memory lane...

By all accounts, the early days of Kuali were very exciting. The unprecedented goal of bringing universities together to collaborate on the creation of an enterprise open source financial system for higher education was certainly groundbreaking. To kickstart the technical work, a number of developers and architects got together to design and define the architecture for Kuali. Coming out of this initial flurry of activity was the decision to use the following technologies for development of the software:
  1. Java
  2. Spring Framework
  3. Apache Struts
  4. Apache ObJectRelationalBridge (OJB)
  5. Oracle and MySQL databases
Now, I'm sad to say that I wasn't involved in Kuali at this early stage (in fact I probably wasn't even working at Indiana University yet), so hopefully I'm not misstating the above. Regardless, the Kuali Nervous System (KNS) which was part of the first release of Kuali Financials certainly incorporated all of these.

Many of you may know that KRAD is the heir-apparent to the KNS and has replaced Apache Struts with Spring MVC, effectively expanding our use of the Spring framework. But what I'd like to spend some time talking about is the Apache ObJectRelationalBridge.

For those who don't know what OJB is, it's the object-relational mapping library that Kuali uses. It takes Java objects and the data represented therein and "saves" it to tables and columns in a relational database. It also loads data from the database back into Java objects so that the application can work with it. This is pretty typical object-relational mapping (ORM) functionality.

At the time that the early technology decisions were being made for Kuali (2005-ish), there was another popular Java ORM that a lot of people were using. It was called Hibernate. So why did Kuali decide to use OJB and not Hibernate? I honestly can't say for sure since I wasn't involved in those conversations, but I suspect it had to do with the fact that OJB worked well, had good documentation, and was being used widely by a number of the technologists who were involved in those early decisions. There was no clear winner at the time, as it was relatively early in the history of Hibernate as well. So the Kuali project decided to go with OJB as the ORM and persistence technology of choice.

I think it's safe to say that we bet on the wrong horse.

Since that time, Hibernate was folded under JBoss, grew by leaps and bounds in popularity, and became the de facto standard for persistence in Java. OJB was abandoned by it's creators and is now dead. I mean really dead. So dead in fact that Apache moved it to a special place called the "Attic" in 2011.

OJB still works and works well, but it is no longer maintained and it does not make sense for Kuali to continue to use it at the core of our applications. This is not a new revelation by any means, we've all been aware of this for quite some time. In fact early efforts to migrate away from OJB began all the way back in 2008!

The funny thing is, getting rid of OJB has turned out to be harder than it should be.

Why is that? Well, I think it comes down to a couple of things:
  1. The semantics that Apache OJB used for things like lazy loading is totally different from products like Hibernate.
  2. There was a lot of special logic built into the KNS in Kuali which was tied to the semantics and behavior of OJB. Untangling this and providing a clear upgrade path for existing applications is not a trivial exercise.
These are not insurmountable obstacles, but they certainly require us to take a careful approach as we look toward sunsetting support for OJB within the Kuali technology stack. A renewed sense of urgency and vigor around this problem manifested itself when Apache announced that they were moving OJB into the Attic in 2011. However, the reality remains that we have a number of very large Kuali applications which have spent years developing on top of OJB.

This brings me to the fun stuff.

Near the end of this year, Kuali Rice version 2.4 is scheduled to be released. A little feature called "JPA Support" has top billing for that release. JPA stands for "Java Persistence API" and defines a standard for persistence in Java. There are multiple ORM's which have support for JPA, including Hibernate, EclipseLink, OpenJPA, and others.

However, recognizing the fact that years and years of code have been developed within Kuali Rice which depend on OJB, there is an awful lot of technical debt to pay down as things have been tacked on and hacked in over the years. Accordingly, we are taking a holistic approach to solving this problem through a refactoring and redesign of the way that KRAD interfaces with data. We are implementing this in a new module called krad-data.

I'm sure there are a number of people reading this who have spent hours upon hours building or modifying Kuali applications using the KNS. To those people, I just want to say this:

I think you are in for a treat.

Have you ever spent hours modifying XML data dictionary files and wondering why in the world you have to type in an attribute definition for every single property on your business object? Have you ever wondered why all of your business objects have to extend from a common base class and can't just be simple POJO's? Have you ever been annoyed by the fact that you have to provide labels and sizes for everything? Have you ever wondered why the heck you have to manually define validation patterns when the framework should be able to just figure it out? Have you ever had the need to interface with the database using plain-old JDBC? Or maybe using one of those fancy new NoSQL databases?

Lastly, have you ever wondered why you need to have novels worth of XML in data dictionary files in your project at all? I mean, shouldn't the framework be able to figure out most things from all the hard work you did to construct the objects in Java, map them to the database using your ORM, and define the tables, columns, sizes, and constraints within the database? Nobody likes to repeat themselves.

If you answered yes to any of the questions above, then I think you are going to like what we have in store for you.

Suffice it to say that we are not just "adding support for JPA". We are improving the way that Kuali applications can access and work with data. We are also going to great pains to simplify the APIs in Kuali Rice which handle data persistence and metadata. We want to make it easy for developers to use KRAD to build their data-driven applications. Nowadays, people expect this from the rapid application development frameworks that they use. If that wasn't the case, then frameworks like Ruby on Rails wouldn't be as popular as they are. There is no reason that KRAD and the toolset we use in Kuali can't be just as powerful, or even more so.

Here is a list of features that krad-data will have (in no particular order):
  • Built-in support for JPA.
  • Simplified API for handling typical CRUD operations and accessing metadata.
  • Pluggable metadata pipeline for loading and merging metadata from various sources, including:
    • database
    • ORM
    • annotations
    • reflection
    • and the old fallback of XML data dictionary files (note that these become optional!)
  • Support for "natural language" defaults on things like labels.
  • Convention-based defaults for UIF controls based on data types.
  • Data formatting using features built into Spring.
  • Ability to use POJO's for your data objects. 
  • Flexible data types to allow for dynamic data objects which may not be easily representable as static Java types at compilation-time.
  • A simplified "Extension Framework" which requires no manual ORM mapping (or re-mapping).
  • Data validation support built into the persistence API, including support for JSR-303 bean validation (hopefully, if we have time).
  • A service provider interface (SPI) for plugging in any type of data store for persistence for a given type of data object. This could include:
    • JDBC
    • Web Service-backed persistence
    • Alternate frameworks like spring-data
    • Various types of NoSQL data stores
  • The above should be able to eliminate the need for a special "Externalizable Business Object" framework.
We are also planning to do all of this work without breaking any legacy KNS or KRAD applications which are using the existing data layer and OJB support. This is no small task really, but we feel it is important that people will be able to upgrade to 2.4 without breaking existing application functionality, because this will facilitate a "phased" migration for those applications. This includes the ability run both krad-data and the "legacy" OJB-based data layer along side one another.

Furthermore, I should say that much of the krad-data module has already been completed on the branch we've been using to do the development. We still have a lot of work to do to finish hooking things back into the rest of KRAD and testing it to make sure it all works, but things have been progressing well thanks to the hard work of the small but talented team of developers that is working on this effort.

In my next post, I'll talk about the design and architecture of krad-data and dive into some of these features in a bit more depth. Stay tuned!


  1. This all sounds like great stuff - thank you for the update, Eric!

    As interested as I am in data dictionary defaults, I'll admit, I'm highly curious about the flexible data types proposal. I'm wondering if that will make it easier for some projects who want to use NoSQL datastores to hook into Rice (since, from what I've seen, several NoSQL datastores save data in some variation of JSON, which is incredibly lightly typed...I've really been wondering about the impedance mismatch between a Java object graph and a MongoDB document for the past week or so....). Is that part of the intent?

    Thanks Eric!

  2. James, thanks for the comment. Yes, that is part of the intent here. My inspiration and my guiding light here is actually eDocLite. I have a dream that we can eliminate the need for a separate set of code and infrastructure for eDocLite in Kuali Rice through this introduction of flexible data types. This would allow to essentially reimplement it on top of KRAD and using all the UIF and other functionality that KRAD provides.

    The data store for eDocLite is essentially just simple key-value, but the data model and metadata (called "fields" in eDocLite) are defined in XML and ingested into the system at runtime. To support such a thing we would need to allow runtime definition and modification of dynamic data structures (something which we should be able to support easily in the new krad-data design) as well as the ability to dynamically define and ingest KRAD user interface configuration for the eDocLite form itself. This is something which is not currently possible in KRAD, but an enhancement could be done to support it. It's already possible to reload UIF configuration in KRAD at runtime, but only for development purposes.

    That said, this model could be extended to really any flexible or mostly schemaless backend data store. In this case, KRAD would not be working with concrete Java classes but rather flexible data structures (like chunks of JSON, or hash maps). We could implement accessors that would know how to traverse these structures using typical "dot" notation. One thing I didn't mention that the new framework is taking advantage of is a new "accessor" API that is built on top of Spring's BeanWrapper API.

    All that said, the most visible effect of this on the APIs in Rice is that the data layer does not use "Class" everywhere but rather uses "DataObjectType" everywhere where DataObjectType is just a wrapper around a concrete Java Class (which could be a String or a Map for example) along with a "discriminator" value that allows for the specific type to be defined.

    For example, for eDocLite, we might use a DataObjectType which is constructed as follows:

    DataObjectType myEDocLiteType = DataObjectType.create(EDocLiteData.class, "myAwesomeEDocLite");

    Where EDocLiteData simply contains a Map which contains the data for an eDocLite document and "myAwesomeEDocLite" is the name of the specific eDocLite.

    So, that's just one example of a real-world application of the flexible data typing support that the framework should support in the future. I'm not sure how much of the full support for this kind of thing we will actually have in place for Rice 2.4 as it's really outside of the scope for JPA support. But we wanted to at least design the new APIs such that they will be able to support this in the future. I, for one, think it will be handy ;)