This is Part 1 of a two-part series on solving a problem that I have with Grails and with ORM in general. Part 1 describes the problem; Part 2 will describe the solution. Grails is a first-rate Web development framework for Groovy. Groovy is like Python, except that underneath Groovy is Java and every Java library you ever loved. If you’re a Java Web developer, you must give Grails a try; but if you’re a Spring Framework developer, you’re in for a special treat because Grails is built on top of everything we know and love about Spring.
In the course of using Grails to develop a handful of government Web applications, I’ve discovered a small gap between my applications and my data.
Grails includes an incredible piece of software engineering they call GORM, which is short for Grails Object Relational Mapping and is essentially Groovy syntax for creating HibernateCriteriaBuilder instances. This is the best practice for writing database queries in Grails, and it goes something like this
def courses = Course.withCriteria { // only approved courses, please eq('approved', true) // in a category matching the keyword "database" categories { ilike('name', '%database%') } // with offerings occurring in the future offerings { gt('starts', '2009/09/29 21:12') } }copy code
This Groovy code is roughly equivalent to the following SQL statement
SELECT * FROM course c INNER JOIN course_categories cc ON (cc.course_id = c.course_id) INNER JOIN category cat ON (cat.id = cc.category_id) INNER JOIN offering o ON (o.course_id = c.id) WHERE c.approved = 1 AND LCASE(cat.name) LIKE '%database%' AND o.starts > '2009-09-29 21:12'copy code
Now if you don’t appreciate the difference between these two chunks of code, it’s probably because you either have no use for ORM tools (PHP developers, I’m talking to you), have never tried to use Hibernate, or have never had to write low-level data layer code with JDBC. The rest of you are probably thinking, “Man, that’s pretty frickin’ cool.” Because it is.
As I’ve experienced it, there’s only one problem with the GORM way, and it’s actually a problem with the ORM way in general.
In our example above we reference three domain objects: Course, Category, and Offering. For the sake of our example, we will assume that there are one-to-many relationships between Courses and Categories, as well as between Courses and Offerings. That means for every one Course in the system, we’ll have zero, one, or more than one Offerings related to it.
Relationships like these are exactly what RDBMSs and ORM are all about: taking flat relationships and turning them into hierarchies of objects. The problem is that when we rely on these relationships for the purpose of querying data, we almost always introduce duplication into our result set.
Imagine a set of data in which there are two courses, each with two related offerings. Imagine that all of these records satisfy our query above. The result set for our SQL query would look something like this
COURSE.ID NAME OFFERING.ID STARTS ENDS 1 Tuning Your SQL 1 2009/10/26 2009/10/26 1 Tuning Your SQL 2 2010/01/01 2010/01/04 2 Hiring Good Help 3 2009/11/04 2009/11/05 2 Hiring Good Help 4 2009/12/06 2009/12/07copy code
Take note of the duplication: two instances of a course record for each instance of the corresponding offering record. The problem has the potential to get worse when we introduce categories.
COURSE.ID NAME CATEGORY OFFERING.ID STARTS ENDS 1 Tuning Your SQL Database Management 1 2009/10/26 2009/10/26 1 Tuning Your SQL Database Management 2 2010/01/01 2010/01/04 1 Tuning Your SQL Databases, General 1 2009/10/26 2009/10/26 1 Tuning Your SQL Databases, General 2 2010/01/01 2010/01/04 2 Hiring Good Help Database Personnel 3 2009/11/04 2009/11/05 2 Hiring Good Help Database Personnel 4 2009/12/06 2009/12/07copy code
Where once there were two copies of Course #1 now there are four!
Now, in most circumstances, this really isn’t an issue. That’s right. I said isn’t an issue.
GORM and Hibernate both provide the means for eliminating this duplication at object instantiation, and it’s called distinct. All we have to do to implement this feature is change one line in our original GORM query (the first one):
def courses = Course.createCriteria().listDistinct {copy code
Now if we were to loop over the contents of courses, we would find two objects instead of six: one for each unique Course is our result set.
One very common use case for Web applications is large result set pagination. While our own example result set doesn’t warrant pagination, one containing a thousand relationships between hundreds of courses, offerings, and categories, certainly would.
GORM includes two query features for pagination: firstResult, which should be a positive integer indicating the first row to return, and maxResults, which should be a positive integer indicating the page size. When woven into our original GORM query, the code now looks like the following
def courses = Course.createCriteria().listDistinct { // only approved courses, please eq('approved', true) // in a category matching the keyword "database" categories { ilike('name', '%database%') } // with offerings occurring in the future offerings { gt('starts', '2009/09/29 21:12') } firstResult(x) maxResults(y) }copy code
So imagine a version of our result set that when raw contains 100 records. That’s 100 instances of relationships between Courses and Categories, and Courses and Offerings. Irrespective of the total number of distinct Courses, we decide that we’d like to paginate our result set 10 records at a time. And this is where somewhere in the beautiful stack that is Grails, something goes ker-splat!
Check back soon for Part 2!