Tuesday, June 23, 2009

Lazy Hibernate

Let’s play with large collections.

ddl
CREATE TABLE popular_band (
    band_id AS INTEGER PRIMARY KEY,
    band_name AS VARCHAR
);

CREATE TABLE fan (
    fan_id AS INTEGER PRIMARY KEY,
    fan_name AS VARCHAR,
    favorite_band_id AS INTEGER
);

ALTER TABLE fan ADD fans_fk_popular_band FOREIGN KEY (favorite_band_id)
REFERENCES popular_band(band_id);

INSERT INTO popular_band VALUES (1, "Beatles");

INSERT INTO fan VALUES (1, "Jim", 1);
INSERT INTO fan VALUES (2, "Bob", 1);
INSERT INTO fan VALUES (3, "Sue", 1);
INSERT INTO fan VALUES (4, "Mary", 1);
-- 100,000 more ...
INSERT INTO fan VALUES (100005, "Alex", 1);

After reverse engineering this schema with Hibernate, your mappings should look something like this.

PopularBand.hbm.xml
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
    <class name="PopularBand" table="POPULAR_BAND">
        <id name="bandId" type="long">
            <column name="BAND_ID" />
        </id>
        <property name="bandName" type="string">
            <column name="BAND_NAME" />
        </property>
        <set name="fans" inverse="true" lazy="true" table="FAN" fetch="select">
            <key>
                <column name="FAVORITE_BAND_ID" precision="10" scale="0" />
            </key>
            <one-to-many class="Fan" />
        </set>
    </class>

    <class name="Fan" table="FAN">
        <id name="fanId" type="long">
            <column name="FAN_ID" />
        </id>
        <property name="fanName" type="string">
            <column name="FAN_NAME" />
        </property>
        <many-to-one name="favoriteBand" class="PopularBand" fetch="select">
            <column name="FAVORITE_BAND_ID" />
        </many-to-one>
    </class>
</hibernate-mapping>

The Beatles, being a very popular band, have many fans. I just want to look at 10, in no particular order. That shouldn’t be hard, right?

public static void print10fans(PopularBand pb) {
    Set<Fan> fans = pb.getFans();
    Iterator<Fan> iter = fans.iterator();
    for (int i=0; i < 10 && iter.hasNext(); i++) {
        System.out.println(iter.next().getFanName());
    }
}

And the output is

java.lang.OutOfMemoryError: Java heap space
...

Oops, our very large collection is very large. Maybe the manual can help us.

lazy (optional - defaults to true) may be used to disable lazy fetching and specify that the association is always eagerly fetched, or to enable "extra-lazy" fetching where most operations do not initialize the collection (suitable for very large collections)

Perfect! Just a quick change

<set name="fans" inverse="true" lazy="extra" table="FAN" fetch="select">
                                      ^^^^^

and we can try again.

java.lang.OutOfMemoryError: Java heap space
...

Ok, just what does this "extra-lazy" really mean anyway? The manual is very silent on such details, so we turn to other sources (emphasis added).

Extra lazy doesn’t do much on sets. It is actually quite useful only on the size method which triggers a select count on the database. However, iterating over a set will instantiate the whole collection. On maps and lists, extra lazy also enables fetching an entry by index. Still iterating over a list or a map will fetch the whole collection from database.

So when the manual says

…most operations do not initialize the collection (suitable for very large collections)

it actually means

…only the size() method does not initialize the collection, unless it’s a map or list, then you can also fetch by index (but otherwise your very large collection is destined to crash your program)

You can use "Filtering collections" to specify how many results you want, but doing that seems to turn the whole idea of “Plain-Old-Java-Objects” into “Plain-Old-Java-Objects-Unless-Your-Collections-Are-Too-Big-Then-You-Have-To-Use-Framework-Specific-API”

1 comment:

  1. Wow, this is totally outrageous! I've decided not to use Hibernate for any future projects. I have one system which, without wishing to bore you with the details, produces > 900 SQL statements when the user clicks (some large Maps are updated; but often most of the values don't change) ... I know it's over 900 because it fills up PuTTY's scrollback log :) ... Currently there is not heavy usage so (amazingly) it's fast enough, but it's on my tasklist to rewrite the program to just use plain SQL .. (There are < 10 tables) ...

    ReplyDelete