Jena Lessons - More On the Reasoner
Back when I started working with Jena, my first "ontology task" was to iterate through a subset of objects in the model, perform some operations to slightly modify these objects, and then store the objects back into the model. Sounds simple right?
In a traditional database application, this program would consist of a SELECT statement, followed by some business logic, and conclude with a SQL UPDATE. I've probably written code like this hundreds of times. It's simple and runs quickly.
On paper, the ontology task seems even easier. Once opening a Jena ontology model, you can query for objects (Resources) in any number of ways (including a SQL-like SPARQL syntax). From there you simply create or modify properties on the resource. Since the ontology model is in memory, the changes happen real time (no need for an UPDATE). And while you could persist the changes, there really isn't any need to.
While this seems quite easy, I noticed severe performance problems running this program on even moderately sized data sets. Not only was the performance bad, but the application was using HUGE amounts of memory. Much more than I would expect for models of this size.
I immediately started to search for the cause of these problems and all signs pointed to the Jena Reasoner. If you remember from my previous post, the Reasoner is responsible for inferencing - essentially creating relationships between objects not explicitly defined in the data. You see, for each update to the model, the Reasoner ran (traversing much of the model) to discover any new relationships that it could find. As the model became larger and more data changed, the Reasoner process took up more and more of the processing time and system resources.
What I wanted to do was simply shut off the Reasoner temporarily while I did my batch update, then reactivate it once I was done. The Jena API provides no support to "turn off" the Reasoner once the model is created. I knew that there must be some way to do this and I asked some questions.
Eventually someone suggested that I create a new model with no Reasoner and load my existing "Reasoned" model into it. I could then batch all my changes into this simplified model with no Reasoner performance hit. The only trick is to run model.rebind() once the batch changes are complete in order to refresh the main model.
In a traditional database application, this program would consist of a SELECT statement, followed by some business logic, and conclude with a SQL UPDATE. I've probably written code like this hundreds of times. It's simple and runs quickly.
On paper, the ontology task seems even easier. Once opening a Jena ontology model, you can query for objects (Resources) in any number of ways (including a SQL-like SPARQL syntax). From there you simply create or modify properties on the resource. Since the ontology model is in memory, the changes happen real time (no need for an UPDATE). And while you could persist the changes, there really isn't any need to.
While this seems quite easy, I noticed severe performance problems running this program on even moderately sized data sets. Not only was the performance bad, but the application was using HUGE amounts of memory. Much more than I would expect for models of this size.
I immediately started to search for the cause of these problems and all signs pointed to the Jena Reasoner. If you remember from my previous post, the Reasoner is responsible for inferencing - essentially creating relationships between objects not explicitly defined in the data. You see, for each update to the model, the Reasoner ran (traversing much of the model) to discover any new relationships that it could find. As the model became larger and more data changed, the Reasoner process took up more and more of the processing time and system resources.
What I wanted to do was simply shut off the Reasoner temporarily while I did my batch update, then reactivate it once I was done. The Jena API provides no support to "turn off" the Reasoner once the model is created. I knew that there must be some way to do this and I asked some questions.
Eventually someone suggested that I create a new model with no Reasoner and load my existing "Reasoned" model into it. I could then batch all my changes into this simplified model with no Reasoner performance hit. The only trick is to run model.rebind() once the batch changes are complete in order to refresh the main model.
Sidebar: Under the covers, I believe the 2 models are actually sharing the same memory space, but the changes to the "basic model" are happening "'below the radar" of the main model's Reasoner. The rebind() method tells the main model to rescan its cache to get the updates.All in all, this process works like a champ and I'd recommend it to anyone having performance problems while batch changing a Jena ontology model.
Comments