The Revised R*-Tree

The Revised R*-Tree is described in a paper by Norbert Beckmann and Bernhard Seeger [1]ACM (2009. It is a refinement of the standard R*-Tree originally described in 1990[2]Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger which in turn is based on the R-Tree [3]Antonin Guttman 1984.

I first came across the paper in 2010 when I was implementing a standard R-Tree in Java. The paper caught my interest and I converted my original R-Tree implementation into a Revised R*-Tree, which I used successfully for several years in my side project, which is a desktop map viewer[4]The current incarnation is written in Scala 2.11 with a ScalaFX user interface and a parallel tiled rendering algorithm using Akka. I started to learn Scala at the end of 2012 and after I completed the Coursera course [5]Functional Programming Principles in Scala I wanted something to keep sharpening my Scala knowledge on and decided to write a version of the Revised R*-Tree in Scala.

My original plan was to use it for multi-dimensional indexing across a range of attribute types. I soon had to abandon the non-numeric types and then the integer types due to Scala’s strong typing and some restrictions of the numeric types’ inheritance hierarchy. Eventually I settled for working with spatial bounding boxes defined with Doubles.

The original Scala implementation was quite different from the Java implementation. It was much easier to transcribe the algorithms directly into Scala code, although some simple things seemed very hard to get right in the new and unfamiliar language. The new implementation came together quite quickly, although I ended up spending a lot of time improving the insertion time by unwinding the elegant functional algorithms back into more conventional implementations, because the overhead of the functional style was very high in some places. In particular, I was never a great fan of Scala for comprehensions, but their performance makes them irrelevant for any performance-critical code. The query performance of the completed Revised R*-Tree code has never been much of an issue[6]i.e. it’s very fast with the test data set., however I spent a lot more time optimising the insertion code. My latest version should insert 5 million entries in less than two minutes. An early version ran a similar test over many hours. Overall the bulk of the work wasn’t in implementing the algorithm, but in making it run fast enough to handle data sets with entries in the half a million to five million range.

My preferred test data set comes from Minnesota Department of Transportation. In common with many public bodies in the United States, Mn/DOT publishes a comprehensive set of base map data for public use in non-commercial applications[7]Mn/DOT GIS data. I owe Mn/DOT and the people of the fine state of Minnesota a big “Thank you” for their generosity in publishing this data.

At the moment I work with county boundaries:

mndot-county-boundaries

… trunk highways:

mndot-trunk-highways

… and county roads:

mndot-county-roadsMixing the different types of data in the same index affects the performance of some of the drawing queries, so I thought it would be interesting to look at what happens to the index.

References

References
1 ACM (2009
2 Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger
3 Antonin Guttman 1984
4 The current incarnation is written in Scala 2.11 with a ScalaFX user interface and a parallel tiled rendering algorithm using Akka
5 Functional Programming Principles in Scala
6 i.e. it’s very fast with the test data set.
7 Mn/DOT GIS data

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.