Posts

Thread-specific components as a type

In user interface programming most of the commonly used user interface toolkits work on a single-threaded model. In this post we will look at an example in ScalaFX, but the same principle applies to JavaFX and Swing. In these toolkits, a component must only be accessed from the dedicated event processing thread (e.g Platform FX Application Thread) after it has been realized, except when using methods documented as thread-safe. A component is realized when it is made ready to be painted, e.g. by calling pack(), show() or setVisible(true) either on the component or on one of its ancestors in the component hierarchy. In practice this means that components can be constructed and initialized on other threads, but as soon as the component has been realized, access to the component must be confined to the limited number of thread-safe methods or to methods running on the event processing thread.

The rules described above are well-documented but can be hard to follow in practice. This is where the type system can help. The book Functional Programming in Scala:
Functional Programming in Scala - front cover explains how to design and build type algebras using increasingly advanced concepts, however the design presented here is a very simple application of the techniques described in the book.

Our type algebra needs to represent the idea of a component that can only be accessed from the event processing thread.


class Confined[T](t: T) {}

T is the generic type of the component. We could restrict this to be a subclass of a component (e.g. T <: scalafx.scene.Node), but for now we will leave T unbounded. We want to put the component into a context that controls access to its public API:


object Confined { def apply[T](t: T): Confined[T] = new Confined[T](t) }

We also want to be able to use the component’s API from our own code:


class Confined[T](t: T) { def run(f: T => Unit): Unit = ??? }

… but we need to ensure that our code is run on the event processing thread:


def run(f: T => Unit): Unit = { if (Platform.isFxApplicationThread) f(t) else Platform.runLater(f(t)) }

We could go further and start to wonder if this class fits one of the standard functional programming patterns, such as Monoid or Monad, but at first sight the Unit return type of our function and the possibility that we will switch threads inside the run(…) function makes it appear that our class does not neatly fit into either of these patterns.

Let’s leave it there for now and add some Scaladoc to our completed code:


package com.mouseforge.scalafx import scalafx.application.Platform /** * Confined class helper methods. */ object Confined { /** * Create a new Confined instance of generic type T. * * @param t the instance to be confined to the Platform FX Application Thread. * @tparam T the type of instance t. * @return a new Confined instance encapsulating the instance that should be confined to the Platform FX Application Thread. */ def apply[T](t: T): Confined[T] = new Confined[T](t) } /** * Encapsulates a user interface component that should be confined to the Platform FX Application Thread. * * A function on a confined instance will only be performed immediately if called from the Platform FX Application Thread, * otherwise the function will be scheduled to run later on the Platform FX Application Thread. * * Uses the idea of a type algebra described in Functional Programming in Scala. * * @param t the instance that should only be accessed on the Platform FX Application Thread. * @tparam T the type of the instance t. */ class Confined[T](t: T) { /** * Perform a function on the operand on the Platform FX Application Thread. * * @param f the function to perform on the operand to be performed on the Platform FX Application Thread. */ def run(f: T => Unit): Unit = { if (Platform.isFxApplicationThread) f(t) else Platform.runLater(f(t)) } }

Now we can use the type system to protect our user interface component from accidental access from the wrong thread like this:


stage.show() // Stage and all its components are now realized - confine the sceneRef. val confinedScene = Confined(sceneRef) // We want to add a ticker-tape style label - build the confined label val confinedLabel: Confined[Label] = Confined(new Label {text = ""; textFill = Color.DarkOrchid; font = Font(24.0)}) // Add the label to the scene using the confined elements. confinedScene.run(cs => confinedLabel.run(label => cs.content.add(label))) // Start the ticker-tape message. new Thread(() => adjustText(confinedLabel, "Hello from Mouseforge!!!")).start()

If the components are confined then it is very difficult to accidentally access them except from the event processing thread. It is still possible to extract the component but it is hard to do accidentally:


// Bad code var unsafeLabel = _ confinedLabel.run(cl => unsafeLabel = cl) // Don't do this! confinedLabel.run(cl => unsafeLabel.synchronized { unsafeLabel = cl // Thread-safe, but don't do this either. })

It is also possible to access the unconfined components while they are still in scope. In general, this technique isn’t foolproof, but I have found it useful to ensure my user interface updates are not accidentally processed on my background threads. The method that catches me out most often is size(), it’s very easy to call on the wrong thread and hard to spot once you’ve done it.

This is the full ticker-tape example:


package com.mouseforge.scalafx import java.io.InputStream import java.net.URL import scalafx.application.JFXApp import scalafx.scene.Scene import scalafx.scene.image.{Image, ImageView} import scalafx.scene.control.Label import scalafx.scene.layout.StackPane import scalafx.scene.paint.Color import scalafx.scene.text.Font object HelloMouseforge extends JFXApp { val sceneRef: Scene = new Scene { root = new StackPane() } stage = new JFXApp.PrimaryStage { title.value = "Mouseforge" width = 377 height = 324 var inputStream: InputStream = _ try { inputStream = new URL("https://mouseforge.com/wp-content/uploads/2015/04/mouseforge.jpg").openStream() val image = new Image(inputStream) val imageView = new ImageView(image) sceneRef.content.add(imageView) scene = sceneRef } finally { inputStream.close() } } def adjustText(confinedTextBox: Confined[Label], fullText: String): Unit = { for (loop <- 0 to 1000) { val text = fullText.take(loop % (fullText.length + 1)) confinedTextBox.run(l => l.text = text) Thread.sleep(125L) } } stage.show() // Stage and all its components are now realized - confine the sceneRef. val confinedScene = Confined(sceneRef) // We want to add a ticker-tape style label - build the confined label val confinedLabel: Confined[Label] = Confined(new Label {text = ""; textFill = Color.DarkOrchid; font = Font(24.0)}) // Add the label to the scene using the confined elements. confinedScene.run(cs => confinedLabel.run(label => cs.content.add(label))) // Start the ticker-tape message. new Thread(() => adjustText(confinedLabel, "Hello from Mouseforge!!!")).start() } import scalafx.application.Platform /** * Confined class helper methods. */ object Confined { /** * Create a new Confined instance of generic type T. * * @param t the instance to be confined to the Platform FX Application Thread. * @tparam T the type of instance t. * @return a new Confined instance encapsulating the instance that should be confined to the Platform FX Application Thread. */ def apply[T](t: T): Confined[T] = new Confined[T](t) } /** * Encapsulates a user interface component that should be confined to the Platform FX Application Thread. * * A function on a confined instance will only be performed immediately if called from the Platform FX Application Thread, * otherwise the function will be scheduled to run later on the Platform FX Application Thread. * * Uses the idea of a type algebra described in Functional Programming in Scala. * * @param t the instance that should only be accessed on the Platform FX Application Thread. * @tparam T the type of the instance t. */ class Confined[T](t: T) { /** * Perform a function on the operand on the Platform FX Application Thread. * * @param f the function to perform on the operand to be performed on the Platform FX Application Thread. */ def run(f: T => Unit): Unit = { if (Platform.isFxApplicationThread) f(t) else Platform.runLater(f(t)) } }

Even More Pickling

Puzzled at my lack of success with Scala pickling I filed an issue on GitHub [1]https://github.com/scala/pickling/issues/326. I will be the first to admit that as bug reports go, “I’ve got 20,000 lines of code and it doesn’t work with your library”, is pretty much a lost cause. I did hope to get one or two tips on how I could help myself, or possibly flamed for asking a stupid question, but I didn’t expect to be completely ignored for two weeks. I did eventually receive a reply, which involved quite a lot of work to follow up.

The reply suggested that I build my code in Java 7 outside of the IntelliJ environment. Doing this (via a shell script) identified a few small issues [2]I discovered a couple of Java files still lurking in the project but increased my confidence that it was pickling that was broken, not my environment. This led me to read through all of the open issues and #296 [3]https://github.com/scala/pickling/issues/296 turned out to be a real gem.

It turns out that pickling identifies any Scala class with a getter or setter method as a Java Bean. This leads it to do strange things and in particular it will generate a pickler at compile time for traits/base classes without full knowledge of the runtime type. This leads to a lot of lost or empty objects in the output. This is more or less a disaster for my scenario, with a lot of code converted from Java.

Fast forward another few weeks and I’ve converted most of the getters and setters to something closer to idiomatic Scala. A single feature definition produces about 12K of JSON output, which is probably not complete, but getting there.

Pickling still isn’t usable because it falls over with a strange error:

Caused by: java.lang.RuntimeException: error: cannot find class or module with type name 'scala.collection.immutable.ListMap.Node'
full type string: 'scala.collection.immutable.ListMap.Node'

I can get past this for a single instance by capturing it as a val and declaring the concrete static type. There is no ListMap in the output. If it is declared as a trait then it fails. This looks like another case of selecting the wrong static pickler at compile time, when deferring to a runtime pickler would be the safer choice.

It seems that the development team have the features that they need in working condition. Perhaps it is good enough to write papers about, good enough to solve some big data problems, good enough to use if you know what works and what doesn’t. I don’t think the current version is a general purpose competitor to Java Serialization. It isn’t robust enough and the documentation doesn’t point out the areas in which valid Scala code will cause it to break. If there was a bit more activity on the GitHub project I’d have waited for the library to mature, but it looks like the current version is here to stay for a while.

What puzzles me most of all is my grim determination to force it to work. To start with it looked like a lost cause. Many hours of spare-time programming later it looks like I might eventually trick it into working … this one line of code:

val pickled = featureDefinition.pickle.value

It seems I still have some good-will left, despite the obvious frustrations of spending so long getting nowhere.

References

References
1 https://github.com/scala/pickling/issues/326
2 I discovered a couple of Java files still lurking in the project
3 https://github.com/scala/pickling/issues/296

The Revised R*-Tree

The Revised R*-Tree is described in a paper by Norbert Beckmann and Bernhard Seeger [1]ACM (2009. It is a refinement of the standard R*-Tree originally described in 1990[2]Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger which in turn is based on the R-Tree [3]Antonin Guttman 1984.

I first came across the paper in 2010 when I was implementing a standard R-Tree in Java. The paper caught my interest and I converted my original R-Tree implementation into a Revised R*-Tree, which I used successfully for several years in my side project, which is a desktop map viewer[4]The current incarnation is written in Scala 2.11 with a ScalaFX user interface and a parallel tiled rendering algorithm using Akka. I started to learn Scala at the end of 2012 and after I completed the Coursera course [5]Functional Programming Principles in Scala I wanted something to keep sharpening my Scala knowledge on and decided to write a version of the Revised R*-Tree in Scala.

My original plan was to use it for multi-dimensional indexing across a range of attribute types. I soon had to abandon the non-numeric types and then the integer types due to Scala’s strong typing and some restrictions of the numeric types’ inheritance hierarchy. Eventually I settled for working with spatial bounding boxes defined with Doubles.

The original Scala implementation was quite different from the Java implementation. It was much easier to transcribe the algorithms directly into Scala code, although some simple things seemed very hard to get right in the new and unfamiliar language. The new implementation came together quite quickly, although I ended up spending a lot of time improving the insertion time by unwinding the elegant functional algorithms back into more conventional implementations, because the overhead of the functional style was very high in some places. In particular, I was never a great fan of Scala for comprehensions, but their performance makes them irrelevant for any performance-critical code. The query performance of the completed Revised R*-Tree code has never been much of an issue[6]i.e. it’s very fast with the test data set., however I spent a lot more time optimising the insertion code. My latest version should insert 5 million entries in less than two minutes. An early version ran a similar test over many hours. Overall the bulk of the work wasn’t in implementing the algorithm, but in making it run fast enough to handle data sets with entries in the half a million to five million range.

My preferred test data set comes from Minnesota Department of Transportation. In common with many public bodies in the United States, Mn/DOT publishes a comprehensive set of base map data for public use in non-commercial applications[7]Mn/DOT GIS data. I owe Mn/DOT and the people of the fine state of Minnesota a big “Thank you” for their generosity in publishing this data.

At the moment I work with county boundaries:

mndot-county-boundaries

… trunk highways:

mndot-trunk-highways

… and county roads:

mndot-county-roadsMixing the different types of data in the same index affects the performance of some of the drawing queries, so I thought it would be interesting to look at what happens to the index.

References

References
1 ACM (2009
2 Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger
3 Antonin Guttman 1984
4 The current incarnation is written in Scala 2.11 with a ScalaFX user interface and a parallel tiled rendering algorithm using Akka
5 Functional Programming Principles in Scala
6 i.e. it’s very fast with the test data set.
7 Mn/DOT GIS data

More Pickling

Following on from my experiments with Scala Pickling, I thought I would try unpickle() on the JSON file. This is where I discovered that things hadn’t gone well at all. First of all, I had to increase the maximum heap – 8GB was not enough. I used 14GB, but perhaps 9 would have been enough. It’s still a lot. At this point I started to get exceptions from the unpickle() call. It turned out that a lot of my data structure had turned into references like this:

{
 "$type": "java.lang.Object"
 }

Unsurprisingly Scala can’t unpickle this into anything useful [1]I suppose it could create the Java object instance, but it wouldn’t be usable anywhere with a type declaration.

Now I am going through my code converting the Java collections to Scala collections – mostly mutable.Map and mutable.ArrayBuffer to start with. I’ve spent quite a few hours on this now. The code is no doubt better for it, but pickling is still losing most of my object graph. One class retains one String field but loses another. This is puzzling to say the least. If this was some random library from the internet I would give up and move on, but this is the right way to do Scala serialization from EPFL. Although it’s fairly new, I’m pretty sure it should work a lot better than this.

References

References
1 I suppose it could create the Java object instance, but it wouldn’t be usable anywhere with a type declaration.

Scala Pickling

My first-ever blog post documents my experiences so far with Scala pickling.

Since I’ve started in the middle, a quick precap[1]A “precap” is like a recap, but informs the reader about details that might be written in a future post. Although it sounds like it should be short for “precapitulation” that would imply being resigned to defeat and surrender, so it’s a complete word in its own right. No defeatism here … at least, not yet!.

I’ve been learning Scala for three or four years in my spare time and have a little project built with Scala, ScalaFX and a bit of Akka. It’s a map viewer that started off as a viewer for my implementation of the Revised R*-Tree[2]Beckmann and Seeger, ACM 978-1-60558-551-2/09/06. The original implementation was in Java, but I rewrote the R*-Tree in Scala and ported the rest of the Java code to Scala. I have a test data set  based on data published by Minnesota Department of Transportation which amounts to about half a million features – mostly county road segments. The data was imported from Shape files (an ESRI format) and is stored in a PostGIS database[3]I also have a copy stored in Neo4J, but that’s another story. The PostGIS database lives on my home server, so for development purposes I serialize the objects to a local file and (for the most-part) work from the local file instead.

At the moment the code uses Java serialization via Scala. It’s OK, but it’s a little fragile, not easily human readable and a bit slow to work with. This is where Scala Pickling comes in. My plan was to use Scala pickling to serialize the same objects to JSON. JSON is about as human-readable as my kind of structured data is likely to get and more resilient and more human fixable than binary data too. If the hype is to be believed, Scala pickling should also be more efficient. It sounded better all-round so I thought I’d give it a try.

So, this morning I fired up my IDE (IntelliJ IDEA) and started to add pickling. I naively thought that it might be included in Scala 2.11.5, but it turns out that it’s a short download away via SBT. I do this in a slightly clunky way, because my IntelliJ project pre-dates a solid SBT integration. In theory I use my new SBT-based project to do the download, and then copy the files over to the lib directory in the old-style project. Easy … except … I work on a MacBook Pro running OSX and getting Finder to see files in a directory starting with a ‘.'[4]the character known as “period” if you’re in the US, “dot” for the rest of us involves twelve flavours of witchcraft. Thirteen on a bad day. After a 90 minute digression in which I discovered that:

  • a half-decent file manager for OSX costs at least $18;
  • if it’s any good it definitely isn’t in the App store;
  • since everybody and his aunt needs a better file manager than Finder there are thousands of unofficial file managers to choose from
  • at $18+ a pop I definitely need the once and future king of file managers

… I really couldn’t decide which was the undisputed Joffrey the first of his name and used cp on the command line instead. 93 minutes to copy three files. At this rate … but I digress.

A few seconds later, Scala pickling was set up as a library ready to be imported and used. The import is a little confusing since it used to be “import scala.pickling._” but at some point this changed to “import scala.pickling.Defaults._”, so about half the examples on the web don’t work straight away. The next step is to choose the pickling format, which in my case was achieved with “import scala.pickling.json._”, exactly as advertised.

In theory, the next and final step should be to pickle the target object, something like this: “val pickledObject = targetObject.pickle().value” which in my case ran, but turned my container of a Java ArrayList holding half a million objects into a JSON file of 105 bytes[5]via a standard FileWriter.. I don’t know why, perhaps its the Java ArrayList causes a Scala-Java-Scala effect, as the fix[6]when I say “fix” I obviously mean “caused it to break in a new and even more fascinating way”. was to use a Scala Array instead.

Java serialization produces a 290MB file and runs happily in the 4GB maximum allocated to the JVM. Unfortunately Scala pickling broke with an OutOfMemoryError after whirring away for a couple of minutes. At first I thought this might be down to cycles in my object graph, but a swift[7]when I say “swift” I’m glossing over dealing with the malware on one of the dodgy answer aggregation sites that scrapes answers from StackOverflow and adds … a whole world of pain. After killing Safari, carefully reopening the tabs and killing the dodgy one before it started playing its evil tricks, I was back in control of my browser Google revealed that this is supported and works. After experimenting with subsets of the original array[8]courtesy of take() I found that increasing the maximum heap size to 8GB gave Scala enough head-room to pickle the whole array.

A working solution and, as usual, the hardest parts of the process were:

  1. getting started,
  2. losing so much time to glitches.

The end result is memory-hungry and quite slow: ~80s to pickle the objects to a JSON string and ~15s to write the string to a file. The file is about 840MB but it does compress to a very tiny 3.4MB using “tar -cvzf <output file> <input file>”. For comparison with Java serialization, the uncompressed file is 290MB which compresses to 66MB.

In conclusion, this was an interesting experiment but Scala pickling to JSON is probably the wrong tool for this job. If I’d set out to write an export to JSON feature I’d have been delighted with this result, since it took so few lines of code[9]about 5 plus about 10 more to integrate it into the rest of the program to implement the whole feature. I’ll probably stick with Java serialization for now, but if I was to dig deeper – by chunking my data, by using a different format or by exploring the customization options – I think there are ways forward to a much more promising result.

Also, by writing this post, I realised that the proportion of the effort that went into the real work of implementing this feature was a shockingly small part of the total. The rest was about getting past hurdles that shouldn’t have been there in the first place. In retrospect I’m surprised at how much each one slowed me down.

 

References

References
1 A “precap” is like a recap, but informs the reader about details that might be written in a future post. Although it sounds like it should be short for “precapitulation” that would imply being resigned to defeat and surrender, so it’s a complete word in its own right. No defeatism here … at least, not yet!
2 Beckmann and Seeger, ACM 978-1-60558-551-2/09/06
3 I also have a copy stored in Neo4J, but that’s another story
4 the character known as “period” if you’re in the US, “dot” for the rest of us
5 via a standard FileWriter.
6 when I say “fix” I obviously mean “caused it to break in a new and even more fascinating way”.
7 when I say “swift” I’m glossing over dealing with the malware on one of the dodgy answer aggregation sites that scrapes answers from StackOverflow and adds … a whole world of pain. After killing Safari, carefully reopening the tabs and killing the dodgy one before it started playing its evil tricks, I was back in control of my browser
8 courtesy of take()
9 about 5 plus about 10 more to integrate it into the rest of the program