In a modern world of theoretically unlimited computing power, semantic image segmentation has become a crucial approach for numerous applications, such as autonomous driving, advanced medical image analysis, object detection and many others. Often, creating a basic U-Net neural network shows decent results in numerical terms. For instance, the Dice coefficient is a popular image segmentation metric used in performance estimation. Nevertheless, after careful manual examination of the prediction masks, small “islands“ of mispredicted pixels become visible. Thus, the question arises: how can these minor inconsistencies be improved? Well, Conditional Random Fields also known as CRF is often used as a post-processing tool to improve the performance of the algorithm. However, this operation could be computationally costly during inference, especially on mobile devices. It also uses a set of parameters which needs to be hardcoded making it hard to be suitable for the whole test set of images. One of the possible solutions to that problem is to add a CRF algorithm as an additional layer to your Neural Network in the form of a Recurrent Neural Network (RNN) and make it trainable. While there are numerous scientific papers with regards to this approach, there are no out-of-the-box CRF-RNN implementations in most of the deep learning frameworks. Therefore, the main goal of this blog post is to demonstrate how to plug in this additional layer to the original U-Net model using Tensorflow.
Use Case Description
The example use case is a pixel-wise detection of the documents, in particular, receipts. Regular detection does not allow to capture deformations of the receipt’s shape. Figure 1 demonstrates one of the labelled images which is part of the test set.
Essentially, it is a binary classification on a pixel level, whereby a 0 class defines the background of the image and 1 – a document.
Plain U-Net Approach
The dataset is split into 3 subsets: train, dev and test. The latter is used for the final validation of the model. The model used to classify the image is a so-called U-Net model.
It consists of 2 major parts: an Encoder and a Decoder. The technical implementation is inspired by the Tensorflow example of the image segmentation which can be found following this link. To keep it short, the summary of the model can be observed in Figure 3.
Before passing through the network, images are resized down to 224×224 and normalised. Please notice that the softmax layer is left out. Training 30 epochs on 500 images have shown a pretty decent result on the test data set – Dice coefficient of 0.983. One of the prediction mask examples can be seen in the Figure 4.
Noticeable enough, as described in the introduction, there are small “islands” of misclassification circled in blue. That is where the CRF-RNN layer will come in handy. Before moving to it, it is essential to note that as the features have been already trained, they should be fixed and set to untrainable before adding the new CRF-RNN layer.
Once you have trained the features, It is time to connect the CRF-RNN layer and train the network once again. Unfortunately, there isn’t a pre-build CRF layer in Tensorflow. After a thorough search, I have bumped into a GitHub repository of Sadeep Jayasumana. He was so kind to build a custom class for Keras and make it public. The only limitation is that the batch size must be 1, which makes it a bit slow to train. Nevertheless, considering the fact that the features have already been trained, the limitation does not seem to be a big deal. As a first step, clone the git repository and follow the installation instructions. Once it is done, you can connect the custom CrfRnnLayer to your network. Below code snippet demonstrates how it has been done in our use case:
from crfrnn_layer import CrfRnnLayer def add_crf_layer(original_model): original_model.trainable = False crf_layer = CrfRnnLayer(image_dims=(224, 224), num_classes=2, theta_alpha=3., theta_beta=160., theta_gamma=3., num_iterations=10, name='crfrnn')([original_model.outputs, original_model.inputs]) new_crf_model = tf.keras.Model(inputs = original_model.input, outputs = crf_layer) return(new_crf_model)
There are a few parameters which need to be specified, such as images_dims and number of iterations. The first needs to match the output dimensions of the last layer in your feature extraction. The number of iterations is an arbitrary parameter. Regarding all others, they are subject to a hyperparameter optimisation. As soon as the new model is compiled, the model summary looks as follows:
As the last step, the EPOCHS parameter needs to be set to 1 while retraining the model as the number of iterations is already specified inside of the custom layer.
After passing through the network and running the validation on the test set, we observe that numeric performance metrics have slightly increased, showing the Dice coefficient of 0.9857. More importantly, the tiny “islands” of misclassification disappeared:
Now your solution is one step closer to the deployment in production!
To sum up, there is no out-of-the-box CRF-RNN layer implemented in Tensorflow. Nevertheless, thanks to the open-source community, there is a custom made option which assists in implementing it. To do that, these steps need to be followed:
- Train your features using CNNs.
- Make the features untrainable.
- Plug-in the custom CRF-RNN layer.
- Retrain the network.
- Use the new model for inference.
There are also a couple of points for further improvement of the layer. Firstly, at the moment, there is only a possibility for a batch size of 1. This can be improved, but requires some restructuring of the layer. Secondly, this custom method does not work with Tensorflow Lite, since this operator is not registered in it. Usage of custom made kernels is possible, but requires a series of C++ implementations in the core Tensorflow Lite library. The instructions for that can be found by following this link.