TensorFlow: Mutating variables and control flow
How to control operations orders and variable mutation in TF
In this article, we are going to explore deeper TensorFlow capacities in terms of variable mutation and control flow statements.
Mutation
So far, we’ve used Variables exclusively as some weights in our models that would be updated with an optimiser’s operation (like Adam). But optimisers are not the only way to update Variables, there is a whole set of higher order functions to do so (Again, see those functions as a way to add operations in your graph).
The most basic function to make custom updates is the tf.assign()
operation. It takes a Variable and a value, and assign the value to the Variable, simple.
let’s start with an example:
Nothing fancy here. It works just like any other operations: you call it within a Session
and the operation ensure that the mutation happens so your Variable gets updated.
Compare this assign
call to the usual optimiser train_op
call. Both do the same thing: mutate data. The only difference is that the optimizer is doing a whole lot of calculus before applying mutations to your Variables.
TF support many other functions to do manual updates, see them as helper functions. All of them could be replaced by some clever tensor operations followed by a tf.assign
call, but that would be cumbersome. So, TF provides two kinds of mutation operations for us:
- Those to apply sparse updates (update only a subset of elements of your variables): https://www.tensorflow.org/api_guides/python/state_ops#Sparse_Variable_Updates
- Those to apply dense updates (update the whole Variable at once): https://www.tensorflow.org/api_guides/python/state_ops#Variable_helper_functions
I won’t dig into all those helper functions. Some of them can be hard to wrap your head around, my best advice is just to experiment with them on a very simple script before using them into your models, you will earn time…
One last word about mutation: what if we would like to change the shape of our Variables? For example, adding a row/column on the fly right inside our graph? So far I’ve been only talking about “assigning” new values.
That’s possible but trickier:
- First, a
tf.Variable
has a parametervalidate_shape
defaulting toTrue.
It prevents you from updating the shape of it so we have to set it toFalse
. - This parameter also exists in the
tf.assign
function itself, so we have to turn it off again.
Let’s see an example:
OK! That was not too hard, let’s move on.
Control dependency
We can update Variables, but if you start to put assign calls all around your code, you will soon end up calling multiple times sess.run
to control them. This is not practical nor efficient. Remember, the more we stay in the graph, the more efficient we are.
Welcome in the realm of control flow. TF provides a set of functions to order your operations when they are not fully dependent.
Let’s start simple: we will build a graph doing a simple multiplication between a placeholder
and a Variable
. We would like to increment this Variable
before each call we make to our multiplication. How do we actually do that?
If we start the naive way, by just adding a tf.assign
call, we will end up with something like this:
It doesn’t work at all: our Variable
is not incremented and we keep outputting 2
.
If you look at the code above and try to build mentally a computation graph, you will clearly see that this graph doesn’t need to compute the assign_op
to compute the output of the multiplication between x
and y
: y
is already perfectly defined with the initialised value 2
.
To fix this, we need a way to force TF to run the assign_op
.
Hopefully, that does exist! We can add what is called a control dependency. This works just like Graph or Variables scope, we use it in conjunction with the python statement with
.
Let’s see an example:
Everything works fine. TF see a dependency so it runs the assign_op
before computing anything under the dependency scope, here is a visualisation:
- On the left, the graph just doesn’t care to compute the
assign_op
- On the right, the control dependency force the graph to compute the
assign_op
before computing the multiplication operation
One pitfall
Earlier I’ve talked about mutating the shape of a Variable. Sadly, using shape mutations with control dependency leads us into the dark side of TF code optimiser.
Before trying to explain anything here is a piece of code showing the result:
Look closely to the code and the outputs:
- The
print
operation is dependent on theassign_op
, it should only be computed afterx
has been updated. - Yet
x
looks like it has not been updated when we print it… - But in fact, it has been since I can get the true value of
x
using a special theread_value
function.
What the hell is happening? This behaviour can be misleading and this is probably closer to a bug than a feature, but TF is caching aggressively to optimise your computations. This happens to be one of the drawbacks you can encounter, be careful!
That’s it!
So, how could you use that?
One idea that top of my mind is with people in NLP having to deal with <unk>
words. Now you could “technically” update the shape of your embeddings online (while learning) to add words as you encounter them!
ONE BIG REMARK: I have no idea if such a model would still learn a useful (dynamic) word embedding, but if you test this I would love to hear about your experiments!
TensorFlow best practice series
This article is part of a more complete series of articles about TensorFlow. I’ve not yet defined all the different subjects of this series, so if you want to see any area of TensorFlow explored, add a comment! So far I wanted to explore those subjects (this list is subject to change and is in no particular order):
- A primer
- How to handle shapes in TensorFlow
- TensorFlow saving/restoring and mixing multiple models
- How to freeze a model and serve it with a python API
- TensorFlow: A proposal of good practices for files, folders and models architecture
- TensorFlow howto: a universal approximator inside a neural net
- How to optimise your input pipeline with queues and multi-threading
- Mutating variables and control flow (this one :) )
- How to handle preprocessing with TensorFlow.
- How to control the gradients to create custom back-prop operations.
- How to monitor and inspect my models to gain insight into them.
Note: TF is evolving fast right now, those articles are currently written for the 1.0.0 version.
Reference
- How to do a while loop: http://stackoverflow.com/questions/38994037/tensorflow-while-loop-for-training
- A nice implementation of what we’ve seen: https://github.com/PrajitR/fast-pixel-cnn/blob/master/fast_pixel_cnn_pp/fast_nn.p
- Some explanation about the dark side of optimisations: https://github.com/tensorflow/tensorflow/issues/7782