TensorFlow: How to freeze a model and serve it with a python API

Morgan

Published in

metaflow-ai

4 min readNov 25, 2016

We are going to explore two parts of using an ML model in production:

How to export a model and have a simple self-sufficient file for it
How to build a simple python server (using flask) to serve it with TF

Note: if you want to see the kind of graph I save/load/freeze, you can here

How to freeze (export) a saved model

If you wonder how to save a model with TensorFlow, please have a look at my previous article before going on.

let’s start from a folder containing a model, it probably looks something like this:

Screenshot of the resulting folder before freezing our model

The important files here are the “.chkp” ones. If you remember well, for each pair at different timesteps, one is holding the weights (“.data”) and the other one (“.meta”) is holding the graph and all its metadata (so you can retrain it etc…)

But when we want to serve a model in production, we don’t need any special metadata to clutter our files, we just want our model and its weights nicely packaged in one file. This facilitate storage, versioning and updates of your different models.

Luckily in TF, we can easily build our own function to do it. Let’s explore the different steps we have to perform:

Retrieve our saved graph: we need to load the previously saved meta-graph in the default graph and retrieve its graph_def (the ProtoBuf definition of our graph)
Restore the weights: we start a Session and restore the weights of our graph inside that Session
Remove all metadata useless for inference: Here, TF helps us with a nice helper function which grabs what is needed in your graph to perform inference and returns what we will call our new “frozen graph_def”
Save it to the disk: Finally, we will serialize our frozen graph_def ProtoBuf and dump it to the disk

Note that the two first steps are the same as when we load any graph in TF, the only tricky part is actually the graph “freezing” and TF has a built-in function to do it!

I provide a slightly different version which is simpler and that I found handy. The original freeze_graph function provided by TF is installed in your bin dir and can be called directly if you used PIP to install TF. If not you can call it directly from its folder (see the commented import in the gist).

So let’s see:

Now we can see a new file in our folder: “frozen_model.pb”.

Screenshot of the resulting folder after freezing our model

As expected, its size is bigger than the weights file size and lower than the sum of the two checkpoints files sizes.

Note: In this very simple case, the weights file size is very small, but it is usually multiple Mbs.

How to use the frozen model

Naturally, after knowing how to freeze a model, one might wonder how to use it.

The little trick to have in mind is to understand that what we dumped to the disk was a graph_def ProtoBuf. So to import it back in a python script we need to:

Import a graph_def ProtoBuf first
Load this graph_def into an actual Graph

We can build a convenient function to do so:

Now that we built our function to load our frozen model, let’s create a simple script to finally make use of it:

Note: when loading the frozen model, all operations got prefixed by “prefix”. This is due to the parameter “name” in the “import_graph_def” function, by default it prefixes everything by “import”.
This can be useful to avoid name collisions if you want to import your graph_def in an existing Graph.

How to build a (very) simple API

For this part, I will let the code speaks for itself. After all this is a TF series about TF and not so much about how to build a server in python. Yet it felt kind of unfinished without it, so here you go, the final workflow:

Note: We are using flask in this example

TensorFlow best practice series

This article is part of a more complete series of articles about TensorFlow. I’ve not yet defined all the different subjects of this series, so if you want to see any area of TensorFlow explored, add a comment! So far I wanted to explore those subjects (this list is subject to change and is in no particular order):

A primer
How to handle shapes in TensorFlow
TensorFlow saving/restoring and mixing multiple models
How to freeze a model and serve it with python (this one!)
TensorFlow: A proposal of good practices for files, folders and models architecture
TensorFlow howto: a universal approximator inside a neural net
How to optimise your input pipeline with queues and multi-threading
Mutating variables and control flow
How to handle input data with TensorFlow.
How to control the gradients to create custom back-prop or fine-tune my models.
How to monitor and inspect my models to gain insight into them.

Note: TF is evolving fast right now, those articles are currently written for the 1.0.0 version.

References

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py