TensorFlow howto: a universal approximator inside a neural net

Today, let’s take a break from learning and implement something instead!

Did you hear about the “Universal approximation theorem”?

Basically, this theorem states that (without all the nitty-gritty details):

  • for any continuous function f defined in R^n …
  • … you can find a wide enough 1-hidden layer neural net …
  • … that will approximate f as well as you want on a closed interval

that sounds very cool!

Let’s dive directly into the code and build an implementation with TensorFlow in the basic case of f is a function from R to R. Basically, we are going to build a 1-hidden layer neural network without a bias on the output layer, let’s see:

Some notes:

  • x must be of rank 2 to be used by the TensorFlow matmul function. This means that x is of shape [None, 1] (None holds for the batch size: you can see it as a capacity to compute as many values as you want in a single call)
  • The input_dim and output_dim are hard-coded right now, but you could change them as you wish to handle a lot more kinds of functions. In our case we’ll keep it simple so we can graph our function easily.
  • Finally, notice the existence of the Relu function. We could have used a lot of different functions instead of it but it doesn’t really matter for the theorem as long as it is an increasing function: it only matters for the “speed of learning”.

Now, let’s write a very simple script to evaluate this function:

All right! We have our “universal approximator” (UA).

We only need to train it now, to approximate any function we want on a given closed interval (You won’t do it on an infinite interval, would you ?).

So let’s start by a function i personally didn’t believe a neural network would approximate well: the sine function

Sidenote: if you’re like me and wonder how is this possible, let me give you a mathematical hint:
- Any continuous function on a compact set (closed intervals) can be approximated by a piecewise constant function as well as we want
- And you can build a neural network manually which will be as close as you want to this piecewise function by adding as neurons as necessary

We will build a script to:

  • train our UA on the sine function.
  • graph the resulting approximated function and the sine one side by side
  • Make the hidden_dim accessible in the command line to be able to change it easily

I will post the whole script directly here. It contains the explanation as comments.

I believe this is more suitable for Medium and also for you if you want to run it (don’t be afraid by the length of the file, there are a lot of comments and empty lines).

Now you can open two terminals and launch the following commands from the main directory to see the magic happens:

  • python myfile.py --nb_neurons 50
  • tensorboard --logdir results --reload_interval 5 (the default reload_interval is 120 seconds to avoid being too aggressive on the computer but in our case we can safely speed it a little bit)

You can go watch your UA training in real time and see it learn the sine function.

Remember, we should see that the more hidden_dim neurons we add, the more our function will approximate well!

Lets me show you for 4 different values of hidden_dim: [20, 50, 100, 500]

Different graph showing the effect of the number of neurons in the UA

As expected, if we increase the number of neurons, our UA approximate better our sine function and in fact, we can be as close as we want from it. That’s pretty neat to see it working.

YET our UA implementation has a huge drawback, we can’t reuse it if the input_dim starts to vary…

What if, in our wildest dream, we would like, to approximate the activation function of a complex neural network! Wouldn’t that be a cool inception?

I think this is a very good exercise for you, how can you trick TensorFlow to have an implementation handling dynamic input dimensions? (The solution is in my Github if you want to check it, but you should work it out by yourself first).

To end this article, here is a little gift: I’ve been using the second implementation to train a neural network on the MNIST dataset. (So we have a neural network using a neural network as an activation function).

Those are the graphs of activation function approximated on it, Cheers!

Sidenotes:
I’m using the ELU function inside the second UA, this is why the resulting approximations are curved.
All those approximations happened multiple times
I’ve reached consistently 0.98 accuracy on the MNIST test set, telling us that the activation function is potentially not very important to be able to learn a task.

TensorFlow best practice series

This article is part of a more complete series of articles about TensorFlow. I’ve not yet defined all the different subjects of this series, so if you want to see any area of TensorFlow explored, add a comment! So far I wanted to explore those subjects (this list is subject to change and is in no particular order):

Note: TF is evolving fast right now, those articles are currently written for the 1.0.0 version.