ONNX: the Open Neural Network Exchange Format
An open-source battle is being waged for the soul of artificial intelligence. It is being fought by industry titans, universities and communities of machine-learning researchers world-wide. This article chronicles one small skirmish in that fight: a standardized file format for neural networks. At stake is the open exchange of data among a multitude of tools instead of competing monolithic frameworks.
The good news is that the battleground is Free and Open. None of the big players are pushing closed-source solutions. Whether it is Keras and Tensorflow backed by Google, MXNet by Apache endorsed by Amazon, or Caffe2 or PyTorch supported by Facebook, all solutions are open-source software.
Unfortunately, while these projects are open, they are not interoperable. Each framework constitutes a complete stack that until recently could not interface in any way with any other framework. A new industry-backed standard, the Open Neural Network Exchange format, could change that.
Now, imagine a world where you can train a neural network in Keras, run the trained model through the NNVM optimizing compiler and deploy it to production on MXNet. And imagine that is just one of countless combinations of interoperable deep learning tools, including visualizations, performance profilers and optimizers. Researchers and DevOps no longer need to compromise on a single toolchain that provides a mediocre modeling environment and so-so deployment performance.
What is required is a standardized format that can express any machine-learning model and store trained parameters and weights, readable and writable by a suite of independently developed software.
Enter the Open Neural Network Exchange Format (ONNX).
To understand the drastic need for interoperability with a standard like ONNX, we first must understand the ridiculous requirements we have for existing monolithic frameworks.
A casual user of a deep learning framework may think of it as a language for specifying a neural network. For example, I want 100 input neurons, three fully connected layers each with 50 ReLU outputs, and a softmax on the output. My framework of choice has a domain language to specify this (like Caffe) or bindings to a language like Python with a clear API.
However, the specification of the network architecture is only the tip of the iceberg. Once a network structure is defined, the framework still has a great deal of complex work to do to make it run on your CPU or GPU cluster.
Python, obviously, doesn't run on a GPU. To make your network definition run on a GPU, it needs to be compiled into code for the CUDA (NVIDIA) or OpenCL (AMD and Intel) APIs or processed in an efficient way if running on a CPU. This compilation is complex and why most frameworks don't support both NVIDIA and AMD GPU back ends.
The job is still not complete though. Your framework also has to balance resource allocation and parallelism for the hardware you are using. Are you running on a Titan X card with more than 3,000 compute cores, or a GTX 1060 with far less than half as many? Does your card have 16GB of RAM or only 4? All of this affects how the computations must be optimized and run.
And still it gets worse. Do you have a cluster of 50 multi-GPU machines on which to train your network? Your framework needs to handle that too. Network protocols, efficient allocation, parameter sharing—how much can you ask of a single framework?
Now you say you want to deploy to production? You wish to scale your cluster automatically? You want a solid language with secure APIs?
When you add it all up, it seems absolutely insane to ask one monolithic project to handle all of those requirements. You cannot expect the authors who write the perfect network definition language to be the same authors who integrate deployment systems in Kubernetes or write optimal CUDA compilers.
The goal of ONNX is to break up the monolithic frameworks. Let an ecosystem of contributors develop each of these components, glued together by a common specification format.
The Ecosystem (and Politics)
Interoperability is a healthy sign of an open ecosystem. Unfortunately, until recently, it did not exist for deep learning. Every framework had its own format for storing computation graphs and trained models.
Late last year that started to change. The Open Neural Network Exchange format initiative was launched by Facebook, Amazon and Microsoft, with support from AMD, ARM, IBM, Intel, Huawei, NVIDIA and Qualcomm. Let me rephrase that as everyone but Google. The format has been included in most well known frameworks except Google's TensorFlow (for which a third-party converter exists).
This seems to be the classic scenario where the clear market leader, Google, has little interest in upending its dominance for the sake of openness. The smaller players are banding together to counter the 500-pound gorilla.
Google is committed to its own TensorFlow model and weight file format, SavedModel, which shares much of the functionality of ONNX. Google is building its own ecosystem around that format, including TensorFlow Server, Estimator and Tensor2Tensor to name a few.
The ONNX Solution
Building a single file format that can express all of the capabilities of all the deep learning frameworks is no trivial feat. How do you describe convolutions or recurrent networks with memory? Attention mechanisms? Dropout layers? What about embeddings and nearest neighbor algorithms found in fastText or StarSpace?
ONNX cribs a note from TensorFlow and declares everything is a graph of tensor operations. That statement alone is not sufficient, however. Dozens, perhaps hundreds, of operations must be supported, not all of which will be supported by all other tools and frameworks. Some frameworks may also implement an operation differently from their brethren.
There has been considerable debate in the ONNX community about what level tensor operations should be modeled at. Should ONNX be a mathematical toolbox that can support arbitrary equations with primitives such as sine and multiplication, or should it support higher-level constructs like integrated GRU units or Layer Normalization as single monolithic operations?
As it stands, ONNX currently defines about 100 operations. They range in complexity from arithmetic addition to a complete Long Short-Term Memory implementation. Not all tools support all operations, so just because you can generate an ONNX file of your model does not mean it will run anywhere.
Generation of an ONNX model file also can be awkward in some frameworks because it relies on a rigid definition of the order of operations in a graph structure. For example, PyTorch boasts a very pythonic imperative experience when defining models. You can use Python logic to lay out your model's flow, but you do not define a rigid graph structure as in other frameworks like TensorFlow. So there is no graph of operations to save; you actually have to run the model and trace the operations. The trace of operations is saved to the ONNX file.
It is early days for deep learning interoperability. Most users still pick a framework and stick with it. And an increasing number of users are going with TensorFlow. Google throws many resources and real-world production experience at it—it is hard to resist.
All frameworks are strong in some areas and weak in others. Every new framework must re-implement the full "stack" of functionality. Break up the stack, and you can play to the strengths of individual tools. That will lead to a healthier ecosystem.
ONNX is a step in the right direction.
Note: the ONNX GitHub page is here.
Braddock Gaskill is a research scientist with eBay Inc. He contributed to this article in his personal capacity. The views expressed are his own and do not necessarily represent the views of eBay Inc.
About the Author
Braddock Gaskill has 25 years of experience in AI and algorithmic software development. He also co-founded the Internet-in-a-Box open-source project and developed the libre Humane Wikipedia Reader for getting content to students in the developing world.