Musings on Deep Learning from a Mosaic Data Scientist

*posted by Mosaic Data Science*

Inspired by the spike in interest in deep learning that I’ve seen around the business forum, I thought it would be insightful to share a few of my own thoughts about deep learning, what its role might be at Mosaic and other analytics consulting companies, and how it might advance in the next few years.

For those folks not breathlessly tracking the latest developments in RNNs, CNNs, LSTMs, and TGIFs (just kidding), I’ll start with a quick overview of the topic. Deep learning models are a subclass of artificial neural network (ANN) models. ANNs are mathematical models, meaning simply that quantitative data goes in one side and a quantitative result comes out the other. As the name implies, the structure of ANNs takes inspiration from the structure of biological brains. The “neurons” in a neural network are simple mathematical functions (linear, step, sigmoid, etc.), called “activation functions.” But as you connect many of these simple functions by using the output of one set of functions as the inputs to the next set (the “network”) then you can begin to represent some very complex functions – just as the tens of billions of relatively simple cells in the human brain conspire to create complex thoughts, emotions, and behaviors.

Deep learning simply refers to artificial neural networks with multiple layers of neurons between the inputs and outputs. These “hidden” layers allow for representation of complex and abstract features or patterns within the data. In building a traditional machine learning model (linear/logistic regression, classification trees, random forests, etc.), the key to a good model is often good feature engineering — the process of transforming raw data into the higher-level variables to be input into the model. Deep learning models require the data scientist to instead specify the structure of the network and to trust the optimization heuristics used for training to discover the best features that can be represented within each hidden layer. These features can occasionally be interpreted in meaningful ways (e.g., facial recognition models often develop nodes that seem to detect shapes associated with specific facial features) but are more often abstract and uninterpretable on their own. This is the real power of deep learning – the model can develop its own representation of what is important in the data, even where it goes beyond its human creator’s ability to encode or even interpret such a representation.

Artificial neural networks have been around since the 1950s but due to computational constraints, they were limited to simple structures with a small number of neurons for much of that history. However, algorithmic innovations in the last 25 years, research advances related to activation functions and network structures, and more recent hardware advances – notably, the use of graphical processor units (GPUs) to speed model training by orders of magnitude – have all allowed for much greater structural complexity to be applied to much larger training data, which in turn allows for representation of much more sophisticated and nuanced patterns. Deep learning has been behind some impressive, some flashy, and some… let’s say… interesting results.

Seems like magic! So let’s go all in with deep learning, right? Mosaic will become the next DeepMind! Not so fast. I do think that there is incredible promise in deep learning but, despite the hype, we’re still at least a few years away from a societal deep learning panacea. There are real drawbacks and pitfalls to deep learning. A few examples:

- Deep learning is data hungry. By removing the data scientist’s role in feature engineering, there is limited subject matter expertise encoded in the model. The learning model needs sufficient data to be able to determine on its own how to interpret the patterns it is seeing – to pull the signal from the noise – without any additional context.
- Deep learning is computationally expensive. Between the large amounts of training data required and the iterative optimization methods used to train the models, it can take hours or days for a single model training – and that’s after you’ve spun up the latest and greatest GPU on AWS.
- Building the right structure for a deep learning model is difficult. There is not one standard deep learning model. For some applications, there are well-studied structures that can give you a running start (e.g., convolutional neural nets for image recognition), but there is a reason that Google and Facebook have full-time, very well paid research teams building their next set of deep learning models. Huge amounts of effort have to go into a single model.
- Interpretability is an afterthought. While it is nice to be able to present amazing accuracy statistics for your machine learning model, many of our customers, particularly within MDS, are not willing to blindly trust an algorithm that they have no hope of understanding. It is much easier to build trust in a logistic regression model (when input A goes up, output B goes down) than a deep learning model that customers can’t even interpret for themselves.
- For many simple applications, deep learning isn’t actually that much better than other machine learning approaches – particularly for predictive modeling, where incomplete information is often the constraining factor. I came across a paper comparing different approaches to a very similar email classification problem. The key insight for me: a simple random forest model was always within a couple of percentage points of the most advanced CNN and LSTM methods that were tried, and it actually outperformed those approaches on two of the five test sets.

So what do I think the role of deep learning should be at Mosaic and for business in general? I think we need to be profoundly literate in the subject and aware of high-level developments in the field. Data in the world continues to get bigger and more complex, and advances in the field of deep learning as well as tools like TensorFlow and Keras make deep learning more and more viable. Business leaders are reading the same hype about deep learning that we are and often look to us to provide a rational take on the topic.

However, I’m not ready to get deep learning on our list of core capabilities, and I don’t think it’s time for a concentrated professional development effort in this area. Our objective is most often to make incremental progress to a human-driven decision process rather than automating decisions, which increases the importance of model interpretability relative to model performance. We pitch our agility and speed-to-value as a differentiator, and with our customers often starting from ground zero in the areas that we are working, the investment required for a single high-performing deep learning model is better spread across multiple initiatives using simpler machine learning models that can be deployed and supported on a leaner infrastructure. Deep learning brings the greatest value when there is significant investment made in solving a single, well-defined, strategically important problem with a large amount of labeled data to work from.

I also think we should be thinking of deep learning as it fits with our core strengths. I see (deep) reinforcement learning as a compelling opportunity, given our expertise in simulation and our capabilities around decision support in environments where many decisions made at many points in time under great uncertainty impact the ultimate efficiency of the system. Reinforcement learning uses a test and learn process, typically with a heavy dose of simulation, to assign the cost or value (“reward”) of an eventual outcome back to the individual decisions that together led to that outcome. Over a large number of replications, a deep learning model can learn to represent the state of the system at any point in time in a meaningful way. It can begin to predict the expected reward of a particular decision based only on the information available at the time the decision is made – even if that reward isn’t realized until well into the future. To optimize a particular decision, simply follow a greedy heuristic and select the option that maximizes the expected reward as predicted by the model like this. I see great potential in applying such an approach to decisions like runway sequencing, TMIs, and reroutes.

Another interesting area to watch is transfer learning, in which a model trained for solving one problem can be used as a knowledge store to make a similar problem more tractable. The hidden layers from the model trained on the original dataset are used to generate abstract features that were deemed most relevant to the original modeling objective. One would suspect that these features would also be highly relevant to the similar task. For example, a deep learning model trained as a general image recognition engine may allow you to train a more targeted image classifier on a very small number of labeled images compared to what it would take to train such a model from scratch. For our data science customers who often come to us with hundreds or thousands rather than millions of labeled data points, a transfer learning approach could allow us to leverage the power of deep learning while still delivering manageable tools.

## Peter Le

2 weeks ago

Nice summary and perspective regarding deep learning! Thanks for sharing.

At the same time, technology is changing so fast and it may take some times to get a core capacity well established and operated. So, it may be not too early to start listing deep learning as a core capacity and building its customer base now. Just a thought.

## Drew Clancy

2 weeks ago

Thanks Peter, appreciate the comment and idea!