top of page

Relating skip connections in neural networks to consciousness

On consciousness

Consciousness appears to depend heavily on short-term memory—the ability to recall recent experiences and compare them to current stimuli. When we go on “autopilot” or black out, we move through time and can even accomplish tasks but without conscious realization of our recent pasts and motivations.


Relationship to neural networks

In the autopilot unconscious state, we resemble standard feedforward neural networks during one forward pass, functionally capable of doing useful things (for NNs, making accurate predictions) but without being conscious or self-aware of our reasoning process.


For a typical neural network, a multi-stage (i.e. deep) feedforward decision making process appears as such:



At each stage d > 1, the network is unaware of what led to its input at that stage—it lacks recollective memory, unaware of its own line of reasoning. For even small numbers of hidden states (i.e. small depths), intermediate outputs diverge wildly from the original stimuli in appearance; the network is effectively operating in the aforementioned “blackout” state.


On residual connections

Residual connections in neural networks have revolutionized deep learning by allowing effective training of increasingly deep neural networks. In residual networks (“ResNets”), intermediate outputs within the network are augmented with unprocessed raw input:



The input to layers at depth d > 1 includes (by vector addition) raw input to the previous layer d – 1. This is the so-called skip connection, or highway connection. The name comes from the skipping of signal processing. The term “residual” comes from the idea that the network is explicitly learning the difference between input signal and processed signal (i.e. residual signal).


Effects on training

There are practical reasons underpinning the motivation behind ResNets, namely mitigation of vanishing gradient and deep propagation of input signal. But there is a fascinating philosophical side note. The network at a given input depth d > 1 is effectively afforded some kind of implicit awareness of its own predictive tendency cast against an increasingly distant, foggy memory of the very first input.


Back to consciousness

Residual learning has an astonishing smoothing effect on the loss functions of very deep networks. Successful optimization of a non-residual architecture involves wild fluctuations in loss over time; the analogy to a blacked out person is appropriate as success is somehow incidental with no clear logical decision-making path from start to finish.


Conversely, networks with residual connections indicate something of a clear and apparent logical connection between inputs and outputs—perhaps akin to conscious decision making (e.g. short term memory).


Recurrent neural networks (RNNs) are often described as being useful for time-dependent—or “horizontal”—data. Conversely, deep networks, i.e. many series of stacked layers, are thought of as vertically oriented.


If we take a deep ResNet and imagine turning it sideways to become horizontal for the processing of a single input-output pair, we can understand the predictive process itself as a time-varying operation where each point-in-time is a decision-making process for the network. In this process, each point in time will loosely resemble previous points, all the way back in time to the original input.


In fact ResNets are not all too different from a type of RNN called an LSTM (Long Short-Term Memory). In LSTMs, each cell is given the option to incorporate original data, much in the way a ResNet block’s output incorporates its unadulterated input.


Deep learning techniques are often named in a way that is clearly inspired by the human brain—the term neural network itself, attention, memory—but sometimes appropriate terminology isn’t immediately clear. In these cases we resort to more prosaic, technically descriptive coinages: convolution, recurrence, residuals, skip connections, and so on.


The origins of consciousness are unclear, its presence difficult to ascertain. While this uncertainty remains, it's tempting to consider any evidence relating our digital creations to our organic selves.


One day, perhaps sooner than we expect, we will stumble onto an algorithm that is so clearly awake it can’t be mistaken. Until then, we'll continue to incorporate the latest human intelligence-inspired techniques into complex systems like AiME.

bottom of page