Self-attention can be big for TinyML applications – TechTalks

self-awareness tinyml

This article is part of our coverage of the latest in AI research.

Advances in computing technology have been a boon to deep learning research. Almost limitless cloud computing resources, GPU clusters, and specialized AI have enabled researchers to design, train, test, and tune deep neural networks at speeds previously impossible. This speed and efficiency, combined with the ingenuity of a growing number of scientists, have helped usher in some of the most successful trends in deep learning over the past decade.

At the same time, however, there is growing interest in bringing the latest deep learning innovations to TinyML, the branch of machine learning that focuses on resource-constrained environments that are disconnected from the cloud.

One of these innovations is the self-awareness mechanism, which has become one of the key components in many large-scale deep learning architectures.

In a new article, researchers from the University of Waterloo and DarwinAI present a new deep learning architecture that gives TinyML highly efficient self-awareness. Dubbed the “Double-Condensing Attention Condenser,” the architecture builds on the team’s previous work and shows promise for edge AI applications.

Self-Awareness in TinyML

Attention capacitor architecture
The architecture of the attention condenser

Classic deep neural networks are designed to process one piece of information at a time. However, in many applications, the neural network must process and consider the relationships between a sequence of input data before making a prediction. The most obvious use case for this is in natural language processing, where the meaning of a word depends on what comes before or after it. Self-awareness is also used in other types of applications, including computer vision and speech recognition.

Self-awareness is one of the most efficient and successful mechanisms to address relationships between sequential data. It is used in Transformers, the deep learning architecture behind major language models like GPT-3 and OPT-175B. But it can also be very useful for TinyML applications.

Also Read :  Dow Jones Futures Fall: Market Rally Reeling As Fed's Powell Signals Higher Peak Rate

“With the increasing demand for TinyML to power a tremendous range of real-world applications such as Manufacturing/Industry 4.0, my teams at DarwinAI and the Vision and Image Processing Research Group have continuously strived to build smaller and faster deep neural network architectures for the edge to to power such applications,” said Alexander Wong, Professor at the University of Waterloo and Chief Scientist at DarwinAI TechTalks. “Given the exponential increase in self-awareness in deep learning, we were curious if we could bring the idea of ​​self-awareness and implement it in a way that increases both efficiency and performance in the realm of TinyML on the Edge while reducing the typical trade-off between larger attention and increased complexity of the real world.”

In 2020, Wong and his colleagues published the introduction of “attention capacitors,” a mechanism to enable highly efficient self-awareness neural networks on edge devices.

“The main idea behind the original attentional capacitors was to create a new type of self-awareness mechanism that co-models local and cross-channel activation relationships within a unitary condensed embedding, thereby performing selective attention,” Wong said.

The attention capacitor takes multiple feature channels (V) and compresses them (through C) into a single embedding layer (E) that represents the local and cross-channel features. This is a dimensionality reduction technique that forces the neural network to learn the most relevant features of its input space. The attention capacitor block then decompresses the embedding (through X) to reproduce the input sequence with the self-awareness information embedded therein (A). The network can then combine the original features and the self-awareness information for new purposes.

“This densification of joint local and channel attention allows for a very rich yet efficient understanding of what to focus on within the input data, allowing for much easier architectural design,” Wong said.

Also Read :  JP Morgan Chase and Wells Fargo Hit By Record Low Mortgage Applications

In the publication, the researchers also presented TinySpeech, a neural network that uses attention capacitors for speech recognition. Attention capacitors have shown great success in manufacturing, automotive, and healthcare applications, according to Wong. However, the original attention capacitors had certain limitations.

“A shortcoming of the original attention condenser is that there was an unnecessary asymmetry between the feature branch and the attention condensation, with the feature embedding being more complex than it really needed to be,” Wong said.

Double condensing attention capacitors

double condensing attention condenser
Double Condensing Attention Capacitor (DC-AC)

In their new article, Wong and his colleagues introduce the self-attention mechanism called Double-Condensing Attention Condenser (DC-AC). This new architecture addresses the asymmetry problem that the original attention capacitor suffered from by using attention capacitors in both computational branches. The added attention capacitor mechanisms allow the neural network to learn even more representative feature embeddings.

“By introducing this double-condensing attention condenser mechanism, we were able to significantly reduce complexity while maintaining high rendering performance, which means we can create even smaller network designs with high fidelity given the better balance,” Wong said.

The researchers designed a neural network architecture that uses DC-AC blocks for self-awareness. The network architecture is called AttendNeXt and consists of four computational branches, each using a series of convolutional layers and DC-AC blocks. It then goes through a series of blocks consisting of DC-AC blocks, smoothed downsampling layers, and convolution layers.

AttendNeXt neural network
AttendNeXt deep learning model

The network’s columnar architecture allows different branches to learn unraveled embeddings in the early layers. In the deeper layers of the network, the columns merge and the channels gradually increase, allowing the self-awareness blocks to cover larger areas of the original input.

Instead of building AttendNeXt manually, the researchers used neural architecture search to explore different network configurations.

“The machine-driven design exploration algorithm is generative synthesis, a generative strategy we introduced for the powerful search of neural architectures considering different operational constraints, based on the interplay between a generator trying to generate better network architectures and an inquisitor that studies this generated architectures,” Wong said.

Also Read :  Meta busts out big VR move, but will Quest Pro headset move the metaverse needle?

Four limitations were imposed on the search algorithm, including column architectures, point-wise progressive convolutions (to prevent information loss in residual blocks), use of anti-aliasing downsampling, and comparing performance with current state-of-the-art TinyML architectures.

“The four constraints are what we call ‘best practices’ design constraints that drive the exploration process based on what has previously been found in the literature to improve the performance/robustness as well as the level of accuracy we achieve want,” Wong said.

Improved performance

The researchers’ findings indicate that AttendNeXt “possesses a strong balance between accuracy, architectural complexity, and computational complexity, making such an architecture well-suited for TinyML applications at the edge.”

In particular, AttendNeXt’s throughput shows a significant improvement over other TinyML architectures, which can be crucial for applications requiring high-frequency and real-time inference.

AttendNeXt throughput performance
AttendNeXt outperforms other TinyML architectures in inference throughput

“AttendNeXt can be applied to a wide range of applications, with peripheral visual perception tasks being particularly well-suited for use (e.g., high-throughput visual manufacturing inspection, autonomous vehicles, machine vision-driven manufacturing robotics, low-cost medical imaging devices, smartphone apps, etc.) ‘ said Wong.

Researchers hope that exploring different efficient architectural designs and self-awareness mechanisms can lead to “interesting new building blocks for TinyML applications.”

“We aim to investigate how to leverage this new self-awareness mechanism within the machine-driven design exploration paradigm to generate optimal network architectures for various visual inspection tasks in manufacturing, including classification, object recognition, semantic/instance segmentation, etc., running on embedded devices how, as well as exploring its effectiveness on non-visual modalities (e.g., auditory),” Wong said. “And as usual, we continue to explore new attention-grabbing capacitor designs in search of a great balance between efficiency and accuracy.”

Source link