Hardware-Efficient Intelligence

COHESA (Computing Hardware for Emerging Intelligent Sensory Applications) is a multi-million-dollar strategic partnership between 19 leading research groups across Canada, and technology companies with a long-term interest in machine learning. The promise of machine learning and data analytics motivates the COHESA research network to pursue advanced research in artificial intelligence.

In recent years, artificial intelligence (AI) has found ubiquitous application in a diversity of consumer and industrial sectors ranging from autonomous vehicles, speech recognition, to pharmaceutical research. Introduction of state-of-the-art AI algorithms, however, is often bottlenecked by high computational cost and energy consumption in their deployment on existing hardware platforms. These drawbacks primarily arise from the inference and training of unrefined graphical models that modern AI algorithms rely upon.

Some of the most influential graphical models manifest themselves in the form of a deep neural network (DNN). DNNs form the backbone of many learning frameworks utilized in fields such as computer vision, natural language processing, and machine translation. Computational acceleration of DNNs through architectural specialization of integrated circuits is seen as a promising path in tackling these critical bottlenecks faced by the AI community.

The Integrated Systems Laboratory (ISL), through its research collaboration with COHESA, seeks to develop specialized integrated circuits for machine learning computation from a software-hardware co-design perspective. The paradigm of co-design stems from the lack of attention paid to the refinement of algorithms before tailoring hardware architectures for them. We propose to undergo a refinement of neural networks from a mathematical and computer science perspective utilizing a generalized multi-phase methodology as shown in the diagram below:

The crux of DNN acceleration research is to reduce both the arithmetic complexity and memory footprint of neural networks in order to design efficient integrated circuits custom to their applications. The concept of arithmetic complexity in machine learning is typically tied to two components: multiplication budget of fully-connected and convolutional layers, and the weight efficacy of the network. Weight efficacy refers to the effectiveness of the weights in classifier prediction. To improve upon this metrics, a variety of methods have been proposed to remove weights that are minimal in contributing to the overall prediction accuracy of which a magnitude-based pruning technique is seen as the most suitable. In the final phase of co-design, quantization is applied to achieve hardware-friendly data representation formats for learned weights.