Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Fereydoun Memarzanjany Newsgroups: comp.lang.vhdl,comp.arch.fpga,comp.arch.embedded Subject: Re: Innervator: Hardware Acceleration for Neural Networks Date: Tue, 6 Aug 2024 23:02:10 -0600 Organization: A noiseless patient Spider Lines: 37 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 07 Aug 2024 07:02:10 +0200 (CEST) Injection-Info: dont-email.me; posting-host="311e58f894bccfb40be344c4f43061b5"; logging-data="2338825"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19bwc3aGv6H4/KHdHYuoGOKzKggko/YoCI=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:nkH2OTqGSDJsbGtIEJA71KNJz28= In-Reply-To: Content-Language: en-US Bytes: 3372 Pasted below is an overview/abstract, and you will find more information (including a paper, demo video, statistics, slides, and source code) at the following GitHub repository: https://github.com/Thraetaona/Innervator ------------------------------------------------------------------------ Artificial intelligence ("AI") is deployed in various applications, from noise cancellation to image recognition, but AI-based products often come with high hardware and electricity costs; this makes them inaccessible for consumer devices and small-scale edge electronics. Inspired by biological brains, deep neural networks ("DNNs") are modeled using mathematical formulae, yet general-purpose processors treat otherwise-parallelizable AI algorithms as step-by-step sequential logic.  In contrast, programmable logic devices ("PLDs") can be customized to the specific parameters of a trained DNN, thereby ensuring data-tailored computation and algorithmic parallelism at the register-transfer level. Furthermore, a subgroup of PLDs, field-programmable gate arrays ("FPGAs"), are dynamically reconfigurable.  So, to improve AI runtime performance, I designed and open-sourced my hardware compiler: Innervator.  Written entirely in VHDL-2008, Innervator takes any DNN's metadata and parameters (e.g., number of layers, neurons per layer, and their weights/biases), generating its synthesizable FPGA hardware description with the appropriate pipelining and batch processing. Innervator is entirely portable and vendor-independent.  As a proof of concept, I used Innervator to implement a sample 8x8-pixel handwritten digit-recognizing neural network in a low-cost AMD Xilinx Artix-7(TM) FPGA @ 100 MHz.  With 3 pipeline stages and 2 batches at about 67% LUT utilization, the Network achieved ~7.12 GOP/s, predicting the output in 630 ns and under 0.25 W of power.  In comparison, an Intel(R) Core(TM) i7-12700H CPU @ 4.70 GHz would take 40,000-60,000 ns at 45 to 115 W. Ultimately, Innervator's hardware-accelerated approach bridges the inherent mismatch between current AI algorithms and the general-purpose digital hardware they run on. ------------------------------------------------------------------------ (Forgot to cross-post to c.a.fpga and c.a.embedded; adding them now.)