Path: ...!news.mixmin.net!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail From: Mild Shock Newsgroups: sci.physics Subject: Memory Powering the AI Revolution Date: Thu, 16 Jan 2025 11:06:27 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Thu, 16 Jan 2025 10:06:25 -0000 (UTC) Injection-Info: solani.org; logging-data="58705"; mail-complaints-to="abuse@news.solani.org" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20 Cancel-Lock: sha1:iyJ47/yuVskx0030JJRThB1R7iw= X-Mozilla-News-Host: news://news.solani.org:119 X-User-ID: eJwNyskBwDAIA7CVTMAc60DK/iM0eovq4hPmdONyEbW0hSwsqJkwDc1t/U5bNZaYvI6a+2K+fkZFsuBXO34l2hRY Bytes: 2030 Lines: 26 I currently believe that some of the fallacies around LLMs is that one assumes that the learning generates some small light NNs (Neural Networks), which are then subject to blurred categories and approximative judgments. But I guess its quite different the learning generates very large massive NNs, which can afford representing ontologies quite precise and with breadth. But how is it done? One puzzle piece could be new types of memory, so called High-Bandwidth Memory (HBM), an architecture where DRAM dies are vertically stacked and connected using Through-Silicon Vias (TSVs). For example found in NVIDIA GPUs like the A100, H100. Compare to DDR3 that might be found in your Laptop or PC. Could give you a license to trash L1/L2 Caches with your algorithms? HBM3 DDR3 Bandwidth 1.2 TB/s (per stack) 12.8 GB/s to 25.6 GB/s Latency Low, optimized for Higher latency real-time tasks Power Efficiency More efficient Higher power consumption despite high speeds than HBM3