| Deutsch English Français Italiano |
|
<vq2j6f$v95h$1@solani.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail From: Mild Shock <janburse@fastmail.fm> Newsgroups: comp.lang.prolog Subject: I didn't invent these things (Was: Will a decoder-only transformer also work?) Date: Sun, 2 Mar 2025 22:39:27 +0100 Message-ID: <vq2j6f$v95h$1@solani.org> References: <vpis5p$n6g2$1@solani.org> <vq0gvr$tuur$1@solani.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 2 Mar 2025 21:39:27 -0000 (UTC) Injection-Info: solani.org; logging-data="1025201"; mail-complaints-to="abuse@news.solani.org" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20 Cancel-Lock: sha1:zsLvHa8MHDgIWgSS/fNFohcotms= X-User-ID: eJwFwQkBwDAIA0BLhCcUOZQO/xJ2F0Zw0hn02NhT6E/JeHqx0bpb10dwLJ8+80AKh1Wgmai0pPmpvoSnzQ80jxQL In-Reply-To: <vq0gvr$tuur$1@solani.org> Thank you that you think, I would invent these things: > Are you thinking that autoencoders > could play a bigger role in tasks like > language modeling Nope, it is all in the papers, like here: > **Attention Is All You Need** > Vaswani et al., 2017 > https://arxiv.org/abs/1706.03762 The conclusion says, its same architecture as autoencoders: > In this work, we presented the Transformer, > the first sequence transduction model based > entirely on attention, replacing the recurrent > layers most commonly used in encoder-decoder > architectures with multi-headed self-attention. Same architecture with latent spaces between encoder and decoder. The training on my laptop would take, for the EN-DE model ConvS2S Ensemble reported in the paper Table 2, using my GPU: 7.7e19 / 3e13 = 1 month If I would try to train GPT 4.5 on my laptop it would take: 1E23 / 3e13 = 3'000 years P.S.: The paper is the from the same Vaswani et al., 2017 as referenced in the Python code of the other Grokking paper. Mild Shock schrieb: > > Ok, my bad. You can of course also try a decoder-only. > Just like here in this Python code example: > > > **Simple PyTorch Implementation of “Grokking”** > > We trained a standard decoder-only transformer (Vaswani et al., 2017) > > https://github.com/teddykoker/grokking > > The transformer need not necessarely have a encoder and > a latent space. It can be also a decoder-only. > > Mild Shock schrieb: >> >> Very simple challenge conceptually, develop the idea >> of Centipawn towards TicTacToe and implement the >> game based on learning / training a transformer, and >> >> then executing it. All written in Prolog itself! Optional >> bonus exercise, make the execution ИИUƎ style, i.e. >> incremental evaluation of the transformer. >> >> Centipawn - Chess Wiki >> https://chess.fandom.com/wiki/Centipawn >> >> NNUE - Chess Programming Wiki >> https://www.chessprogramming.org/NNUE >