Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!news.szaf.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail From: Mild Shock Newsgroups: comp.lang.prolog Subject: Most radical approach is Novacore from Dogelog Player (Was: Unicode and atom length=1) Date: Mon, 23 Jun 2025 17:03:38 +0200 Message-ID: <103bqc8$165f2$1@solani.org> References: <103bos1$164mt$1@solani.org> <103bpdh$164t1$1@solani.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 23 Jun 2025 15:03:36 -0000 (UTC) Injection-Info: solani.org; logging-data="1250786"; mail-complaints-to="abuse@news.solani.org" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.21 Cancel-Lock: sha1:AFSRF2A0J3Lm4RUNDIyIRgX5Xd8= X-User-ID: eJwNyMcNADEIBMCWCAvY5XCE/ks4v0YaU2evgJvD1taPFBuldA1BdT+dhXB/8eJxqENmwgngBppuLrSI0vvmD1UdFZ4= In-Reply-To: <103bpdh$164t1$1@solani.org> Hi, The most radical approach is Novacore from Dogelog Player. It consists of the following major incisions in the ISO core standard: - We do not forbid chars, like for example using lists of the form [a,b,c], we also provide char_code/2 predicate bidirectionally. - We do not provide and _chars built-in predicates also there is nothing _strings. The Prolog system is clever enough to not put every atom it sees in an atom table. There is only a predicate table. - Some host languages have garbage collection that deduplicates Strings. For example some Java versions have an options to do that. But we do not have any efforts to deduplicate atoms, which are simply plain strings. - Some languages have constant pools. For example the Java byte code format includes a constant pool in every class header. We do not do that during transpilation , but we could of course. But it begs the question, why only deduplicate strings and not other constant expressions as well? - We are totally happy that we have only codes, there are chances that the host languages use tagged pointers to represent them. So they are represented similar to the tagged pointers in SWI-Prolog which works for small integers. - But the tagged pointer argument is moot, since atom length=1 entities can be also represented as tagged pointers, and some programming languages do that. Dogelog Player would use such tagged pointers without poluting the atom table. - What else? Bye Mild Shock schrieb: > > Technically SWI-Prolog doesn't prefer codes. > Library `library(pure_input)` might prefer codes. > But this is again an issue of improving the > library by some non existent SWI-Prolog community. > > The ISO core standard is silent about a flag > back_quotes, but has a lot of API requirements > that support both codes and chars, for example it > requires atom_codes/2 and atom_chars/2. > > Implementation wise there can be an issue, > like one might decide to implement the atoms > of length=1 more efficiently, since with Unicode > there is now an explosion. > > Not sure whether Trealla Prolog and Scryer > Prolog thought about this problem, that the > atom table gets quite large. Whereas codes don't > eat the atom table. Maybe they forbit predicates > > that have an atom of length=1 head: > > h(X) :- >     write('Hello '), write(X), write('!'), nl. > > Does this still work? > > Mild Shock schrieb: >> Concerning library(portray_text) which is in limbo: >> >>  > Libraries are (often) written for either >> and thus the libraries make the choice. >> >> But who writes these libraries? The SWI Prolog >> community. And who doesn’t improve these libraries, >> instead floods the web with workaround tips? >> The SWI Prolog community. >> >> Conclusion the SWI-Prolog community has itself >> trapped in an ancient status quo, creating an island. >> Cannot improve its own tooling, is not willing >> to support code from else where that uses chars. >> >> Same with the missed AI Boom. >> >> (*) Code from elsewhere is dangerous, People >> might use other Prolog systems than only SWI-Prolog, >> like for exampe Trealla Prolog and Scryer Prolog. >> >> (**) Keeping the status quo is comfy. No need to >> think in terms of programm code. Its like biology >> teachers versus pathology staff, biology teachers >> do not everyday see opened corpses. >> >> >> Mild Shock schrieb: >>> >>> Inductive logic programming at 30 >>> https://arxiv.org/abs/2102.10556 >>> >>> The paper contains not a single reference to autoencoders! >>> Still they show this example: >>> >>> Fig. 1 ILP systems struggle with structured examples that >>> exhibit observational noise. All three examples clearly >>> spell the word "ILP", with some alterations: 3 noisy pixels, >>> shifted and elongated letters. If we would be to learn a >>> program that simply draws "ILP" in the middle of the picture, >>> without noisy pixels and elongated letters, that would >>> be a correct program. >>> >>> I guess ILP is 30 years behind the AI boom. An early autoencoder >>> turned into transformer was already reported here (*): >>> >>> SERIAL ORDER, Michael I. Jordan - May 1986 >>> https://cseweb.ucsd.edu/~gary/PAPER-SUGGESTIONS/Jordan-TR-8604-OCRed.pdf >>> >>> Well ILP might have its merits, maybe we should not ask >>> for a marriage of LLM and Prolog, but Autoencoders and ILP. >>> But its tricky, I am still trying to decode the da Vinci code of >>> >>> things like stacked tensors, are they related to k-literal clauses? >>> The paper I referenced is found in this excellent video: >>> >>> The Making of ChatGPT (35 Year History) >>> https://www.youtube.com/watch?v=OFS90-FX6pg >>> >> >