Path: nntp.eternal-september.org!news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail From: Mild Shock Newsgroups: comp.lang.prolog Subject: Do Prologers know the Unicode Range? (Was: Most radical approach is Novacore from Dogelog Player) Date: Fri, 27 Jun 2025 13:21:06 +0200 Message-ID: <103luqv$1cbpu$1@solani.org> References: <103bos1$164mt$1@solani.org> <103bpdh$164t1$1@solani.org> <103bqc8$165f2$1@solani.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 27 Jun 2025 11:21:03 -0000 (UTC) Injection-Info: solani.org; logging-data="1453886"; mail-complaints-to="abuse@news.solani.org" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.21 Cancel-Lock: sha1:icQDln4wfWSfHfdGomxZhjR5Nkg= X-User-ID: eJwNy8EBwCAIA8CVQEmEcQRk/xHa+x82lXWMoGEw7F6eEBXWzcdTzzKX2HhgOd67yr2atke0C3FF3f4TDo3+AFK2FQA= In-Reply-To: <103bqc8$165f2$1@solani.org> The official replacement character is 0xFFFD: > Replacement Character > https://www.compart.com/de/unicode/U+FFFD Well that is what people did in the past, replace non-printables by the ever same code, instead of using ‘\uXXXX’ notation. I have studied the library(portray_text) extensively. And my conclusion is still that it extremly ancient. For example I find: mostly_codes([H|T], Yes, No, MinFactor) :- integer(H), H >= 0, H =< 0x1ffff, [...] ; catch(code_type(H, print),error(_,_),fail), [...] https://github.com/SWI-Prolog/swipl-devel/blob/eddbde61be09b95eb3ca2e160e73c2340744a3d2/library/portray_text.pl#L235 Why even 0x1ffff and not 0x10ffff, this is a bug, do you want to starve is_text_code/1 ? The official Unicode range is 0x0 to 0x10ffff. Ulrich Neumerkel often confused the range in some of his code snippets, maybe based on a limited interpretation of Unicode. But if one would switch to chars one could easily support any Unicode code point even without knowing the range. Just do this: mostly_chars([H|T], Yes, No, MinFactor) :- atom(H), atom_length(H, 1), [...] ; /* printable check not needed */ [...] Mild Shock schrieb: > Hi, > > The most radical approach is Novacore from > Dogelog Player. It consists of the following > major incisions in the ISO core standard: > > - We do not forbid chars, like for example >   using lists of the form [a,b,c], we also >   provide char_code/2 predicate bidirectionally. > > - We do not provide and _chars built-in >   predicates also there is nothing _strings. The >   Prolog system is clever enough to not put >   every atom it sees in an atom table. There >   is only a predicate table. > > - Some host languages have garbage collection that >   deduplicates Strings. For example some Java >   versions have an options to do that. But we >   do not have any efforts to deduplicate atoms, >   which are simply plain strings. > > - Some languages have constant pools. For example >   the Java byte code format includes a constant >   pool in every class header. We do not do that >   during transpilation , but we could of course. >   But it begs the question, why only deduplicate >   strings and not other constant expressions as well? > > - We are totally happy that we have only codes, >   there are chances that the host languages use >   tagged pointers to represent them. So they >   are represented similar to the tagged pointers >   in SWI-Prolog which works for small integers. > > - But the tagged pointer argument is moot, >   since atom length=1 entities can be also >   represented as tagged pointers, and some >   programming languages do that. Dogelog Player >   would use such tagged pointers without >   poluting the atom table. > > - What else? > > Bye > > Mild Shock schrieb: >> >> Technically SWI-Prolog doesn't prefer codes. >> Library `library(pure_input)` might prefer codes. >> But this is again an issue of improving the >> library by some non existent SWI-Prolog community. >> >> The ISO core standard is silent about a flag >> back_quotes, but has a lot of API requirements >> that support both codes and chars, for example it >> requires atom_codes/2 and atom_chars/2. >> >> Implementation wise there can be an issue, >> like one might decide to implement the atoms >> of length=1 more efficiently, since with Unicode >> there is now an explosion. >> >> Not sure whether Trealla Prolog and Scryer >> Prolog thought about this problem, that the >> atom table gets quite large. Whereas codes don't >> eat the atom table. Maybe they forbit predicates >> >> that have an atom of length=1 head: >> >> h(X) :- >>      write('Hello '), write(X), write('!'), nl. >> >> Does this still work? >> >> Mild Shock schrieb: >>> Concerning library(portray_text) which is in limbo: >>> >>>  > Libraries are (often) written for either >>> and thus the libraries make the choice. >>> >>> But who writes these libraries? The SWI Prolog >>> community. And who doesn’t improve these libraries, >>> instead floods the web with workaround tips? >>> The SWI Prolog community. >>> >>> Conclusion the SWI-Prolog community has itself >>> trapped in an ancient status quo, creating an island. >>> Cannot improve its own tooling, is not willing >>> to support code from else where that uses chars. >>> >>> Same with the missed AI Boom. >>> >>> (*) Code from elsewhere is dangerous, People >>> might use other Prolog systems than only SWI-Prolog, >>> like for exampe Trealla Prolog and Scryer Prolog. >>> >>> (**) Keeping the status quo is comfy. No need to >>> think in terms of programm code. Its like biology >>> teachers versus pathology staff, biology teachers >>> do not everyday see opened corpses. >>> >>> >>> Mild Shock schrieb: >>>> >>>> Inductive logic programming at 30 >>>> https://arxiv.org/abs/2102.10556 >>>> >>>> The paper contains not a single reference to autoencoders! >>>> Still they show this example: >>>> >>>> Fig. 1 ILP systems struggle with structured examples that >>>> exhibit observational noise. All three examples clearly >>>> spell the word "ILP", with some alterations: 3 noisy pixels, >>>> shifted and elongated letters. If we would be to learn a >>>> program that simply draws "ILP" in the middle of the picture, >>>> without noisy pixels and elongated letters, that would >>>> be a correct program. >>>> >>>> I guess ILP is 30 years behind the AI boom. An early autoencoder >>>> turned into transformer was already reported here (*): >>>> >>>> SERIAL ORDER, Michael I. Jordan - May 1986 >>>> https://cseweb.ucsd.edu/~gary/PAPER-SUGGESTIONS/Jordan-TR-8604-OCRed.pdf >>>> >>>> >>>> Well ILP might have its merits, maybe we should not ask >>>> for a marriage of LLM and Prolog, but Autoencoders and ILP. >>>> But its tricky, I am still trying to decode the da Vinci code of >>>> >>>> things like stacked tensors, are they related to k-literal clauses? >>>> The paper I referenced is found in this excellent video: >>>> >>>> The Making of ChatGPT (35 Year History) >>>> https://www.youtube.com/watch?v=OFS90-FX6pg >>>> ========== REMAINDER OF ARTICLE TRUNCATED ==========