Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!news.szaf.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail From: Mild Shock Newsgroups: comp.lang.prolog Subject: Can Prologers produce 100% Prolog Code? (Was: Do Prologers know the Unicode Range?) Date: Fri, 27 Jun 2025 13:22:33 +0200 Message-ID: <103lutm$1cbpu$2@solani.org> References: <103bos1$164mt$1@solani.org> <103bpdh$164t1$1@solani.org> <103bqc8$165f2$1@solani.org> <103luqv$1cbpu$1@solani.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 27 Jun 2025 11:22:30 -0000 (UTC) Injection-Info: solani.org; logging-data="1453886"; mail-complaints-to="abuse@news.solani.org" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.21 Cancel-Lock: sha1:NM4pj+txbd/6GNReZqmSRVoi+t8= In-Reply-To: <103luqv$1cbpu$1@solani.org> X-User-ID: eJwNy8ERACAIA8GWQCDBchC1/xL0ezsXBkXTEfC4ccGSHMZc2BAT9/pBlYvmYlNPTaUX/9CXSHxurcI4src8FZwUMg== Somebody wrote: > It seems that it reads in as ðŸ‘\u008D but writes out as ðŸ‘\\x8D\\. Can one then do ‘\uXXXX’ in 100% Prolog as well? Even including surrogates? Of course, here some DCG generator snippet from Dogelog Player which is 100% Prolog. This is from the Java backend, because I didn’t introduce ‘\uXXXX’ in my Prolog system, because it is not part of ISO core standard. The ISO core standard would want '\xXX': crossj_escape_code2(X) --> {X =< 0xFFFF}, !, {atom_integer(J, 16, X), atom_codes(J, H), length(H, N), M is 4-N}, [0'\\, 0'u], cross_escape_zeros(M), cross_escape_codes2(H). crossj_escape_code2(X) --> {crossj_high_surrogate(X, Y), crossj_low_surrogate(X, Z)}, crossj_escape_code2(Y), crossj_escape_code2(Z). crossj_high_surrogate(X, Y) :- Y is (X >> 10) + 0xD7C0. crossj_low_surrogate(X, Y) :- Y is (X /\ 0x3FF) + 0xDC00. Mild Shock schrieb: > The official replacement character is 0xFFFD: > > > Replacement Character > > https://www.compart.com/de/unicode/U+FFFD > > Well that is what people did in the past, replace > non-printables by the ever same code, instead of > using ‘\uXXXX’ notation. I have studied the > > library(portray_text) extensively. And my conclusion > is still that it extremly ancient. > > For example I find: > > mostly_codes([H|T], Yes, No, MinFactor) :- >     integer(H), >     H >= 0, >     H =< 0x1ffff, >     [...] >    ;   catch(code_type(H, print),error(_,_),fail), >     [...] > > https://github.com/SWI-Prolog/swipl-devel/blob/eddbde61be09b95eb3ca2e160e73c2340744a3d2/library/portray_text.pl#L235 > > > Why even 0x1ffff and not 0x10ffff, this is a bug, > do you want to starve is_text_code/1 ? The official > Unicode range is 0x0 to 0x10ffff. Ulrich Neumerkel > > often confused the range in some of his code snippets, > maybe based on a limited interpretation of Unicode. > But if one would switch to chars one could easily > > support any Unicode code point even without > knowing the range. Just do this: > > mostly_chars([H|T], Yes, No, MinFactor) :- >     atom(H), >     atom_length(H, 1), >     [...] >    ;  /* printable check not needed */ >     [...] > > Mild Shock schrieb: >> Hi, >> >> The most radical approach is Novacore from >> Dogelog Player. It consists of the following >> major incisions in the ISO core standard: >> >> - We do not forbid chars, like for example >>    using lists of the form [a,b,c], we also >>    provide char_code/2 predicate bidirectionally. >> >> - We do not provide and _chars built-in >>    predicates also there is nothing _strings. The >>    Prolog system is clever enough to not put >>    every atom it sees in an atom table. There >>    is only a predicate table. >> >> - Some host languages have garbage collection that >>    deduplicates Strings. For example some Java >>    versions have an options to do that. But we >>    do not have any efforts to deduplicate atoms, >>    which are simply plain strings. >> >> - Some languages have constant pools. For example >>    the Java byte code format includes a constant >>    pool in every class header. We do not do that >>    during transpilation , but we could of course. >>    But it begs the question, why only deduplicate >>    strings and not other constant expressions as well? >> >> - We are totally happy that we have only codes, >>    there are chances that the host languages use >>    tagged pointers to represent them. So they >>    are represented similar to the tagged pointers >>    in SWI-Prolog which works for small integers. >> >> - But the tagged pointer argument is moot, >>    since atom length=1 entities can be also >>    represented as tagged pointers, and some >>    programming languages do that. Dogelog Player >>    would use such tagged pointers without >>    poluting the atom table. >> >> - What else? >> >> Bye >> >> Mild Shock schrieb: >>> >>> Technically SWI-Prolog doesn't prefer codes. >>> Library `library(pure_input)` might prefer codes. >>> But this is again an issue of improving the >>> library by some non existent SWI-Prolog community. >>> >>> The ISO core standard is silent about a flag >>> back_quotes, but has a lot of API requirements >>> that support both codes and chars, for example it >>> requires atom_codes/2 and atom_chars/2. >>> >>> Implementation wise there can be an issue, >>> like one might decide to implement the atoms >>> of length=1 more efficiently, since with Unicode >>> there is now an explosion. >>> >>> Not sure whether Trealla Prolog and Scryer >>> Prolog thought about this problem, that the >>> atom table gets quite large. Whereas codes don't >>> eat the atom table. Maybe they forbit predicates >>> >>> that have an atom of length=1 head: >>> >>> h(X) :- >>>      write('Hello '), write(X), write('!'), nl. >>> >>> Does this still work? >>> >>> Mild Shock schrieb: >>>> Concerning library(portray_text) which is in limbo: >>>> >>>>  > Libraries are (often) written for either >>>> and thus the libraries make the choice. >>>> >>>> But who writes these libraries? The SWI Prolog >>>> community. And who doesn’t improve these libraries, >>>> instead floods the web with workaround tips? >>>> The SWI Prolog community. >>>> >>>> Conclusion the SWI-Prolog community has itself >>>> trapped in an ancient status quo, creating an island. >>>> Cannot improve its own tooling, is not willing >>>> to support code from else where that uses chars. >>>> >>>> Same with the missed AI Boom. >>>> >>>> (*) Code from elsewhere is dangerous, People >>>> might use other Prolog systems than only SWI-Prolog, >>>> like for exampe Trealla Prolog and Scryer Prolog. >>>> >>>> (**) Keeping the status quo is comfy. No need to >>>> think in terms of programm code. Its like biology >>>> teachers versus pathology staff, biology teachers >>>> do not everyday see opened corpses. >>>> >>>> >>>> Mild Shock schrieb: >>>>> ========== REMAINDER OF ARTICLE TRUNCATED ==========