Article <103lutm$1cbpu$2@solani.org>

Deutsch English Français Italiano
<103lutm$1cbpu$2@solani.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!news.szaf.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: Mild Shock <janburse@fastmail.fm>
Newsgroups: comp.lang.prolog
Subject: Can Prologers produce 100% Prolog Code? (Was: Do Prologers know the
 Unicode Range?)
Date: Fri, 27 Jun 2025 13:22:33 +0200
Message-ID: <103lutm$1cbpu$2@solani.org>
References: <vpceij$is1s$1@solani.org> <103bos1$164mt$1@solani.org>
 <103bpdh$164t1$1@solani.org> <103bqc8$165f2$1@solani.org>
 <103luqv$1cbpu$1@solani.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 27 Jun 2025 11:22:30 -0000 (UTC)
Injection-Info: solani.org;
	logging-data="1453886"; mail-complaints-to="abuse@news.solani.org"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
 Firefox/128.0 SeaMonkey/2.53.21
Cancel-Lock: sha1:NM4pj+txbd/6GNReZqmSRVoi+t8=
In-Reply-To: <103luqv$1cbpu$1@solani.org>
X-User-ID: eJwNy8ERACAIA8GWQCDBchC1/xL0ezsXBkXTEfC4ccGSHMZc2BAT9/pBlYvmYlNPTaUX/9CXSHxurcI4src8FZwUMg==

Somebody wrote:

 > It seems that it reads in as ðŸ‘\u008D but writes out as ðŸ‘\\x8D\\.

Can one then do ‘\uXXXX’ in 100% Prolog as
well? Even including surrogates? Of course,
here some DCG generator snippet from Dogelog

Player which is 100% Prolog. This is from the
Java backend, because I didn’t introduce ‘\uXXXX’
in my Prolog system, because it is not part of

ISO core standard. The ISO core standard would want '\xXX':

crossj_escape_code2(X) --> {X =< 0xFFFF}, !,
    {atom_integer(J, 16, X), atom_codes(J, H),
    length(H, N), M is 4-N}, [0'\\, 0'u],
    cross_escape_zeros(M),
    cross_escape_codes2(H).
crossj_escape_code2(X) --> {crossj_high_surrogate(X, Y),
    crossj_low_surrogate(X, Z)},
    crossj_escape_code2(Y),
    crossj_escape_code2(Z).

crossj_high_surrogate(X, Y) :- Y is (X >> 10) + 0xD7C0.

crossj_low_surrogate(X, Y) :- Y is (X /\ 0x3FF) + 0xDC00.

Mild Shock schrieb:
> The official replacement character is 0xFFFD:
> 
>  > Replacement Character
>  > https://www.compart.com/de/unicode/U+FFFD
> 
> Well that is what people did in the past, replace
> non-printables by the ever same code, instead of
> using ‘\uXXXX’ notation. I have studied the
> 
> library(portray_text) extensively. And my conclusion
> is still that it extremly ancient.
> 
> For example I find:
> 
> mostly_codes([H|T], Yes, No, MinFactor) :-
>      integer(H),
>      H >= 0,
>      H =< 0x1ffff,
>      [...]
>     ;   catch(code_type(H, print),error(_,_),fail),
>      [...]
> 
> https://github.com/SWI-Prolog/swipl-devel/blob/eddbde61be09b95eb3ca2e160e73c2340744a3d2/library/portray_text.pl#L235 
> 
> 
> Why even 0x1ffff and not 0x10ffff, this is a bug,
> do you want to starve is_text_code/1 ? The official
> Unicode range is 0x0 to 0x10ffff. Ulrich Neumerkel
> 
> often confused the range in some of his code snippets,
> maybe based on a limited interpretation of Unicode.
> But if one would switch to chars one could easily
> 
> support any Unicode code point even without
> knowing the range. Just do this:
> 
> mostly_chars([H|T], Yes, No, MinFactor) :-
>      atom(H),
>      atom_length(H, 1),
>      [...]
>     ;  /* printable check not needed */
>      [...]
> 
> Mild Shock schrieb:
>> Hi,
>>
>> The most radical approach is Novacore from
>> Dogelog Player. It consists of the following
>> major incisions in the ISO core standard:
>>
>> - We do not forbid chars, like for example
>>    using lists of the form [a,b,c], we also
>>    provide char_code/2 predicate bidirectionally.
>>
>> - We do not provide and _chars built-in
>>    predicates also there is nothing _strings. The
>>    Prolog system is clever enough to not put
>>    every atom it sees in an atom table. There
>>    is only a predicate table.
>>
>> - Some host languages have garbage collection that
>>    deduplicates Strings. For example some Java
>>    versions have an options to do that. But we
>>    do not have any efforts to deduplicate atoms,
>>    which are simply plain strings.
>>
>> - Some languages have constant pools. For example
>>    the Java byte code format includes a constant
>>    pool in every class header. We do not do that
>>    during transpilation , but we could of course.
>>    But it begs the question, why only deduplicate
>>    strings and not other constant expressions as well?
>>
>> - We are totally happy that we have only codes,
>>    there are chances that the host languages use
>>    tagged pointers to represent them. So they
>>    are represented similar to the tagged pointers
>>    in SWI-Prolog which works for small integers.
>>
>> - But the tagged pointer argument is moot,
>>    since atom length=1 entities can be also
>>    represented as tagged pointers, and some
>>    programming languages do that. Dogelog Player
>>    would use such tagged pointers without
>>    poluting the atom table.
>>
>> - What else?
>>
>> Bye
>>
>> Mild Shock schrieb:
>>>
>>> Technically SWI-Prolog doesn't prefer codes.
>>> Library `library(pure_input)` might prefer codes.
>>> But this is again an issue of improving the
>>> library by some non existent SWI-Prolog community.
>>>
>>> The ISO core standard is silent about a flag
>>> back_quotes, but has a lot of API requirements
>>> that support both codes and chars, for example it
>>> requires atom_codes/2 and atom_chars/2.
>>>
>>> Implementation wise there can be an issue,
>>> like one might decide to implement the atoms
>>> of length=1 more efficiently, since with Unicode
>>> there is now an explosion.
>>>
>>> Not sure whether Trealla Prolog and Scryer
>>> Prolog thought about this problem, that the
>>> atom table gets quite large. Whereas codes don't
>>> eat the atom table. Maybe they forbit predicates
>>>
>>> that have an atom of length=1 head:
>>>
>>> h(X) :-
>>>      write('Hello '), write(X), write('!'), nl.
>>>
>>> Does this still work?
>>>
>>> Mild Shock schrieb:
>>>> Concerning library(portray_text) which is in limbo:
>>>>
>>>>  > Libraries are (often) written for either
>>>> and thus the libraries make the choice.
>>>>
>>>> But who writes these libraries? The SWI Prolog
>>>> community. And who doesn’t improve these libraries,
>>>> instead floods the web with workaround tips?
>>>> The SWI Prolog community.
>>>>
>>>> Conclusion the SWI-Prolog community has itself
>>>> trapped in an ancient status quo, creating an island.
>>>> Cannot improve its own tooling, is not willing
>>>> to support code from else where that uses chars.
>>>>
>>>> Same with the missed AI Boom.
>>>>
>>>> (*) Code from elsewhere is dangerous, People
>>>> might use other Prolog systems than only SWI-Prolog,
>>>> like for exampe Trealla Prolog and Scryer Prolog.
>>>>
>>>> (**) Keeping the status quo is comfy. No need to
>>>> think in terms of programm code. Its like biology
>>>> teachers versus pathology staff, biology teachers
>>>> do not everyday see opened corpses.
>>>>
>>>>
>>>> Mild Shock schrieb:
>>>>>
========== REMAINDER OF ARTICLE TRUNCATED ==========