| Deutsch English Français Italiano |
|
<103lutm$1cbpu$2@solani.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!news.szaf.org!weretis.net!feeder8.news.weretis.net!reader5.news.weretis.net!news.solani.org!.POSTED!not-for-mail
From: Mild Shock <janburse@fastmail.fm>
Newsgroups: comp.lang.prolog
Subject: Can Prologers produce 100% Prolog Code? (Was: Do Prologers know the
Unicode Range?)
Date: Fri, 27 Jun 2025 13:22:33 +0200
Message-ID: <103lutm$1cbpu$2@solani.org>
References: <vpceij$is1s$1@solani.org> <103bos1$164mt$1@solani.org>
<103bpdh$164t1$1@solani.org> <103bqc8$165f2$1@solani.org>
<103luqv$1cbpu$1@solani.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 27 Jun 2025 11:22:30 -0000 (UTC)
Injection-Info: solani.org;
logging-data="1453886"; mail-complaints-to="abuse@news.solani.org"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101
Firefox/128.0 SeaMonkey/2.53.21
Cancel-Lock: sha1:NM4pj+txbd/6GNReZqmSRVoi+t8=
In-Reply-To: <103luqv$1cbpu$1@solani.org>
X-User-ID: eJwNy8ERACAIA8GWQCDBchC1/xL0ezsXBkXTEfC4ccGSHMZc2BAT9/pBlYvmYlNPTaUX/9CXSHxurcI4src8FZwUMg==
Somebody wrote:
> It seems that it reads in as ðŸ‘\u008D but writes out as ðŸ‘\\x8D\\.
Can one then do ‘\uXXXX’ in 100% Prolog as
well? Even including surrogates? Of course,
here some DCG generator snippet from Dogelog
Player which is 100% Prolog. This is from the
Java backend, because I didn’t introduce ‘\uXXXX’
in my Prolog system, because it is not part of
ISO core standard. The ISO core standard would want '\xXX':
crossj_escape_code2(X) --> {X =< 0xFFFF}, !,
{atom_integer(J, 16, X), atom_codes(J, H),
length(H, N), M is 4-N}, [0'\\, 0'u],
cross_escape_zeros(M),
cross_escape_codes2(H).
crossj_escape_code2(X) --> {crossj_high_surrogate(X, Y),
crossj_low_surrogate(X, Z)},
crossj_escape_code2(Y),
crossj_escape_code2(Z).
crossj_high_surrogate(X, Y) :- Y is (X >> 10) + 0xD7C0.
crossj_low_surrogate(X, Y) :- Y is (X /\ 0x3FF) + 0xDC00.
Mild Shock schrieb:
> The official replacement character is 0xFFFD:
>
> > Replacement Character
> > https://www.compart.com/de/unicode/U+FFFD
>
> Well that is what people did in the past, replace
> non-printables by the ever same code, instead of
> using ‘\uXXXX’ notation. I have studied the
>
> library(portray_text) extensively. And my conclusion
> is still that it extremly ancient.
>
> For example I find:
>
> mostly_codes([H|T], Yes, No, MinFactor) :-
> integer(H),
> H >= 0,
> H =< 0x1ffff,
> [...]
> ; catch(code_type(H, print),error(_,_),fail),
> [...]
>
> https://github.com/SWI-Prolog/swipl-devel/blob/eddbde61be09b95eb3ca2e160e73c2340744a3d2/library/portray_text.pl#L235
>
>
> Why even 0x1ffff and not 0x10ffff, this is a bug,
> do you want to starve is_text_code/1 ? The official
> Unicode range is 0x0 to 0x10ffff. Ulrich Neumerkel
>
> often confused the range in some of his code snippets,
> maybe based on a limited interpretation of Unicode.
> But if one would switch to chars one could easily
>
> support any Unicode code point even without
> knowing the range. Just do this:
>
> mostly_chars([H|T], Yes, No, MinFactor) :-
> atom(H),
> atom_length(H, 1),
> [...]
> ; /* printable check not needed */
> [...]
>
> Mild Shock schrieb:
>> Hi,
>>
>> The most radical approach is Novacore from
>> Dogelog Player. It consists of the following
>> major incisions in the ISO core standard:
>>
>> - We do not forbid chars, like for example
>> using lists of the form [a,b,c], we also
>> provide char_code/2 predicate bidirectionally.
>>
>> - We do not provide and _chars built-in
>> predicates also there is nothing _strings. The
>> Prolog system is clever enough to not put
>> every atom it sees in an atom table. There
>> is only a predicate table.
>>
>> - Some host languages have garbage collection that
>> deduplicates Strings. For example some Java
>> versions have an options to do that. But we
>> do not have any efforts to deduplicate atoms,
>> which are simply plain strings.
>>
>> - Some languages have constant pools. For example
>> the Java byte code format includes a constant
>> pool in every class header. We do not do that
>> during transpilation , but we could of course.
>> But it begs the question, why only deduplicate
>> strings and not other constant expressions as well?
>>
>> - We are totally happy that we have only codes,
>> there are chances that the host languages use
>> tagged pointers to represent them. So they
>> are represented similar to the tagged pointers
>> in SWI-Prolog which works for small integers.
>>
>> - But the tagged pointer argument is moot,
>> since atom length=1 entities can be also
>> represented as tagged pointers, and some
>> programming languages do that. Dogelog Player
>> would use such tagged pointers without
>> poluting the atom table.
>>
>> - What else?
>>
>> Bye
>>
>> Mild Shock schrieb:
>>>
>>> Technically SWI-Prolog doesn't prefer codes.
>>> Library `library(pure_input)` might prefer codes.
>>> But this is again an issue of improving the
>>> library by some non existent SWI-Prolog community.
>>>
>>> The ISO core standard is silent about a flag
>>> back_quotes, but has a lot of API requirements
>>> that support both codes and chars, for example it
>>> requires atom_codes/2 and atom_chars/2.
>>>
>>> Implementation wise there can be an issue,
>>> like one might decide to implement the atoms
>>> of length=1 more efficiently, since with Unicode
>>> there is now an explosion.
>>>
>>> Not sure whether Trealla Prolog and Scryer
>>> Prolog thought about this problem, that the
>>> atom table gets quite large. Whereas codes don't
>>> eat the atom table. Maybe they forbit predicates
>>>
>>> that have an atom of length=1 head:
>>>
>>> h(X) :-
>>> write('Hello '), write(X), write('!'), nl.
>>>
>>> Does this still work?
>>>
>>> Mild Shock schrieb:
>>>> Concerning library(portray_text) which is in limbo:
>>>>
>>>> > Libraries are (often) written for either
>>>> and thus the libraries make the choice.
>>>>
>>>> But who writes these libraries? The SWI Prolog
>>>> community. And who doesn’t improve these libraries,
>>>> instead floods the web with workaround tips?
>>>> The SWI Prolog community.
>>>>
>>>> Conclusion the SWI-Prolog community has itself
>>>> trapped in an ancient status quo, creating an island.
>>>> Cannot improve its own tooling, is not willing
>>>> to support code from else where that uses chars.
>>>>
>>>> Same with the missed AI Boom.
>>>>
>>>> (*) Code from elsewhere is dangerous, People
>>>> might use other Prolog systems than only SWI-Prolog,
>>>> like for exampe Trealla Prolog and Scryer Prolog.
>>>>
>>>> (**) Keeping the status quo is comfy. No need to
>>>> think in terms of programm code. Its like biology
>>>> teachers versus pathology staff, biology teachers
>>>> do not everyday see opened corpses.
>>>>
>>>>
>>>> Mild Shock schrieb:
>>>>>
========== REMAINDER OF ARTICLE TRUNCATED ==========