Article <vhgr1v$2ovnd$1@paganini.bofh.team>

Deutsch English Français Italiano
<vhgr1v$2ovnd$1@paganini.bofh.team>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!npeer.as286.net!npeer-ng0.as286.net!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!nntp.comgw.net!newsfeed.bofh.team!paganini.bofh.team!not-for-mail
From: antispam@fricas.org (Waldek Hebisch)
Newsgroups: comp.lang.c
Subject: Re: else ladders practice
Date: Tue, 19 Nov 2024 01:53:05 -0000 (UTC)
Organization: To protect and to server
Message-ID: <vhgr1v$2ovnd$1@paganini.bofh.team>
References: <3deb64c5b0ee344acd9fbaea1002baf7302c1e8f@i2pn2.org>   <vg2ttn$3a4lk$1@dont-email.me> <vg33gs$3b8n5$1@dont-email.me> <vg358c$3bk7t$1@dont-email.me> <vg37nr$3bo0c$1@dont-email.me> <vg3b98$3cc8q$1@dont-email.me> <vg5351$3pada$1@dont-email.me> <vg62vg$3uv02$1@dont-email.me> <vgd3ro$2pvl4$1@paganini.bofh.team> <vgdc4q$1ikja$1@dont-email.me> <vgdt36$2r682$2@paganini.bofh.team> <vge8un$1o57r$3@dont-email.me> <vgpi5h$6s5t$1@paganini.bofh.team> <vgtsli$1690f$1@dont-email.me>
Injection-Date: Tue, 19 Nov 2024 01:53:05 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="2916077"; posting-host="WwiNTD3IIceGeoS5hCc4+A.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.0-9-amd64 (x86_64))
X-Notice: Filtered by postfilter v. 0.9.3
X-Received-Bytes: 10606
Bytes: 10733
Lines: 213

Bart <bc@freeuk.com> wrote:
> On 10/11/2024 06:00, Waldek Hebisch wrote:
>> Bart <bc@freeuk.com> wrote:
> 
>>> I'd would consider a much elaborate one putting the onus on external
>>> tools, and still having an unpredictable result to be the poor of the two.
>>>
>>> You want to create a language that is easily compilable, no matter how
>>> complex the input.
>> 
>> Normally time spent _using_ compiler should be bigger than time
>> spending writing compiler.  If compiler gets enough use, it
>> justifies some complexity.
> 
> That doesn't add up: the more the compiler gets used, the slower it 
> should get?!

More complicated does not mean slower.  Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.  Similarly, compiler may be simplified by using
simpler but slower methods and more complicated compiler may use
faster methods.  This is particularly relevant here: simple compiler
may keep list of cases or ranges and linarly scan those.  More
advanced one may use a say a tree structure.

More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time".  Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms.  Modern computers can do a lot of work in 0.1s.

> The sort of analysis you're implying I don't think belongs in the kind 
> of compiler I prefer. Even if it did, it would be later on in the 
> process than the point where the above restriction is checked, so 
> wouldn't exist in one of my compilers anyway.

Sure, you design your compiler as you like.

> I don't like open-ended tasks like this where compilation time could end 
> up being anything. If you need to keep recompiling the same module, then 
> you don't want to repeat that work each time.
 
Yes.  This may lead to some complexity.  Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first.  If previous analysis is valid,
then it can be reused.  If something significant changes, than
it needs to be re-done.  But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.

Concerning open-ended, may attitude is that compiler should make
effort which is open-ended in the sense that when new method is
discovered then compiler may be extended and do more work.
OTOH in "single true compiler" world, compiler may say "this is
too difficult, giving up".  Of course, when trying something
very hard compiler is likely to run out of memory or user will
stop it.  But compiler may give up earlier.  Of course, this
is unacceptable for a standarized language, when people move
programs between different compiler.  If compiler can legally
reject a program because of its limitation and is doing this
with significant probabliity, than portablity between compilers
is severly limited.  But if there is a way to disable extra
checks, then this may work.  This is one of reasones why
'gcc' has so many options: users that want it can get stronger
checking, but if they want 'gcc' will accept lousy code
too.

>> I am mainly concerned with clarity and correctness of source code.
> 
> So am I. I try to keep my syntax clean and uncluttered.
> 
>> Dummy 'else' doing something may hide errors.
> 
> So can 'unreachable'.
>
>> Dummy 'else' signaling
>> error means that something which could be compile time error is
>> only detected at runtime.
>> 
>> Compiler that detects most errors of this sort is IMO better than
>> compiler which makes no effort to detect them.  And clearly, once
>> problem is formulated in sufficiently general way, it becomes
>> unsolvable.  So I do not expect general solution, but expect
>> resonable effort.
> 
> So how would David Brown's example work:
> 
>   int F(int n) {
>      if (n==1) return 10;
>      if (n==2) return 20;
>   }
> 
> /You/ know that values -2**31 to 0 and 3 to 2**31-1 are impossible; the 
> compiler doesn't. It's likely to tell you that you may run into the end 
> of the function.
> 
> So what do you want the compiler to here? If I try it:
> 
>   func F(int n)int =
>       if n=1 then return 10 fi
>       if n=2 then return 20 fi
>   end
> 
> It says 'else needed' (in that last statement). I can also shut it up 
> like this:
> 
>   func F(int n)int =                      # int is i64 here
>       if n=1 then return 10 fi
>       if n=2 then return 20 fi
>       0
>   end
> 
> Since now that last statement is the '0' value (any int value wil do). 
> What should my compiler report instead? What analysis should it be 
> doing? What would that save me from typing?

Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.

Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers).  Second, compiler must recognize
that this is a "multiway switch" and collect conditions.  Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values.  More
precisely, in this example we just have small number of discrete
values.  More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.

>>>> normally you do not need very complex analysis:
>>>
>>> I don't want to do any analysis at all! I just want a mechanical
>>> translation as effortlessly as possible.
>>>
>>> I don't like unbalanced code within a function because it's wrong and
>>> can cause problems.
>> 
>> Well, I demand more from compiler than you do...
> 
> Perhaps you're happy for it to be bigger and slower too. Most of my 
> projects build more or less instantly. Here 'ms' is a version that runs 
> programs directly from source (the first 'ms' is 'ms.exe' and subsequent 
> ones are 'ms.m' the lead module):
> 
>  c:\bx>ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms hello
>  Hello World! 21:00:45
> 
> This builds and runs 15 successive generations of itself in memory 
> before building and running hello.m; it took 1 second in all. (Now try 
> that with gcc!)
> 
> Here:
> 
>   c:\cx>tm \bx\mm -runp cc sql
>   Compiling cc.m to <pcl>
>   Compiling sql.c to sql.exe
> 
> This compiles my C compiler from source but then it /interprets/ the IR 
> produced. This interpreted compiler took 6 seconds to build the 250Kloc 
> test file, and it's a very slow interpreter (it's used for testing and 
> debugging).
> 
> (gcc -O0 took a bit longer to build sql.c! About 7 seconds but it is 
> using a heftier windows.h.)
> 
> If I run the C compiler from source as native code (\bx\ms cc sql) then 
> building the compiler *and* sql.c takes 1/3 of a second.
> 
> You can't do this stuff with the compilers David Brown uses; I'm 
> guessing you can't do it with your prefered ones either.

To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s.  But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
  vesion of it)
- actuall compilation of the compiler is small part of total
  compile time
========== REMAINDER OF ARTICLE TRUNCATED ==========