Article <vi7ct8$28ej$1@dont-email.me>

Deutsch English Français Italiano
<vi7ct8$28ej$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: question about linker
Date: Wed, 27 Nov 2024 15:12:39 +0000
Organization: A noiseless patient Spider
Lines: 169
Message-ID: <vi7ct8$28ej$1@dont-email.me>
References: <vi54e9$3ie0o$1@dont-email.me> <vi56tj$3ip1o$1@dont-email.me>
 <vi583f$3ie0o$3@dont-email.me> <vi59df$3ip1o$3@dont-email.me>
 <vi5qu0$3md4n$1@dont-email.me> <vi5u16$3me78$1@dont-email.me>
 <vi71f9$7be$1@dont-email.me> <vi727c$7be$2@dont-email.me>
 <vi73sh$agj$2@dont-email.me> <vi7509$agj$3@dont-email.me>
 <vi76o2$agj$4@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 27 Nov 2024 16:12:40 +0100 (CET)
Injection-Info: dont-email.me; posting-host="94da1025fd5231eb125f80dc7027ef53";
	logging-data="74195"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19NMJUrB6gAHLfBuCgg4Csl"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:lqRF7ItUcojVvNyvtTYsO5DnRms=
Content-Language: en-GB
In-Reply-To: <vi76o2$agj$4@dont-email.me>
Bytes: 6616

On 27/11/2024 13:27, Thiago Adams wrote:
> On 27/11/2024 09:57, Thiago Adams wrote:
>> On 27/11/2024 09:38, Thiago Adams wrote:
>>> On 27/11/2024 09:10, Bart wrote:
>>>> On 27/11/2024 11:57, Bart wrote:
>>>>> On 27/11/2024 01:52, Thiago Adams wrote:
>>>>
>>>>> I also use ILs for my compilers, but I write my own backends. I've 
>>>>> worked on two diifferent kinds. One looks like a HLL, and only 
>>>>> exists for my language. So this original source:
>>>>
>>
>>
>> I was wondering if is possible to write C programs without struct/union?
>>
>> I did this experiment.
>>
>> struct X {
>>      int a, b;
>> };
>>
>> void F1() {
>>      struct X x;
>>      x.a = 1;
>>      x.b = 2;
>>      printf("%d, %d", x.a, x.b);
>> }
>>
>> The equivalent C89 program in a subset without structs count be
>>
>> #define M(T, obj, OFF) *((T*)(((char*)&(obj)) + (OFF)))
>>
>> void F2() {
>>      char x[8];
>>      M(int, x, 0 /*offset of a*/) = 1;
>>      M(int, x, 4 /*offset of b*/) = 2;
>>      printf("\n");
>>      printf("%d, %d", M(int, x, 0), M(int, x, 4));
>> }
>>
>> The char array represents the struct X memory, then we  have to find 
>> the offset of the members and cast to their types.
>>
>>
>> Does your IL have structs?

No. It has a 'block' type which defines a fixed-length memory block. So 
your struct ST type below I think would be represented as the type 
'mem:824', as it is 824 bytes.

(This works fine for WinABI. But for SYS V ABI, that has a much more 
complex set of rules where struct passing may depend on the types of the 
members. They may be split up amongst different registers.

I'm not too worried about that however; it will only apply to structs 
passed by value across an FFI, and most external libraries don't pass 
by-value structs. There will also be workarounds.)


>> The QBE IL has aggregates types. I think this removes the front end 
>> calculate the the offsets.
>>
>> https://c9x.me/compile/doc/il.html#Aggregate-Types
>>
>>
>>
>>
> 
> I tried this sample with clang and -S -emit-llvm to see if it generates 
> structs. The answer is yes.
> 
> https://llvm.org/docs/LangRef.html#getelementptr-instruction
> 
> struct RT {
>    char A;
>    int B[10][20];
>    char C;
> };
> struct ST {
>    int X;
>    double Y;
>    struct RT Z;
> };
> 
> int *foo(struct ST *s) {
>    return &s[1].Z.B[5][13];
> }
> 
> The LLVM code generated by Clang is approximately:
> 
> %struct.RT = type { i8, [10 x [20 x i32]], i8 }
> %struct.ST = type { i32, double, %struct.RT }
> 
> define ptr @foo(ptr %s) {
> entry:
>    %arrayidx = getelementptr inbounds %struct.ST, ptr %s, i64 1, i32 2, 
> i32 1, i64 5, i64 13
>    ret ptr %arrayidx
> }

This example is misleading. That's the output from using -O3. 
Unoptimised LLVM output is this:

---------------------------------
define dso_local ptr @foo(ptr noundef %0) #0 !dbg !10 {
   %2 = alloca ptr, align 8
   store ptr %0, ptr %2, align 8
     #dbg_declare(ptr %2, !34, !DIExpression(), !35)
   %3 = load ptr, ptr %2, align 8, !dbg !36
   %4 = getelementptr inbounds %struct.ST, ptr %3, i64 1, !dbg !36
   %5 = getelementptr inbounds nuw %struct.ST, ptr %4, i32 0, i32 2, 
!dbg !37
   %6 = getelementptr inbounds nuw %struct.RT, ptr %5, i32 0, i32 1, 
!dbg !38
   %7 = getelementptr inbounds [10 x [20 x i32]], ptr %6, i64 0, i64 5, 
!dbg !36
   %8 = getelementptr inbounds [20 x i32], ptr %7, i64 0, i64 13, !dbg !36
   ret ptr %8, !dbg !39
}
---------------------------------

If you are writing the IR code, then it will be up to you to combine 
that chain of constant offsets into a single offset. Othewise it will 
still be you needing to do so the other side of the IR!

(I don't know if the reduction above is done pre-LLVM or by LLVM.

In my IL, it will generate multiple instructions, and there will be a 
reduction pass, to combine instructions where possible. That's a WIP, 
but such examples like yours are incredibly rare in my code-base, while 
the speed-up achieved is likely to be minor. Modern CPUs are good at 
running poor code fast. Mostly this just makes code more compact.

My IL for your example (I translated to my language) starts off as this:

---------------------
  proc t.foo:
     param    u64       s
     rettype  u64
     load     u64       s
     load     i64       1
     addpx    mem:824 /824/-824      # /scale factor /extra byte offset
     load     i64       20
     addpx    u64 /1
     load     i64       5
     addpx    mem:80 /80
     load     i64       13
     addpx    i32 /4
     jumpret  u64       #1
  #1:
     retfn    u64
  endproc
---------------------

The reductions could also be applied during codegen to native code. But 
as it is, no reductions are done, and the body of the function generates 
this x64 code:

     mov   rax,  [rbp + `t.foo.s]  # or mov rax, rcx with reg allocator
     lea   rax,  [rax+20]
     lea   rax,  [rax+400]
     lea   rax,  [rax+52]

Here the reduction could also be done with a peephole optimiser to 
combined the three LEAs into one instruction. With 's' in a register, 
probably the optimum code here would be one LEA instruction.