Article <vs108m$1scgt$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <vs108m$1scgt$1@dont-email.me>
Deutsch English Français Italiano
<vs108m$1scgt$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups: comp.lang.c
Subject: Re: Fast division (was Re: Suggested method for returning a string
 from a C program?)
Date: Wed, 26 Mar 2025 14:42:44 +0100
Organization: A noiseless patient Spider
Lines: 130
Message-ID: <vs108m$1scgt$1@dont-email.me>
References: <vrd77d$3nvtf$2@dont-email.me> <20250319115550.0000676f@yahoo.com>
 <vreuj1$1asii$4@dont-email.me> <vreve4$19klp$2@dont-email.me>
 <20250319201903.00005452@yahoo.com> <86r02roqdq.fsf@linuxsc.com>
 <vrh1br$35029$2@dont-email.me> <LRUCP.2$541.0@fx47.iad>
 <vrh71t$3be42$1@dont-email.me> <vrk8vm$2f4gc$1@paganini.bofh.team>
 <20250321113316.506@kylheku.com> <vrl6hp$2qg20$1@dont-email.me>
 <20250321210228.508@kylheku.com> <vrmg7d$2nif7$2@paganini.bofh.team>
 <vrvjr9$i7gg$1@dont-email.me> <vrvp94$3q1ah$1@paganini.bofh.team>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 26 Mar 2025 14:42:47 +0100 (CET)
Injection-Info: dont-email.me; posting-host="e6a35e9678f9481385dd1adce673b0e8";
	logging-data="1978909"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+lMOncEtN+rHBoabOHGTnD"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
Cancel-Lock: sha1:Ht4NEEgWJ7Gcp44Qh2qIrJsKWTU=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <vrvp94$3q1ah$1@paganini.bofh.team>
Bytes: 6677

On 26.03.2025 03:37, Waldek Hebisch wrote:
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> On 22.03.2025 15:07, Waldek Hebisch wrote:
>>>
>>> Actually, to do fast division of N-bit number by fixed N-bit number
>>> one need 2N-bit multiplication. 
>>
>> I just stumbled across your post and above sentence. Do you mean *one*
>> multiplication of 2N bit numbers? - Could you please explain that (by
>> an example, or could you provide a reference)?
> 
> One multiplications with 2N bit result + few other operations
> like shifts and additions.  Consider for example:
> 
> unsigned int
> divv(unsigned int a) {
>     return a/1234567;
> }

Ah, I suppose you meant that (in case of constant divisors) the
compiler would just _pre-compute_ the reciprocal value (and can
then just multiply, of course). - Is that understanding correct?

Given that I don't know the semantics of the assembler below I'm
just speculating, but - given you used a constant "1234567" - it
seems the compiler used a different value ("3000869427") for the
multiplication, the reciprocal - right?

> 
> My gcc-12 at -O generates the following assembly code:
> 
> divv:
> .LFB0:
>         .cfi_startproc
>         movl    %edi, %eax
>         movl    $3000869427, %edx
>         imulq   %rdx, %rax
>         shrq    $32, %rax
>         subl    %eax, %edi
>         shrl    %edi
>         addl    %edi, %eax
>         shrl    $20, %eax
>         ret
> 
> So 1 multiplication, 3 shifts, subtraction and addtion.  When
> more bits of multiplication are available one can use smaller
> number of extra operations.  For example, when I change
> function above so that argument is 16-bit and divisor is
> 12345, the code is:
> 
> divv:
> .LFB0:
>         .cfi_startproc
>         movzwl  %di, %eax
>         imull   $43489, %eax, %eax
>         shrl    $29, %eax
>         ret
> 
> So just 1 multiplication and 1 shift.
> 
> The idea beside this is quite simple: instead of division we
> multiply by reciprocal. 

First I thought that your formulation implies that there's a trick
where _any_ division  a/b  could be expressed by a multiplication
(and some primitive binary operations). - Which would have been cool.

> Reciprocal is represented in fixed
> point aritihemtic (so normally there is at least one shift to get
> binary point in right place).  Since divisor is fixed reciprocal can
> be precomputed.  This works best when there is enough accuracy,
> otherwise one needs to add extra steps.  Working out how
> many bits of accuracy are needed is a bit tricky, in particular
> by adding extra operations one can lower needed accuracy.
> 
>> (The reason for my question is that for integer divisions of length N
>> an old DSP I used required besides shifts effectively N subtractions
>> to create the result and modulus; it didn't use any multiplications.)
> 
> Method above is good when you have fast and wide multiplier (compared
> to your numbers).  Also there is cost of precomputation,
> to gain divisor must be fixed or at least change slowly.
> If you have varying divisor then one can use Newton method
> (IIUC this is what modern CPU-s use).  Shifts and subtractions
> are good when you do not have fast multiplier.

Actually the (meanwhile very old) DSP I used (in the mid 1980's) had
a fast multiplier; + and - were done in one cycle, * (asymptotically)
also one cycle. The interesting part was that the [general] non-const
division was "constructed" (as opposed to pre-existing as an opcode).
All operations (roughly) required just one cycle but the division; it
was constructed by a 'repeat' opcode for the next opcode. So if you
had written something like  LACC addr1 ; RPTK 16 ; SUBC addr2  that
would have divided the data  *addr1 / *addr2  (informally written)
providing both results, the division and the modulus.

Out of interest I've implemented the logic in "C" to demonstrate it
(the variable names reflect the DSP terminology and the sizes of the
data words (16) and accumulator (32), only the '<< 15' is originally
part of the 'subc' opcode/"C" function (but shifts did not cost any
extra cycles, they've been done in passing, sort of):

    void subc (int32_t * acc, int32_t * dma)
    {
        int32_t acc_out;
        if ((acc_out = *acc - *dma) >= 0)
            *acc = acc_out << 1 | 1;
        else
            *acc <<= 1;
    }
....
    int16_t z = 47;
    int16_t n = 6;
    // calculate "z div n" and "z mod n"
    int32_t acc = z;
    int32_t dma = n << 15;

    for (int i=1; i<=16; i++)
        subc (&acc, &dma);

    printf ("%d div %d = %d\n", z, n, acc&0xffff);
    printf ("%d mod %d = %d\n", z, n, acc>>16);

I considered that DSP feature quite interesting. That was from a time
where the usual processors required something like 4 cycles for + and
-, around 16 cycles for *, and 80 cycles for the division. (Numbers
are just faint memories from an 68k CPU, so correct me if I'm wrong.)

Janis