Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c
Subject: Re: filling area by color atack safety
Date: Fri, 29 Mar 2024 23:58:26 -0700
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <86y1a0i4tp.fsf@linuxsc.com>
References: <ut3669$21eur$1@i2pn2.org> <86h6h3nvyz.fsf@linuxsc.com> <865xxiok09.fsf@linuxsc.com> <20240319131842.00002138@yahoo.com> <utbuk3$qed7$1@dont-email.me> <20240319191859.00004bc8@yahoo.com> <86bk79m10l.fsf@linuxsc.com> <20240324202758.00001b75@yahoo.com> <86frwfjs0n.fsf@linuxsc.com> <20240325012844.0000685b@yahoo.com> <867chlk1zf.fsf@linuxsc.com> <20240329162141.00006c81@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Sat, 30 Mar 2024 06:58:29 +0100 (CET)
Injection-Info: dont-email.me; posting-host="8552662d97416c03078d7b971fbb1163";
	logging-data="911328"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18eJqKSEhl4jLRHViBKOKWDlBDGoQMWoLs="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:u+4C0FEmqOfLuatrm9ZKnzjHXQA=
	sha1:RwZ2k04dosBOYKA+L/eNshj8grQ=
Bytes: 5529

Michael S <already5chosen@yahoo.com> writes:

> On Thu, 28 Mar 2024 23:04:36 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Michael S <already5chosen@yahoo.com> writes:
>>
>>>> [..various fill algorithms and how they scale..]
>>>
>>> One thing that I were not expecting at this bigger pictures, is
>>> good performance of simple recursive algorithm.  Of course, not of
>>> original form of it, but of variation with explicit stack.  For
>>> many shapes it has quite large memory footprint and despite that
>>> it is not slow.  Probably the stack has very good locality of
>>> reference.
>>>
>>> [algorithm]
>>
>> You are indeed a very clever fellow.  I'm impressed.
>
> Yes, the use of switch is clever :(

In my view the cleverness is how "recursion" is accomplished by a
means of a combination of using a stack to store a "return address"
and restoring state by undoing a change rather than storing the
old value.  Using a switch() is just a detail (and to my way of
thinking how the switch() is done here obscures the basic idea and
makes the code harder to understand, but never mind that).

> It more resemble computed GO TO in old FORTRAN or indirect jumps
> in asm than idiomatic C switch.  But it is a legal* C.

I did program in FORTRAN briefly but don't remember ever using
computed GO TO.  And yes, I found that missing semicolon and put it
back.  Is there some reason you don't always use -pedantic?  I
pretty much always do.

>> Intrigued by your idea, I wrote something along the same lines,
>> only shorter and (at least for me) a little easier to grok.
>> If someone is interested I can post it.
>
> If non-trivially different, why not?

I hope to soon but am unable to right now (and maybe for a week
or so due to circumstances beyond my control).  For sure the
code is different;  whether it is non-trivially different I
leave for others to judge.

>> I see you have also done a revised algorithm based on the same
>> idea, but more elaborate (to save on runtime footprint?).
>> Still working on formulating a response to that one...
>
> The original purpose of enhancement was to amortize non-trivial
> and probably not very fast call stack emulation logic over more
> than one pixel.  2x2 just happens to be the biggest block that
> still has very simple in-block recoloring logic. ~4x reduction in
> the size of auxiliary memory is just a pleasant side effect.
>
> Exactly the same 4x reduction in memory size could have been
> achieved with single-pixel variant by using packed array for 2-bit
> state (==trace back) stack elements.  But it would be the same or
> slower than original while the enhanced variant is robustly faster
> than original.

An alternate idea is to use a 64-bit integer for 32 "top of stack"
elements, or up to 32 I should say, and a stack with 64-bit values.
Just an idea, it may not turn out to be useful.

The few measurements I have done don't show a big difference in
performance between the two methods.  But I admit I wasn't paying
close attention, and like I said only a few patterns of filling were
exercised.

> After implementing the first enhancement I paid attention that at
> 4K size the timing (per pixel) for few of my test cases is
> significantly worse than at smaller images.  So, I added another
> enhancement aiming to minimize cache trashing effects by never
> looking back at immediate parent of current block.  The info about
> location of the parent nicely fitted into remaining 2 bits of
> stack octet.

The idea of not going back to the originator (what you call the
parent) is something I developed independently before looking at
your latest code (and mostly I still haven't).  Seems like a good
idea.

Two further comments.

One, the new code is a lot more complicated than the previous
code.  I'm not sure the performance gain is worth the cost
in complexity.  What kind of speed improvements do you see,
in terms of percent?

Two, and more important, the new algorithm still uses O(NxN) memory
for an N by N pixel field.  We really would like to get that down to
O(N) memory (and of course run just as fast).  Have you looked into
how that might be done?