Path: ...!2.eu.feeder.erje.net!feeder.erje.net!newsfeed.bofh.team!paganini.bofh.team!not-for-mail
From: Waldek Hebisch <antispam@fricas.org>
Newsgroups: comp.lang.c
Subject: Re: Top 10 most common hard skills listed on resumes...
Date: Mon, 9 Sep 2024 23:58:45 -0000 (UTC)
Organization: To protect and to server
Message-ID: <vbo23j$2hibc$1@paganini.bofh.team>
References: <vab101$3er$1@reader1.panix.com>   <vak7c0$2ufit$1@raubtier-asyl.eternal-september.org> <vaki4u$303sg$1@dont-email.me> <vakjff$30c4f$1@raubtier-asyl.eternal-september.org> <val7d6$33e83$1@dont-email.me> <vamb0t$3btll$2@raubtier-asyl.eternal-september.org> <vamqfc$3e42u$1@dont-email.me> <20240828134956.00006aa3@yahoo.com> <van4v1$3fgjj$1@dont-email.me> <vbl591$286j0$1@paganini.bofh.team> <vbmfee$2bn2v$3@dont-email.me> <vbn15a$2fvl0$1@paganini.bofh.team> <vbn376$2empp$1@dont-email.me>
Injection-Date: Mon, 9 Sep 2024 23:58:45 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="2673004"; posting-host="WwiNTD3IIceGeoS5hCc4+A.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.0-9-amd64 (x86_64))
X-Notice: Filtered by postfilter v. 0.9.3
Bytes: 9309
Lines: 171

David Brown <david.brown@hesbynett.no> wrote:
> On 09/09/2024 16:36, Waldek Hebisch wrote:
>> David Brown <david.brown@hesbynett.no> wrote:
>>> On 08/09/2024 23:34, Waldek Hebisch wrote:
>>>> David Brown <david.brown@hesbynett.no> wrote:
>>>>>
>>>>> And while microcontrollers sometimes have a limited form of branch
>>>>> prediction (such as prefetching the target from cache), the more
>>>>> numerous and smaller devices don't even have instruction caches.
>>>>> Certainly none of them have register renaming or speculative execution.
>>>>
>>>> IIUC STM4 series has cache, and some of them are not so big.  There
>>>> are now several chinese variants of STM32F103 and some of them have
>>>> caches (some very small like 32 words, IIRC one has 8 words and it
>>>> is hard to decide if this very small cache or big prefetch buffer).
>>>
>>> There are different kinds of cache here.  Some of the Cortex-M cores
>>> have optional caches (i.e., the microcontroller manufacturer can choose
>>> to have them or not).
>>>
>>> <https://en.wikipedia.org/wiki/ARM_Cortex-M#Silicon_customization>
>> 
>> I do not see relevent information at that link.
> 
> There is a table of the Cortex-M cores, with the sizes of the optional 
> caches.
> 
>>   
>>> Flash memory, flash controller peripherals, external memory interfaces
>>> (including things like QSPI) are all specific to the manufacturer,
>>> rather than part of the Cortex M cores from ARM.  Manufacturers can do
>>> whatever they want there.
>> 
>> AFAIK typical Cortex-M design has core connected to "bus matrix".
>> It is up to chip vendor to decide what else is connected to bus matrix.
> 
> Yes.
> 
> However, there are other things connected before these crossbar 
> switches, such as tightly-coupled memory (if any).

TCM is _not_ a cache.

>  And the cpu caches 
> (if any) are on the cpu side of the switches.

Caches are attached were system designer thinks they are useful
(and possible).  Word "cache" has well-estabished meaning and
ARM (or you) has no right to redefine it.

>  Manufacturers also have a 
> certain amount of freedom of the TCMs and caches, depending on which 
> core they are using and which licenses they have.
> 
> There is a convenient diagram here:
> 
> <https://www.electronicdesign.com/technologies/embedded/digital-ics/processors/microcontrollers/article/21800516/cortex-m7-contains-configurable-tightly-coupled-memory>
> 
>> For me it does not matter if it is ARM design or vendor specific.
>> Normal internal RAM is accessed via bus matrix, and in MCU-s that
>> I know about is fast enough so that cache is not needed.  So caches
>> come into play only for flash (and possibly external memory, but
>> design with external memory probably will be rather large).
>> 
> 
> Typically you see data caches on faster Cortex-M4 microcontrollers with 
> external DRAM, and it is also standard on Cortex-M7 devices.  For the 
> faster chips, internal SRAM on the AXI bus is not fast enough.  For 
> example, the NXP i.mx RT106x family typically run at 528 MHz core clock, 
> but the AXI bus and cross-switch are at 133 MHz (a quarter of the 
> speed).  The tightly-coupled memories and the caches run at full core speed.

OK, if you run core at faster clock than the bus matrix, then cache
attached on core side make a lot of sense.  And since cache has to
compensate for lower bus speed it must be resonably large.  But
if you look at devices where bus matrix runs at the same clock
as the core, then it makes sense to put cache on the other side.

>> It seems that vendor do not like to say that they use cache, instead
>> that use misleading terms like "flash accelerator".
> 
> That all depends on the vendor, and on how the flash interface 
> controller.  Vendors do like to use terms that sound good, of course!
> 
>> 
>>> So a "cache" of 32 words is going to be part of the flash interface, not
>>> a cpu cache
>> 
>> Well, caches never were part of CPU proper, they were part of
>> memory interface.  They could act for whole memory or only for part
>> that need it (like flash).  So I do not understand what "not a cpu
>> cache" is supposed to mean.  More relevant is if such thing act
>> as a cache, 32 word things almost surely will act as a cache,
>> 8 word thing may be a simple FIFO buffer (or may act smarter
>> showing behaviour typical of caches).
>> 
> 
> Look at the diagram in the link I gave above, as an example.  CPU caches 
> are part of the block provided by ARM and are tightly connected to the 
> processor.  Control of the caches (such as for enabling them) is done by 
> hardware registers provided by ARM, alongside the NVIC interrupt 
> controller, SysTick, MPU, and other units (depending on the exact 
> Cortex-M model).
> 
> This is completely different from the small buffers that are often 
> included in flash controllers or external memory interfaces as 
> read-ahead buffers or write queues (for RAM), which are as external the 
> processor core as SPI, UART, PWM, ADC, and other common blocks provided 
> by the microcontroller manufacturer.

The disscussion started about possible interaction of caches
and virtual function dispatch.  This interaction does not depend
on you calling it cache.  It depends on cache hits/misses,
their cost and possible eviction.  And actually small caches
can give "interesting" behaviour: with small code footprint there
may be 100% hit ratio, but one extra memory reference may lead
to significant misses.  And even small caches behave differently
then simple buffers.

> 
>>> (which are typically 16KB - 64KB,
>> 
>> I wonder where you found this figure.  Such size is typical for
>> systems bigger than MCU-s.  It could be useful for MCU-s with
>> flash a on separate die, but with flash on the same die as CPU
>> much smaller cache is adequate.
> 
> Look at the Wikipedia link I gave.  Those are common sizes for the 
> Cortex-M7 (which is pretty high-end), and for the newer generation of 
> Cortex-M35 and Cortex-M5x parts.  I have on my desk an RTO1062 with a 
> 600 MHz Cortex-M7, 1 MB internal SRAM, 32 KB I and D caches, and 
> external QSPI flash.

OK, as I wrote it makes sense for them.  But for smaller machines
much smaller caches may be adequate.

>> 
>>> and only found on bigger
>>> microcontrollers with speeds of perhaps 120 MHz or above).  And yes, it
>>> is often fair to call these flash caches "prefetch buffers" or
>>> read-ahead buffers.
>> 
>> Typical code has enough branches that simple read-ahead beyond 8
>> words is unlikely to give good results.  OTOH delivering things
>> that were accessed in the past and still present in the cache
>> gives good results even with very small caches.
> 
> There are no processors with caches smaller than perhaps 4 KB - it is 
> simply not worth it.

Historicaly there were processors with small caches.  256B in
Motorla chips and I think smaller too.  It depends on the whole
design.  Currently for "big" processors really small caches seem
to make no sense.  Microconrollers have their own constaints.
Manufacturer may decide that cache giving 10% average improvement
is not worth uncertainilty of execution time.  Or may decide that
small cache is the cheapest way to get better benchmark figures.

>  Read-ahead buffers on flash accesses are helpful, 
> however, because most code is sequential most of the time.  It is common 
> for such buffers to be two-way, and to have between 16 and 64 bytes per 
> way.

If you read carefully description of STM "flash accelerator" it is
clear that this is classic cache, with line size matched to flash,
something like 2-set associativity, conflicts and eviction.
Historically there were variations, some caches only cache targets
of jumps and use prefetch buffer for linear code.  Such caches
can be effective at very small size.

-- 
                              Waldek Hebisch