Article <vnptl6$15pgm$1@dont-email.me>

Deutsch English Français Italiano
<vnptl6$15pgm$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Cost of handling misaligned access
Date: Mon, 3 Feb 2025 02:10:09 -0600
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <vnptl6$15pgm$1@dont-email.me>
References: <5lNnP.1313925$2xE6.991023@fx18.iad> <vnosj6$t5o0$1@dont-email.me>
 <2025Feb3.075550@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 03 Feb 2025 09:10:15 +0100 (CET)
Injection-Info: dont-email.me; posting-host="f4fe680962b51b8bca49b43582a45572";
	logging-data="1238550"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+Rz0pVBWwiTBEt/cX77eH20t+8gOmoPTA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:wlysIQCVOvxGGlQs9PFY7oYWLcE=
In-Reply-To: <2025Feb3.075550@mips.complang.tuwien.ac.at>
Content-Language: en-US
Bytes: 3710

On 2/3/2025 12:55 AM, Anton Ertl wrote:
> BGB <cr88192@gmail.com> writes:
>> On 2/2/2025 10:45 AM, EricP wrote:
>>> Digging deeper with performance counters reveals executing each unaligned
>>> load instruction results in ~505 executed instructions. P550 almost
>>> certainly doesn’t have hardware support for unaligned accesses.
>>> Rather, it’s likely raising a fault and letting an operating system
>>> handler emulate it in software."
>>>
>>
>> An emulation fault, or something similarly nasty...
>>
>>
>> At that point, even turning any potentially unaligned load or store into
>> a runtime call is likely to be a lot cheaper.
> 
> There are lots of potentially unaligned loads and stores.  There are
> very few actually unaligned loads and stores: On Linux-Alpha every
> unaligned access is logged by default, and the number of
> unaligned-access entries in the logs of our machines was relatively
> small (on average a few per day).  So trapping actual unaligned
> accesses was faster than replacing potential unaligned accesses with
> code sequences that synthesize the unaligned access from aligned
> accesses.
> 

Don't make every C pointer unaligned.

Rather, have something like an explicit "__unaligned" keyword or 
similar, and then use the runtime call for these pointers.


But, yeah, assuming one can't just have hardware with natively unaligned 
pointers.


> Of course, if the cost of unaligned accesses is that high, you will
> avoid them in cases like block copies where cheap unaligned accesses
> would otherwise be beneficial.
> 

Yeah.

Though "memcpy()" is usually a "simple to fix up" scenario.


A harder case is for LZ decompression, where byte-for-byte copying is 
slow, but typically both the source and destination will often be at 
pretty much arbitrary alignment for each copy operation (with the vast 
majority of LZ matches being a small number of bytes).

Granted, on most traditional systems, LZ compression is infrequent (IOW: 
not something someone is just throwing around all over the place in an 
attempt to make IO speeds faster).


But, apparently, older mentality was more like "decompression is slow".

And not so much "our media devices are slow, but a sufficiently fast LZ 
compressor can make them faster" (and then throwing LZ at a whole bunch 
of IO related use cases...).


But, yeah, I predict though that if one tries to run an LZ decoder that 
was written to assume unaligned pointers, on a CPU that does 
trap-and-emulate on misaligned pointers, it is going to be very slow.



> - anton