Article <vql0ok$u5ff$1@dont-email.me>

Deutsch English Français Italiano
<vql0ok$u5ff$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Zen Microcode
Date: Sun, 9 Mar 2025 16:20:10 -0500
Organization: A noiseless patient Spider
Lines: 154
Message-ID: <vql0ok$u5ff$1@dont-email.me>
References: <m2s3p6F12efU1@mid.individual.net>
 <memo.20250306093652.8812C@jgd.cix.co.uk>
 <dc50aed63fcdb6e73d25d0e68c52c20e@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 09 Mar 2025 22:21:25 +0100 (CET)
Injection-Info: dont-email.me; posting-host="3e2dcc3f4de19925a2c8a9285e1c9832";
	logging-data="988655"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+wMqnWDP9YnN+GOEVHYD6wu+74VqHx5Xw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:e5ADJGjF7XjXcUDjLvAPDFXitBw=
In-Reply-To: <dc50aed63fcdb6e73d25d0e68c52c20e@www.novabbs.org>
Content-Language: en-US
Bytes: 6330

On 3/7/2025 2:29 PM, MitchAlsup1 wrote:
> A "good try" at encryption is what engineers show management
> in order to claim they know what they are doing {{even when
> they really don't}}.
> 
> I was in the meetings where the AMD architecture team discussed
> this "security issue" and I can name names.


Not sure about the specifics of this case.


But, sometimes one can also use encryption mostly as a legal tool (say, 
for anti-tampering).
Like, if it is just bare data, they can't do as much.
But, if encryption or similar is involved, they can bring in the full 
force of the law...


In the latter case, the encryption would often be something like XOR'ing 
with a bit pattern or a Caesar cipher or similar.


Like, say, lazy man's encryption could be something like:
   void encode(void *dst, void *src, int sz, uint64_t key)
   {
     uint64_t *cs, *ct, *cse;
     cs=src; cse=cs+(sz+7)>>3; ct=dst;
     while(cs<cse)
       { *ct++=(*cs++)+key; }
   }
   void decode(void *dst, void *src, int sz, uint64_t key)
     { encode(dst, src, sz, (~key)+1); }

Where, in this case, the strength (or lack thereof) doesn't really matter.

If you happen to already know some of the non-encoded data, breaking 
this is trivial (and figuring out 8 bytes is enough to decode the whole 
thing). Only reason to do it 8 bytes at a time (vs 1 byte) is because 8 
bytes is faster.

But, if encoding a known format (say, PE/COFF or WAV or similar), could 
probably crack it very quickly relying on some basic knowledge of the 
file format (eg, where to find magic numbers and blobs of NUL bytes). 
Could potentially break it in under 1000 clock-cycles this way.



Or, maybe they could make it a little stronger by using PRNG...

   uint64_t permuteKey(uint64_t key)
   {
       uint64_t ckey, cklo, ckhi, cka;
       cklo=((uint32_t)(key>> 0))*0xE20B7AC6ULL;  //*1
       ckhi=((uint32_t)(key>>32))*0xE20B7AC6ULL;
       cka=(ckhi>>32)|((cklo>>32)<<32);
       ckey=key+cka;
       return(ckey);
   }
*1: Use cases that can be turned into a (faster) 32-bit widening 
multiply. Where, full 64-bit multiply is unreasonably slow. In this 
case, the multiplies serve to mix the bits around somewhat.

   void encode(void *dst, void *src, int sz,
     uint64_t key1, uint64_t key2)
   {
     uint64_t ckey, cka, ckb, ckc, ckstep, v;
     uint64_t *cs, *ct, *cse;
     int n;

     cs=src; cse=cs+(sz+7)>>3; ct=dst;

     //setup cost, likely expensive, probably unavoidable
     cka=key1; ckb=key2; ckc=key1^key2;
     ckey=((uint32_t)ckc)*0xE20B7AC6ULL;
     n=(ckey>>32)&63;
     while(n--)
       cka=permuteKey(cka);
     n=(ckey>>38)&63;
     while(n--)
       ckb=permuteKey(ckb);
     n=(ckey>>44)&15;
     while(n--)
       ckc=permuteKey(ckc);

     ckey=cka+ckb; n=64;
     ckstep=ckey+ckc;
     ckey=permuteKey(ckey);      //(strength boost)
     ckstep=permuteKey(ckstep);  //?

     while(cs<cse)
     {
       v=(*cs++);
       n--;
       *ct++=v^ckey;
       ckey+=ckstep;  //weak, but cheap-ish...
       ckstep=(ckstep<<1)^(ckstep>>27); //? (strength boost)

       //permute key, stronger but slow
       if(!n)
       {
         cka=permuteKey(cka);
         ckb=permuteKey(ckb);
         ckc=permuteKey(ckc);
         ckey=cka+ckb;
         ckstep=ckey+ckc;
         ckey=permuteKey(ckey);      //? (strength boost)
         ckstep=permuteKey(ckstep);  //?
         n=64;  //so only do it rarely
       }
     }
   }

Where, it would be no longer sufficient to know N bytes of payload data 
to break it. As for whether it would be acceptably cheap/fast is unknown.

To try to limit computational cost, only permute keys once every 512 
bytes or so (though, it would still be fairly weak within each 512 
block; but doing this too often could negatively effect data throughput).

Could be made faster (say, by working 32 bytes at a time), but would get 
probably a bit too bulky for use as an example here (but, I suspect 
could be possible to get it within around 80% of memcpy speed with some 
creative unrolling).

Switched to XOR in the example (as the final data-facing step), which 
avoids needing a separate decoder function.


Or, a possible faster/cheaper intermediate option being to not 
re-permute mid-stream.

Though, if one had a chunk of known data (*2), it could be possible to 
work out the step values (using the power of integer subtract), and 
break the rest. So, probably not sufficient... (Maybe passably if this 
strategy would only break a small chunk of data).

*2: Say, magic numbers or known locations where one is likely to find 
blobs of NUL bytes or similar given the file format.


Say, it probably at least needs to look like it would be hard to break, 
and not something where someone can look at it and figure out that the 
key could be broken by subtracting pairs of values and then effectively 
having captured the key-state for the whole message...

While, ideally, also not adding too much computational overhead.

Though, not sure where exactly would be the lower bar here (probably 
needs to at least appear like it would work).

....