Article <vbgn2u$171ul$1@dont-email.me>

Deutsch English Français Italiano
<vbgn2u$171ul$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Bill Sloman <bill.sloman@ieee.org>
Newsgroups: sci.electronics.design
Subject: Re: DRAM accommodations
Date: Sat, 7 Sep 2024 15:07:29 +1000
Organization: A noiseless patient Spider
Lines: 48
Message-ID: <vbgn2u$171ul$1@dont-email.me>
References: <vbdcrs$gp01$1@dont-email.me> <vbflbe$tlhp$7@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 07 Sep 2024 07:07:43 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="dcc042547c4ea5ce34dc062a0a6e987b";
	logging-data="1279957"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/okmgFNdhWYueqVscXFZcsulB7YdMuEfc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:JlGaPOuhdFQVNcE3HVSpcXvEEag=
X-Antivirus: Norton (VPS 240906-4, 6/9/2024), Outbound message
In-Reply-To: <vbflbe$tlhp$7@dont-email.me>
Content-Language: en-US
X-Antivirus-Status: Clean
Bytes: 3378

On 7/09/2024 5:31 am, Don Y wrote:
> On 9/5/2024 3:54 PM, Don Y wrote:
>> Given the high rate of memory errors in DRAM, what steps
>> are folks taking to mitigate the effects of these?
>>
>> Or, is ignorance truly bliss?  <frown>
> 
>  From discussions with colleagues, apparently, adding (external) ECC to
> most MCUs is simply not possible; too much of the memory and DRAM
> controllers are in-built (unlike older multi-chip microprocessors).
> There's no easy way to generate a bus fault to rerun the bus cycle
> or delay for the write-after-read correction.
> 
> And, among those devices that *do* support ECC, it's just a conventional
> SECDEC implelmentation.  So, a fair number of UCEs will plague any
> design with an appreciable amount of DRAM (can you even BUY *small*
> amounts of DRAM??)
> 
> For devices with PMMUs, it's possible to address the UCEs -- sort of.
> But, this places an additional burden on the software and raises
> the problem of "If you are getting UCEs, how sure are you that
> undetected CEs aren't slipping through??"  (again, you can only
> detect the UCEs via an explicit effort so you pay the fee and take
> your chances!)
> 
> For devices without PMMUs, you have to rely on POST or BIST.  And,
> *hope* that everything works in the periods between (restart often!  :> )
> 
> Back of the napkin figures suggest many errors are (silently!) encountered
> in an 8-hour shift.  For XIP implementations, it's mainly data that is at
> risk (though that can also include control flow information from, e.g.,
> the pushdown stack).  For implementations that load their application
> into DRAM, then the code is suspect as well as the data!
> 
> [Which is likely to cause more detectable/undetectable problems?]

Typical software reaction. You design error detection and error 
correction into the hardware, and the extra hardware can both correct 
most errors (when they can be corrected) and report all of the them - 
both those corrected and those that couldn't be corrected.

Data transmission systems can re-transmit damaged packets of data. and 
tend to go for checksums that merely detected errors in much longer 
words/packets, and reject the affected packets.

-- 
Bill Sloman, Sydney