Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: pozz Newsgroups: comp.arch.embedded Subject: Re: Library for save an events log in Flash Date: Fri, 19 Apr 2024 16:58:35 +0200 Organization: A noiseless patient Spider Lines: 243 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 19 Apr 2024 16:58:34 +0200 (CEST) Injection-Info: dont-email.me; posting-host="7a9d6136739a41fe83f8065a55972666"; logging-data="3142032"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+QzxXXH/0aoVm6v1tk+PH8e0kDzcaLGdg=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:LYMZj1oWwk46i4sYJo/h48Oigso= In-Reply-To: Content-Language: it Bytes: 12050 Il 19/04/2024 10:28, David Brown ha scritto: > On 18/04/2024 22:36, pozz wrote: >> Il 18/04/2024 21:30, David Brown ha scritto: >>> On 18/04/2024 16:38, pozz wrote: >>>> The request is very common: when some interested events occur in the >>>> system, they, with the related timestamp, must be saved in a log. >>>> The log must be saved in an external SPI Flash connected to a MCU. >>>> The log has a maximum number of events. After the log is completely >>>> filled, the new event overwrite the oldest event. >>>> >>>> I tried to implement such library, but I eventually found it's not a >>>> simple task, mostly if you want a reliable library that works even >>>> when errors occur (for example, when one writing fails). >>>> >>>> I started by assigning 5 sectors of SPI Flash to the log. There are >>>> 256 events in a sector (the event is a fixed struct). >>>> In this scenario, the maximum log size is 1024 events, because the >>>> 5th sector can be erased when the pointer reaches the end of a sector. >>>> >>>> The first challenge is how the system can understand what is the >>>> first (newest) event in the log at startup. I solved saving a >>>> 16-bits counter ID to each event. The initialization routine starts >>>> reading all the IDs and taks the greatest as the last event. >>>> However initially the log is empty, so all the IDs are 0xFFFF, the >>>> maximum. One solution is to stop reading events when 0xFFFF is read >>>> and wrap-around ID at 0xFFFE and not 0xFFFF. >>> >>> Start at 0xfffe, and count down. >> >> >> And what to do when the counter reaches zero? It can wrap-around up to >> 0xfffe (that is very similar to an increasing counter from >> 0x0000-0xFFFE). >> > > How big are your log entries?  How many entries are realistic in the > lifetime of the system? 16 bytes each entry. It's difficult to reach 0xFFFF events in the log, but why limit our fantasy? :-D With 20 events per day, the log will be filled after 9 years. It's a very long life, but I think I have solved the problem to understand if the ID was wrapped-around. >>  > Or xor with 0xffff for storage.  Or >>> wrap at 0xfffe, as you suggest.  Or use 32-bit values.  Or have >>> another way to indicate that the log entry is valid. >> >> I will add a CRC for each entry and that can be used to validate the >> event. An empty/erased slot filled with 0xFF will not pass CRC >> validation. >> > > That's usually fine. > >> >>> Or, since you have a timestamp, there's no need to track an ID - the >>> timestamp will be increasing monotonically. >> >> I don't want to use timestamps for two reasons: >> >> - the system wall clock can be changed (the system is isolated) >> - the library I'm writing doesn't know the content of "events", for >>    it the event is an opaque sequence of bytes. >> > > OK. > >> >>>> However there's another problem. What happens after writing 65535 >>>> events in the log? The ID restarts from 0, so the latest event >>>> hasn't the greatest ID anymore. >>>> >>>> These are the saved IDs after 65536 events: >>>> >>>>      1^ SECT    2^ SECT    3^ SECT    4^ SECT    5^SECT----------> >>>>      0xFB00 ... 0xFC00 ... 0xFD00 ... 0xFE00 ... 0xFF00 ... 0xFFFF >>>> >>>> The rule "newest event has greatest ID" is correct yet. Now a new >>>> event is written: >>>> >>>>      1^ SECT-------> 2^ SECT   3^ SECT   4^ SECT   5^SECT---------> >>>>      0x0000 0xFB01.. 0xFC00 .. 0xFD00 .. 0xFE00 .. 0xFF00 .. 0xFFFF >>>> >>>> Now the rule doesn't work anymore. The solution I found is to detect >>>> discontinuity. The IDs are consecutive, so the initialization >>>> routine continues reading while the ID(n+1)=ID(n)+1. When there's a >>>> gap, the init function stops and found the ID and position of the >>>> newest event. >>> >>> Make your counts from 0 to 256*5 - 1, then wrap.  Log entry "n" will >>> be at address n * sizeof(log entry), with up to 256 log entries >>> blank. Then you don't need to store a number at all. >> >> What do you mean with log entry "0"? Is it the oldest or the newest? I >> think the oldest, because that formula is imho correct in this case. >> >> However it doesn't appear correct when the log has rotated, that >> happens after writing 5x256+1 events. In this case the newest entry >> ("n"=1024) is at address 0, not n*sizeof(entry). >> > > (I misread your "5 sectors of an SPI flash chip" as "5 SPI flash chips" > when first replying.  It makes no real difference to what I wrote, but I > might have used "chip" instead of "sector".) > > You have 256 entries per flash sector, and 5 flash sectors. For the log > entry number "n" - where "n" is an abstract count that never wraps, your > index "i" into the flash array is (n % 5*256). The sector number is > then (i / 256), and the index into the sector is (i % 256). The > position in the log is determined directly by the entry number, and you > don't actually need to store it anywhere. > > Think of this a different way - dispense with the log entry numbers > entirely.  When you start up, scan the flash to find the next free slot. >  You do this by looking at slot 0 first.  If that is not empty, keep > scanning until you find a free slot - that's the next free slot.  If > slot 0 is empty, scan until you have non-empty slots, then keep going > until you get a free one again, and that's the next free slot.  If you > never find a used slot, or fail to find a free slot after the non-free > slots, then your first free slot is slot 0. > > Any new logs are then put in this slot, moving forward.  If you need to > read out old logs, move backwards.  When storing new logs, as you are > nearing the end of a flash sector (how near depends on the sector erase > time and how often events can occur), start the erase of the next sector > in line. Yes, it is what I already do. However I disagree on the formula. The higher-layer application requests log entry 0. What is it? The newest event. My eventlog library should convert 0 to the slot index in the Flash, that is directly related to the Flash addres (I don't really need the number of sector here). If the log is empty, event 0 for the application is slot 0 for the eventlog library. However, if there are three events in the log, event 0 for the application is slot 2 for the eventlog library. The application doesn't know anything about what I named the ID of the event. It's just a number used by the lower-layer eventlog module to find the first free slot at startup. >>>> But he problems don't stop here. What happens if an event write >>>> failed? Should I verify the real written values, by reading back >>>> them and comparing with original data? In a reliable system, yes, I >>>> should. >>>> >>>> I was thinking to protect an event with a CRC, so to understand at >>>> any time if an event is corrupted (bad written). However the >>>> possibility to have isolated corrupted events increase the >>>> complexity of the task a lot. >>> >>> An 8-bit or 16-bit CRC is peanuts to calculate and check. >> >> I know. Here the increased complexity wasn't related to the CRC >> calculation, but to the possibility to have isolated corrupted slots >> in the buffer. Taking into account these corrupted slots isn't so >> simple for me. >> > > Think how such corruption could happen, and its consequences.  For most > event logs, it is simply not going to occur in the lifetime of working > products - and if it does, as an isolated error in an event log, it > doesn't matter significantly.  Errors in a sensibly designed SPI NOR > flash system would be an indication of serious hardware problems such as > erratic power supplies, and then the log is the least of your concerns. > > The only thing to consider is a reset or power failure in the middle of > writing a log event. Yes, I agree with you. >>> Write the whole log entry except for a byte or two (whatever is the ========== REMAINDER OF ARTICLE TRUNCATED ==========