Deutsch   English   Français   Italiano  
<v4erec$29e2g$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!3.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.in-chemnitz.de!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: "undefined behavior"?
Date: Thu, 13 Jun 2024 15:15:55 +0200
Organization: A noiseless patient Spider
Lines: 233
Message-ID: <v4erec$29e2g$1@dont-email.me>
References: <666a095a$0$952$882e4bbb@reader.netnews.com>
 <v4d4h5$1rc9e$1@dont-email.me> <666a2146$0$950$882e4bbb@reader.netnews.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 13 Jun 2024 15:15:56 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a4019a0b35be2744ae8acc392e7d37ca";
	logging-data="2406480"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/swQA2oqhg7BCByIflm0FtxUMxOw0hNW0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:5jpC1V9QCzsweL/fv2z34AiAkh8=
Content-Language: en-GB
In-Reply-To: <666a2146$0$950$882e4bbb@reader.netnews.com>
Bytes: 10839

On 13/06/2024 00:29, DFS wrote:
> On 6/12/2024 5:38 PM, David Brown wrote:
>> On 12/06/2024 22:47, DFS wrote:
>>> Wrote a C program to mimic the stats shown on:
>>>
>>> https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
>>>
>>> My code compiles and works fine - every stat matches - except for one 
>>> anomaly: when using a dataset of consecutive numbers 1 to N, all 
>>> values  > 40 are flagged as outliers.  Up to 40, no problem.  Random 
>>> numbers dataset of any size: no problem.
>>>
>>> And values 41+ definitely don't meet the conditions for outliers 
>>> (using the IQR * 1.5 rule).
>>>
>>> Very strange.
>>>
>>> Edit: I just noticed I didn't initialize a char:
>>> before: char outliers[100];
>>> after : char outliers[100] = "";
>>>
>>> And the problem went away.  Reset it to before and problem came back.
>>>
>>> Makes no sense.  What could cause the program to go FUBAR at data 
>>> point 41+ only when the dataset is consecutive numbers?
>>>
>>> Also, why doesn't gcc just do you a solid and initialize to "" for you?
>>>
>>
>> It is /really/ difficult to know exactly what your problem is without 
>> seeing your C code!  There may be other problems that you haven't seen 
>> yet.
> 
> The outlier section starts on line 169
> =====================================================================================

<snip>

Apart from the initialisation issue, I would suggest you re-consider the 
way you add strings to the "outliers" buffer.  If there are two many of 
them, it will overflow - there's nothing to stop you putting more than 
200 characters into it.  I would recommend dropping the "temp" variable 
and instead keep track of a pointer to the terminated null character of 
your current "outliers" string.  Use "snprintf" to "print" directly into 
the string, rather than going via "temp", and use the return value of 
the "snprintf" to update your end pointer.  You will easily be able to 
avoid the risk of overrun, while also being slightly more efficient too.

The line:

	outliers[strlen(outliers)] = '\0';

is completely useless.  "strlen" starts at the beginning of "outliers", 
and counts along until it finds a null character - thus either 
"outliers[strlen(outliers)]" is already equal to '\0', or your attempt 
at calculating "strlen" with an overrun buffer will lead to more 
undefined behaviour.

> 
>> Non-static local variables without initialisers have "indeterminate" 
>> value if there is no initialiser.  Trying to use these "indeterminate" 
>> values is undefined behaviour - you have absolutely no control over 
>> what might happen.  Any particular behaviour you see is done to luck 
>> from the rest of the code and what happened to be in memory at the time.
> 
> In 2024 that's surprising.  I can't be the only one to forget to 
> initialize a char[] variable.
> 

You are not - attempting to use an uninitialised variable is a common 
error.  That is why C compilers provide warnings about this kind of 
thing, along with run-time tools like the sanitizers Ben recommended, to 
help find such mistakes.  But compiler vendors can't force people to use 
such tools and warning flags, nor can the tools find /all/ cases of 
errors.  At some point, programmers have to take responsibility for 
knowing the language they are using, and writing their code correctly. 
Good tools and good use of those tools is an aid to careful coding, not 
an alternative to it.

> 
> 
>> There is no automatic initialisation of non-static local variables, 
>> because that would often be inefficient. 
> 
> It would've saved me half an hour of frustration.

And the things you have learned as a result - from your own debugging, 
and the threads here - will save you many more hours of frustration in 
the future.

There are languages that focus on ease of use and do all the management 
of things like strings and buffers, and prevent users from mistakes like 
this, at the cost of slower run-times.  There are languages that do very 
little automatically for the programmer and have absolutely minimal 
overheads, for maximal efficiency.  C is the later kind of language.

Remember, while you might see automatic initialisation of local 
variables as a negligible overhead, other people might not - I've worked 
on C code for microcontrollers where a wasted processor cycle or two is 
too much.  If your code does not care about such efficiencies, then you 
have to question whether C is the right language in the first place.  I 
believe most modern code that is written in C would be better if it were 
written in other higher level languages (precisely because a half hour 
of /your/ time is usually more valuable than a few microseconds of your 
computer's time).


On the subject of initialisation, I strongly suggest that you do /not/ 
get in the habit of always initialising your variables to 0 when you 
define them.  Do that only if 0 is the real, appropriate starting value. 
  Prefer to avoid declaring the variable at all until you need it, then 
define it with its initial value (and consider making it "const" to 
reduce the risk of other coding errors).  If the structure of the code 
requires you to define the variable before you have a value for it, 
prefer to leave it without an initial value.  Then compiler warnings 
have a much better chance of spotting mistakes.

> 
> Now I'm getting 'stack smashing detected' errors (after the program runs 
> correctly) when using datasets of consecutive numbers.
> 

I think Ben found that buffer overrun for you, and showed you how to 
find it yourself in the future.

> hmmmm 2 issues in a row using consecutives - that's a clue!
> 
> 
> 
>> The best way to avoid errors like yours, IMHO, is not to declare such 
>> variables until you have data to put in them - thus you always have a 
>> sensible initialiser of real data.  Occasionally that is not 
>> practical, but it works in most cases.
> 
> Data is definitely going in them: either the value 'none' or a list of 
> the outliers and some text.
> 

Now that I have your source code, I can see the error is the way you put 
data in - strcat() reads the existing data, it does not just write data.

> 
> 
>> For a data array, zero initialisation is common.  Typically you do 
>> this with :
>>
>>      int xs[100] = { 0 };
>>
>> That puts the explicit 0 in the first element of xs, and then the rest 
>> of the array is cleared with zeros.
> 
>> I recommend never using "char" as a type unless you really mean a  > 
>> character, limited to 7-bit ASCII.  So if your "outliers" array really
>> is an array of such characters, "char" is fine.  If it is intended to 
>> be numbers and for some reason you specifically want 8-bit values, use 
>> "uint8_t" or "int8_t", and initialise with { 0 }.
> 
> I did mean characters, limited to: 0-9a-zA-Z()

OK.

> 
> I think I'm using the char variable correctly.
>   sprintf(tempchar,"%d ",outlier);
>   strcat(char,tempchar);

Yes.  Without your source code, I could only guess.

But see earlier in this post for a suggestion to improve your use of the 
variable.

> 
> 
>> A major lesson here is to learn how to use your tools.  C is not a 
>> forgiving language.  Make use of all the help your tools can give you 
>> - enable warnings here.  "gcc -Wall" enables a range of common 
>> warnings with few false positives in normal well-written code, 
========== REMAINDER OF ARTICLE TRUNCATED ==========