Path: ...!news-out.netnews.com!postmaster.netnews.com!us14.netnews.com!not-for-mail X-Trace: DXC=dPJo2@I6M\mPooF4PSM31jHWonT5<]0TmQ;nb^V>PUff=AnO\FUBY[`nF54O@^\1?d_og>=_9mdDioXmM8L1aOXeKkS2`?ji X-Complaints-To: support@blocknews.net Date: Fri, 14 Jun 2024 19:05:49 -0400 MIME-Version: 1.0 User-Agent: Betterbird (Windows) From: DFS Subject: Re: "undefined behavior"? Newsgroups: comp.lang.c References: <666a095a$0$952$882e4bbb@reader.netnews.com> <8734ph7qe5.fsf@nosuchdomain.example.com> <666a226d$0$951$882e4bbb@reader.netnews.com> <666b0451$0$953$882e4bbb@reader.netnews.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Lines: 107 Message-ID: <666ccccb$0$973$882e4bbb@reader.netnews.com> NNTP-Posting-Host: 127.0.0.1 X-Trace: 1718406347 reader.netnews.com 973 127.0.0.1:56007 Bytes: 4254 On 6/14/2024 1:18 PM, David Brown wrote: > On 13/06/2024 16:38, DFS wrote: >> >> I write a little code every few days.  Mostly python. > > Certainly if I wanted to calculate some statistics from small data sets, > I'd go for Python - it would not consider C unless it was for an > embedded system. > >> >> I like C for it's blazing speed.  Very addicting.  And it's much more >> challenging/frustrating than python. > > With small data sets, Python has blazing speed - /every/ language has > blazing speed.  And for large data sets, use numpy on Python and you > /still/ have blazing speeds - a lot faster than anything you would write > in C (because numpy's underlying code is written in C by people who are > much better at writing fast numeric code than you or I). > > The only reason to use C for something like is is for the challenge and > fun, which is fair enough. It was fun, especially when I got every stat to match the website exactly. I just now ported that C stats program to python. The original C took me ~2.5 days to write and test. The port to python then took about 2 hours. It mainly consisted of replacing printf with print, removing brackets {}, changing vars max and min to dmax and dmin, dropping the \n from printf's, replacing fabs() with abs(), etc. Line count dropped about 20%. During conversion, I got a Python error I don't remember seeing in the past: "TypeError: list indices must be integers or slices, not float" because division returns a float, and some of the array addressing was like this: nums[i/2]. My initial fix was this clunk (convert to int()): # median and quartiles # quartiles divide sorted dataset into four sections # Q1 = median of values less than Q2 # Q2 = median of the data set # Q3 = median of values greater than Q2 if N % 2 == 0: Q2 = median = (nums[int((N/2)-1)] + nums[int(N/2)]) / 2.0 i = int(N/2) if i % 2 == 0: Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0 Q3 = (nums[int(i + ((i-1)/2))] + nums[int(i+(i/2))]) / 2.0 else: Q1 = nums[int((i-1)/2)] Q3 = nums[int(i + ((i-1)/2))] if N % 2 != 0: Q2 = median = nums[int((N-1)/2)] i = int((N-1)/2) if i % 2 == 0: Q1 = (nums[int((i/2)-1)] + nums[int(i/2)]) / 2.0 Q3 = (nums[int(i + (i/2))] + nums[int(i + (i/2) + 1)]) / 2.0 else: Q1 = nums[int((i-1)/2)] Q3 = nums[int(i + ((i+1)/2))] And then with some substitution: if N % 2 == 0: i = int(N/2) Q2 = median = (nums[i - 1] + nums[i]) / 2.0 x = int(i/2) y = int((i-1)/2) if i % 2 == 0: Q1 = (nums[x - 1] + nums[x]) / 2.0 Q3 = (nums[i + y] + nums[i + x]) / 2.0 else: Q1 = nums[y] Q3 = nums[i + y] if N % 2 != 0: i = int((N-1)/2) Q2 = median = nums[i] x = int(i/2) y = int((i-1)/2) z = int((i+1)/2) if i % 2 == 0: Q1 = (nums[x - 1] + nums[x]) / 2.0 Q3 = (nums[i + x] + nums[i + x + 1]) / 2.0 else: Q1 = nums[y] Q3 = nums[i + z] How would you do it? If you have an easy to apply formula for computing the quartiles, let's hear it!