Article <v4s370$1cgrl$1@dont-email.me>

Deutsch English Français Italiano
<v4s370$1cgrl$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: Baby X is bor nagain
Date: Tue, 18 Jun 2024 14:48:15 +0100
Organization: A noiseless patient Spider
Lines: 172
Message-ID: <v4s370$1cgrl$1@dont-email.me>
References: <v494f9$von8$1@dont-email.me>
 <v49seg$14cva$1@raubtier-asyl.eternal-september.org>
 <v49t6f$14i1o$1@dont-email.me>
 <v4bcbj$1gqlo$1@raubtier-asyl.eternal-september.org>
 <v4bh56$1hibd$1@dont-email.me> <v4c0mg$1kjmk$1@dont-email.me>
 <v4c8s4$1lki1$4@dont-email.me> <20240613002933.000075c5@yahoo.com>
 <v4emki$28d1b$1@dont-email.me> <20240613174354.00005498@yahoo.com>
 <v4okn9$flpo$2@dont-email.me> <20240617002924.597@kylheku.com>
 <v4pddb$m5th$1@dont-email.me> <20240618115650.00006e3f@yahoo.com>
 <v4rv0o$1b7h1$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 18 Jun 2024 15:48:16 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="53025cad3a0039377ef43824bbd67ea2";
	logging-data="1459061"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18G8PLuVyN7QjNE9k12Dcxd"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:SLPlphy4NSVLpFb3rhZI9Zx86+Q=
Content-Language: en-GB
In-Reply-To: <v4rv0o$1b7h1$1@dont-email.me>
Bytes: 8823

On 18/06/2024 13:36, David Brown wrote:
> On 18/06/2024 10:56, Michael S wrote:
>> On Mon, 17 Jun 2024 15:23:55 +0200
>> David Brown <david.brown@hesbynett.no> wrote:
>>
>>> I use Python rather than C because for
>>> PC code, that can often involve files, text manipulation, networking,
>>> and various data structures, the Python code is at least an order of
>>> magnitude shorter and faster to write.  When I see the amount of
>>> faffing around in order to read and parse a file consisting of a list
>>> of integers, I find it amazing that anyone would actively choose C
>>> for the task (unless it is for the fun of it).
>>>
>>
>> The faffing (what does it mean, BTW ?) is caused by unrealistic
>> requirements. More specifically, by requirements of (A) to support
>> arbitrary line length (B) to process file line by line. Drop just one
>> of those requirements and everything become quite simple.
> 
> "Faffing around" or "faffing about" means messing around doing 
> unimportant or unnecessary things instead of useful things.  In this 
> case, it means writing lots of code for handling memory management to 
> read a file instead of using a higher-level language and just reading 
> the file.
> 
> Yes, dropping requirements might make the task easier in C.  But you 
> still don't get close to being as easy as it is in a higher level 
> language.  (That does not have to be Python - I simply use that as an 
> example that I am familiar with, and many others here will also have at 
> least some experience of it.)
> 
>>
>> For task like that Python could indeed be several times shorter, but
>> only if you wrote your python script exclusively for yourself, cutting
>> all corners, like not providing short help for user, not testing that
>> input format matches expectations and most importantly not reporting
>> input format problems in potentially useful manner.
> 
> No, even if that were part of the specifications, it would still be far 
> easier in Python.  The brief Python samples I have posted don't cover 
> such user help, options, error checking, etc., but that's because they 
> are brief samples.
> 
>> OTOH, if we write our utility in more "anal" manner, as we should if
>> we expect it to be used by other people or by ourselves long time after
>> it was written (in my age, couple of months is long enough and I am not
>> that much older than you) then code size difference between python and
>> C variants will be much smaller, probably factor of 2 or so.
> 
> Unless half the code is a text string for a help page, I'd expect a 
> bigger factor.  And I'd expect the development time difference to be an 
> even bigger factor - with Python you avoid a number of issues that are 
> easy to get wrong in C (such as memory management).  Of course that 
> would require a reasonable familiarity of both languages for a fair 
> comparison.
> 
> C and Python are both great languages, with their pros and cons and 
> different areas where they shine.  There can be good reasons for writing 
> a program like this in C rather than Python, but C is often used without 
> good technical reasons.  To me, it is important to know a number of 
> tools and pick the best one for any given job.
> 
>>
>> W.r.t. faster to code, it very strongly depends on familiarity.
>> You didn't do that sort of tasks in 'C' since your school days, right?
>> Or ever? And you are doing them in Python quite regularly? Then that is
>> much bigger reason for the difference than the language itself.
> 
> Sure - familiarity with a particular tool is a big reason for choosing it.
> 
>> Now, for more complicated tasks Python, as the language, and even more
>> importantly, Python as a massive set of useful libraries could have
>> very big productivity advantage over 'C'. But it does not apply to very
>> simple thing like reading numbers from text file.
> 
> IMHO, it does.  I have slightly lost track of which programs were being 
> discussed in which thread, but the Python code for the task is a small 
> fraction of the size of the C code.  I agree that if you want to add 
> help messages and nicer error messages, the difference will go down.
> 
> Here is a simple task - take a file name as an command-line argument, 
> then read all white-space (space, tab, newlines, mixtures) separated 
> integers.  Add them up and print the count, sum, and average (as an 
> integer).  Give a brief usage message if the file name is missing, and a 
> brief error if there is something that is not an integer.  This should 
> be a task that you see as very simple in C.
> 
> 
> #!/usr/bin/python3
> import sys
> 
> if len(sys.argv) < 2 :
>      print("Usage: sums.py <input-file>")
>      sys.exit(1)
> 
> data = list(map(int, open(sys.argv[1], "r").read().split()))
> n = len(data)
> s = sum(data)
> print("Count: %i, sum %i, average %i" % (n, s, s // n))

A rather artificial task that you have to chosen so that it can be done 
as a Python one-liner, for the main body.

Some characteristics of how it is done are that the whole file is read 
into memory as effectively a single string, and all the numbers are 
collated into an in-memory array before it is processed.

Numbers are also conveniently separated by white-space (no commas!), so 
that .split can be used.

You are using features from Python that allow arbitrary large integers 
that also avoid any overflow on that sum.

A C version wouldn't have all those built-ins to draw on (presumably you 
expect the starting point to be 'int main(int n ,char** args){}'; using 
existing libraries is not allowed).

Some would write it so that the file is processed serially and doesn't 
have to occupy memory, or needed to deal with files that might fill up 
memory.

They might also try and avoid building a large data[] array that may 
need to grow in size unless the bounds are determined in addvance.

The C version would be doing it in a different mannner, and likely to be 
more efficient.

I haven't tried it directly in C (I don't have a C 'readfile's to hand); 
I tried it in my language on a 100MB test input of 15M random numbers 
ranging up to one million.

It took just under 0.5 seconds. When I optimised it via C and gcc-O3, it 
took just over 0.3 seconds (so the C was 50% faster).

In CPython, your version took 6 seconds, and PyPy was 4.8 seconds.

With a more arbitrary input format, this would be the kind of job that a 
compiler's lexer does. But nobody seriously writes lexers in Python.


(This is the main program from my attempt; not C, but equally low level:

-------------------
proc main=
     int n:=0, x, length:=0, sum:=0

     sptr:=readfile("data.txt")
     if sptr=nil then stop fi
     eof:=0

     while x:=nextnumber(); not eof do
         ++length
         sum+:=x
     od

     println "Length  =", length
     println "Sum     =", sum
     println "Average =", sum/length
end
-------------------

Not shown is the fiddly 'nextnumber' routine. It uses 64-bit signed 
values, and handles negative numbers.

This is it in action, run directly from source code (tcc can do this too!):

   C:\mapps>mm -run test
   Length  = 15494902
   Sum     = 7745911799036
   Average = 499900
========== REMAINDER OF ARTICLE TRUNCATED ==========