Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Rich <rich@example.invalid>
Newsgroups: comp.lang.tcl
Subject: Re: slow fileutil::foreachLine
Date: Mon, 17 Jun 2024 15:40:29 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <v4pldd$noq7$1@dont-email.me>
References: <nY-dnZ1fQb8-QvL7nZ2dnZfqn_GdnZ2d@brightview.co.uk>
Injection-Date: Mon, 17 Jun 2024 17:40:30 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="73d1b486d2204051b1609329071341c3";
	logging-data="779079"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX190VFqK3iJohl4QZgrfKvqp"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:XZ6VyMhYGBGE3bwXRLyDN/BuKAM=
Bytes: 2389

Mark Summerfield <mark@qtrac.eu> wrote:
> I have this function:
> 
> proc ws::get_words {wordfile} {
>     set in [open $wordfile r]
>     try {
>         while {[gets $in line] >= 0} {
>             if {[regexp {^[a-z]+$} $line matched]} {
>                 lappend ::ws::Words [string tolower $matched]
>             }
>         }
>     } finally {
>         close $in
>     }
> }
> 
> It reads about 100_000 lines and ends up keeping about 65_000 of them 
> (from /usr/share/dict/words)
> 
> I tried replacing it with:
> 
> proc ws::get_words {wordfile} {
>     ::fileutil::foreachLine line $wordfile {
>         if {[regexp {^[a-z]+$} $line matched]} {
>             lappend ::ws::Words [string tolower $matched]
>         }
>     }
> }
> 
> The first version loads "instantly"; but the second version (with 
> foreachLine) takes seconds.

If you check the implementation of fileutil::foreachLine, you find:

   set code [catch {uplevel 1 $cmd} result options]

Where "$cmd" is a variable holding a string of the "command" passed to 
foreachLine.

Your original copy is all in a single procedure, so it will be bytecode 
compiled, and for all but the first execution will run that compiled 
bytecode.

The foreachLine version, since the "cmd" is a string, will receive 
little to no byte code compiling, and the difference in time is the 
overhead of not being able to bytecode compile the "command" string 
passed to foreachLine.