| Deutsch English Français Italiano |
|
<v4pldd$noq7$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Rich <rich@example.invalid>
Newsgroups: comp.lang.tcl
Subject: Re: slow fileutil::foreachLine
Date: Mon, 17 Jun 2024 15:40:29 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <v4pldd$noq7$1@dont-email.me>
References: <nY-dnZ1fQb8-QvL7nZ2dnZfqn_GdnZ2d@brightview.co.uk>
Injection-Date: Mon, 17 Jun 2024 17:40:30 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="73d1b486d2204051b1609329071341c3";
logging-data="779079"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190VFqK3iJohl4QZgrfKvqp"
User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Cancel-Lock: sha1:XZ6VyMhYGBGE3bwXRLyDN/BuKAM=
Bytes: 2389
Mark Summerfield <mark@qtrac.eu> wrote:
> I have this function:
>
> proc ws::get_words {wordfile} {
> set in [open $wordfile r]
> try {
> while {[gets $in line] >= 0} {
> if {[regexp {^[a-z]+$} $line matched]} {
> lappend ::ws::Words [string tolower $matched]
> }
> }
> } finally {
> close $in
> }
> }
>
> It reads about 100_000 lines and ends up keeping about 65_000 of them
> (from /usr/share/dict/words)
>
> I tried replacing it with:
>
> proc ws::get_words {wordfile} {
> ::fileutil::foreachLine line $wordfile {
> if {[regexp {^[a-z]+$} $line matched]} {
> lappend ::ws::Words [string tolower $matched]
> }
> }
> }
>
> The first version loads "instantly"; but the second version (with
> foreachLine) takes seconds.
If you check the implementation of fileutil::foreachLine, you find:
set code [catch {uplevel 1 $cmd} result options]
Where "$cmd" is a variable holding a string of the "command" passed to
foreachLine.
Your original copy is all in a single procedure, so it will be bytecode
compiled, and for all but the first execution will run that compiled
bytecode.
The foreachLine version, since the "cmd" is a string, will receive
little to no byte code compiling, and the difference in time is the
overhead of not being able to bytecode compile the "command" string
passed to foreachLine.