Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Rich Newsgroups: comp.lang.tcl Subject: Re: slow fileutil::foreachLine Date: Mon, 17 Jun 2024 15:40:29 -0000 (UTC) Organization: A noiseless patient Spider Lines: 47 Message-ID: References: Injection-Date: Mon, 17 Jun 2024 17:40:30 +0200 (CEST) Injection-Info: dont-email.me; posting-host="73d1b486d2204051b1609329071341c3"; logging-data="779079"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190VFqK3iJohl4QZgrfKvqp" User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64)) Cancel-Lock: sha1:XZ6VyMhYGBGE3bwXRLyDN/BuKAM= Bytes: 2389 Mark Summerfield wrote: > I have this function: > > proc ws::get_words {wordfile} { > set in [open $wordfile r] > try { > while {[gets $in line] >= 0} { > if {[regexp {^[a-z]+$} $line matched]} { > lappend ::ws::Words [string tolower $matched] > } > } > } finally { > close $in > } > } > > It reads about 100_000 lines and ends up keeping about 65_000 of them > (from /usr/share/dict/words) > > I tried replacing it with: > > proc ws::get_words {wordfile} { > ::fileutil::foreachLine line $wordfile { > if {[regexp {^[a-z]+$} $line matched]} { > lappend ::ws::Words [string tolower $matched] > } > } > } > > The first version loads "instantly"; but the second version (with > foreachLine) takes seconds. If you check the implementation of fileutil::foreachLine, you find: set code [catch {uplevel 1 $cmd} result options] Where "$cmd" is a variable holding a string of the "command" passed to foreachLine. Your original copy is all in a single procedure, so it will be bytecode compiled, and for all but the first execution will run that compiled bytecode. The foreachLine version, since the "cmd" is a string, will receive little to no byte code compiling, and the difference in time is the overhead of not being able to bytecode compile the "command" string passed to foreachLine.