Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Rich Newsgroups: comp.lang.tcl Subject: Re: Does cheating produce faster searches? Date: Wed, 25 Sep 2024 20:36:23 -0000 (UTC) Organization: A noiseless patient Spider Lines: 62 Message-ID: References: <20240925170149.3463bec9@lud1.home> Injection-Date: Wed, 25 Sep 2024 22:36:23 +0200 (CEST) Injection-Info: dont-email.me; posting-host="c9bb3a4d1fe083989d2cd97db187f1d3"; logging-data="4019984"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+v+D1FmeMs5xvkPiT9OVeM" User-Agent: tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64)) Cancel-Lock: sha1:F1glGq1VFjhZB5iPM7i6MMM5LvA= Bytes: 3560 Luc wrote: > Suppose I have a large list. Very large list. Then I want to search > for an item in that string: > > % lsearch $list "string" > > Now, suppose I have many lists instead. One list contains all the items > that begin with the letter a, another list contains all the items that > begin with the letter b, another list contains all the items that begin > with the letter c, and so on. Then I see what the first character in > "string" is and only search for it in the one corresponding list. What you have described is, at a very high level, similar to how B-Tree indexes work. https://en.wikipedia.org/wiki/B-tree And they are the common index type used in databases for faster data retreival. > Would that be faster? I somehow suspect the answer is 'no.' The answer is actually "it depends". You'd have to define "very large". For some value of "very large" the extra time spent to pick which sublist to search will most likely be faster than doing a search across a single list containing all the items. But, as is always the case, the algorithm matters. For basic 'lsearch', it does, by default, a linear search. It starts at index 0, and looks at each item sequentially until it finds a match, or hits the end of the list. For this algorithm, your "separate based on first character" method will be faster for a not-so-big value of "very large". But, lsearch also has the "-sorted" option. For a "very large" list that you can maintain in sorted order, using "-sorted" causes lsearch to perform a binary search upon the list. https://en.wikipedia.org/wiki/Binary_search For this alternate algorithm, your "separate lists" method will need a much larger value for "very large" vs. the linear search without "-sorted" before the "separate lists" are faster. However, "-sorted" now leaves you with the need to have "very large list" be sorted. And the time needed to sort the list, for a "very large" list, could be substantial. So you'd need, in this case, a way to either only sort once, but use many times, or if the list is being modified, you'd want to be careful to modify it such that it is still sorted vs. needing to "sort" it over and over again. > Bonus question: what about sqlite operations? Would they be faster if > I had one separate table for each initial letter/character? Again, it depends. If you have no indexes on your tables, then likely yes. If you have indexes on the tables, that can also be utilized by the query you are running, then the size of the table will need to be much much larger before the "split up" version might become faster.