Deutsch English Français Italiano |
<iWGdndu1WIJLe3T4nZ2dnZfqnPcAAAAA@giganews.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail NNTP-Posting-Date: Thu, 07 Mar 2024 16:09:58 +0000 Subject: Re: Meta: a usenet server just for sci.math Newsgroups: sci.math References: <8f7c0783-39dd-4f48-99bf-f1cf53b17dd9@googlegroups.com> <a34856ad-4487-42d0-8b3a-397eec3a46dc@googlegroups.com> <1b50e6d3-2e7c-41eb-9324-e91925024f90o@googlegroups.com> <31663ae2-a6a2-44b8-9aa3-9f0d16d24d79o@googlegroups.com> <6eedc16b-2c82-4aaf-a338-92aba2360ba2n@googlegroups.com> <51605ff6-f18f-48c5-8e83-0397632556aen@googlegroups.com> <b0c4589a-f222-457e-95b3-437c0721c2a2n@googlegroups.com> <5a48e832-3573-4c33-b9cb-d112f01b733bn@googlegroups.com> <8wWdnVqZk54j3Fj4nZ2dnZfqnPGdnZ2d@giganews.com> <MY-cnRuWkPoIhFr4nZ2dnZfqnPSdnZ2d@giganews.com> <NqqdnbEz-KTJTlr4nZ2dnZfqnPudnZ2d@giganews.com> <FqOcnYWdRfEI2lT4nZ2dnZfqn_SdnZ2d@giganews.com> <NVudnVAqkJ0Sk1D4nZ2dnZfqn_idnZ2d@giganews.com> <RuKdnfj4NM2rlkz4nZ2dnZfqn_qdnZ2d@giganews.com> <HfCdnROSvfir-E_4nZ2dnZfqnPWdnZ2d@giganews.com> <FLicnRkOg7SrWU_4nZ2dnZfqnPadnZ2d@giganews.com> <v7ecnUsYY7bW40j4nZ2dnZfqnPudnZ2d@giganews.com> <q7-dnR2O9OsAAH74nZ2dnZfqnPhg4p2d@giganews.com> <QrWdnaIk98Ulgnv4nZ2dnZfqnPVi4p2d@giganews.com> From: Ross Finlayson <ross.a.finlayson@gmail.com> Date: Thu, 7 Mar 2024 08:10:01 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <QrWdnaIk98Ulgnv4nZ2dnZfqnPVi4p2d@giganews.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID: <iWGdndu1WIJLe3T4nZ2dnZfqnPcAAAAA@giganews.com> Lines: 609 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-xW7OiZQGMMK5TH5Q0NweYvVGtrrgPVbXXfRwA8GJlZoltzLMTYkMcFYRaT3kBnbMjJ/mcYsPVtTR8fY!SSYZVvhRJS6vxB+ZHYKjCBgGkEmPAW08GYC69yy+YIkUO9ZjjOOfFjfmCU5L3x0BoJVJSom23Idw X-Complaints-To: abuse@giganews.com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 Bytes: 25294 On 03/04/2024 11:23 AM, Ross Finlayson wrote: > > So, figuring that BFF then is about designed, > basically for storing Internet messages with > regards to MessageId, then about ContentId > and external resources separately, then here > the idea again becomes how to make for > the SFF files, what results, intermediate, tractable, > derivable, discardable, composable data structures, > in files of a format with regards to write-once-read-many, > write-once-read-never, and, "partition it", in terms of > natural partitions like time intervals and categorical attributes. > > > There are some various great open-source search > engines, here with respect to something like Lucene > or SOLR or ElasticSearch. > > The idea is that there are attributes searches, > and full-text searches, those resulting hits, > to documents apiece, or sections of their content, > then backward along their attributes, like > threads and related threads, and authors and > their cliques, while across groups and periods > of time. > > There's not much of a notion of "semantic search", > though, it's expected to sort of naturally result, > here as for usually enough least distance, as for > "the terms of matching", and predicates from what > results a filter predicate, here with what I call, > "Yes/No/Maybe". > > Now, what is, "yes/no/maybe", one might ask. > Well, it's the query specification, of the world > of results, to filter to the specified results. > The idea is that there's an accepter network > for "Yes" and a rejector network for "No" > and an accepter network for "Maybe" and > then rest are rejected. > > The idea is that the search, is a combination > of a bunch of yes/no/maybe terms, or, > sure/no/yes, to indicate what's definitely > included, what's not, and what is, then that > the term, results that it's composable, from > sorting the terms, to result a filter predicate > implementation, that can run anywhere along > the way, from the backend to the frontend, > this way being a, "search query specification". > > > There are notions like, "*", and single match > and multimatch, about basically columns and > a column model, of documents, that are > basically rows. > > > The idea of course is to built an arithmetic expression, > that also is exactly a natural expression, > for "matches", and "ranges". > > "AP"|Archimedes|Plutonium in first|last > > Here, there is a search, for various names, that > it composes this way. > > AP first > AP last > Archimedes first > Archimedes last > Plutonium first > Plutonium last > > As you can see, these "match terms", just naturally > break out, then that what's gets into negations, > break out and double, and what gets into ranges, > then, well that involves for partitions and ranges, > duplicating and breaking that out. > > It results though a very fungible and normal form > of a search query specification, that rebuilds the > filter predicate according to sorting those, then > has very well understood runtime according to > yes/no/maybe and the multimatch, across and > among multiple attributes, multiple terms. > > > This sort of enriches a usual sort of query > "exact full hit", with this sort "ranges and conditions, > exact full hits". > > So, the Yes/No/Maybe, is the generic search query > specification, overall, just reflecting an accepter/rejector > network, with a bit on the front to reflect keep/toss, > that's it's very practical and of course totally commonplace > and easily written broken out as find or wildmat specs. > > For then these the objects and the terms relating > the things, there's about maintaining this, while > refining it, that basically there's an ownership > and a reference count of the filter objects, so > that various controls according to the syntax of > the normal form of the expression itself, with > most usual English terms like "is" and "in" and > "has" and "between", and "not", with & for "and" > and | for "or", makes that this should be the kind > of filter query specification that one would expect > to be general purpose on all such manners of > filter query specifications and their controls. > > So, a normal form for these filter objects, then > gets relating them to the SFF files, because, an > SFF file of a given input corpus, satisifies some > of these specifications, the queries, or for example > doesn't, about making the language and files > first of the query, then the content, then just > mapping those to the content, which are built > off extractors and summarizers. > > I already thought about this a lot. It results > that it sort of has its own little theory, > thus what can result its own little normal forms, > for making a fungible SFF description, what > results for any query, going through those, > running the same query or as so filtered down > the query for the partition already, from the > front-end to the back-end and back, a little > noisy protocol, that delivers search results. > > > > > The document is element of the corpus. > Here each message is a corpus. Now, > there's a convention in Internet messages, > not always followed, being that the ignorant > or lacking etiquette or just plain different, > don't follow it or break it, there's a convention > of attribution in Internet messages the > content that's replied to, and, this is > variously "block" or "inline". > > From the outside though, the document here > has the "overview" attributes, the key-value > pairs of the headers those being, and the > "body" or "document" itself, which can as > well have extracted attributes, vis-a-vis > otherwise its, "full text". > > https://en.wikipedia.org/wiki/Search_engine_indexing > > > The key thing here for partitioning is to > make for date-range partitioning, while, > the organization of the messages by ID is > essentially flat, and constant rate to access one > but linear to trawl through them, although parallelizable, ========== REMAINDER OF ARTICLE TRUNCATED ==========