Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Richard Owlett Newsgroups: comp.editors Subject: Re: Automating an atypical search & replace Date: Sun, 14 Jul 2024 02:51:45 -0500 Organization: A noiseless patient Spider Lines: 27 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sun, 14 Jul 2024 09:51:48 +0200 (CEST) Injection-Info: dont-email.me; posting-host="46abaa0d5260ada45c5313798b70835f"; logging-data="72736"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+khWt59YWgD5GAByTjlm1S5qfOXP/XIA4=" User-Agent: Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 Cancel-Lock: sha1:+DpdJH9DjQnspKDiqFWJhwTHACE= In-Reply-To: Bytes: 2165 On 07/14/2024 01:13 AM, Stan Brown wrote: > AOn Sat, 13 Jul 2024 23:39:14 -0000 (UTC), Lawrence D'Oliveiro wrote: >> On Sat, 13 Jul 2024 11:08:48 -0500, Richard Owlett wrote: >> >>> These occurrences are consistently of the form >>> arbitrary_text >>> >>> I wish to delete "" and *ASSOCIATED* "". >> >> This is beyond the abilities of regular expressions. This is the point >> where you need to use an actual HTML/XML-parsing library. >> > > In general I'd agree with you. But the OP made a big deal -- in a > different thread, for some reason -- about wanting to use minimal > HTML, so I doubt very much there will be nested ... > sequences. I'd compare using a minimal HTML to learning to crawl before pursuing running a marathon ;} > > Also, the OP quite rightly wanted to confirm each change before it is > made, so presumably if there are any nested sequences he will say no > to that particular edit and fix it manually. >