Deutsch English Français Italiano |
<mailman.24.1728750786.4695.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Thomas Passin <list1@tompassin.net> Newsgroups: comp.lang.python Subject: Re: Correct syntax for pathological re.search() Date: Sat, 12 Oct 2024 09:06:54 -0400 Lines: 82 Message-ID: <mailman.24.1728750786.4695.python-list@python.org> References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4> <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com> <b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de KYvHpwsOQTtacVQOQReNKgCdVcs1qyBPnfr1PPh9xiQA== Cancel-Lock: sha1:u/IzVAPRFHsH1jbrYnPvX1h+5wk= sha256:6RU5zC7a/0A9x4TkOWEn9WOmh7UHCNF1KLHeWTivicU= Return-Path: <list1@tompassin.net> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=tompassin.net header.i=@tompassin.net header.b=zMmTWjxf; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; "python's": 0.05; 'demands': 0.07; 'string': 0.07; ':-)': 0.09; 'expression': 0.09; 'linux': 0.09; 'obviously,': 0.09; 'regex': 0.09; 'skip:\\ 10': 0.09; 'url-ip:151.101.0.223/32': 0.09; 'url- ip:151.101.128.223/32': 0.09; 'url-ip:151.101.192.223/32': 0.09; 'url-ip:151.101.64.223/32': 0.09; 'utility': 0.09; 'yes.': 0.09; 'import': 0.15; 'syntax': 0.15; '2024': 0.16; '8:37': 0.16; '>>>>>': 0.16; 'avi': 0.16; 'backslash': 0.16; 'cases,': 0.16; 'compiled': 0.16; 'discard': 0.16; 'gross': 0.16; 'inspect': 0.16; 'layers': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16; 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16; 'received:dreamhost.com': 0.16; 'subject:syntax': 0.16; 'url:howto': 0.16; 'url:regex': 0.16; 'wrote:': 0.16; 'python': 0.16; 'october': 0.17; 'pm,': 0.19; 'to:addr:python-list': 0.20; 'lines': 0.23; 'skip:- 10': 0.25; 'section': 0.25; 'space': 0.26; '11,': 0.26; 'friday,': 0.26; 'coming': 0.27; 'function': 0.27; '>>>': 0.28; 'example,': 0.28; 'header:User-Agent:1': 0.30; 'takes': 0.31; "doesn't": 0.32; 'python-list': 0.32; 'received:10.0': 0.32; 'received:mailchannels.net': 0.32; 'received:relay.mailchannels.net': 0.32; 'titled': 0.32; 'but': 0.32; "i'm": 0.33; 'subject:for': 0.33; 'there': 0.33; 'header:In- Reply-To:1': 0.34; 'trying': 0.35; '"the': 0.35; 'mon,': 0.36; 'those': 0.36; "skip:' 10": 0.37; 'using': 0.37; "it's": 0.37; 'means': 0.38; 'read': 0.38; 'enough': 0.39; 'received:100': 0.39; 'want': 0.40; 'should': 0.40; 'four': 0.60; 'michael': 0.60; 'search': 0.61; 'from:': 0.62; 'to:': 0.62; 'url-ip:151.101.0/24': 0.62; 'url-ip:151.101.128/24': 0.62; 'url-ip:151.101.192/24': 0.62; 'url-ip:151.101.64/24': 0.62; 'skip:r 20': 0.64; 're:': 0.64; 'your': 0.64; 'look': 0.65; 'header:Received:6': 0.67; 'received:64': 0.67; 'per': 0.68; 'right': 0.68; 'skip:b 40': 0.69; 'through.': 0.69; 'front': 0.70; 'care': 0.71; 'life': 0.77; 'sent:': 0.78; 'left': 0.83; 'live.': 0.84; 'hence,': 0.91; 'subject.': 0.93 X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1728738414; a=rsa-sha256; cv=none; b=jX2x/KzmgfI9kx0eMLegplDImS8VzJdQNU3IlfGTzH4ykIpmxZdxfGGm917uKczQ5zDARc W0h68Ab1vNs9XjrCqrVlJaPHkBJHmoTAcVAwivsJaQmLvrN8URToShUy+3WD/GP1KX+mT9 TkU8cotNmvDEirhpn1kB/28iFawPtOXAi9lWwTQfI688hzlfs2a9pcUfAeYZffNteBi+nM 5/70Oskq25jcg+TwNyTLTuK0q+FtEGyRd7YnLOOgMlFgRHSMeS6ruGxFXSHvjzz2cmoo1L DepnjT0fzlSIvGqBsdWiFgWpUMHMFTplusOODTFatyhLPcDsngxIvKl047/nhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1728738414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=; b=lB6D0LabLGd9B6B8/B+l1uJ844TNu7MLeRYhDsPY2cu2+qXfRcTJ1r+zhNhRTUbmYJ7ZRI sqcei+MZYDVGz0+XQfsDw6KUPsDpgMbNCn4xjyQyg3wzeT62RQePBpzemM9EYtvgQDD4PU 1JzW+QDw+PNyXOc2TFZzBSYtXl9jkcO8PXAMPS22quMlW8hKEQzDlEas50svl/8PLm+lhz zRE4IBTewe3ctQM7hYiRlsb9I03i7xlKG4Oz1KnYM/RxDUb33Lhzso1cLrg34aR2iG8HOF cljFPVz8I8grs1LlLugLBXwddK1CEsaF9zdRSCike6hSKL+VALLqyNt4aB0UZA== ARC-Authentication-Results: i=1; rspamd-5b4c8788b8-8v6p7; auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net X-MailChannels-Auth-Id: dreamhost X-Occur-Scare: 288aa20909a621e2_1728738414656_2551319984 X-MC-Loop-Signature: 1728738414656:2364669830 X-MC-Ingress-Time: 1728738414655 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net; s=dreamhost; t=1728738414; bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=; h=Date:From:Subject:To:Content-Type:Content-Transfer-Encoding; b=zMmTWjxfJCe949c3SjW0YiOWyb36OqzvbMf6QCKalMyj90oq96aM3hVlULjo4+3a4 kAReAu9khf6gEJtGfrBzBRNBRZTOSy98RWQ+/5eFAxbtjpsu3VowRPwfHqD68hP8+J 6E5javWxmJrXEJb5w6fcsPUzGYB8+hhiIn+OYXxnbqPA1/2PeZ59FeacHhrnj69ZuU 4RwV0GePx19pHLMiqyJaGrBkCV0bWCh+X2if27e/6B+yZ70TmCKq4FdxyLjSm5eoly AlAOVOce9+XNeV+/TTyz5QcWk67jMZ8PefcfnzshYp3tnGpXglRrTfJdXALecsHiym tOddjrTqiAr/Q== User-Agent: Mozilla Thunderbird Content-Language: en-US In-Reply-To: <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net> X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4> <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com> Bytes: 9581 On 10/11/2024 8:37 PM, MRAB via Python-list wrote: > On 2024-10-11 22:13, AVI GROSS via Python-list wrote: >> Is there some utility function out there that can be called to show >> what the >> regular expression you typed in will look like by the time it is ready >> to be >> used? >> >> Obviously, life is not that simple as it can go through multiple >> layers with >> each dealing with a layer of backslashes. >> >> But for simple cases, ... >> > Yes. It's called 'print'. :-) There is section in the Python docs about this backslash subject. It's titled "The Backslash Plague" in https://docs.python.org/3/howto/regex.html You can also inspect the compiled expression to see what string it received after all the escaping: >>> import re >>> >>> re_string = '\\w+\\\\sub' >>> re_pattern = re.compile(re_string) >>> >>> # Should look as if we had used r'\w+\\sub' >>> print(re_pattern.pattern) \w+\\sub >> -----Original Message----- >> From: Python-list <python-list- >> bounces+avi.e.gross=gmail.com@python.org> On >> Behalf Of Gilmeh Serda via Python-list >> Sent: Friday, October 11, 2024 10:44 AM >> To: python-list@python.org >> Subject: Re: Correct syntax for pathological re.search() >> >> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote: >> >>> I'm trying to discard lines that include the string "\sout{" (which is >>> TeX, for those who are curious. I have tried: >>> if not re.search("\sout{", line): if not re.search("\sout\{", line): >>> if not re.search("\\sout{", line): if not re.search("\\sout\{", >>> line): >>> >>> But the lines with that string keep coming through. What is the right >>> syntax to properly escape the backslash and the left curly bracket? >> >> $ python >> Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on >> linux >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import re >>>>> s = r"testing \sout{WHADDEVVA}" >>>>> re.search(r"\\sout{", s) >> <re.Match object; span=(8, 14), match='\\sout{'> >> >> You want a literal backslash, hence, you need to escape everything. >> >> It is not enough to escape the "\s" as "\\s", because that only takes >> care >> of Python's demands for escaping "\". You also need to escape the "\" for >> the RegEx as well, or it will read it like it means "\s", which is the >> RegEx for a space character and therefore your search doesn't match, >> because it reads it like you want to search for " out{". >> >> Therefore, you need to escape it either as per my example, or by using >> four "\" and no "r" in front of the first quote, which also works: >> >>>>> re.search("\\\\sout{", s) >> <re.Match object; span=(8, 14), match='\\sout{'> >> >> You don't need to escape the curly braces. We call them "seagull wings" >> where I live. >> > ========== REMAINDER OF ARTICLE TRUNCATED ==========