Deutsch   English   Français   Italiano  
<mailman.24.1728750786.4695.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Thomas Passin <list1@tompassin.net>
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Sat, 12 Oct 2024 09:06:54 -0400
Lines: 82
Message-ID: <mailman.24.1728750786.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
 <MQaOO.3313338$EVn.2054758@fx04.ams4>
 <011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
 <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
 <b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de KYvHpwsOQTtacVQOQReNKgCdVcs1qyBPnfr1PPh9xiQA==
Cancel-Lock: sha1:u/IzVAPRFHsH1jbrYnPvX1h+5wk= sha256:6RU5zC7a/0A9x4TkOWEn9WOmh7UHCNF1KLHeWTivicU=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=tompassin.net header.i=@tompassin.net header.b=zMmTWjxf;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; "python's":
 0.05; 'demands': 0.07; 'string': 0.07; ':-)': 0.09; 'expression':
 0.09; 'linux': 0.09; 'obviously,': 0.09; 'regex': 0.09; 'skip:\\
 10': 0.09; 'url-ip:151.101.0.223/32': 0.09; 'url-
 ip:151.101.128.223/32': 0.09; 'url-ip:151.101.192.223/32': 0.09;
 'url-ip:151.101.64.223/32': 0.09; 'utility': 0.09; 'yes.': 0.09;
 'import': 0.15; 'syntax': 0.15; '2024': 0.16; '8:37': 0.16;
 '>>>>>': 0.16; 'avi': 0.16; 'backslash': 0.16; 'cases,': 0.16;
 'compiled': 0.16; 'discard': 0.16; 'gross': 0.16; 'inspect': 0.16;
 'layers': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16;
 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
 'received:dreamhost.com': 0.16; 'subject:syntax': 0.16;
 'url:howto': 0.16; 'url:regex': 0.16; 'wrote:': 0.16; 'python':
 0.16; 'october': 0.17; 'pm,': 0.19; 'to:addr:python-list': 0.20;
 'lines': 0.23; 'skip:- 10': 0.25; 'section': 0.25; 'space': 0.26;
 '11,': 0.26; 'friday,': 0.26; 'coming': 0.27; 'function': 0.27;
 '>>>': 0.28; 'example,': 0.28; 'header:User-Agent:1': 0.30;
 'takes': 0.31; "doesn't": 0.32; 'python-list': 0.32;
 'received:10.0': 0.32; 'received:mailchannels.net': 0.32;
 'received:relay.mailchannels.net': 0.32; 'titled': 0.32; 'but':
 0.32; "i'm": 0.33; 'subject:for': 0.33; 'there': 0.33; 'header:In-
 Reply-To:1': 0.34; 'trying': 0.35; '"the': 0.35; 'mon,': 0.36;
 'those': 0.36; "skip:' 10": 0.37; 'using': 0.37; "it's": 0.37;
 'means': 0.38; 'read': 0.38; 'enough': 0.39; 'received:100': 0.39;
 'want': 0.40; 'should': 0.40; 'four': 0.60; 'michael': 0.60;
 'search': 0.61; 'from:': 0.62; 'to:': 0.62; 'url-ip:151.101.0/24':
 0.62; 'url-ip:151.101.128/24': 0.62; 'url-ip:151.101.192/24':
 0.62; 'url-ip:151.101.64/24': 0.62; 'skip:r 20': 0.64; 're:':
 0.64; 'your': 0.64; 'look': 0.65; 'header:Received:6': 0.67;
 'received:64': 0.67; 'per': 0.68; 'right': 0.68; 'skip:b 40':
 0.69; 'through.': 0.69; 'front': 0.70; 'care': 0.71; 'life': 0.77;
 'sent:': 0.78; 'left': 0.83; 'live.': 0.84; 'hence,': 0.91;
 'subject.': 0.93
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1728738414; a=rsa-sha256;
 cv=none;
 b=jX2x/KzmgfI9kx0eMLegplDImS8VzJdQNU3IlfGTzH4ykIpmxZdxfGGm917uKczQ5zDARc
 W0h68Ab1vNs9XjrCqrVlJaPHkBJHmoTAcVAwivsJaQmLvrN8URToShUy+3WD/GP1KX+mT9
 TkU8cotNmvDEirhpn1kB/28iFawPtOXAi9lWwTQfI688hzlfs2a9pcUfAeYZffNteBi+nM
 5/70Oskq25jcg+TwNyTLTuK0q+FtEGyRd7YnLOOgMlFgRHSMeS6ruGxFXSHvjzz2cmoo1L
 DepnjT0fzlSIvGqBsdWiFgWpUMHMFTplusOODTFatyhLPcDsngxIvKl047/nhw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=mailchannels.net; s=arc-2022; t=1728738414;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references:dkim-signature;
 bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=;
 b=lB6D0LabLGd9B6B8/B+l1uJ844TNu7MLeRYhDsPY2cu2+qXfRcTJ1r+zhNhRTUbmYJ7ZRI
 sqcei+MZYDVGz0+XQfsDw6KUPsDpgMbNCn4xjyQyg3wzeT62RQePBpzemM9EYtvgQDD4PU
 1JzW+QDw+PNyXOc2TFZzBSYtXl9jkcO8PXAMPS22quMlW8hKEQzDlEas50svl/8PLm+lhz
 zRE4IBTewe3ctQM7hYiRlsb9I03i7xlKG4Oz1KnYM/RxDUb33Lhzso1cLrg34aR2iG8HOF
 cljFPVz8I8grs1LlLugLBXwddK1CEsaF9zdRSCike6hSKL+VALLqyNt4aB0UZA==
ARC-Authentication-Results: i=1; rspamd-5b4c8788b8-8v6p7;
 auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Occur-Scare: 288aa20909a621e2_1728738414656_2551319984
X-MC-Loop-Signature: 1728738414656:2364669830
X-MC-Ingress-Time: 1728738414655
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
 s=dreamhost; t=1728738414;
 bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=;
 h=Date:From:Subject:To:Content-Type:Content-Transfer-Encoding;
 b=zMmTWjxfJCe949c3SjW0YiOWyb36OqzvbMf6QCKalMyj90oq96aM3hVlULjo4+3a4
 kAReAu9khf6gEJtGfrBzBRNBRZTOSy98RWQ+/5eFAxbtjpsu3VowRPwfHqD68hP8+J
 6E5javWxmJrXEJb5w6fcsPUzGYB8+hhiIn+OYXxnbqPA1/2PeZ59FeacHhrnj69ZuU
 4RwV0GePx19pHLMiqyJaGrBkCV0bWCh+X2if27e/6B+yZ70TmCKq4FdxyLjSm5eoly
 AlAOVOce9+XNeV+/TTyz5QcWk67jMZ8PefcfnzshYp3tnGpXglRrTfJdXALecsHiym
 tOddjrTqiAr/Q==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
 <MQaOO.3313338$EVn.2054758@fx04.ams4>
 <011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
 <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
Bytes: 9581

On 10/11/2024 8:37 PM, MRAB via Python-list wrote:
> On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
>> Is there some utility function out there that can be called to show 
>> what the
>> regular expression you typed in will look like by the time it is ready 
>> to be
>> used?
>>
>> Obviously, life is not that simple as it can go through multiple 
>> layers with
>> each dealing with a layer of backslashes.
>>
>> But for simple cases, ...
>>
> Yes. It's called 'print'. :-)

There is section in the Python docs about this backslash subject.  It's 
titled "The Backslash Plague" in

https://docs.python.org/3/howto/regex.html

You can also inspect the compiled expression to see what string it 
received after all the escaping:

>>> import re
>>>
>>> re_string = '\\w+\\\\sub'
>>> re_pattern = re.compile(re_string)
>>>
>>> # Should look as if we had used r'\w+\\sub'
>>> print(re_pattern.pattern)
\w+\\sub


>> -----Original Message-----
>> From: Python-list <python-list- 
>> bounces+avi.e.gross=gmail.com@python.org> On
>> Behalf Of Gilmeh Serda via Python-list
>> Sent: Friday, October 11, 2024 10:44 AM
>> To: python-list@python.org
>> Subject: Re: Correct syntax for pathological re.search()
>>
>> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
>>
>>> I'm trying to discard lines that include the string "\sout{" (which is
>>> TeX, for those who are curious. I have tried:
>>>    if not re.search("\sout{", line): if not re.search("\sout\{", line):
>>>    if not re.search("\\sout{", line): if not re.search("\\sout\{",
>>>    line):
>>>
>>> But the lines with that string keep coming through. What is the right
>>> syntax to properly escape the backslash and the left curly bracket?
>>
>> $ python
>> Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on 
>> linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import re
>>>>> s = r"testing \sout{WHADDEVVA}"
>>>>> re.search(r"\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You want a literal backslash, hence, you need to escape everything.
>>
>> It is not enough to escape the "\s" as "\\s", because that only takes 
>> care
>> of Python's demands for escaping "\". You also need to escape the "\" for
>> the RegEx as well, or it will read it like it means "\s", which is the
>> RegEx for a space character and therefore your search doesn't match,
>> because it reads it like you want to search for " out{".
>>
>> Therefore, you need to escape it either as per my example, or by using
>> four "\" and no "r" in front of the first quote, which also works:
>>
>>>>> re.search("\\\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You don't need to escape the curly braces. We call them "seagull wings"
>> where I live.
>>
> 
========== REMAINDER OF ARTICLE TRUNCATED ==========