| Deutsch English Français Italiano |
|
<mailman.24.1728750786.4695.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Thomas Passin <list1@tompassin.net>
Newsgroups: comp.lang.python
Subject: Re: Correct syntax for pathological re.search()
Date: Sat, 12 Oct 2024 09:06:54 -0400
Lines: 82
Message-ID: <mailman.24.1728750786.4695.python-list@python.org>
References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
<b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de KYvHpwsOQTtacVQOQReNKgCdVcs1qyBPnfr1PPh9xiQA==
Cancel-Lock: sha1:u/IzVAPRFHsH1jbrYnPvX1h+5wk= sha256:6RU5zC7a/0A9x4TkOWEn9WOmh7UHCNF1KLHeWTivicU=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=tompassin.net header.i=@tompassin.net header.b=zMmTWjxf;
dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; "python's":
0.05; 'demands': 0.07; 'string': 0.07; ':-)': 0.09; 'expression':
0.09; 'linux': 0.09; 'obviously,': 0.09; 'regex': 0.09; 'skip:\\
10': 0.09; 'url-ip:151.101.0.223/32': 0.09; 'url-
ip:151.101.128.223/32': 0.09; 'url-ip:151.101.192.223/32': 0.09;
'url-ip:151.101.64.223/32': 0.09; 'utility': 0.09; 'yes.': 0.09;
'import': 0.15; 'syntax': 0.15; '2024': 0.16; '8:37': 0.16;
'>>>>>': 0.16; 'avi': 0.16; 'backslash': 0.16; 'cases,': 0.16;
'compiled': 0.16; 'discard': 0.16; 'gross': 0.16; 'inspect': 0.16;
'layers': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16;
'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
'received:dreamhost.com': 0.16; 'subject:syntax': 0.16;
'url:howto': 0.16; 'url:regex': 0.16; 'wrote:': 0.16; 'python':
0.16; 'october': 0.17; 'pm,': 0.19; 'to:addr:python-list': 0.20;
'lines': 0.23; 'skip:- 10': 0.25; 'section': 0.25; 'space': 0.26;
'11,': 0.26; 'friday,': 0.26; 'coming': 0.27; 'function': 0.27;
'>>>': 0.28; 'example,': 0.28; 'header:User-Agent:1': 0.30;
'takes': 0.31; "doesn't": 0.32; 'python-list': 0.32;
'received:10.0': 0.32; 'received:mailchannels.net': 0.32;
'received:relay.mailchannels.net': 0.32; 'titled': 0.32; 'but':
0.32; "i'm": 0.33; 'subject:for': 0.33; 'there': 0.33; 'header:In-
Reply-To:1': 0.34; 'trying': 0.35; '"the': 0.35; 'mon,': 0.36;
'those': 0.36; "skip:' 10": 0.37; 'using': 0.37; "it's": 0.37;
'means': 0.38; 'read': 0.38; 'enough': 0.39; 'received:100': 0.39;
'want': 0.40; 'should': 0.40; 'four': 0.60; 'michael': 0.60;
'search': 0.61; 'from:': 0.62; 'to:': 0.62; 'url-ip:151.101.0/24':
0.62; 'url-ip:151.101.128/24': 0.62; 'url-ip:151.101.192/24':
0.62; 'url-ip:151.101.64/24': 0.62; 'skip:r 20': 0.64; 're:':
0.64; 'your': 0.64; 'look': 0.65; 'header:Received:6': 0.67;
'received:64': 0.67; 'per': 0.68; 'right': 0.68; 'skip:b 40':
0.69; 'through.': 0.69; 'front': 0.70; 'care': 0.71; 'life': 0.77;
'sent:': 0.78; 'left': 0.83; 'live.': 0.84; 'hence,': 0.91;
'subject.': 0.93
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1728738414; a=rsa-sha256;
cv=none;
b=jX2x/KzmgfI9kx0eMLegplDImS8VzJdQNU3IlfGTzH4ykIpmxZdxfGGm917uKczQ5zDARc
W0h68Ab1vNs9XjrCqrVlJaPHkBJHmoTAcVAwivsJaQmLvrN8URToShUy+3WD/GP1KX+mT9
TkU8cotNmvDEirhpn1kB/28iFawPtOXAi9lWwTQfI688hzlfs2a9pcUfAeYZffNteBi+nM
5/70Oskq25jcg+TwNyTLTuK0q+FtEGyRd7YnLOOgMlFgRHSMeS6ruGxFXSHvjzz2cmoo1L
DepnjT0fzlSIvGqBsdWiFgWpUMHMFTplusOODTFatyhLPcDsngxIvKl047/nhw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
d=mailchannels.net; s=arc-2022; t=1728738414;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references:dkim-signature;
bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=;
b=lB6D0LabLGd9B6B8/B+l1uJ844TNu7MLeRYhDsPY2cu2+qXfRcTJ1r+zhNhRTUbmYJ7ZRI
sqcei+MZYDVGz0+XQfsDw6KUPsDpgMbNCn4xjyQyg3wzeT62RQePBpzemM9EYtvgQDD4PU
1JzW+QDw+PNyXOc2TFZzBSYtXl9jkcO8PXAMPS22quMlW8hKEQzDlEas50svl/8PLm+lhz
zRE4IBTewe3ctQM7hYiRlsb9I03i7xlKG4Oz1KnYM/RxDUb33Lhzso1cLrg34aR2iG8HOF
cljFPVz8I8grs1LlLugLBXwddK1CEsaF9zdRSCike6hSKL+VALLqyNt4aB0UZA==
ARC-Authentication-Results: i=1; rspamd-5b4c8788b8-8v6p7;
auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Occur-Scare: 288aa20909a621e2_1728738414656_2551319984
X-MC-Loop-Signature: 1728738414656:2364669830
X-MC-Ingress-Time: 1728738414655
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
s=dreamhost; t=1728738414;
bh=/mQDSC6QmLEz5x+9YUC6hCzXQAt4MBQ7JcxWAzBJ6mE=;
h=Date:From:Subject:To:Content-Type:Content-Transfer-Encoding;
b=zMmTWjxfJCe949c3SjW0YiOWyb36OqzvbMf6QCKalMyj90oq96aM3hVlULjo4+3a4
kAReAu9khf6gEJtGfrBzBRNBRZTOSy98RWQ+/5eFAxbtjpsu3VowRPwfHqD68hP8+J
6E5javWxmJrXEJb5w6fcsPUzGYB8+hhiIn+OYXxnbqPA1/2PeZ59FeacHhrnj69ZuU
4RwV0GePx19pHLMiqyJaGrBkCV0bWCh+X2if27e/6B+yZ70TmCKq4FdxyLjSm5eoly
AlAOVOce9+XNeV+/TTyz5QcWk67jMZ8PefcfnzshYp3tnGpXglRrTfJdXALecsHiym
tOddjrTqiAr/Q==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <b75b7177-47b7-4aad-ba9a-6078417572de@tompassin.net>
X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me>
<MQaOO.3313338$EVn.2054758@fx04.ams4>
<011301db1c22$5e7519c0$1b5f4d40$@gmail.com>
<fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com>
Bytes: 9581
On 10/11/2024 8:37 PM, MRAB via Python-list wrote:
> On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
>> Is there some utility function out there that can be called to show
>> what the
>> regular expression you typed in will look like by the time it is ready
>> to be
>> used?
>>
>> Obviously, life is not that simple as it can go through multiple
>> layers with
>> each dealing with a layer of backslashes.
>>
>> But for simple cases, ...
>>
> Yes. It's called 'print'. :-)
There is section in the Python docs about this backslash subject. It's
titled "The Backslash Plague" in
https://docs.python.org/3/howto/regex.html
You can also inspect the compiled expression to see what string it
received after all the escaping:
>>> import re
>>>
>>> re_string = '\\w+\\\\sub'
>>> re_pattern = re.compile(re_string)
>>>
>>> # Should look as if we had used r'\w+\\sub'
>>> print(re_pattern.pattern)
\w+\\sub
>> -----Original Message-----
>> From: Python-list <python-list-
>> bounces+avi.e.gross=gmail.com@python.org> On
>> Behalf Of Gilmeh Serda via Python-list
>> Sent: Friday, October 11, 2024 10:44 AM
>> To: python-list@python.org
>> Subject: Re: Correct syntax for pathological re.search()
>>
>> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
>>
>>> I'm trying to discard lines that include the string "\sout{" (which is
>>> TeX, for those who are curious. I have tried:
>>> if not re.search("\sout{", line): if not re.search("\sout\{", line):
>>> if not re.search("\\sout{", line): if not re.search("\\sout\{",
>>> line):
>>>
>>> But the lines with that string keep coming through. What is the right
>>> syntax to properly escape the backslash and the left curly bracket?
>>
>> $ python
>> Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on
>> linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import re
>>>>> s = r"testing \sout{WHADDEVVA}"
>>>>> re.search(r"\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You want a literal backslash, hence, you need to escape everything.
>>
>> It is not enough to escape the "\s" as "\\s", because that only takes
>> care
>> of Python's demands for escaping "\". You also need to escape the "\" for
>> the RegEx as well, or it will read it like it means "\s", which is the
>> RegEx for a space character and therefore your search doesn't match,
>> because it reads it like you want to search for " out{".
>>
>> Therefore, you need to escape it either as per my example, or by using
>> four "\" and no "r" in front of the first quote, which also works:
>>
>>>>> re.search("\\\\sout{", s)
>> <re.Match object; span=(8, 14), match='\\sout{'>
>>
>> You don't need to escape the curly braces. We call them "seagull wings"
>> where I live.
>>
>
========== REMAINDER OF ARTICLE TRUNCATED ==========