Deutsch English Français Italiano |
<mailman.20.1728693664.4695.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail From: MRAB <python@mrabarnett.plus.com> Newsgroups: comp.lang.python Subject: Re: Correct syntax for pathological re.search() Date: Sat, 12 Oct 2024 01:37:55 +0100 Lines: 57 Message-ID: <mailman.20.1728693664.4695.python-list@python.org> References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4> <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de VmbvvUHWNaNU0EC8gx/wVQS4nXktgrfAvpUlSRFiTRvQ== Cancel-Lock: sha1:ejfm9cSqdc6/7inCseMomsJqZUg= sha256:B5ZVcGCGFFX5KO6hW+wIPPlw7tU0qBugGGKPeIcIOjQ= Return-Path: <python@mrabarnett.plus.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=plus.com header.i=@plus.com header.b=l5g/DMQ6; dkim-adsp=none (unprotected policy); dkim-atps=neutral X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; "python's": 0.05; 'demands': 0.07; 'string': 0.07; ':-)': 0.09; 'expression': 0.09; 'from:addr:python': 0.09; 'linux': 0.09; 'obviously,': 0.09; 'received:192.168.1.64': 0.09; 'regex': 0.09; 'skip:\\ 10': 0.09; 'utility': 0.09; 'yes.': 0.09; 'import': 0.15; 'syntax': 0.15; '2024': 0.16; '>>>>': 0.16; 'avi': 0.16; 'backslash': 0.16; 'cases,': 0.16; 'discard': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16; 'gross': 0.16; 'layers': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'received:plus.net': 0.16; 'subject:syntax': 0.16; 'wrote:': 0.16; 'python': 0.16; 'october': 0.17; 'to:addr:python-list': 0.20; 'lines': 0.23; 'skip:- 10': 0.25; 'space': 0.26; '11,': 0.26; 'friday,': 0.26; 'coming': 0.27; 'function': 0.27; 'example,': 0.28; 'header:User- Agent:1': 0.30; 'takes': 0.31; "doesn't": 0.32; 'python-list': 0.32; 'sep': 0.32; 'received:192.168.1': 0.32; 'but': 0.32; "i'm": 0.33; 'subject:for': 0.33; 'there': 0.33; 'header:In-Reply-To:1': 0.34; 'trying': 0.35; 'mon,': 0.36; 'those': 0.36; 'using': 0.37; "it's": 0.37; 'received:192.168': 0.37; 'means': 0.38; 'read': 0.38; 'enough': 0.39; 'want': 0.40; 'four': 0.60; 'michael': 0.60; 'search': 0.61; 'from:': 0.62; 'to:': 0.62; 'received:212': 0.62; 'skip:r 20': 0.64; 're:': 0.64; 'your': 0.64; 'look': 0.65; 'per': 0.68; 'right': 0.68; 'through.': 0.69; 'front': 0.70; 'care': 0.71; 'life': 0.77; 'sent:': 0.78; 'left': 0.83; 'live.': 0.84; 'hence,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019; t=1728693476; bh=/DvuPOCtRSmaSYPYMJgiAvrEZLw4BKH4080EnTu9uC4=; h=Date:Subject:To:References:From:In-Reply-To; b=l5g/DMQ6YGIblHFD4l10k9MzoIu0HMfRcuHXb9SvQxBfPae22FX0HEXj0RTKGGT+K cKZYRzgtD3smYJ+beI8C9ma7nrStg0mDV+P+HbHT4MmNQcoZx1rOySGwPZm50v+5gH hOkB78UswqCRxUgq4Yo3vYevTUsKKIuKObGauc90vyMuHjOetbsaSBV0Bf/XbAN9Y2 69krQtBTI4W7dIJoXqqWgd/YcLfXZezwIxvn8BmGNc03Xqzd+rz+X3NIGx7VTRyzmz nWDpROsKWCl95fdOMlIZXj+Ei/uXuMau2owOzKsVIu3BE9FpOfHSSkZ3U8j/L31NPQ 3pZzsP7ekhk3Q== X-Clacks-Overhead: "GNU Terry Pratchett" X-CM-Score: 0.00 X-CNFS-Analysis: v=2.4 cv=VaJUP0p9 c=1 sm=1 tr=0 ts=6709c4e4 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=IkcTkHD0fZMA:10 a=8AHkEIZyAAAA:8 a=s8p6k_RvTeRuHz-KOWoA:9 a=QEXdDO2ut3YA:10 a=Ju_KwTHo8jjgFOKK0VMC:22 X-AUTH: mrabarnett@:2500 User-Agent: Mozilla Thunderbird Content-Language: en-GB In-Reply-To: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> X-CMAE-Envelope: MS4xfKGe54paxfsDsKiW3O8yZZVuIgS2o+JmZrdnXa7ycoa/5LONd4/xoSrBTAcIWhFYyfORPyFNyjfOZMhb3JrVooSYIu3ZMUaxEZv/wpnQ5i/DpDdMYa1j 8vQGFcHEpACXS7m1xmE/dLc3InlO/283bJ3qNKZuPFeDq5A/tuzjVo8NO8aDJb2oGns8Kx18VcaVzLXPSIGdUOvrsiFpZSzcC8g= X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <fdb8dc77-daa2-41ba-8aec-010b80702eba@mrabarnett.plus.com> X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4> <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> Bytes: 7198 On 2024-10-11 22:13, AVI GROSS via Python-list wrote: > Is there some utility function out there that can be called to show what the > regular expression you typed in will look like by the time it is ready to be > used? > > Obviously, life is not that simple as it can go through multiple layers with > each dealing with a layer of backslashes. > > But for simple cases, ... > Yes. It's called 'print'. :-) > > > -----Original Message----- > From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On > Behalf Of Gilmeh Serda via Python-list > Sent: Friday, October 11, 2024 10:44 AM > To: python-list@python.org > Subject: Re: Correct syntax for pathological re.search() > > On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote: > >> I'm trying to discard lines that include the string "\sout{" (which is >> TeX, for those who are curious. I have tried: >> if not re.search("\sout{", line): if not re.search("\sout\{", line): >> if not re.search("\\sout{", line): if not re.search("\\sout\{", >> line): >> >> But the lines with that string keep coming through. What is the right >> syntax to properly escape the backslash and the left curly bracket? > > $ python > Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> import re >>>> s = r"testing \sout{WHADDEVVA}" >>>> re.search(r"\\sout{", s) > <re.Match object; span=(8, 14), match='\\sout{'> > > You want a literal backslash, hence, you need to escape everything. > > It is not enough to escape the "\s" as "\\s", because that only takes care > of Python's demands for escaping "\". You also need to escape the "\" for > the RegEx as well, or it will read it like it means "\s", which is the > RegEx for a space character and therefore your search doesn't match, > because it reads it like you want to search for " out{". > > Therefore, you need to escape it either as per my example, or by using > four "\" and no "r" in front of the first quote, which also works: > >>>> re.search("\\\\sout{", s) > <re.Match object; span=(8, 14), match='\\sout{'> > > You don't need to escape the curly braces. We call them "seagull wings" > where I live. >