Deutsch English Français Italiano |
<mailman.19.1728681189.4695.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: <avi.e.gross@gmail.com> Newsgroups: comp.lang.python Subject: RE: Correct syntax for pathological re.search() Date: Fri, 11 Oct 2024 17:13:07 -0400 Lines: 62 Message-ID: <mailman.19.1728681189.4695.python-list@python.org> References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4> <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de K6qnwbbOCDV28+GpO/CFWQpA5NwIb6YfNrLsDyCdSJiw== Cancel-Lock: sha1:WJ/bfFhnOHz9ugDpVZxJndPIg64= sha256:LpHiqJ3XV9+pDKAbGnAh6E5K9iVmJO9nIKibBUN7Ldw= Return-Path: <avi.e.gross@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=gmail.com header.i=@gmail.com header.b=jSwz1yXA; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.008 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; '(which': 0.04; "python's": 0.05; 'demands': 0.07; 'string': 0.07; 'expression': 0.09; 'linux': 0.09; 'obviously,': 0.09; 'regex': 0.09; 'skip:\\ 10': 0.09; 'utility': 0.09; 'import': 0.15; 'url:mailman': 0.15; 'syntax': 0.15; '2024': 0.16; 'backslash': 0.16; 'cases,': 0.16; 'discard': 0.16; 'layers': 0.16; 'subject:syntax': 0.16; 'wrote:': 0.16; 'python': 0.16; 'october': 0.17; 'message-id:@gmail.com': 0.18; 'to:addr:python-list': 0.20; 'lines': 0.23; 'skip:- 10': 0.25; 'url-ip:188.166.95.178/32': 0.25; 'url-ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'url-ip:188.166/16': 0.25; 'space': 0.26; '11,': 0.26; 'friday,': 0.26; 'coming': 0.27; 'function': 0.27; '>>>': 0.28; 'example,': 0.28; 'whole': 0.30; 'takes': 0.31; 'url-ip:188/8': 0.31; "doesn't": 0.32; 'python-list': 0.32; 'sep': 0.32; 'but': 0.32; "i'm": 0.33; 'subject:for': 0.33; 'there': 0.33; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34; 'trying': 0.35; 'from:addr:gmail.com': 0.35; 'mon,': 0.36; 'those': 0.36; 'using': 0.37; 'means': 0.38; 'read': 0.38; 'enough': 0.39; 'received:100': 0.39; 'want': 0.40; 'four': 0.60; 'michael': 0.60; 'search': 0.61; 'from:': 0.62; 'to:': 0.62; 'simply': 0.63; 'feel': 0.63; 'skip:r 20': 0.64; 're:': 0.64; 'your': 0.64; 'look': 0.65; 'per': 0.68; 'right': 0.68; 'through.': 0.69; 'front': 0.70; 'care': 0.71; 'life': 0.77; 'sent:': 0.78; 'left': 0.83; 'live.': 0.84; 'hence,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728681187; x=1729285987; darn=python.org; h=thread-index:content-language:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:to:from :from:to:cc:subject:date:message-id:reply-to; bh=gg9Mur6L5TwFRmnTwJ/9pf4TTmwevjuTeKndcJzoRxI=; b=jSwz1yXAfqEDet2yiQoeMcYF5sps6U3DOuPi9xwDua4/y2wRNzchATqqVHozMxLqAr UulN/M4xUiMjAzS3BOm40/SvjfujCChRoJaaMCY5p+NkUGubEDdG66UgmbGAxh5qD0lD G9SLuraCDTy+lMrDf+4oh7U3598aYAf3TuPvuidVK6vz1ga9QFgtlWTOKxx0lL55/2wN YjXphZbnOb+GONr2zAb3XJbcYTBO7DneveuMiEFDvpX/iNlk5oerGdYu1h06QJ5tmhkV lyMiSdZuT9pjdVHamAd6reeQNi7SByT6+EpLnVXOmGxz+nPlo6xyX/ASrMwcqOyLGRUC F8+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728681187; x=1729285987; h=thread-index:content-language:content-transfer-encoding :mime-version:message-id:date:subject:in-reply-to:references:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gg9Mur6L5TwFRmnTwJ/9pf4TTmwevjuTeKndcJzoRxI=; b=T26ddGmr7cMfPrs4w5seXAr60J5Tadf9z2O4eHhoBfg9IVZjfMIyNqBRK1JZL/XA39 Zb9khalbengu7QxK+naZkw67ExD/HDLqlI18K83CXFJceYIm0d4Iy5QrVh+BnottSrrq 3eJoDeB9mvDpFnCMngTnzTpd0JesNm4+b7JgaOg9/p03dKToumZvK7wp0bROO4EXC/MY +hjXW/Qf/Ga58SztNSMHi5EIGhnewI08ZrGaiV6HX6A3dy+a4z0WeBbgLSafkwed6pcc 4vRHRHRLUPkROCC9i5cpAFC4g6UtzY3NxmhTzJEnSiKqBMS7Bc+Ce7TjSciGNvgCb5UK gjFA== X-Forwarded-Encrypted: i=1; AJvYcCX1j8ORf7wdt+57u9Z14WAADlDzZ8gh25oEjQDz2jqJzo9Vnllp4Fy1ZMTY/b/9hupVnVf+kbQGtoZj8A==@python.org X-Gm-Message-State: AOJu0YySVyQflEBR/3tFNzsc4DrXl9YM8eBBrWizDHW3XVLuUuWJqC9U C2M9V7Qy8snpKIMJ96Q/bKCx3DRFMvYF808YgRvMzI5EV9+yhZmVKje52g== X-Google-Smtp-Source: AGHT+IFaNPu3nz87JAKkxb/9HceiYqtzmlmvW9LC68ZnLFKEAG1XyHOzwOeMc+jS2o4ujoOP0CZIMQ== X-Received: by 2002:a05:620a:2953:b0:7a9:b605:f823 with SMTP id af79cd13be357-7b11a370e27mr613887285a.37.1728681186680; Fri, 11 Oct 2024 14:13:06 -0700 (PDT) In-Reply-To: <MQaOO.3313338$EVn.2054758@fx04.ams4> X-Mailer: Microsoft Outlook 16.0 Content-Language: en-us Thread-Index: AQFBUE6EHplIDkSetD50ItlJZJ+fJAGoI+1+s6hiaEA= X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> X-Mailman-Original-References: <ve0o34$1nep4$1@dont-email.me> <MQaOO.3313338$EVn.2054758@fx04.ams4> Bytes: 7851 Is there some utility function out there that can be called to show what the regular expression you typed in will look like by the time it is ready to be used? Obviously, life is not that simple as it can go through multiple layers with each dealing with a layer of backslashes. But for simple cases, ... -----Original Message----- From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Gilmeh Serda via Python-list Sent: Friday, October 11, 2024 10:44 AM To: python-list@python.org Subject: Re: Correct syntax for pathological re.search() On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote: > I'm trying to discard lines that include the string "\sout{" (which is > TeX, for those who are curious. I have tried: > if not re.search("\sout{", line): if not re.search("\sout\{", line): > if not re.search("\\sout{", line): if not re.search("\\sout\{", > line): > > But the lines with that string keep coming through. What is the right > syntax to properly escape the backslash and the left curly bracket? $ python Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> s = r"testing \sout{WHADDEVVA}" >>> re.search(r"\\sout{", s) <re.Match object; span=(8, 14), match='\\sout{'> You want a literal backslash, hence, you need to escape everything. It is not enough to escape the "\s" as "\\s", because that only takes care of Python's demands for escaping "\". You also need to escape the "\" for the RegEx as well, or it will read it like it means "\s", which is the RegEx for a space character and therefore your search doesn't match, because it reads it like you want to search for " out{". Therefore, you need to escape it either as per my example, or by using four "\" and no "r" in front of the first quote, which also works: >>> re.search("\\\\sout{", s) <re.Match object; span=(8, 14), match='\\sout{'> You don't need to escape the curly braces. We call them "seagull wings" where I live. -- Gilmeh Sometimes I simply feel that the whole world is a cigarette and I'm the only ashtray. -- https://mail.python.org/mailman/listinfo/python-list