Article <101hfq7$22v3c$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <101hfq7$22v3c$1@dont-email.me>

Deutsch English Français Italiano

<101hfq7$22v3c$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups: comp.unix.shell
Subject: Bad performance of back-references
Date: Sun, 1 Jun 2025 14:07:33 +0200
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <101hfq7$22v3c$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 01 Jun 2025 14:07:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="5efe03dbd7af97f43c3764a2772b692a";
	logging-data="2194540"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+Jc/EG6Of8yNOIbBIukuCH"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
Cancel-Lock: sha1:WmJxYlDvSTCYxcCCfoURo8WKI+4=
X-Enigmail-Draft-Status: N1110
X-Mozilla-News-Host: news://news.eternal-september.org:119

In a recent application case I used back-references to find duplicate
strings in large data sets. I tested that with grep as in

  grep -E -o '(.{42}).*\1'

While this is obviously a neat way to formulate and solve the task in
principle it is impractical for performance reasons.[*]

Applied on my MB sized data the task did not terminate and I killed
the process after a day.

I also implemented the desired function explicitly (using two nested
loops) in a couple of languages (interpreted or compiled). All those
hand-crafted and non-optimized implementations terminated and did
that within minutes or up to only few hours (depending on the pattern
length).

My astonishment is why the back-reference implementation performs so
badly here with 'grep'.

Janis

[*] Note: Back-references are not from the Regular Expression functions
class so they cannot be done in O(N) or O(N+K); so I don't expect this
complexity where I use them. This is not the question here, just to be
clear.