Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <mailman.47.1729852305.4695.python-list@python.org>
Deutsch   English   Français   Italiano  
<mailman.47.1729852305.4695.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: Albert-Jan Roskam <sjeik_appie@hotmail.com>
Newsgroups: comp.lang.python
Subject: Re: Chardet oddity
Date: Fri, 25 Oct 2024 12:31:25 +0200
Lines: 80
Message-ID: <mailman.47.1729852305.4695.python-list@python.org>
References: <CALk2KRX=pSzA-+zQ1LPcPwUBLdU=_wXtvZtrn73+0fw-2X_w1g@mail.gmail.com>
 <DB9PR10MB6689557635AD6999D9C5BDE4834F2@DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Trace: news.uni-berlin.de a9ftvx0wnnQbppmWKrUfqw3d3CWPCv2cORr+/kSg3qAA==
Cancel-Lock: sha1:uJNPMMKEI1n7gRUcMpkCGlsREXw= sha256:zsldFoIupimJVyVJw69kbSb8fRau+xVt2SCv4gSALQc=
Return-Path: <sjeik_appie@hotmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=hotmail.com header.i=@hotmail.com header.b=qv3N9ihO;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.04; 'def':
 0.04; 'skip:= 10': 0.05; 'variable': 0.05; '>>>': 0.07;
 'loop': 0.07; 'cc:addr:python-list': 0.09; 'derived': 0.09; 'kid':
 0.09; 'terminal': 0.09; 'way?': 0.09; '>': 0.14; 'cc:no real
 name:2**0': 0.14; 'import': 0.15; '"if': 0.16; 'assuming':
 0.16; 'behaviour': 0.16; 'email addr:python.org)': 0.16;
 'encoding': 0.16; 'encoding.': 0.16; 'filename': 0.16; 'input.':
 0.16; 'inspection': 0.16; 'interpreter': 0.16; 'main()': 0.16;
 'resulted': 0.16; 'windows-1252': 0.16; 'python': 0.16;
 'probably': 0.17; 'uses': 0.19; 'calls': 0.19; 'figure': 0.19;
 'cc:addr:python.org': 0.20; "i've": 0.22; 'ran': 0.22; 'thanks!':
 0.24; 'cc:2**0': 0.25; 'tried': 0.26; "isn't": 0.27; 'bit': 0.27;
 'function': 0.27; 'email addr:python.org>': 0.28; 'think':
 0.29; 'whole': 0.30; 'approach': 0.31; 'module': 0.31; 'python-
 list': 0.32; 'but': 0.32; 'hold': 0.33; 'script': 0.33; 'header
 :In-Reply-To:1': 0.34; 'able': 0.34; 'same': 0.34; 'particularly':
 0.35; 'following': 0.35; 'files': 0.36; "skip:' 10": 0.37; 'file':
 0.38; 'way': 0.38; 'read': 0.38; 'both': 0.38; 'thanks': 0.39;
 'quite': 0.39; 'break': 0.39; 'methods': 0.39; 'skip:u 20': 0.39;
 'still': 0.40; 'file:': 0.40; 'something': 0.40; 'method': 0.61;
 'skip:o 10': 0.61; 'day,': 0.62; 'seen': 0.62; 'gives': 0.62;
 'mode': 0.62; 'skip:b 10': 0.63; 'your': 0.64; 'times.': 0.64;
 'saw': 0.65; 'further': 0.69; 'depending': 0.70; 'confidence':
 0.76; 'returned': 0.81; 'crucial': 0.84; 'email name:<python-
 list': 0.84; 'received:40.92.90': 0.84; 'roland': 0.84; 'skip:d
 30': 0.86
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=wUWZ1TNaug7xl9/zGf2AeK3IJdPi03N+2oecvgBcrbFEEbuCHyj+kxA5d0317nOSeoLegFpVuUgvqmk+U+EHYI5d/zIOWGyZLwrnpEJAnfgDkRJYVJh9GDCxmWVzmFRgkCi21haqm50+0C/Z/C87BpIRZDfK2jtJO9Nmu5FxDsivN3oXI5w3alGPzcADpOZw75Nwc+dohOsmAx6JETQ+QrbrbK3uVg9XXsXOj4vCmPAxLmAxyj7QLuSy+4Xflsku9LVuXQCtR9Uec6j6AO6uiDC3ph0EQBskzvMB43kFLiMqvlyIkfGCXFIu/lnLsosUzz+Owiv5PgRKEHWvwcu/Mg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=Yh6FnICL1CYzOFYmgt2RFk264JjxUTeAIJBjG8T9owQ=;
 b=c7dXUZyweEylgF138/5AtdwHEzIkO1Gr9DM6C/XeSCFVxJ9+k57QSZ7W5ocMxwxOo33t/XgWzsPOcDc7CXOTIs+QXZP/JUY5NauYoQmDjQO5lp9KB/WDbwCxcBQkSjKS2+hYVrcVar9xj8oI69K6F2OI5xeQggaYg4w5SmSJyz5tR8ze/0uP/IjSJAI7E05VgPpz60Yk88DFw01Mn3MANJmWFzUiQwrTozLZ0DBuJaIR13zmI5kBy4+74V89NsLo+oJ2e4lhT90O0OWX+8SmlCyzPXoH0nS7PtPGVz+z6mCX7otwKtcDZMX3b4bJ/7jA5X7uwBw1wTuXwzA2FcKYfg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none;
 dkim=none; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=Yh6FnICL1CYzOFYmgt2RFk264JjxUTeAIJBjG8T9owQ=;
 b=qv3N9ihOpjMjRK9dzfCiu81f2QXL9ASIIajcgMJSu3FIUjJX3rb6dBHZGuZgl5vUedzni0nSmCaABwOT8QD1ThloDANrqhy1tugtH/765Of4UBJX+f1wKFg2VPWQZHvb7YjvItLFUSl8/FQqLsZKyohAXQMlx5Qm5C0rW4yf8QZMISDXNnaepBbKsdZaYuqDqMX3alxycSSLGWn8dIDhtNmyPR3z6giFHP45RHr7jDUD929f9fpyZhOhWHOBh+BPyTzPgXVwsSTbV8KCN+P03RrOnmnUfv+wd078I+KApV8dWDQeq1lD/voFHtUixJ7M7SCnhNWlT8SxNfqZqIYGPw==
X-Android-Message-ID: <20c36f31-a71b-48ac-bcde-596cbb458261@email.android.com>
In-Reply-To: <CALk2KRX=pSzA-+zQ1LPcPwUBLdU=_wXtvZtrn73+0fw-2X_w1g@mail.gmail.com>
X-ClientProxiedBy: AM4PR05CA0018.eurprd05.prod.outlook.com (2603:10a6:205::31)
 To DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM
 (2603:10a6:10:3d3::21)
X-Microsoft-Original-Message-ID: <20c36f31-a71b-48ac-bcde-596cbb458261@email.android.com>
X-MS-Exchange-MessageSentRepresentingType: 1
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: DB9PR10MB6689:EE_|AM7PR10MB3921:EE_
X-MS-Office365-Filtering-Correlation-Id: ba146a7d-589d-4e8d-d875-08dcf4e037d6
X-Microsoft-Antispam: BCL:0;
 ARA:14566002|5072599009|7092599003|461199028|19110799003|15080799006|8060799006|3412199025|440099028;
X-Microsoft-Antispam-Message-Info: HcDgJ65hJ7z7iifUYmvx4MxoHpFnGlOJpSC81V3yx7bNRY0qhZ/aIcojXbn+iR+v9fj7Q527EE3RMJTnvIQvKXSqAFkxGU5BaVjf3CTyM4SeoeEpsDgJjGJYBTMN0u64Kp1gp7Iq6pODNKhtSLr5q1dUp3dxzckYaII8Xf+SqpPrXY3HROztRE5TqDBP0I5asZhHojTUsZD1bxkJyKImYuceN0o/m4aGZR39lsT1gEvVLYkPccH5fPaFh4hjuJMAImitRVyEHJeZrbHagBbDB/UKrw+I4ps0hwxufbchVRY9dV843M6Hv095H+eUbpNyaJrzvpWDMbWfKCOkLaU6AfihKwtsi3wlUXp957EeoRqvBaJcmZN6FRYGNBNmrClun4AdngNzVILHCZThsW7HpxnFCuafxxxuJbe7iFj+24y6zQoe0Zsns17oBaeIVGzgKIiuGqTtWRM5wSJg4Mv8LRKVeHLluoeDl1mKe+5yO/Z74G9zf9gqSNlcVOgLNwNGe0i3woXIt7Cibj4F0ubLcq0EGZySHhZUQ9HCXeZEi/2tHEm6q+PPNVi0PShc6KDQRKgzWEdDDCcQCRFm4sFiuwtsKJZexRYcc6l/LBUpvUZz9sBTk0yon/8Tzo4k136WSOCoNuQKXUkOxH+FOuwZyjeGcMr601N7rZJ2DUvPaYP1VXfUKkRNH6/aOuTP0IV43D2V/bSzhpbCGsp/M+EnHK7wsXz6ZrQiWNP8mCkb6lU=
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?RFQzMjQ4NXR0a08vZ3lCRW1HUDJpQldSQk00Y04wdlkzcnRSbDRxc2FkUmNV?=
 =?utf-8?B?elAyT0ZJZVRuWitvbzk2cmducGp1QTZoUW92QWxPWVNqMXJLWDRLS3drL1Rn?=
 =?utf-8?B?L1pnc216aGkxcmpEOGlqOVI4aHQrSFFJaUVXZzExSW1pUEk4dFl5YVd5UFhh?=
 =?utf-8?B?aGozRzc1eWxSOUdRZXlDQkprUWkxU1J3d0haZVJkbklXTzNFLzlJWUNDaktw?=
 =?utf-8?B?UUQ5ZEVVT215UGxQSWl6d1UvYTZodVpLb3AvU2xqM0JnK0VnZE5SWXhvbnhR?=
 =?utf-8?B?VkMxNmNnSnRLS1Y2L1NGZHhkdkFPQzdhUHc3OE1pNWdLcUR0VlpYeHlvWkto?=
 =?utf-8?B?QjFDdnZzZVV5OUQ4UDVZazV4dnJocDdZNG5oNmhhR2tsOFRoWkRKcldQci9S?=
 =?utf-8?B?RDEydVpBNDNTdUpLYUE3R1ZpREplZm90UExCWVkyWjUvamZyMzNaOUZieW40?=
 =?utf-8?B?NjM5M0YrMEVEckFYVFhja1UyR3BHN3lyYzQ0bVdqV0k5YWhkSnhkZUhsV1lX?=
 =?utf-8?B?NjE3QXptdnFwdVZvSmlXb0s3TVUzNkRrcm9DRTB6dklVZDNkZ0l0TTdDMmdm?=
 =?utf-8?B?am1Wc01JZDhEdzZMYVA5dS9qMCt6aW1KalhUSUtuTzljY0Raa0pHQThwWWRx?=
 =?utf-8?B?SllOU0lLS1dZWGpvRVp5MUpzSUxVSWxBRnJLaVpJbkJISWd5VkFpZ3pWOEEv?=
 =?utf-8?B?SnpKODNPZU82UWNpTUJiVWd2S3hzZm1qUmxEdVN6Q2R5M1MxOXVhem9LellV?=
 =?utf-8?B?MEsyTGRZNWo5Z0trc2RMVVhOVHVrVThBejM0Tkc0QTJDVXpEYUEwcGo3M3Bn?=
 =?utf-8?B?NHMwV2FGU0JscWQvbTVucThwNmJWaEdpSDdIdUsvYzBidmNjcHdZV05FK0o4?=
 =?utf-8?B?N3dGbnQ5VEE5V1Jub2ZJUE1FdVlwQUhmaU9wUDFQYWN6cUMvWkt3UU5GZGJL?=
 =?utf-8?B?VGRrRnppT2kzcDB6V29acW9yZ2VDeW45a1djb1A3VEtMYTFRNkthMHBLVC83?=
 =?utf-8?B?OGxKRGtzb3NlNzBzS25xRnhxUW1ScWJwcU9lSFdRYnhaeWhJdkwwa3pBRWRX?=
 =?utf-8?B?bnp2Wk1XN3NuSVdqYXkrc0hnaWRSUUxSWHk5M3BlckQvYjdpVXhuWmllMHlm?=
 =?utf-8?B?QXhVWUh5Q0ZCRHk2ZE93bVpVKzZ3QkFCc0ltQ3huZHpweVk0S3F6ZEk3Lzlo?=
 =?utf-8?B?UXk5cjlMQ2NBaGVrOFQwQ3dQWUEvS2IrNTFHSU1vRERTUFFKalJjQ2diU3d6?=
 =?utf-8?B?dW9waTFhZXVuMGVOU2RwaE5Ib1hFblhUUkF1aW1oSlBoSDFTL2RVT2s3TGFu?=
 =?utf-8?B?Tk5iYWJ4TEltbmFSMXN4L2tFdnVucmtJbXgrT0kydUN1eVZwRE5YbFZudmhX?=
 =?utf-8?B?bG4rRXNmYmZqRzVHVkk2YktoL0hnUVo1Q2YveHFDODlGY2lMa2dOcFo1VjdD?=
 =?utf-8?B?VlBUMTlPNEpZa2YvNjF2OURUV05BNnA4Tld0K0t6QzI0RjRDSDJocWd1WmZz?=
 =?utf-8?B?bDc2TEJHa3dMVGZvdElpT2VwaDlhVGRlbXZ4ZmlPcWhRTjZqajlrajRQTUY2?=
 =?utf-8?B?VUYxb2R5aUtWM1ZUNFJsRDZsdGhVNStHNTVNc3lpNGhUTFZPS2ttZ1p1Ri9U?=
 =?utf-8?Q?QMAnJ8STcZWbGPhczbRjhH1aQ6Gel2CDCKUbkCp4edSw=3D?=
X-OriginatorOrg: sct-15-20-7719-20-msonline-outlook-4359a.templateTenant
X-MS-Exchange-CrossTenant-Network-Message-Id: ba146a7d-589d-4e8d-d875-08dcf4e037d6
X-MS-Exchange-CrossTenant-AuthSource: DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Oct 2024 10:31:43.0336 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR10MB3921
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <DB9PR10MB6689557635AD6999D9C5BDE4834F2@DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM>
Bytes: 12814

   On Oct 24, 2024 17:51, Roland Mueller via Python-list
   <python-list@python.org> wrote:

     ke 23. lokak. 2024 klo 20.11 Albert-Jan Roskam via Python-list (
     python-list@python.org) kirjoitti:

     >    Today I used chardet.detect in the repl and it returned
     windows-1252
     >    (incorrect, because it later resulted in a UnicodeDecodeError).
     When I
     > ran
     >    chardet as a script (which uses UniversalLineDetector) this
     returned
     >    MacRoman. Isn't charset.detect the correct way? I've used this
     method
     > many
     >    times.
     >    # Interpreter
     >    >>> contents = open(FILENAME, "rb").read()
     >    >>> chardet.detect(content)
     >    {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401,
     > 'language':
     >    ''}
     >    # Terminal
     >    $ python -m chardet FILENAME
     >    FILENAME: MacRoman with confidence 0.7167379080370483
     >    Thanks!
     >    Albert-Jan
     >

     The entry point for the module chardet is chardet.cli.chardetect:main
     and
     main() calls function description_of(lines, name).
     'lines' is an opened file in mode 'rb' and name will hold the filename.

     Following way I tried this in interactive mode: I think the crucial
     difference is that  description_of(lines, name) reads
     the opened file line by line and stops after something has been detected
     in
     some line.

     When reading the whole file into the variable contents probably gives
     another result depending on the input.
     This behaviour I was not able to repeat.
     I am assuming that you used the same Python for both tests.

     >>> from chardet.cli import chardetect
     >>> chardetect.description_of(open('/tmp/DATE', 'rb'), 'some file')
     'some file: ascii with confidence 1.0'
     >>>

     Your approach
     >>> from chardet import detect
     >>> detect(open('/tmp/DATE','rb').read())
     {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}

     from /usr/lib/python3/dist-packages/chardet/cli/chardetect.py

     def description_of(lines, name='stdin'):
         u = UniversalDetector()
         for line in lines:
             line = bytearray(line)
             u.feed(line)
             # shortcut out of the loop to save reading further -
     particularly
     useful if we read a BOM.
             if u.done:
                 break
         u.close()
         result = u.result

   =============
========== REMAINDER OF ARTICLE TRUNCATED ==========