Deutsch   English   Français   Italiano  
<mailman.31.1729703240.4695.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: Albert-Jan Roskam <sjeik_appie@hotmail.com>
Newsgroups: comp.lang.python
Subject: Chardet oddity
Date: Wed, 23 Oct 2024 19:07:14 +0200
Lines: 15
Message-ID: <mailman.31.1729703240.4695.python-list@python.org>
References: <DB9PR10MB668924668A3BA86F698C6E42834D2@DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Trace: news.uni-berlin.de vq+TKzXYcHEFpu0/L6XdVQ7XVNx4YosNQkhNEc7KHtWw==
Cancel-Lock: sha1:f5f8or7yxAKheO+Q63YO0duDE/0= sha256:3RYdN3XT8PTs02rmyVB3tNb5Pm1D+S+YZ61yvW7ga5g=
Return-Path: <sjeik_appie@hotmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=hotmail.com header.i=@hotmail.com header.b=FKuMOKob;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.081
X-Spam-Evidence: '*H*': 0.85; '*S*': 0.02; '(which': 0.04; 'way?':
 0.09; 'resulted': 0.16; 'windows-1252': 0.16; 'python': 0.16;
 'uses': 0.19; 'to:addr:python-list': 0.20; "i've": 0.22; 'ran':
 0.22; 'thanks!': 0.24; "isn't": 0.27; 'script': 0.33; "skip:' 10":
 0.37; 'skip:u 20': 0.39; 'method': 0.61; 'skip:o 10': 0.61;
 'skip:i 20': 0.62; 'times.': 0.64; 'confidence': 0.76; 'returned':
 0.81; 'received:40.92.90': 0.84
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=NaeLM3ZF5K1WVU268jFr5Hlw9RG6QpVXylquhhE2atQZspglYDo3+v2ddLfP6XznYIwXBpse9l8FY67IEpLkjhiZl0yz9CpnyyYjyDd+hyF82VUxpMHA6PjeWyXD8tmP9OlVD42B580v6CRWDcs1qFNA3n0TY2fCO3UG4//cPvl64WhENnZL1MvXsQnZWpR2GxNvbFUIjPPf5VmPlHrFdx7XCEVQOkrAEDDcHrx+/uVvJaF9BuzhAw0Hg1Q1ihWc8wDhTFm5BRC1JeIT+797VTi2OJmoBKLH5dto7l+oNN+Noqp7oc6sQJX+LzyEcoPjBIvaOs+wlraGHdZiU+oAEQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=2iAVxyamQQS7hRmw4suyqIRHVz90zTdmk6BQfFosVdY=;
 b=XYEMM5nFaIaR0K9JNkgZPJxfmPhWBNemVMNKuFHsomsYkoMHidsqORcOg7mNkje9yoKow7b9EhBwu6PHqwvpGmfLsPk5kMtMc4pB+eRkEt0rqY47CUZvncDSMPXPOZvZEQfJBSJA0syK+tTp733Z2yLEIu0W2d/4Rkemp24zzK7/3uYmGPSu3Qnt2j6Rgs+xuu1Z59RSpvWaA6nPMdq/8nb/Rb+MxoqAdjzWY7+idO582Dn5L5J7To3Sz3D2P5DVQ0XAjDP2eVd0Nk6e6UiPwCp/rrEBnYO32E3ry4uzGpHNQ21eZEMWpkrf1CUe+Y7gfDgNxbhJdyl0TiWaUBeDlw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none;
 dkim=none; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=2iAVxyamQQS7hRmw4suyqIRHVz90zTdmk6BQfFosVdY=;
 b=FKuMOKobi0xTClm6bRW4CGB71vSlycbBgCmP8QvHBFXwKA7XqPdKp+xrX5qPJVMW5L88tbxLBuNSADN3mpRfTqVfJT6WCv2U2rBfowsO8s8WCEDK1PI+6I2P6ti+TLESye4HhpXoOjrhAdaVad346AKmGp/4zlZAXTv1LB5FbPyrBUoBOoSZcDMTBfGfCBnANQLq3+4WcXtgiz6h8wZiyHw/MxjJx8QBgRozg2bHGLRdMKn60sH91CZukTYnwyIgdQ3niBjtHVBbZBqO/g4hea7UxAGQV31tM/X6fm74mp6DmK5p5oMNR9m3jmK7bzdqMn9S9QyKAiAvfCNfX0XQzg==
X-Android-Message-ID: <91bdad5c-4dff-4b2f-80ac-0e87f984560a@email.android.com>
X-ClientProxiedBy: AS4P189CA0040.EURP189.PROD.OUTLOOK.COM
 (2603:10a6:20b:5dd::9) To DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM
 (2603:10a6:10:3d3::21)
X-Microsoft-Original-Message-ID: <91bdad5c-4dff-4b2f-80ac-0e87f984560a@email.android.com>
X-MS-Exchange-MessageSentRepresentingType: 1
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: DB9PR10MB6689:EE_|AS1PR10MB5649:EE_
X-MS-Office365-Filtering-Correlation-Id: 841d8b0c-e51c-480c-ca8e-08dcf3852570
X-Microsoft-Antispam: BCL:0;
 ARA:14566002|7092599003|15080799006|5072599009|461199028|8060799006|19110799003|3412199025|440099028;
X-Microsoft-Antispam-Message-Info: ITiP/ECMsudFFehx66IXoRlnYMWifbl79bsQTSLCx/OHZe+Fixt27ZID3MDfY+Fr7lb2g5D6YyOwSLCNm0WvgLzLZdZCDRyproJNMG11/cHG0BTC0rbezXngORcSC/8d/hPu6s4mbRw33I8uIV56e/cJOt+YyvFljLpnKM484lCHsUV0fryOfTvQ7oCPWAzh2htyPhxDoOLC84uFVQpOFEXH93LpH7OiqwuwVkFzS2AhgMAWDPyf2DjaKWfp83HZuM8EVnMd7gTe7kf9SVMpP2LPZ2Gm10rTN3vs3IK+0EmZpniokSoqMZDJtmn795G01f9kAonnAvgJlRLxXabVQ5aiKKjw5UWIDD0SiGecsWZ9iMTlRoHVVJ4KBETaeA+tGHg1ssgoUSmLNccaZMZvXetHZWjCxUDheeEbqP+/WhZ5LLVKOsoUZRnlFjl1iQDhhOjhhvXx1KTQK9e+H8ajC4HN2KVmvqb3/dnLCDI6/93v0SKmxJ62ISCucAwUfXneHKH8iYxb0zElX1VuyImi+nBXqvdPKy+sOOn/i+y2CFsksLOTFii2rnD0ljn+t2RUgtpMIEqOVQ8YKudS6qshzHh9ghSglncFA7Zx6w0BKpJShsWfpP2wtia0Ja9BTW5YxgP74/pVWoDPBH6DOH7gAIp32bQNG9siAlXJ/NvnrKXHUxcOCVSnPSyOSKQz7VT9E5AxhaeEgIQq9ZV+Yzvt7uYjEfgb7JwGKpBVJNjjDXM=
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?RVBDWE1kWUQrU3Qyd3FLUVVXTEVBQytGSGRscWwwaXBoT2pIMGRUUnFhWlc2?=
 =?utf-8?B?ZUk4WFNmejZKNHJMWFY2emZNOHpKVys0U1JuaHpvRXRaTkVreisvTktaQTBq?=
 =?utf-8?B?Yi9OQ3QyVkxTVVlaWEFkdTFFdWFpQWtxR251QjFkRVF5U2ZOWWxvTXQra1FW?=
 =?utf-8?B?RjQ0QkVSKzd3a3FmMWh4RmpHeFhjUUZJS1pQWnh1YW1zK0Y0Y2dndWhNWFFr?=
 =?utf-8?B?dlBnNFREeUhsdVZVRXRkSEFkUEt1TWQ1Z2diUk1idXdWR0RRcDhDOEthczN3?=
 =?utf-8?B?aDAwYU85cUNzdy94bjBkbjFoZ1FyODBJUGVRUXFHeU5oZlZMYmp4UTRFRldM?=
 =?utf-8?B?ZEgzUzJuNUdBZ2J0MkpSYzkwdGtVQW13TTJCaHV0a3hLMXZJU25BTUVONVR3?=
 =?utf-8?B?TE1tT0ZhR1MvQWttT3I2Y2s0bkxzS0owZ0VyRWlzendWYmR6YXU0R1M5ekgr?=
 =?utf-8?B?UGQrbHBMcHRQaXlMc0tLWVdxL3VXRDdueWN6SlRCQWhhaXU0b1VDT2kvK3Bq?=
 =?utf-8?B?NXZ4U1ViVzNOT21ERDk0NTFqVnJBc05RTjgvWG5FSytYTExpenljanAxWmFK?=
 =?utf-8?B?bkIzaHljQnRLMldXUm1PV29sNWNid0w4SlloUlhrd0IxL3FUV1FaTnpXbmR2?=
 =?utf-8?B?Vk5ZMFpybGRxMEhZajBiWjZJMDN3TlFWTFBjZUZGeFJlS09mMWs2Qm9sMVRN?=
 =?utf-8?B?WlJFSGs1L2FDYzBUMi93L0JLRTc3djIxbnNaSmkyV3plRTZoZjZnR3BjY0RB?=
 =?utf-8?B?YUQ2UUtsbkROV3dQZVFKNUpGRFczMDZ3UTFXT0ZCTFFDb2t2MHJwb1BXUHhn?=
 =?utf-8?B?SVBGMlNuaDZ3aDluMUxFd1dpMS9ES0x2WTBlSjFWb3hkM3M4L2dhYzBsMzky?=
 =?utf-8?B?Ukd1bWg4bEliZVlEcjJoNlIzNXhlSEFKd0xlM2NZR2hibkdEM2xpWTBxNGYy?=
 =?utf-8?B?OWQ5ZkNpd1IxUC9BOGZSbUk5T2FreEZOTHB6RFVTNlRrZThsTnVLQ20rZm8v?=
 =?utf-8?B?Y04zNFY0MFpqNjNTUjdrb1o0L0RHNTZBQmNpT0FDcjFGZ29MMUZ6NStmaFk5?=
 =?utf-8?B?aVdNVHV5T0RCUkV4aUthVkJ6djRkdGR3T3ZaWER4Y3ZJdUJkcUNkVWdqYlp3?=
 =?utf-8?B?N1JWQkRZbEV6NWhCTkMydGYvUkVjdExpSE5oT0xEeGdEU2FIeEVtUk9LYkdC?=
 =?utf-8?B?c2lPckRTalRyWkxIVnVNQUlLWFRlaE52dlhXbEpvLzUrTGxYTDJZUXY2aC90?=
 =?utf-8?B?Mm5CZ2FFTGZ1SVg3NmgxVUo5WjF1YS9kb09lamJtU1FIbHV6MzZPR3AwaHl6?=
 =?utf-8?B?c0hMZVpCQnZOdWZzYzFjbmpqKzVPWnpYaTFIVW5uM0JycTc4eHBTb0xpTklJ?=
 =?utf-8?B?alM1WUh2WkdUR1BSVkNPVnZZbmpuQkhsdEVpSUI1enpXeU00dmpTMEM1OE1V?=
 =?utf-8?B?ZXJVZEhJalZGeWtUQiswU1BGcVhOWWd2dldwTWdqZkQvZGVuR2llKzZFNFNu?=
 =?utf-8?B?UzlSSjUzaGxXL0FIc2dHc3ZEb0EwMWJJWGVZV01QTlZhK2FlMXJ0YW1UeGRy?=
 =?utf-8?B?bXpZSURGekpXazdKOWY4MUpYOFJ5eEd1dFlkL2prMU1DdDN3TVhtUjg4aG9K?=
 =?utf-8?Q?IdH5WGZOTLNUXavD1kOn6Ix1QLsefun2efW8THanLZCI=3D?=
X-OriginatorOrg: sct-15-20-7719-20-msonline-outlook-4359a.templateTenant
X-MS-Exchange-CrossTenant-Network-Message-Id: 841d8b0c-e51c-480c-ca8e-08dcf3852570
X-MS-Exchange-CrossTenant-AuthSource: DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Oct 2024 17:07:16.8139 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS1PR10MB5649
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <DB9PR10MB668924668A3BA86F698C6E42834D2@DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM>
Bytes: 8923

   Today I used chardet.detect in the repl and it returned windows-1252
   (incorrect, because it later resulted in a UnicodeDecodeError). When I ran
   chardet as a script (which uses UniversalLineDetector) this returned
   MacRoman. Isn't charset.detect the correct way? I've used this method many
   times.
   # Interpreter
   >>> contents = open(FILENAME, "rb").read()
   >>> chardet.detect(content)
   {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
   ''}
   # Terminal
   $ python -m chardet FILENAME
   FILENAME: MacRoman with confidence 0.7167379080370483
   Thanks!
   Albert-Jan