Deutsch English Français Italiano |
<mailman.31.1729703240.4695.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail From: Albert-Jan Roskam <sjeik_appie@hotmail.com> Newsgroups: comp.lang.python Subject: Chardet oddity Date: Wed, 23 Oct 2024 19:07:14 +0200 Lines: 15 Message-ID: <mailman.31.1729703240.4695.python-list@python.org> References: <DB9PR10MB668924668A3BA86F698C6E42834D2@DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Trace: news.uni-berlin.de vq+TKzXYcHEFpu0/L6XdVQ7XVNx4YosNQkhNEc7KHtWw== Cancel-Lock: sha1:f5f8or7yxAKheO+Q63YO0duDE/0= sha256:3RYdN3XT8PTs02rmyVB3tNb5Pm1D+S+YZ61yvW7ga5g= Return-Path: <sjeik_appie@hotmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=hotmail.com header.i=@hotmail.com header.b=FKuMOKob; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.081 X-Spam-Evidence: '*H*': 0.85; '*S*': 0.02; '(which': 0.04; 'way?': 0.09; 'resulted': 0.16; 'windows-1252': 0.16; 'python': 0.16; 'uses': 0.19; 'to:addr:python-list': 0.20; "i've": 0.22; 'ran': 0.22; 'thanks!': 0.24; "isn't": 0.27; 'script': 0.33; "skip:' 10": 0.37; 'skip:u 20': 0.39; 'method': 0.61; 'skip:o 10': 0.61; 'skip:i 20': 0.62; 'times.': 0.64; 'confidence': 0.76; 'returned': 0.81; 'received:40.92.90': 0.84 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=NaeLM3ZF5K1WVU268jFr5Hlw9RG6QpVXylquhhE2atQZspglYDo3+v2ddLfP6XznYIwXBpse9l8FY67IEpLkjhiZl0yz9CpnyyYjyDd+hyF82VUxpMHA6PjeWyXD8tmP9OlVD42B580v6CRWDcs1qFNA3n0TY2fCO3UG4//cPvl64WhENnZL1MvXsQnZWpR2GxNvbFUIjPPf5VmPlHrFdx7XCEVQOkrAEDDcHrx+/uVvJaF9BuzhAw0Hg1Q1ihWc8wDhTFm5BRC1JeIT+797VTi2OJmoBKLH5dto7l+oNN+Noqp7oc6sQJX+LzyEcoPjBIvaOs+wlraGHdZiU+oAEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2iAVxyamQQS7hRmw4suyqIRHVz90zTdmk6BQfFosVdY=; b=XYEMM5nFaIaR0K9JNkgZPJxfmPhWBNemVMNKuFHsomsYkoMHidsqORcOg7mNkje9yoKow7b9EhBwu6PHqwvpGmfLsPk5kMtMc4pB+eRkEt0rqY47CUZvncDSMPXPOZvZEQfJBSJA0syK+tTp733Z2yLEIu0W2d/4Rkemp24zzK7/3uYmGPSu3Qnt2j6Rgs+xuu1Z59RSpvWaA6nPMdq/8nb/Rb+MxoqAdjzWY7+idO582Dn5L5J7To3Sz3D2P5DVQ0XAjDP2eVd0Nk6e6UiPwCp/rrEBnYO32E3ry4uzGpHNQ21eZEMWpkrf1CUe+Y7gfDgNxbhJdyl0TiWaUBeDlw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2iAVxyamQQS7hRmw4suyqIRHVz90zTdmk6BQfFosVdY=; b=FKuMOKobi0xTClm6bRW4CGB71vSlycbBgCmP8QvHBFXwKA7XqPdKp+xrX5qPJVMW5L88tbxLBuNSADN3mpRfTqVfJT6WCv2U2rBfowsO8s8WCEDK1PI+6I2P6ti+TLESye4HhpXoOjrhAdaVad346AKmGp/4zlZAXTv1LB5FbPyrBUoBOoSZcDMTBfGfCBnANQLq3+4WcXtgiz6h8wZiyHw/MxjJx8QBgRozg2bHGLRdMKn60sH91CZukTYnwyIgdQ3niBjtHVBbZBqO/g4hea7UxAGQV31tM/X6fm74mp6DmK5p5oMNR9m3jmK7bzdqMn9S9QyKAiAvfCNfX0XQzg== X-Android-Message-ID: <91bdad5c-4dff-4b2f-80ac-0e87f984560a@email.android.com> X-ClientProxiedBy: AS4P189CA0040.EURP189.PROD.OUTLOOK.COM (2603:10a6:20b:5dd::9) To DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:10:3d3::21) X-Microsoft-Original-Message-ID: <91bdad5c-4dff-4b2f-80ac-0e87f984560a@email.android.com> X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DB9PR10MB6689:EE_|AS1PR10MB5649:EE_ X-MS-Office365-Filtering-Correlation-Id: 841d8b0c-e51c-480c-ca8e-08dcf3852570 X-Microsoft-Antispam: BCL:0; ARA:14566002|7092599003|15080799006|5072599009|461199028|8060799006|19110799003|3412199025|440099028; X-Microsoft-Antispam-Message-Info: ITiP/ECMsudFFehx66IXoRlnYMWifbl79bsQTSLCx/OHZe+Fixt27ZID3MDfY+Fr7lb2g5D6YyOwSLCNm0WvgLzLZdZCDRyproJNMG11/cHG0BTC0rbezXngORcSC/8d/hPu6s4mbRw33I8uIV56e/cJOt+YyvFljLpnKM484lCHsUV0fryOfTvQ7oCPWAzh2htyPhxDoOLC84uFVQpOFEXH93LpH7OiqwuwVkFzS2AhgMAWDPyf2DjaKWfp83HZuM8EVnMd7gTe7kf9SVMpP2LPZ2Gm10rTN3vs3IK+0EmZpniokSoqMZDJtmn795G01f9kAonnAvgJlRLxXabVQ5aiKKjw5UWIDD0SiGecsWZ9iMTlRoHVVJ4KBETaeA+tGHg1ssgoUSmLNccaZMZvXetHZWjCxUDheeEbqP+/WhZ5LLVKOsoUZRnlFjl1iQDhhOjhhvXx1KTQK9e+H8ajC4HN2KVmvqb3/dnLCDI6/93v0SKmxJ62ISCucAwUfXneHKH8iYxb0zElX1VuyImi+nBXqvdPKy+sOOn/i+y2CFsksLOTFii2rnD0ljn+t2RUgtpMIEqOVQ8YKudS6qshzHh9ghSglncFA7Zx6w0BKpJShsWfpP2wtia0Ja9BTW5YxgP74/pVWoDPBH6DOH7gAIp32bQNG9siAlXJ/NvnrKXHUxcOCVSnPSyOSKQz7VT9E5AxhaeEgIQq9ZV+Yzvt7uYjEfgb7JwGKpBVJNjjDXM= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?RVBDWE1kWUQrU3Qyd3FLUVVXTEVBQytGSGRscWwwaXBoT2pIMGRUUnFhWlc2?= =?utf-8?B?ZUk4WFNmejZKNHJMWFY2emZNOHpKVys0U1JuaHpvRXRaTkVreisvTktaQTBq?= =?utf-8?B?Yi9OQ3QyVkxTVVlaWEFkdTFFdWFpQWtxR251QjFkRVF5U2ZOWWxvTXQra1FW?= =?utf-8?B?RjQ0QkVSKzd3a3FmMWh4RmpHeFhjUUZJS1pQWnh1YW1zK0Y0Y2dndWhNWFFr?= =?utf-8?B?dlBnNFREeUhsdVZVRXRkSEFkUEt1TWQ1Z2diUk1idXdWR0RRcDhDOEthczN3?= =?utf-8?B?aDAwYU85cUNzdy94bjBkbjFoZ1FyODBJUGVRUXFHeU5oZlZMYmp4UTRFRldM?= =?utf-8?B?ZEgzUzJuNUdBZ2J0MkpSYzkwdGtVQW13TTJCaHV0a3hLMXZJU25BTUVONVR3?= =?utf-8?B?TE1tT0ZhR1MvQWttT3I2Y2s0bkxzS0owZ0VyRWlzendWYmR6YXU0R1M5ekgr?= =?utf-8?B?UGQrbHBMcHRQaXlMc0tLWVdxL3VXRDdueWN6SlRCQWhhaXU0b1VDT2kvK3Bq?= =?utf-8?B?NXZ4U1ViVzNOT21ERDk0NTFqVnJBc05RTjgvWG5FSytYTExpenljanAxWmFK?= =?utf-8?B?bkIzaHljQnRLMldXUm1PV29sNWNid0w4SlloUlhrd0IxL3FUV1FaTnpXbmR2?= =?utf-8?B?Vk5ZMFpybGRxMEhZajBiWjZJMDN3TlFWTFBjZUZGeFJlS09mMWs2Qm9sMVRN?= =?utf-8?B?WlJFSGs1L2FDYzBUMi93L0JLRTc3djIxbnNaSmkyV3plRTZoZjZnR3BjY0RB?= =?utf-8?B?YUQ2UUtsbkROV3dQZVFKNUpGRFczMDZ3UTFXT0ZCTFFDb2t2MHJwb1BXUHhn?= =?utf-8?B?SVBGMlNuaDZ3aDluMUxFd1dpMS9ES0x2WTBlSjFWb3hkM3M4L2dhYzBsMzky?= =?utf-8?B?Ukd1bWg4bEliZVlEcjJoNlIzNXhlSEFKd0xlM2NZR2hibkdEM2xpWTBxNGYy?= =?utf-8?B?OWQ5ZkNpd1IxUC9BOGZSbUk5T2FreEZOTHB6RFVTNlRrZThsTnVLQ20rZm8v?= =?utf-8?B?Y04zNFY0MFpqNjNTUjdrb1o0L0RHNTZBQmNpT0FDcjFGZ29MMUZ6NStmaFk5?= =?utf-8?B?aVdNVHV5T0RCUkV4aUthVkJ6djRkdGR3T3ZaWER4Y3ZJdUJkcUNkVWdqYlp3?= =?utf-8?B?N1JWQkRZbEV6NWhCTkMydGYvUkVjdExpSE5oT0xEeGdEU2FIeEVtUk9LYkdC?= =?utf-8?B?c2lPckRTalRyWkxIVnVNQUlLWFRlaE52dlhXbEpvLzUrTGxYTDJZUXY2aC90?= =?utf-8?B?Mm5CZ2FFTGZ1SVg3NmgxVUo5WjF1YS9kb09lamJtU1FIbHV6MzZPR3AwaHl6?= =?utf-8?B?c0hMZVpCQnZOdWZzYzFjbmpqKzVPWnpYaTFIVW5uM0JycTc4eHBTb0xpTklJ?= =?utf-8?B?alM1WUh2WkdUR1BSVkNPVnZZbmpuQkhsdEVpSUI1enpXeU00dmpTMEM1OE1V?= =?utf-8?B?ZXJVZEhJalZGeWtUQiswU1BGcVhOWWd2dldwTWdqZkQvZGVuR2llKzZFNFNu?= =?utf-8?B?UzlSSjUzaGxXL0FIc2dHc3ZEb0EwMWJJWGVZV01QTlZhK2FlMXJ0YW1UeGRy?= =?utf-8?B?bXpZSURGekpXazdKOWY4MUpYOFJ5eEd1dFlkL2prMU1DdDN3TVhtUjg4aG9K?= =?utf-8?Q?IdH5WGZOTLNUXavD1kOn6Ix1QLsefun2efW8THanLZCI=3D?= X-OriginatorOrg: sct-15-20-7719-20-msonline-outlook-4359a.templateTenant X-MS-Exchange-CrossTenant-Network-Message-Id: 841d8b0c-e51c-480c-ca8e-08dcf3852570 X-MS-Exchange-CrossTenant-AuthSource: DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Oct 2024 17:07:16.8139 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS1PR10MB5649 X-Content-Filtered-By: Mailman/MimeDel 2.1.39 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <DB9PR10MB668924668A3BA86F698C6E42834D2@DB9PR10MB6689.EURPRD10.PROD.OUTLOOK.COM> Bytes: 8923 Today I used chardet.detect in the repl and it returned windows-1252 (incorrect, because it later resulted in a UnicodeDecodeError). When I ran chardet as a script (which uses UniversalLineDetector) this returned MacRoman. Isn't charset.detect the correct way? I've used this method many times. # Interpreter >>> contents = open(FILENAME, "rb").read() >>> chardet.detect(content) {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language': ''} # Terminal $ python -m chardet FILENAME FILENAME: MacRoman with confidence 0.7167379080370483 Thanks! Albert-Jan