Deutsch English Français Italiano |
<hyphenation-20250121121804@ram.dialup.fu-berlin.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: ram@zedat.fu-berlin.de (Stefan Ram) Newsgroups: comp.lang.misc Subject: Mini-Language for hyphenation Date: 21 Jan 2025 11:53:55 GMT Organization: Stefan Ram Lines: 83 Expires: 1 Jan 2026 11:59:58 GMT Message-ID: <hyphenation-20250121121804@ram.dialup.fu-berlin.de> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de WvepD3NEtcShxTdQSlxb+wrU6/fOzKd9f2KQlPL5rEMN/w Cancel-Lock: sha1:ckYKU329DuHYt13hRF8nyOgPlZk= sha256:ykm0iCoUc6NmZzh3Ec+cK71oKa79I35ZxoRdmbZYb40= X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved. Distribution through any means other than regular usenet channels is forbidden. It is forbidden to publish this article in the Web, to change URIs of this article into links, and to transfer the body without this notice, but quotations of parts in other Usenet posts are allowed. X-No-Archive: Yes Archive: no X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some services to mirror the article in the web. But the article may be kept on a Usenet archive server with only NNTP access. X-No-Html: yes Content-Language: de-DE-1901 Bytes: 3937 I foresee the need for a mini-language for hyphenation in my current plain-text paragraph wrapper project. Here are my plans, comments are welcome: example . This is an unadorned word "example". The system might automatically insert possibilities for hyphenation from a hyphenation dictionary. ex[[-]]am[[-]]ple Here, possibilities for hyphenation have been inserted. It is assumed that nested brackets occur so rarely in natural texts, that this possibility is negligible. But means for escaping will be discussed below. ba[ck[k-|k]]en This is a hyphenation of a German word according to the rules from 1973. It's either "backen" or "bak- ken". Bett[[-|t]]uch "Bettuch" or "Bett- tuch", according to spelling rules from 1973. So, the general pattern in my mini-language is: [no-hyphenation text[pre-break text|post-break text]] . Bett[t]uch When brackets occur in the text that do no satisfy the syntax of my mini-language, they will simply be left alone. I.e., this is just literally "Bett[t]uch" with a "t" to be "typeset" in literal brackets. ba[ck[k-|k][-|ck@-99]] Here, two possibilities for hyphenation are given, the second one has a value of -99 added to the quality of the break, which means that "[k-|k]" will be preferred. backen[[#]] This inserts an invisible marker of width zero that then may be found in the wrapped paragraph to learn on which line the "n" has ended. b[[#97]]cken Here, the "a" is given by its code point number. b[[#u61]]cken Here, the "a" is given by its code point number in hex notation. Escape Mechanisms In programming language, we may indeed have nested brackets as in "a[ b[ 20 ]]". Using the above notation, this can be written as "a[[#91]] b[[#91]] 20 [[#93]][[#93]]". My mini-language is intended to be a low-level mechanism for the specification of hyphenation rules. Higher-level formatting languages may be built on top of it, which may automatically convert "a[ b[ 20 ]]" into "a[[#91]] b[[#91]] 20 [[#93]][[#93]]" when it appears in the context of source code. However, as a last ressort, one may use a special notation to redefine the characters of the mini-language: [[#40=#91]] [[#91=]] Above, the parenthesis "(" (40) is given the role of the bracket "[" (91), and then the bracket is defined to have no special role in the mini-language. (The value right of "=" always represents the role this symbol has in the /original/ mini-language.)