Article <hyphenation-20250121121804@ram.dialup.fu-berlin.de>

Deutsch English Français Italiano
<hyphenation-20250121121804@ram.dialup.fu-berlin.de>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.misc
Subject: Mini-Language for hyphenation
Date: 21 Jan 2025 11:53:55 GMT
Organization: Stefan Ram
Lines: 83
Expires: 1 Jan 2026 11:59:58 GMT
Message-ID: <hyphenation-20250121121804@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de WvepD3NEtcShxTdQSlxb+wrU6/fOzKd9f2KQlPL5rEMN/w
Cancel-Lock: sha1:ckYKU329DuHYt13hRF8nyOgPlZk= sha256:ykm0iCoUc6NmZzh3Ec+cK71oKa79I35ZxoRdmbZYb40=
X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved.
	Distribution through any means other than regular usenet
	channels is forbidden. It is forbidden to publish this
	article in the Web, to change URIs of this article into links,
        and to transfer the body without this notice, but quotations
        of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
	services to mirror the article in the web. But the article may
	be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: de-DE-1901
Bytes: 3937

  I foresee the need for a mini-language for hyphenation in
  my current plain-text paragraph wrapper project. Here are
  my plans, comments are welcome:

example

  . This is an unadorned word "example". The system might automatically
  insert possibilities for hyphenation from a hyphenation dictionary.

ex[[-]]am[[-]]ple

  Here, possibilities for hyphenation have been inserted. It
  is assumed that nested brackets occur so rarely in natural
  texts, that this possibility is negligible. But means for
  escaping will be discussed below.

ba[ck[k-|k]]en

  This is a hyphenation of a German word according to the
  rules from 1973. It's either "backen" or "bak-
  ken".

Bett[[-|t]]uch

  "Bettuch" or "Bett-
  tuch", according to spelling rules from 1973.

  So, the general pattern in my mini-language is:

[no-hyphenation text[pre-break text|post-break text]]

  .

Bett[t]uch

  When brackets occur in the text that do no satisfy the
  syntax of my mini-language, they will simply be left alone.
  I.e., this is just literally "Bett[t]uch" with a "t" to
  be "typeset" in literal brackets.

ba[ck[k-|k][-|ck@-99]]

  Here, two possibilities for hyphenation are given, the second one
  has a value of -99 added to the quality of the break, which means
  that "[k-|k]" will be preferred.

backen[[#]]

  This inserts an invisible marker of width zero that then may be found
  in the wrapped paragraph to learn on which line the "n" has ended.

b[[#97]]cken

  Here, the "a" is given by its code point number.

b[[#u61]]cken

  Here, the "a" is given by its code point number in hex notation.

  Escape Mechanisms

  In programming language, we may indeed have nested brackets as
  in "a[ b[ 20 ]]". Using the above notation, this can be written
  as "a[[#91]] b[[#91]] 20 [[#93]][[#93]]".

  My mini-language is intended to be a low-level mechanism
  for the specification of hyphenation rules. Higher-level
  formatting languages may be built on top of it, which may
  automatically convert "a[ b[ 20 ]]" into "a[[#91]] b[[#91]] 20
  [[#93]][[#93]]" when it appears in the context of source code.

  However, as a last ressort, one may use a special notation to
  redefine the characters of the mini-language:

[[#40=#91]]
[[#91=]]

  Above, the parenthesis "(" (40) is given the role of the bracket
  "[" (91), and then the bracket is defined to have no special role
  in the mini-language. (The value right of "=" always represents
  the role this symbol has in the /original/ mini-language.)