| Deutsch English Français Italiano |
|
<SLRTR-000000-20250505123051@ram.dialup.fu-berlin.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.python
Subject: SLRTR 000000: ZWJ in Sphinx
Date: 5 May 2025 11:34:52 GMT
Organization: Stefan Ram
Lines: 97
Expires: 1 Mar 2026 11:59:58 GMT
Message-ID: <SLRTR-000000-20250505123051@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de EcQpZRFeciKjf+LMGY1khQBX5v6RUIu6KDUFiiEwjsDDsy
Cancel-Lock: sha1:wSFuGZlyUQoRHWG8BU3SuMDUptE= sha256:Y92AeZHuAAGS61odzQI1gqgBIhzMg42aLzBKcmprcJk=
X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
The Use of the U+200D "ZERO WIDTH JOINER" (ZWJ) Character in
reStructuredText Input for "Sphinx"
(Technical Report SLRTR 000000)
(This technical report was prepared by the author during his spare
time.)
Stefan Ram
2025
Abstract - The character U+200D "ZERO WIDTH JOINER" (ZWJ) may be
employed in inputs written in the "reStructuredText" (rst) markup
notation for the software documentation tool "Sphinx" in order to permit
the inclusion of special characters within embedded code segments.
To ensure that Sphinx's automatic line breaking continues to function
correctly, two minor adjustments to Sphinx are required.
I. Introduction
The software documentation tool "Sphinx" accepts texts composed in the
"reStructuredText" (rst) notation. Within paragraphs, code segments are
denoted by enclosing the relevant text between pairs of grave accents
(``) as illustrated in Figure 1.
Figure 1: A Code Segment within a Paragraph
|... the expression ``x[ 2 ]`` may be used ...
Such segments are, however, subject to two restrictions:
- They must not begin or end with a space character (" ").
- They must not contain pairs of grave accents.
II. Versions of the Software Considered
This report pertains to Sphinx, version 8.2.3.
III. The U+200D ZERO WIDTH JOINER (ZWJ) Character as a Workaround
It is nevertheless possible to include a space at the beginning of
an embedded code segment by prefixing it with the invisible character
U+200D "ZERO WIDTH JOINER" (ZWJ). Similarly, a space may be appended
to the end of such a segment by suffixing it with a ZWJ. Furthermore,
a sequence of multiple grave accents within an embedded code segment
can be achieved by interposing a ZWJ between the grave accents.
The ZWJ character is invisible in Sphinx's output, or it may be
removed by means of post-processing if so desired.
IV. Consideration of ZWJ in Line Breaking and Word Division
Sphinx interprets a ZWJ as a character of width one and regards it as
a potential break point within words. Consequently, the formatting of
output text may be affected. This behavior can be modified by two
changes to the Sphinx source code.
A. Adjustment of Character Width
Within the Sphinx source file "docutils\utils\__init__.py", the width of
ZWJ characters should be subtracted from the total text width, so that
ZWJ is not counted as a character of length one. This is accomplished by
inserting the following line prior to the "return width" statement in
the definition of the column_width function:
Figure 2: The line to be inserted
|width -= text.count('\u200d')
B. Adjustment of Break Point Determination
(This adjustment is likely unnecessary for ZWJ within embedded code
segments, but may be required if ZWJ is used within words of running
text for any reason.)
In the Sphinx source file "sphinx\writers\text.py", words should not be
split at the occurrence of ZWJ within a word. To this end, the
definition shown in Figure 2 may be inserted below the definition of the
split function (which itself is within the definition of the _split
function in the TextWrapper class). The indentation of the new col_width
function should match that of the preceding split function.
Figure 3: The definition to be inserted
|def col_width(t: str) -> int:
| '''for the purpose of word splitting, treat
| zero-width characters just as characters
| of width one.'''
| width = column_width(t)
| if width == 0: width = 1
| return width
The source code should further be modified such that this new col_width
function is invoked in the call to "groupby" three lines below,
replacing the previous use of column_width.
(End of Technical Report)