Eltanin: Phil’s document processor (version 2.1)

Phil Brooke, Green Pike Ltd

11 Jan 2023

πŸ”—

Copyright Β© 2022-2023 Phil Brooke & Green Pike Ltd.Β Released under GPL v3, see the copyright section.

Outline of document processing

Β§

1πŸ”—Write slightly-enhanced Pandoc markdown into a file in input/whatever.txt. As with Markdown generally, the intention is that the source Markdown files should be generally readable without excessive use of tags or formatting instructions (although some are inevitable). This is the main rationale for not using Docbook (XML) (although assemblies have some potential for organising a knowledge base).

2πŸ”—https://garrettgman.github.io/rmarkdown/authoring_pandoc_markdown.html provides reasonable detail of Pandoc’s markdown notation.

Major changes

Β§

core.5 to core.6

Β§

3πŸ”—This is substantially revised from the core.5 version, which used multiple separate files. In particular, the use of Make is minimal and optional, and the main driver is (from core.6) a Python3 program.

core.19 to core.20

Β§

4πŸ”—The input/output file structure was reorganised so they mirror each other. The YAML configuration was changed to move common configuration stanzas into their own list. The local variable output_stem was changed to file_stem.

Summary of directives for document authors

Β§

5πŸ”—This section is aimed at those updating documents where the configuration has already been prepared.

General

Β§

6πŸ”—The directives and macros are case-sensitive.

7πŸ”—Directives are bracketed with dollar signs, i.e., $…$ (with a few exceptions). They are generally a single letter, a colon, then a key.

Setting references to sections, paragraph and other locations

Β§

Var2VarNDVar1

8πŸ”—Targets for a cross-reference (β€œinbound xrefs”):

  • p β€” assigns the current paragraph as target and its number as value
  • s β€” assigns the current section as target and its number as value
  • t β€” assigns a target (but no value)

9πŸ”—x is a special directive that adds a prefix. This is useful when the same document source produces multiple outputs. Otherwise, multiple output files from a single input file would result in duplicate cross-reference targets.

10πŸ”—l can be used to set local variables, not visible outside that particular document and can be repeated across documents. This is useful for building titles, and the default structure and reports expect a small number to be set. The syntax is a variable name, equals sign, and its value.

Overriding paragraph numbering occasionally

Β§

11πŸ”—The npn directive between exclamation marks, not dollar signs at the very start of a paragraph will cause numbering of that paragraph to be omitted (completely; internal cross-referencing is also disabled for that paragraph).

Setting anchor name for a paragraph

Β§

12πŸ”—Markdown provides for a section id to be set using {#…}. The prefix directive !np!… at the very start of a paragraph overrides the default anchor for a numbered paragraph to the remainder of that string.

Reference to values and targets

Β§

13πŸ”—Referring to a cross-reference (β€œoutbound xrefs”) and values (including those set by l directives):

  • v β€” shows the value (from a p, s or l directive, error if it’s a t target)
  • b β€” produces the URL in brackets (parentheses, not the usual delimiter)
  • a β€” produces the URL in angle brackets
  • h β€” produces the value (as for v) wrapped in an anchor pointing to it (as if b)
  • Upper-case variants of v, b, a and h. Too many back-references from a single (part of a) document can be messy, so V, B, A and H behave as their lower-case variant but without creating a back-reference.

Longer details

Β§

14πŸ”—Configuration is available via a YAML file. Quite a lot can be configured: see settings within the Python source. Consider using yamllint to validate configuration files.

15πŸ”—Customisation is also available via

16πŸ”—All the outputs should magically appear in the output directory.

17πŸ”—The document sources and auxiliary files are stored in a Git-controlled repository. This means that tags and branches can be used to manage development versions, proposed versions and record an audit trail of changes and releases. By including the build instructions with any particular version, then it should remain possible to replicate the outputs of any commit (provided that the dependencies in section 14 continue to provide the same output).

Processor passes

Β§

18πŸ”—The Python program drives a series of incremental changes to the files. These intermediate steps are normally hidden, but can be shown with the --steps command line option.

19πŸ”—In rough order, the passes:

Configuration structure

Β§

20πŸ”—The YAML file comprises a series of keys. The most notable are

21πŸ”—The configs block gives:

22πŸ”—The outputs block gives:

Macros/definitions

Β§

Simple

Β§

23πŸ”—These definitions are contained in a file starting with the line simple. Thereafter, the first word of each line is the directive to be searched for, and the rest of the line (after a single space) is the replacement text.

24πŸ”—These are most useful for short imperatives and applying consistent formatting with Pandoc.

Complex

Β§

25πŸ”—The first line is complex, followed by a series of definitions.

26πŸ”—Each definition takes the name of the directive/macro, the number of arguments (0 to 10) and then its definition (in [{…}]).

27πŸ”—These are particularly useful for

  • applying variation, e.g., a macro that only provides output for some variants and not others
  • inserting common blocks of text

Built-in

Β§

28πŸ”—There is a single built-in macro, $include, which includes another file. The path is relative to the master working directory, not the including file’s location.

Choice of delimiters

Β§

29πŸ”—The default is $. ! works reasonably well, but is less visually clear.

30πŸ”—@ would be better but repeatedly conflicts with existing features, particularly citation support.

31πŸ”—The macro expansions and their arguments are quoted with [{…}] to make collisions less likely.

Choice of macro method

Β§

32πŸ”—A simplistic build of simple and slightly more complex macros was implemented directly in Python for the following reasons.

Choice of Pandoc numbering

Β§

33πŸ”—Initially, Lua filters were used. However, these are slightly harder to manipulate in general for this purpose than directly mangling the Pandoc AST via its JSON export.

34πŸ”—Alternatives considered and rejected:

Cross-references (xrefs)

Β§

35πŸ”—A set of directives (see section above) can be inserted into the input files.

36πŸ”—The requirement this addresses is to be able to build consistent cross-references across a range of documents. The final destinations of these documents may be across multiple directories (or even servers) despite being built from a single input directory. Additionally, being able trace backwards is valuable to see which documents (and sections or paragraphs within them) are being referred to.

37πŸ”—If the capabilities of the v directive (to show a value rather than build a link) were not needed, then Pandoc spans and inline references may have been viable. However, these identifiers are set too late for a post-processor to handle (or at least would need a restructure); and cross-document referencing would still require some external assistance.

38πŸ”—Any identifiers set for sections in Pandoc using {…} will be preserved.

39πŸ”—Duplicate cross-references result in errors, as do missing targets/values.

Resolving cross-references

Β§

40πŸ”—Note, a single space, LF, or CR-LF after x, p, s or t are also removed. This makes the original markdown source easier to read by allowing some whitespace. If whitespace is desired after one of these directives, just use (at least) two items of whitespace.

Render HTML

Β§

41πŸ”—The (near) final pass generates HTML using Pandoc. CSS is applied from the default base.css file and the customisable custom.css file. A further CSS file positions and styles the paragraph numbers. The --self-contained option is set so that any resources such as images are embedded in the HTML file rather than referenced.

42πŸ”—The default structure also includes a trailing block of HTML which uses the script include/git-get-status to generate a footer with Git commit information.

43πŸ”—The body element is also marked with data-dir and data-file attributes to enable the use of per-directory and/or per-file styling.

Dependencies

Β§

PDP

44πŸ”—Some dependencies are absolute:

Eltanin document suite β€” Git VCS commit 6bcdaeed37fff80a69399ae5dd1c87b3470ac787 β€” publish β€” Wed, 11 Jan 2023 07:43:12 +0000
Generated in part by the Eltanin package