This is a very short and simple guide for people who want to start working on digital editions, but are completely unfamiliar with XML and TEI.
1. Mark-up
The ML of XML stands for Mark-up Language. That is to say, XML is used to mark up a text—in other words, to add comments in and around it—without displacing the text itself.
This is quite like how a proof-reader might add mark-up to a printed text in red pen. For example:
2. Tags
2.1 Opening and closing tags
In XML, the mark-up is represented in tags, which encode information between angle brackets:
<tag>
Tags often come in pairs, with an opening tag matched by a closing tag. A closing tag has a forward slash directly after the first angle bracket:
<tag></tag>
We can use different tags to mark up a text with editorial notes, for example as follows:
cum omnis <surplus>omnis</surplus> eloquentiae doctrinam et omne studio<ex>rum</ex> genus sapientiae <unclear>luce praefulgens</unclear> a Graecorum fontibus deriuatum
These tags could indicate that the second omnis is surplus and should be ignored, that –rum at the end of studiorum is an editorial expansion of a manuscript abbreviation, and that luce praefulgens is difficult to read in a manuscript source.
2.2 Single (self-closing) tags
If only a single tag is required, we can use what is called a ‘self-closing’ tag, i.e. one with a closing forward slash before the last angle bracket. For example, this self-closing tag might indicate that there appears to be a gap in the text:
cum omnis <gap/> studiorum genus
2.3 Nesting tags
It is perfectly fine for tags to be nested (contained) within other tags. In the following example, eloquentiae is marked as the lemma of a critical apparatus, with eloquentia marked as a manuscript reading. Both tags are grouped together within an <app> tag.
cum omnis
<app>
<lem>eloquentiae</lem>
<rdg>eloquentia<rdg>
</app>
doctrinam
(In this example, the additional space is not problematic. XML normally ignores extra space, unless instructed to do otherwise.)
While nested tags are very common, XML does not allow overlapping tags. So the following example would cause an error, because <app> overlaps with <rdg>:
cum omnis <app><rdg>eloquentia</app><rdg> doctrinam
3. Attributes
If we want to provide additional information with a tag, we can use attributes. These are contained within an opening or self-closing tag, with the formula name=”value”:
<note type="translation">in every kind of knowledge and wisdom</note>
<rdg wit="#ms-T">eloquentia<rdg>
cum omnis <gap extent="4 words"/> studiorum genus
<milestone unit="folio" n="7v"/>
4. TEI
The X of XML stands for eXtensible. This means that you can invent names for tags and attributes as you wish. However, if everyone devised their own system of mark-up, it would be difficult to read each other’s files and to develop shareable software to process digital editions.
The Text Encoding Initiative (TEI) offers guidelines for encoding that are now a de facto standard. These are extremely extensive: the PDF of the current guidelines runs to over 2,600 pages! In practice, however, most projects require only a very small subset of the recommended tags and attributes. By following TEI, we can hope to make our XML code more accessible now and in the future.
In TEI documents, the basic structure is often as follows:
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>…</titleStmt>
<publicationStmt>…</publicationStmt>
<sourceDesc>…</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>…</body>
</text>
</TEI>
The TEI header (<teiHeader>) provides information about the document: its title, publication information, and source. The <body> of the document contains the main content.
Since the TEI guidelines are complicated, however, it is generally easier for beginners if you can obtain and modify an existing template, to save you having to figure out how to apply the guidelines from scratch.
5. XML editors
XML files (with the file extension .xml) contain absolutely no formatting. This makes the file structure extremely simple and ideal for long-term archiving.
You can therefore edit XML files in any plain-text editor, for example Notepad on Windows or TextEdit on MacOS.
For working with more complex files, e.g. TEI-compliant files, it can be helpful to use code-editing software. This will usually automatically colour-code tags, attributes and values to make them easier to read. It will also alert you to any errors in your code, such as missing closing tags or overlapping tags.
There are many code editors available, but the following two are the best known:
- Oxygen XML editor. A very powerful tool with many features, but it requires a paid subscription.
- Visual Studio Code (VS Code). A free tool that is extremely popular among software developers. A free extension named “XML Tools” provides XML validation, including for TEI documents. (See screenshot below.)
6. Publishing your XML edition
This is the difficult part. XML is obviously a format created for machines to read, not for people. To turn your XML into nicely presented, readable text, you will need to transform it into a different format. For web publication, this is normally HTML.
Transforming XML requires knowledge of other technologies, such as XSLT and XPath, which probably present a learning curve that is too high for most humanities scholars.
There are some publishing tools (e.g. TEI Publisher) available, but these too require technical expertise to set up and customise.
Even after you have transformed your XML into HTML, you will need to organise web hosting in order to make your content available online. This also requires some technical knowledge and involves ongoing costs.
To remove the barriers to publishing XML editions online, I am developing two new publishing platforms:
- Gloss Corpus is intended for digital editions of manuscript glosses. (See the About page for XML templates and mark-up guidelines.)
- Armarium is a platform for publication of simple digital editions. More details to follow soon.
7. How to learn more
The best way to learn is by experimenting with a simple project of your own. The following resources can help you with that:
- ChatGPT is incredibly useful for learning all kinds of technical skills. You can submit your code and ask for corrections, suggestions, explanations, etc. ChatGPT is also very familiar with TEI and can make useful recommendations.
- For a more structured approach, the XML Tutorial at W3Schools is very accessible.