Site announcements
Skip available courses
Available courses
User Experience Design
This course will introduce the students to user experience design foundations. The students may be surprised to learn that this course is less about technology and software but more about humans. Humans are the technology users and without them there is no technology only penguins and icebergs. User experience design (UXD) is rooted in human psychology. Good understanding of design principles and guidelines and their effective application requires knowledge of their scientific underpinnings. Therefore, portion of the course will be devoted to theoretical topics and their implications on UXD. We will briefly review psychological theories of human cognition and perception. These theories serve an essential role in generative design, explanatory evaluation, and support the codification of knowledge.
- Enrolled students: 5
Data Management & Visualization
- Enrolled students: 1
Digitisation of Cultural Heritage
- Enrolled students: 1
Information Organization and Access
- Enrolled students: 1
News Literacy: Spotting and Fighting with Disinformation
- Enrolled students: There are no students enrolled in this course.
New Trends in Knowledge Organization Competencies for Information Professionals
- Teacher: Ana Lúcia Terra
- Enrolled students: 50
Open Science: Transforming and Democratizing Academic Knowledge
- Teacher: Nevzat Ozel
- Enrolled students: 50
Plenary Session
- Teacher: Silviu Bors
- Teacher: Mina Bounoua
- Teacher: Joumana Boustany
- Teacher: Nevzat Ozel
- Teacher: Jamie Rose Johnston
- Teacher: Ana Lúcia Terra
- Enrolled students: 50
Urban Knowledge Hubs: Engaging Users to Transform Public Spaces and Services
- Teacher: Jamie Rose Johnston
- Enrolled students: 50
Digital Humanities in LIS Context: Creating Electronic Editions from Printed Old Books
Structured Writing or Information Architecture?
Before explaining the process of creating the digital matrix of the Liturghier, it is appropriate to make an incursion into the "world" of structured text, especially in electronic format. We can admit the fact that in order to reposition any text from the classic, printed format into the new, electronic format, intelligible in machine language, we are in fact dealing with a restructuring of it into machine code, an assurance of its distribution, remotely, in real time according to the principles of information technologies. And, last but not least, with the easy retrieval of the new form of the text in the web environment and with ensuring the preservation of the original.
We start from the assertion that any writing is structured. Writing without grammatical structure would be incomprehensible. All writing programs are also structured. Software that would not produce reliable and consistent data structures would be unreliable and non-functional[^3]. In a simple presentation of a text on an electronic device, the ordinary reader understands and reads this text through a subconscious mechanism offered by a visual grammar in a likewise ordinary, natural style. In fact, before we reach this performance, the mechanism that makes the device usable by the human factor needs some specific instructions. Even in the case of scanned pages converted into word processor files, this mechanism, the computer, can only determine that something in a block of text is possibly a paragraph, but cannot distinguish a paragraph from a note or a quote. Therefore, in order to usefully render content, it is necessary to indicate the order and intention of the parts of a document, thus ensuring its long-term use[^4]. Generally speaking, structured writing means writing approaches that add a bit more structure, beyond the basic requirements of grammar, to exercise some control over content; structured writing also means using software that uses more specific data structures to support specific writing structures and processes, such as publishing, single-source use, or content reuse[^5].
At the machine code/computer level, the way characters are represented by the underlying data stream is called file encoding. The specific encoding used is often present as the first few bytes in the file; an application checks these bytes when opening the file, "knowing" how to display and manipulate the data. There is also a default encoding mode if these first few bytes are not present[^6]. For data passage between different components to be possible and much easier to achieve, different formats have been designed over time, ultimately arriving at the point where, in the case of documents, the content of a writing can be structured in several ways; most frequently, this is done by applying descriptive, coded markup (Extensible Markup Language (XML) or other semantic markup[^7]) or by storing content in named fields in a database. By properly structuring content, information can be easily transformed into knowledge, instructions into automations, concepts into knowledge units, etc.[^8] In practical terms, the XML specification provides standard rules and conventions for converting data and information into a form that can be used, stored, and transmitted by and between computer applications. In other words, XML is an open standard that is used to encode and describe structured data and to facilitate the maintenance, organization, sharing, and reuse of that data by computer applications[^9]. It is an open standard for storing and exchanging structured information. With the help of XML, not only text can be captured, but also information about the text and relationships between different components of the text. A rich and intelligent content base opens up sophisticated content manipulation possibilities, such as customizing information based on reader demographics or automatically linking product references to corresponding 3D images[^10].
Everything relates to structured writing. Even the ordinary word processors and desktop editing tools used every day are structured writing tools. In the web era, organizations produce and deliver more and more content in increasingly shorter timeframes. To keep pace and maintain quality, they need tools and techniques to support the rapid and reliable delivery of consistent, high-quality rhetoric. Many organizations turn to structured writing solutions (i.e., differently structured solutions) to keep pace and meet demand. However, without a clear and comprehensive understanding of what is possible, they often choose solutions that are suboptimal or even worse than what they were doing before[^11].
The Text Encoding Initiative (TEI) is a consortium that collectively develops and maintains a standard for representing texts in digital format. Its main product is a set of guidelines that specify encoding methods for machine-readable texts, especially in the humanities, social sciences, and linguistics. Starting in 1994, the TEI guidelines have been widely used by libraries, museums, publishers, and individual researchers to present texts for research, teaching, and online preservation[^12]. The purpose of TEI is to provide guidance for the creation and management in digital format of every type of data created and used by researchers in the humanities, such as source texts, manuscripts, archival documents, ancient inscriptions, and many others. As its name suggests, it focuses primarily on text, rather than sound or video, but can be usefully applied to any form of digital data. Essentially, it was created and is maintained by the scholarly community to be used by this community, especially for creating digital resources beyond the easy use of WYSIWYG applications[^13]. TEI emphasizes what is common to any type of document, whether it is physically represented in digital form on disk or memory card, whether it is printed as a book or newspaper, whether it is written as a manuscript or codex, whether it is inscribed on a stone or wax tablet. This continuity facilitates the migration of text from older manifestations, such as print or manuscript, to newer ones, such as disk or screen. Therefore, the TEI vision of what text actually is is largely conditioned by what text has been in the past, without however compromising too much what text might become in the future [author's emphasis]. It attempts to treat all types of digital documents in the same way, regardless of whether they were "born digital" or not. Consequently, the TEI framework provides a useful way of thinking about the nature of text: it constitutes a kind of encyclopedia of generally accepted textual notions. Currently, TEI documents in digital form are expressed using XML. This provides a simple way to represent structured data in the form of a linear stream of character data and to tag certain parts of this stream with named labels to indicate structural function or semantics[^14].
In the noisy marketplace of digital humanities, TEI is a kind of senior member, an annoying parental figure for some, benevolent for others, something too outdated even to be considered for still others. However, in the last decade, it has become increasingly clear that TEI is part of what makes digital humanities happen: it has become part of the infrastructure with which everyone must engage, both technically and socially, once they begin to think about text or other forms of cultural resources in digital form. TEI provides a set of tools with which this reflection can be done and, most importantly, also reflects the thinking that is being done, both through its concerns and through its occasional oddities[^15]. This sense of TEI, as information architecture, is what we have used for most of this text. It is evident that this digital humanities marketplace is a very small one compared to the rest of the market, almost exclusively from the commercial area. We want to mention here some of the successful projects from both the European and non-European space: American Memory from the Library of Congress (https://www.loc.gov/collections/), British National Corpus (http://www.natcorp.ox.ac.uk/), Les Bibliothèques Virtuelles Humanistes (http://www.bvh.univ-tours.fr/), Cambridge Digital Library (https://cudl.lib.cam.ac.uk/), The Corpus of Electronic Texts (https://celt.ucc.ie/), Chinese Buddhist Electronic Text Association (https://www.cbeta.org/cbreader/help/index_e.htm), The Canadian Confederation Debates (https://hcmc.uvic.ca/confederation/), Croatian National Corpus (https://www.hr4eu.hr/croatia/resources/), Deutsches Textarchiv (https://www.deutschestextarchiv.de/), Biblioteca Virtual Miguel de Cervantes (https://cervantesvirtual.com/), Corpus Coranicum (https://corpuscoranicum.de/en), Altägyptische Kursivschriften (https://aku.uni-mainz.de/die-digitale-palaeographie/), The Japanese Text Initiative (http://jti.lib.virginia.edu/japanese/), Persian Digital Humanities (https://sllc.umd.edu/fields/persian/roshan-institute/digital-humanities), The Electronic Text Corpus of Sumerian Literature (https://etcsl.orinst.ox.ac.uk/) and many, many others.
Procedures for Creating the Digital Matrix
The text we used was taken from a translated and printed edition of the Liturghier by the Archdiocese of Târgoviște in 2008. For our project, to obtain the best possible result, applying TEI rules, we took as examples of good practice the following resources: Manuscriptorium (https://www.manuscriptorium.com/), Incunabula and Blockbooks (https://digital.bodleian.ox.ac.uk/collections/incunabula/), Bibliothèques Virtuelles Humanistes (https://www.bvh.univ-tours.fr/). One of the best examples to guide encoding efforts was found in Manuscriptorium in the form of Aemilivs Macer De Herbarvm Virtutibus, accessible from the following link: https://www.manuscriptorium.com/apps/index.php?direct=record&pid=NKCR__-NKCR__6_J_000159__0TTKC38-cs. Several others were also consulted, the main criterion being the same publication period.
Regarding the encoding structure, an important point was establishing the taxonomy and a list of persons. Surfaces were added, in the TEI sense, to allow details of the original image to be added to the transformation result. The envisioned results, in addition to outputs, also included XSLT transformation stylesheets which, in turn, allow all possible outputs, from HTML5 documents to EPUBs. The text was written in Markdown for the simplicity of the format and the ability to process it further using the tools made available by TEI. The text that presented OCR artifacts was corrected. Where necessary, the text was normalized regarding punctuation and diacritics. See Figure 1 for problems related to the original text.
Figure 1. Processing of raw text and initial encoding in Markdown (display in Typora)
After errors were corrected and characters with Turkish diacritics were replaced, the stage of establishing a project for creating a digital document was undertaken.
Purpose
The purpose of this exercise is to indicate the potential for creating electronic editions in a dynamic manner adapted to different purposes of presentation and consumption. Encoding is done using XML following the rules of the TEI encoding scheme, which offers maximum flexibility in describing and transforming the original text, as well as graphic elements. Moreover, the encoding model using the TEI schema allows the addition of descriptive elements visible only in documents in traditional format such as marginalia, corrections, editorial indications, or modifications to the spelling or functional parts of a text.
Encoding the Liturghier
To carry out the actual encoding of the Liturghier, we opted for the Oxygen software package (trial license) for the editing tools that offer native support for TEI. Once installed, a new project titled Macarie's Liturghier was created.
Figure 2. Initiating the project for creating the digital edition (Oxygen)
Because the efforts allocated to TEI encoding of the Liturghier exceeded the grace period that the Oxygen software package offered, another working solution had to be found. After investigation, the most suitable environment for continuing the work was an XML-adapted configuration of VSCode software which was enriched with a series of necessary extensions.
Encoding Details
Because the Liturghier has crucial importance for the written culture of Romanians and the entire Balkan Orthodox space, because it is not only a heritage object but also a living cult instrument, the encoding efforts relied on the guidance from P5: Guidelines for Electronic Text Encoding and Interchange, Critical Apparatus[^16].
According to TEI documentation from Digital Facsimiles, in this project we will work with a set of digital representations of the original (digital facsimile) which we wish to invest with additional value in the digital sphere. For this reason, each page delimited using the <text> element has a facs (facsimile) argument attached to the pb (page beginning) element, which makes the connection between the digital facsimile and the encoded page. In some cases, as is also the first page of the Liturghier, there are several graphic elements on the scanned page. These were individualized using <facsimile> in which each detail was treated as a <surface> element that hosts the zones from the image individualized through <graphic> elements.
The module for creating critical apparatus allows encoding of different variants. The encoding effort of the Liturghier can be considered a possible initiation moment of a critical edition that attracts specialists' contributions. Because we have several stages of text transformation over time, we chose encoding using <app> (apparatus entry), which according to TEI offers the possibility to introduce an entry in the critical apparatus. For <lem> (lemma) we reserve the text in Middle Bulgarian which will be added as the transliteration effort progresses. For text variants, as is also the 2009 variant, we chose encoding using <rdg> (reading). The wit attribute mentions which are the landmarks of the text variants (witnesses) that are included in this critical digital variant. Instead of landmarks, we will call them witnesses to approach the technical version recommended by TEI.
The decorated initials were also marked through mention made with the help of the witDetail element for which the witness corresponding to the edition in which they appeared or in which they were maintained was mentioned. In cases where parts of the text appear on a single line, which for typographic option reasons or through significance were separated from the main body, these were marked using <l> (line).
The encoding effort was accompanied by the creation of a web project that indicates the transformation target using the XSLT stylesheet under preparation. To create the web project, we used the existing TEI project resources to which we added what was necessary from a visual point of view to arrive at the result in Figure 3. The web project was necessary to establish which elements are useful from an information point of view and which design will be used to generate the web pages in the end.
For XSLT transformation, the xslt3 software package was used which installs locally an XSLT processor version 3.0 (SaxonJS). SaxonJS is developed by Saxonica to allow working with XSLT and XPath in the Visual Studio Code code editor. For editor configuration and going through transformation stages, the dedicated document from the digital repository dedicated to the project will be consulted: https://github.com/kosson/Liturghierul-Macarie-1508-editions/blob/main/DE-INSTALAT.md.
Regarding the management of electronic resources involved in carrying out this project, these were gathered to be more easily managed in a repository on the GitHub platform, which can be accessed from the link: https://github.com/kosson/Liturghierul-Macarie-1508-editions.
Figure 3. Web project for establishing a transformation target for the XSLT stylesheet.
Future Developments
Having a data model created through TEI encoding, to take a first step toward results that can be used in various editorial processes, an XSLT (Extensible Stylesheet Language Transformations) complement was piloted, which is a document that facilitates the transformation of data from XML into formats useful for display using the browser. At this moment, attention is focused on creating an XSLT transformation file that produces a digital edition identical to the target file created through the web project. To replicate the working environment in VSCode, a list of necessities and necessary extensions can be consulted by accessing the document at the following link: https://github.com/kosson/Liturghierul-Macarie-1508-editions/blob/main/DE-INSTALAT.md.
Conclusions
This endeavor aimed to initiate the effort to create digital editions of the first printed book from the medieval Romanian space. The encoding borrowed elements that are commonly used for manuscripts, but also elements that are used for encoding early printed works. The technology for encoding the printed work is XML following the prescriptions and recipes of the Text Encoding Initiative. The final purpose is to stimulate research in this sphere of Digital Humanities technologies to create a multidisciplinary framework for work and exchange of experience.
Practice has indicated that TEI is a flexible framework, with available examples being easy to access and consult. The results will take shape along two axes. The first is related to the activity of encoding texts with all the necessary attention to details that capture the particularity of texts, formatting, details specific to a copy, etc.
Macarie's Liturghier becomes a first part of a corpus that the authors wish to create with the purpose of recontextualizing and valorizing in a new paradigm related to adding the additional value that structured text brings. The entire project and the different stages of elaboration are available for consultation in a dedicated digital repository on Github at the following link: https://github.com/kosson/Liturghierul-Macarie-1508-editions.
Bibliography
Abel, Bailie 2023. Abel, Scott; Bailie, Rahel Anne. The Language of Content Strategy. XML Press, 2023, (EPUB).
Baker 2018. Baker, Mark. Structured Writing. Rhetoric and Process. XML Press, 2018, (EPUB).
BRV [2023]. Bibliografia românească veche - BRV - Vol. I (1508-1716). 1508, Liturghier (Macarie), slavoneşte - Târgovişte (?). Available at web address https://biblacad.ro/bnr/brv.php, accessed in May 2023.
Burnard 2014. Lou Burnard, What is the Text Encoding Initiative? How to add intelligent markup to digital resources. Marseille, OpenEdition Press, 2014.
Chițulescu [2023]. Chițulescu, Policarp (archim.). Liturghierul lui Macarie (Târgovişte, 1508). Article available at web address https://web.archive.org/web/20210727060339/https://www.tipariturivechi.ro/articol/liturghierul-lui-macarie-targoviste, accessed in May 2023.
Cole, Han 2013. Cole, Timothy W.; Han, Myung-Ja K. XML for Catalogers and Metadata Librarians. Libraries Unlimited, 2013, (EPUB).
Fawcett, Quin, Ayers 2012. Fawcett, Joe; Quin, Liam R.E.; Ayers, Danny. Beginning XML. Wiley, 2012.
TEI [2023]. Text Encoding Initiative, article available at web address https://tei-c.org/, accessed in May 2023.
TEI 12 [2023]. TEI: Guidelines for Electronic Text Encoding and Interchange.12 Critical Apparatus. Article available at web address https://www.tei-c.org/release/doc/tei-p5-doc/en/html/TC.html, accessed in May 2023.
[^1]: Chițulescu [2023]. [^2]: BRV [2023]. [^3]: Baker 2018, 3 (EPUB). [^4]: Abel, Bailie 2023, 79 (EPUB). [^5]: Baker, op. cit. [^6]: Fawcett, Quin, Ayers 2012, p. 43. [^7]: The term semantic is commonly used to characterize markup languages. Semantics is the study of the meaning of terms, so semantic markup is markup that tells us what the content of a text means. People can and do understand the expression "semantic markup" in different ways, which leads to confusion about what is and what is not semantic markup. Therefore, I do not use the term semantic to describe markup languages, although I use it in other contexts. Baker 2018, 12. [^8]: Abel, Bailie 2023, 79. [^9]: Cole, Han 2013, 6.5 (EPUB). [^10]: Abel, Bailie 2023, 81. [^11]: Mark Baker, op. cit. [^12]: TEI [2023]. [^13]: Burnard 2014, 'Introduction', 1 (OpenEdition). [^14]: Ibidem, 'The TEI and XML', 1-3. [^15]: Ibidem, 'Conclusion: what is the TEI?', 1. [^16]: TEI 12 [2023].- Teacher: Silviu Bors
- Enrolled students: 50
How to Use OSINT to Debunk Disinformation
- Teacher: Joumana Boustany
- Enrolled students: 50






















