Data model

The following section provides an initial overview of the data model of edition humboldt digital. The comprehensive documentation of the data model can be found in the Editionsrichtlinien (german only).

Historical sources

The metadata and texts of the edited manuscripts (travel journals, letters and documents) are encoded according to the guidelines of the Text Encoding Initiative (TEI). By using ediarum.BASE.edit, the Base format of the German Text Archive (DTABf) (Haaf/Geyken/Wiegand 2015) is largely used, which has been extended for the encoding of handwritten texts, (Haaf/Thomas 2018). For a few project-specific concerns, additions and extensions to the tagset were made that are compliant with the TEI guidelines. These extensions relate in particular to project-specific requirements arising from the material, such as the tagging of measurements as well as inserted or appended notes. However, the extensions has been made in a way, that these follow as closely as possible the systematics of the DTABf. Thus, in many cases, instead of adding new elements or attributes, attribute values were added.

All names of persons, places and institutions as well as bibliographic references in the source texts (and their metadata) are linked to their respective index entries.

Paratexts

All other texts, such as the introductory research articles on the diary fragment "Isle de Cube. Antilles en général," were also recorded in TEI-XML according to the Base format of the German Text Archive.

Indexes

The indexes on persons, places and institutions as well as the glossary are edited and held in TEI-XML. Each entry has its own unique and permanent ID. In addition to the basic data of an entry, a short description is also provided.

The entries are also provided with one or more URIs from different authority files, if an entry exists there. This makes it possible to identify persons, places and institutions across projects, both in ehd itself and when using the APIs and data publication (see also Stadler 2012 on authority files in editions). For person entries, the GND is primarily used, alternatively or additionally also VIAF. For places, on the other hand, URIs from the free geographic database GeoNames are used, since this achieves a high level of coverage worldwide and entries can be added by the project itself if necessary. Thus, a few localities missing in GeoNames (such as the former locations of the Berlin Observatory) were added there.

Furthermore, the project retrodigitized the index entries of 25 editions (corrspondence, documents, diary excerpts) of Alexander von Humboldt research published between 1973 and 2016 and made them available in the digital index. These index information of the printed editions is part of the series first published by the Akademie-Verlag and finally by De Gruyter Beiträge zur Alexander-von-Humboldt-Forschung, edited by the Alexander-von-Humboldt-Forschungsstelle (1970-2014) of the BBAW.

Links between index entries and the edited materials are automatically generated from the database using the ehd-ID.

Humboldt's token

On manuscripts of Alexander von Humboldt sometimes so-called "Siglen" (tokens) can be found, i.e. combinations of 1-2 characters (e.g. "Ad") that relate different documents, letters and diary entries to each other. These token were not only transcribed, but also entered into a separate TEI-XML register and linked to it from the text passages. In this way, the relationships can also be traced in the digital edition.

 

Plant index

For the plant index, no separate entries are created in a TEI-XML index file (as with the other indexes). Instead, the scientific plant names in the edited texts are annotated in the texts using the appropriate TEI encoding, normalized if necessary, and then retrieved automatically. The plant index is thus created completely dynamically. Each distinct plant name has a list of references to the texts and is automatically linked to various taxonomic databases (see below, section Global indexes for scientific names). Due to the dynamic creation of this index, no permalinks can be provided here yet.

Drawings and sketches

Since version 9 of the ehd, a virtual "index" provides an overview of drawings and sketches from Humboldt's or other contemporary hand in the edited texts. Similar to the plant index, it is compiled automatically from the illustrations marked with figure.

Bibliography

The bibliography of the edition is maintained in the literature management software Zotero. The publicly viewable Zotero group allows collaborative maintenance of the data and can be accessed by all interested parties - also in various citation styles and export formats (see below).

Documentation

The documentation of the data model is done in DITA, more precisely in the DITA files of ediarum.BASE.manual. This way synergies are used as well as specifications or modifications compared to the data model of ediarum.BASE.edit are documented. DITA (and not ODD) was chosen to enable this combination of ediarum.BASE.manual and the guidelines of edition humboldt digital. Furthermore, the internal DITA documentation includes not only the coding guidelines based on the concrete material, but also the operating instructions for the editors in ediarum.AVHR.edit.

The edition guidelines each refer to the corresponding parts of the DTABf documentation.

Schema

While the schema was initially formulated in RelaxNG, it is now maintained as a TEI ODD file (Viglianti 2019) (from which, however, RNG derivatives continue to be created). It is derived using ODD chaining from the ediarum.BASE.schema, which in turn is derived from the DTABf. This makes it possible to maintain and track the respective changes to the next "higher" schema at each level (ediarum, edition humboldt digital etc.). The schema will be published together with the data soon.

Software & Technologies

The publication edition humboldt digital is created in ediarum. The digital working environment ediarum is a solution developed since 2012 by the DH initiative TELOTA, that allows scholars to edit transcriptions of manuscripts, comments, and index entries in TEI-compliant XML, add a critical apparatus and notes, and then publish them on the web and as PDFs (Dumont/Fechner 2014). Ediarum consists of several software modules. For the input and editing of data, edition humboldt digital uses the module ediarum.BASE.edit, which - as is usual with ediarum - is supplemented by a project-specific module ediarum.AVHR.edit for the edition humboldt digital. The module ediarum.REGISTER.edit is used to create and maintain the index entries. An exception is the bibliography, which is maintained in the literature management software Zotero. For this, ediarum.DB offers an interface for synchronization.

The data is stored in the free XML database existdb, the module ediarum.DB is used to manage the data. Existdb serves at the same time - together with the web server Jetty - also as basis of the digital edition, which was realized with XQuery, XSLT and XPath and in the meantime in an eXistdb App in accordance with the EXPath packaging specifications, which simplifies Deployment and development. For the search functions, the Lucene-based faceting, available since eXistdb 5.0, is used. In addition, the edition has several specially programmed caches that increase performance, especially for complex queries. For the display of facsimiles, drawings and illustrations the software digilib, developed at the MPI for the History of Science, is used.

Layout & Web Design

The design of edition humboldt digital originates from drafts that the author originally developed in 2014 for the project "Schleiermacher in Berlin 1808-1834". Due to the similarity of the edition type and the source genres, the designs could be reused and further developed for the ehd.

The design is based on principles of "flat design", i.e. simplicity, minimalism and a strong focus on typography. Especially the latter is a central point, since this digital edition is about one thing above all: text. Therefore, an Antiqua(PT Serif) was chosen as the main typeface, which has a "true" italic cut. It is accompanied by a grotesque (PT Sans) from the same typeface family, which is mainly used in subnavigations, smaller text and meta information. These classic typographical conventions, as far as they could be reasonably transferred to the digital medium web, were also taken into account in the further design. For example, a flexible text width oriented to the width of the viewport was dispensed - in favor of a fixed one that corresponds roughly to the typographically recommended line length.

The design approach also does not use a dedicated page header: in order to give the texts as much space as possible, only a low navigation bar was placed at the top, which is clearly visible thanks to the black color. The page header is taken up by the document title or title of the individual page instead of the website title and sponsor etc. logos (which were quite common in 2014). At the same time, this space also provides room for further meta information and sub-navigation (chronological scrolling between letters; sub-section navigation; letters in the index, etc.).

The design has to master two challenges: On the one hand, the abundance of different text types (edited letters, travel journals, documents; research contributions and indexes) and information that need to be accommodated. Here the design of the ehd follows the principle not to show everything immediately, but to fade in certain information only on user interaction. There should always be enough white space to let the eye rest or to group and prioritize the different information in a meaningful way. A second challenge was (and still is) to adapt the design to the changing requirements and the constantly growing amount of material and information. For such a long-running academy project (2015-2032), not every type of information and function was foreseeable at the beginning. Thus, the design has also been adapted again and again - from the introduction of a sub-navigation, to the redesign of the homepage and the introduction of page-based text and facsimile display, to the ever more in-depth and complexly distinguished edited texts. This is sometimes more and sometimes less well done.

The implementation of the design in HTML uses the 960 Grid System and relied heavy on CSS. Javascript is only used for special - if necessary replaceable - functions of the interface; the generation of the HTML pages is thus largely server-side to facilitate archiving in the Web Archive or the BBAW Web Archive.

External data & web services used

The networked edition humboldt digital. Simultaneously published in the poster "The networked edition humboldt digital". DH2023 in Graz. Abstract: https: //zenodo.org/record/8107834.

For the digital edition, data from third-party projects were reused or external web services were used in several places.

Cascaded Analysis Broker of the German Text Archive (DTA::CAB)

The normal search can be extended by a function that also finds historical spellings and other word forms. For this purpose, the edited texts are linguistically analyzed and annotated using the DTA::CAB web service. Among other things, all words are lemmatized so that searches can be performed on the basis of this lemma. DTA::CAB was developed by Bryan Jurish as part of the German Text Archive.

For more information, see the documentation or Jurish 2012.

Humboldt's writings in the German Text Archive.

In the German Text Archive, over 180 of Alexander von Humboldt's writings are available TEI-XML encoded in full text. A function has been implemented in the chronology that displays or searches the title data of these writings. For this purpose, the D* OpenSearch API (dazu:OpenSearch Description) provided by the German Text Archive is used. Thus, the full texts of Humboldt's writings can not only be displayed, but also searched: The search function allows to specify the number of hits and to link directly to the hit list.

Digitized bibliography on avhumboldt.de

Within the portal avhumboldt.de a digitized b ibliography of Alexander von Humboldt's independently published writings has been provided since 2009 under the direction of Tobias Kraft. The data of this bibliography was converted into XML and integrated into the database of edition humboldt digital to be displayed in the chronology.

correspSearch

The web service correspSearch aggregates machine-readable letter indexes from printed or digital letter editions and makes them centrally searchable (Dumont et al 2023; Dumont 2018). In addition, it provides an interface that also enables automated retrieval of these data and reuse in one's own programs. Since Humboldt's correspondence(Schwarz 2018) was already considered too extensive to be edited in a complete edition in the 1960s, it has since been published scattered in individual correspondence editions or even essays (especially in the journal Humboldt im Netz)(Schröder 2008). In correspSearch, almost all of the more than 6000 published letters to and from Alexander von Humboldt have been brought together for the first time and made searchable for researchers.

In edition humboldt digital, this data is queried via the API of correspSearch in two places: firstly, in the chronology (if this option is activated accordingly); this brings together the more than 1600 entries on Humboldt's life with his published correspondence. On the other hand, the correspSearch API is queried in the individual view of a letter under "Explore letter network". There, letters from and to Alexander Humboldt from other editions are queried in order to make visible with which other correspondence partners Humboldt still had contact in the respective period. In addition, the letters received and sent by the respective correspondence partner during the corresponding period are displayed - the query is based on the GND or VIAF URI stored in the register. In this way, the "extended correspondence context" of the letter network is made visible(Dumont 2023). This function was originally prototyped in the edition humboldt digital on the basis of XQuery and then newly implemented in the DFG project correspSearch as a freely reusable Javascript widget csLink (Müller-Laackman / Dumont 2022). This is also used in the meantime in ehd.

Practices of the Monarchy: Court Calendar

Also integrated into the chronology are events at the Prussian court in which Alexander von Humboldt participated and which are recorded in the court calendar, which is part of the publication "Practices of the Monarchy". For this purpose, the TEI-XML data of the Hofkalendarium are transferred to edition humboldt digital and the personal mentions are mapped to the ehd register using the GND URIs (if available, otherwise they link to the register entries of Praktiken der Monarchie).

GeoNames & OpenStreetMap.

The places and institution locations listed in the register are usually linked to a URI of the free geographic database GeoNames provided. This allows the locations to be identified across projects, which greatly simplifies subsequent use of the data. The GeoNames URI is also used to obtain the geographic coordinates, on the basis of which a corresponding map can be displayed with the location and institution entries using the free map service OpenStreetMap. This is especially helpful for smaller locations of Alexander von Humboldt's various travels.

Common Norm File & BEACON

Based on the URI noted in an entry of the register of persons from the Common Norm File (GND), further data can be obtained via the GND web service "Entity Facts". Thus, with the help of the "Entity Facts", on the one hand, portraits are faded in directly from Wikimedia Commons, and on the other hand, information on (family, friendship, etc.) relationships between persons is obtained. These are also automatically evaluated in the ehd on the basis of the GND URIs and then displayed to the users accordingly (see, for example, the entry on Samuel Thomas Soemmerring).

The GND-ID also allows linking to other editions, encyclopedias and projects that are digitally relevant in the subject area of edition humboldt. The BEACON interfaces of these projects are used for this purpose. For example, it is possible to identify thematic overlaps with other projects of the Center Prussia-Berlin at the BBAW via the links in the register data and to make them available for research (example Wilhelm von Humboldt or Friedrich Schleiermacher). But also external offers, such as Hidden Kosmos or Die deutsche Biographie are linked automatically in this way.

 

Global indices for scientific names

In the plant index(see above), various web services and APIs are used to link scientific plant names from texts of edition humboldt to matching entries from taxonomic databases in an automated digital way. Using the Global Names Resolver web service, the Plant Register links scientific names to entries in the Encylopedia of Life, Tropicos - Missouri Botanical Garden, The International Plant Names Index (IPNI). In addition, the Catalogue of Life, the Biodiversity Heritage Library, and the Global Biodiversity Information Facility are queried using their own interfaces and are also linked. Other databases may be added in the future, provided they have appropriate technical interfaces.

The query of the different databases and automated linking is based on the scientific plant name, which is coded accordingly in the edited text and normalized if necessary.

Data provision & APIs

Licensing

edition humboldt digital not only reuses external data and web services, but in turn makes its data available under the free Creative Commons license CC BY-SA 4.0 via an interface and as an independent data publication.

TEI-XML interface

All edited texts, research articles and chronology and register entries of edition humboldt digital can be retrieved via the TEI-XML interface http://edition-humboldt.de/api/v1.2/tei-xml.xql (note version 1.2).

When called without the parameters, a list of all data with title and permalink of the respective current version is offered. When retrieved with the type parameter, a list of the respective document types is generated (see table below). When retrieved with parameter id, the respective document is output.

Parameter type

 
Values Description
[not set] all records in the person register; set by default unless otherwise specified
correspondents all correspondents
personMentioned all persons mentioned

data publication

All edited texts, research contributions as well as the index of persons, places, institutions and sigles are also published digitally as a complete TEI-XML dataset of the edition humboldt. For this purpose, the texts and entries are not simply exported from the eXistdb, but are retrieved with the help of a Python script via the TEI-XML-API 2.0 of ehd, so that the data image matches that of the data accessible via the API. This makes use of the enrichments that take place there (e.g. GNDs; URIs instead of IDs), the harmonizations to the DTABf as well as the division of the register lists into individual TEI-XML files. In the process, the directory structure will also be changed so that the data is organized by type (i.e. letters, diaries, research contributions, register entries, etc.).

For initial purely technical versioning, the TEI XML files retrieved in this way are played into a Git repository, which is also available for public viewing on GitHub. After a check and completion with the TEI-XML schema of ehd (as RNG), the dataset is exported from there to Zenodo to be freely available there in a long-term archived form.

Further APIs

OAI-PMH

The metadata of the edited texts of the as well as the research articles are provided via the interface http://edition-humboldt.de/api/v1.1/oai-pmh.xql according to the Protocoll for Metadata Harvesting of the Open Archive Initiative. Thus, these texts are also automatically referenced in the Bielefeld Academic Search Enginge (BASE). As a metadata format for OAI-PMH, only Dublin Core is currently supported.

CMIF interface

Via this interface the correspondence metadata of all letters available in this edition can be retrieved in the Correspondence Metadata Interchange Format (CMIF). As a result, the letters edited in this edition will be referenced in correspSearch.

URL: http://edition-humboldt.de/api/v1.1/cmif.xql

BEACON files

The persons available in the ehd data stock and marked with the GND-URI can be retrieved via http://edition-humboldt.de/api/v1.1/beacon.xql as a list in BEACON format and can be linked automatically in external digital offers. It is possible to limit the list to persons who are mentioned in the letter text or who are correspondence partners (see table below).

Parameters
type
 
Values Description
[not set] all records in the persons register; set by default unless otherwise specified
correspondents all correspondents
personMentioned all persons mentioned
authority

all persons with a norm ID of a specific norm file (restricted by type if necessary)

 
value Description
gnd Common standards file of the German National Library; set by default unless otherwise specified
viaf Virtual International Authority File

Context Objects in span (COinS) & Zotero API

The entries of the bibliography are also accessible via APIs. On the one hand, they are embedded as machine-readable ContextObjects in spans in the HTML page of edition humboldt digital. This allows them to be transferred directly into common literature management systems at the click of a mouse. On the other hand, the complete biography is also accessible as a publicly viewable Zotero group via the Zotero API at https://api.zotero.org/groups/667230/items.

Versioning, permalinks and citation notes

The texts and data made available in edition humboldt digital are versioned, i.e. each published version (usually one per year) of the content is kept available for retrieval. The entire published data stock is always versioned. Thus, the register entries also reflect the links of the respective version (cf. e.g. the entry on Georg Forster in version 1 compared to version 9). An overview of the changes between the versions of edition humboldt digital is provided by the version history introduced with version 8. It also evaluates the number of pages edited, chronology and register entries created, and entity links coded.

All texts are provided with a citation note as well as permalinks that reference the respective version (e.g.: http://edition-humboldt.de/v1/H0002656; on citation and referencing in digital editions, see Bleier 2021). If necessary, the respective folio can also be referenced for edited texts by simply adding the folio information to the path, e.g. https://edition-humboldt.de/v9/H0002656/2v. Research articles that do not have page numbers due to their "digital born" character can be cited paragraph by paragraph. For this purpose, the paragraph number (which is always displayed at the top left of each paragraph) is appended as a so-called fragment identifier with a #, e.g. https://edition-humboldt.de/v9/H0016432#3. In addition to the texts and register entries, certain subsections (main topics and various correspondences), which - like a volume - are actually composed of several texts, are also provided with their own citation notes including permalinks (e.g. https://edition-humboldt.de/X0000003).

In addition, the individual texts and data sets are also provided with a canonical URL that always redirects to the most current version. This simply omits the version reference in the path, e. g. http://edition-humboldt.de/H0002656.

The canonical and uniform URLs ("H" and a seven-digit number) have also made it possible for the printed volumes of edition humboldt print to link back to the digital counterpart for edited texts.

The interface itself, i.e. the XQL, XSLT and JS scripts as well as CSS and other files, is currently not publicly versioned. However, it is versioned and kept in a git for development purposes. For the future, it is also planned to store each version of the digital edition (as a user interface) additionally in the web archive of the BBAW and to keep it there. The edition humboldt digital is currently being technically prepared so that it can be archived in the best possible way.

Independently of this, the data will also be published and archived permanently on Zenodo (see section "Data publication").