Tagging Guidelines

Tagging

An important part of the rationale for digitizing the Henry Barnard Papers is to make the large collection more accessible to scholarly investigation into the lives of Barnard and his colleagues in Common School Reform, as well as their theories and ideals regarding education. In order to accomplish this goal the online collection must be able to support fairly detailed search queries regarding particular people, places, organizations, and ideas. To that end the project will encode the transcribed text from the digitized manuscripts into eXtensible Markup Language (XML) using the Text Encoding Initiative’s latest Guidelines, P5. Adopting the TEI standard encoding language prevents the project from having to develop their own system and allows the resulting online collection to remain consistent with the numerous digitization efforts that have already made use of the Guidelines. Furthermore as a result of the TEI’s large user base there are a number of resources available to provide assistance in implementing it, including introductory guides and technical support from the “TEI-L discussion list.” Another advantage of utilizing the TEI is the modular nature of the Guidelines; the project will be able to select only those portions of the system that are relevant to its documents and goals. The Guidelines are also tolerant of customization should there be aspects of the Henry Barnard manuscripts that the TEI standard cannot adequately describe.

The tags employed in the encoding will provide users with the means to search through the collection and sort relevant papers according to their research topic. Each of the digital transcripts will be structured to reflect something of the original format. Correspondence, drafts, printed articles, and speeches will share one document structure, though each type will have a few specialized tags, while diaries and notebooks will be assigned a second, and visual materials a third. The specific format of the original will be explicitly identified with a “document type” tag, allowing searches to focus upon or exclude particular forms. Within the three structures all of the digitized manuscripts will have a heading to provide metadata including the author, date of creation, and the location of the original source within the physical Henry Barnard Papers collection at Fales. If the document does not indicate the date of its creation, or it is incomplete, an approximate date enclosed in the transcription by square brackets will be assigned and tagged as “uncertain” to allow the affected items to be excluded from or relegated to the end of search results. The heading will also include applicable keywords, derived from a list developed by the project’s selection committee, that describe topics within the text. Some of the headers will include specialized information for particular document formats. For example drafts and any printed articles that are digitized will have bibliographic information describing the published version or source of the document, while in describing correspondence the heading will specify the addressee and include the individual or entity’s address if it is indicated. Diaries and notebooks, which will be structured so that each entry is distinct within the larger document, will have links from the heading to anchor tags within the transcription to allow users to move directly to the segment that interests them.
Following the heading will be “body” of the document, namely the transcribed text of the manuscript. The transcription will be divided into paragraphs according to the structure of the source. Likewise gaps in the text will be tagged with a footnote to explain the break in the text, such as a lost page, damage to the document, or illegible writing. In cases where the transcriber is unsure of the author’s writing, but can provide a “best guess,” the “unclear” tag will be used to mark that uncertainty. If the author has scratched out a portion of text or made additions to it the “Deletion” and “Addition” tags will cause the words to render differently from their neighbors to highlight the changes. Any notations made upon a document by Barnard will be transcribed and tagged, as “annotations,” in a similar fashion. Finally in the event that the author misspelled or used an outdated form of a word the transcription will reflect the original, but the word in question will be tagged with a “correction” tag that provides the modern conventional form of the word in order to facilitate full text searches.

Similarly names referred to in the document, such as those of people, places, publications, and organizations, will be assigned a regular appellation in the tag surrounding it to allow more effective searches for the occurrences within the online collection of specific information. For example many of Barnard’s correspondents signed their letters with their initials or simply a first initial and surname. Thus to clarify the identity of an author, or merely a reference to a person within the text, the name will be tagged with their full first name, middle initial, and surname. If the individual’s name changes over time, say as the result of marriage, then the tag will regularize it with the designation by which they were best known. Therefore in the letters Mary Mann scribed before her marriage to Horace Mann her maiden name, Mary Peabody, will be tagged with her married name, Mary P. Mann. References to particular places will be tagged with not only the regular name but also the city, state, and country it is located in to avoid confusion should the same name refer to multiple areas. In addition each instance of a reference to a name of interest will be tagged to allow those documents most relevant to a user’s inquiry to rise to the top of a results list.

The digital calendar of documents held in the Henry Barnard Papers that are not selected for digitization will also be encoded as an XML document, though a much simpler one. Within the record each entry will provide the same bibliographic as the standard headings. However the “body” of the document would instead simply be an abstract describing the general content of the document and would therefore lack the more sophisticated tags of the transcriptions. If able the project will also include links to any pre-existing digital versions of the document.

Sample Document Structures

Correspondence and Manuscripts

Document Type Main entry, specifies type of document, either letter, draft, printed article, or speech
Heading Basic metadata for the document
Author Name will be entered as written in the document and regularized in the tag
Date Enter as written
Uncertainty Use to indicate an approximate date, if written date absent or incomplete
Addressee Name and address, Enter as written, regularize to street, city, state, country in tag
Publication Bibliographic information for any published versions of the document
Keywords To indicate topics that appear in the text
Reference Number The number assigned to the letter in the alphabetical index of the correspondence
Location of Source Document Where the physical original can be found
Link To page images
Body Text of the document
Paragraphs As needed to reflect something of the original structure of the text
References To people, places, organizations, and published works mentioned in the text
Names Enter as written, regularize in tag to first name, middle initial, surname format
Places Enter as written, regularize in tag with city, state, and country
Publications Enter as written, tag with basic bibliographic information
Organizations Enter as written, regularize in tag
Gaps To indicate and explain breaks in the text
Unclear To denote instances where a “best guess” was made in the transcription
Additions Text inserted into a line by the author
Deletions Text lined out by the author
Annotation Notes written in the margin of the document by Henry Barnard
Correction To regularize the spelling of a word to its modern form

Diary or Notebook

Document Type Indicate document as a diary or notebook
Heading Bibliographic information
Author Henry Barnard
Dates Inclusive
Uncertainty Use to indicate an approximate date, if written date absent or incomplete
Anchor Links To individual entries
Location of Source Document The series, box, and folder in which the original can be found
Entry Individual entries within larger document
Title Enter as written or assign based on subject
Date Of the entry
Uncertainty Use to indicate an approximate date, if written date absent or incomplete
Place Where the document was written, if applicable
Keywords Topic of the entry
Anchor To allow links from heading
Link To page images
Body Substantive text of the entry
Paragraphs As needed to reflect the original structure of the text
References To people, places, organizations, and published works mentioned in the text
Names Enter as written, regularize in tag to first name, middle initial, surname format.
Places Enter as written, regularize in tag with city, state, and country
Publications Enter as written, tag with basic bibliographic information
Organizations Enter as written, regularize in tag
Gaps To indicate and explain breaks in the text
Unclear To denote instances where a “best guess” was made in the transcription
Additions Text inserted into a line by the author
Deletions Text lined out by the author
Annotation Notes written in the margin of the document by Henry Barnard
Correction To regularize the spelling of a word to its modern form

Visual Materials

Document Type Photograph, lithograph, woodcut, engraving, or cabinet card
Heading Bibliographic information
Name The individual depicted in the image
Author The photographer and/or studio that created the image
Date When the image was created
Uncertainty Use to indicate an approximate date, if written date absent or incomplete
Location of Source Physical location of the original
Image The individual picture
Reverse Side of image If it bears annotation or an additional graphic
Body If there is any text written on the item
Paragraphs As needed to reflect something of the original structure of the text
References To people, places, organizations, and published works mentioned in the text
Names Enter as written, regularize in tag to first name, middle initial, surname format
Places Enter as written, regularize in tag with city, state, and country
Publications Enter as written, tag with basic bibliographic information
Organizations Enter as written, regularize in tag
Gaps To indicate and explain breaks in the text
Unclear To denote instances where a “best guess” was made in the transcription
Additions Text inserted into a line by the author
Deletions Text lined out by the author
Annotation Notes written in the margin of the document by Henry Barnard
Correction To regularize the spelling of a word to its modern form

Calendar

Document Type Calendar
Entry For documents not included in digitization project
Heading Bibliographic information regarding un-digitized documents
Author Name will be entered as written in the document and regularized in the tag
Date Enter as written
Uncertainty Use to indicate an approximate date, if written date absent or incomplete
Addressee Name and address, Enter as written, regularize to street, city, state, country in tag
Publication Bibliographic information for any published versions of the document
Keywords If applicable
Location of Source Document Where the physical original can be found
Link To a digital version if one is already available
Abstract Brief summary of document’s content
Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License