American Textbooks Draft Tagging Guidelines And Sample

TEI Encoding

I plan to use TEI encoding, as my project is largely text-based and what I hope to really get at is the way language is used, and how it changes over time. I will have the entire textbook scanned and available for viewing (72 dpi seems more than enough, as the books are still easy to read). This way, anyone who is not a high school student, but is interested in browsing the textbooks can get a feel for what they looked like and said in full, without having to navigate through the lesson plans.

As individual chapters are selected to highlight, I will use OCR to transcribe them, and proofread the result. These transcriptions will hide behind the chapter image and allow for in-depth encoding.

I think the TEI will provide flexibility that my specific project needs. For instance, when certain concepts (ex: schooling, integration) or larger categories (the Civil War) are mentioned, I think tagging them at first mention so they show up on a search will be adequate - every mention within that given chapter sub-heading does not need to be tagged. In other instances, however, every use of a certain word or phrase might be useful - for instance, how many times does the author mention "blacks" (or some variation thereof) when discussing the Civil War? I will try to let my encoding schema reflect that.

Moreover, because the use of a certain word to describe a group, event, or concept may change over time, I like that TEI gives me the SameAs capability to connect those different words to show they all point to the same thing. (For instance, earlier use of outdated terms "Negro" or "Colored" being replaced by "Black" or "African American," or, in pre-1865 history, "slaves.") I can also use this capability when expressing a date range, for instance, the Civil War can also be searched as "1861-65," etc.

Additionally, I have gone back and forth questioning whether I should include a glossary for important people and events. Obviously, American history is chock-full of names, dates and places. However, the whole point of the textbook is to define these terms, so I don't want to provide one alternate definition when what I want them to do is compare the many differing definitions provided. (Also, high school students know better than almost anyone who Henry Clay was and when he ran for president). If I did provide a list like this, what I would want it to do was generate how many times that term was used, possibly over a time line (wouldn't it be interesting to see if we mention Abraham Lincoln less in 1970 than in 1900, but Robert E. Lee more?). So I think any sort of list of important people or places would only concern usage, not definitions. For non-high school students, there are plenty of other places online where they can get any length descriptions of these things.


for visible metadata:

author (there will be a link back to the book's main page, with a short bio on the author)
chapter title
sub-chapter heading
publisher (there will be a link back to the book's main page, with a short bio on the publishing house)
publisher location (city)
publish date
as well as physical information re: the text, like paragraphs, headers, sub-headers, sidebars, captions, question-and-answer sections

for terms used in text:

person name
place name (divided up hierarchically: city, state, region, country)
date (which can be included within date range)
event name (annotated for specifics like battles)
group name (which, as said above, might include SameAs; thinking mostly to reflect ethnic or minority groups)
organization name (for instance, religious orders, political parties, etc)
legislation (annotated as state, federal, etc)

Please note: the attached sample file is not from an actual textbook from NYPL, as I was not able to scan any of those, but one roughly the same size as those books for the initial scan. Thus, the sample itself will not contain all the TEI encoding I hope to use, but I will provide a comprehensive list of these tags as well below.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License