American Textbooks Metadata And Added Value

Metadata

TEI seems to have the best way to flow from the larger whole to the smaller individual parts of the texts. Given that I am working with components of a book (first the book itself, then chapters, then pictures and sentences, then words) TEI makes the most sense. I relied heavily on "TEI by example" to come up with these standards.

Visible metadata:

First is the most obvious metadata, the data about the scanned book itself. This data will include the title of the book, the author, publisher, year published, chapter names and number of pages in each chapter. These would all be tags as well (for more information on searchable tags, see below).

Each chapter scan will contain book title, author name, publisher and year published, as well as chapter name (if given) and/or sequential order within the book. Any subheadings and diversions from the main text (for instance, study questions provided at the end) will also be noted.

If the chapter contains pictures (drawings in the earlier years, photographs more common later, plus charts, graphs, diagrams, etc), these will be included within the chapter in the same order they are in the book. The title, if given, will be maintained (sometimes they are just referred to as "Figure 1.3" or something to that effect), along with any other information given about the image, like a caption. They will of course have information on the book, chapter, page number, etc., but will also be searchable with tagged keywords.

Administrative-only metadata:

Copyright information, including who gave the rights and when.

Next there would be the TEI manuscript description on the book, like the call number, date acquired, and location in archives. Then there would be the physical information about the book, including number of pages and the condition of the book

Scanning information would be date scanned (start and completion, if different), type of scanner used, initials of the person who scanned & who approved the scan, file type and size, any changes or corrections made to the scan in the order they were made. There would also be information about the location and size of the original scans (hosted at NYPL). Any updates on the material itself (like cleaning up or moving a scan) will of course be noted once it happens.

User data will be added after the site is up and updated monthly. Information will include the number of times the file is downloaded, most popular search terms, etc.

Search and Tagging:

Search will be available via the OCR transcription "hidden" behind the scanned page image. Each chapter will be tagged with terms from the metadata (author, title, publisher, date published) as well as more search data determined by the panel of teacher consultants. These would include names of major figures, groups, and events, date ranges, and concepts. Like in the Martha Ballard diaries, certain terms would include multiple names for that term if there is a common alternative or the usage changed over time (for instance, referring to African Americans as "colored people" or Native Americans as Indians or "savages"). However, I want to be careful to keep the search value-free as well as comprehensive.

There will be a search suggestion function where users can suggest alternative ways to search to get the subject they desire.

Added Value

There will be short biographies of the textbook authors (some, like Susan Pendleton Lee and Charles Beard, are fairly noteworthy among historians).

There will also be links to current textbook publishing houses and their descriptions of what they currently offer as American history texts.

I would also like to link to some academic criticism on the act of writing American history (Peter Hoffer is a great source), but of course this comes with a certain POV. Also, it is usually directed at more of a college or graduate level student of history.

Additionally, there will be a link returning to the New York Public Library main site.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License