Historians On Trial Digitization

Who is the exhibit designed for? How will it be used?
The documents of this exhibit (court documents, newspaper and scholarly articles, and correspondence concerning Rosenberg and Kessler-Harris) will most likely be used for scholarly or perhaps legal research. It is therefore essential to facilitate accurate searchablity of all of these documents. There is also a need to make them available in PDF from, which will allow users to save, print and email the material. Moreover, it is imperative that sources are cited properly, and the credentials of the creators are clearly stated.
It is anticipated that some users may be attracted to the material for legal purposes, thus preserving the authenticity and accuracy (ensuring that the pages are in order, that no document is miss-titled or inaccurately described, etc.) of the court documents in particular is also of upmost importance.

Which is more important: preservation or access?
The purpose of digitization is more focused on access than preservation, namely because the material is in very stable condition and will probably remain so for many years. Moreover, transcription will be the primary mode of digitization, and images of the vast majority of the documents will not be captured. Thus, the only preservation that will result from the efforts of this project will be the preservation of textual information.

File format
Documents will be transcribed (keyed) by hand. The only documents that will be scanned are those that are not well represented by transcription, such as documents with marginalia or hand written correspondence. However, it is anticipated that only a handful of documents will need to be scanned. These scanned images will be saved as TIFFs, stored on the University’s server and backed up on hard drives. These files are archival copies and will remain untouched. Two copies will be made from these TIFFs: a JPEG for quick loading display (as well as for quick and easy reference for staff), and a PDF for downloading, printing and emailing. These documents will also be transcribed.
All transcriptions will be stored as XML files (.txt), also on the university's server and backed up on hard drives. These documents will be available for downloading, printing or emailing.

create transcription rules similar to these

File naming/structure
Each transcribed document will be named with a unique file name that references the original document and file type. For example, the first page of the written testimony of Rosenberg and Kessler-Harris will be named something like:

Those documents that are scanned will be named with the same structure. A letter written by Kessler-Harris might be named something like:

Sample of JPEG
First page of written testimony JPEG

Sample of PDF
First page of written testimony PDF

Text transcription
All text will be transcribed by by student workers, and subsequently proofread by two different student workers.

Resolution/color/file size/pixel ratio
Documents will be scanned at 300 dpi in 16-bit grayscale. Most documents are 8.5x11 in size, and will be scanned at 100% proportion. Most TIFFs will be about 16MB in size, JPEGs will be about 800KB, and PDFs will be about 1MB (per page). The JPEG pixel ratio is 2300x3300.

Quality control
Each image will be examined in Photoshop by a student worker to ensure no lint or hair appears in the final image. Images will be cropped, left with a very small border, and saved as JPEGs. Filters such as unsharp mask or brightening may be applied if necessary for enhanced web viewing.

Digitization Equipment
One standard sized Epson scanner attached to a Mac computer with Adobe Photoshop, one PC with Adobe Photoshop, and two PCs dedicated to transcription comprise the equipment necessary for digitization. The scanner and computers with Photoshop will ideally be leant to the project by the host institution, as they are only necessary for a small portion of the project.

Calibration of both Mac and PC monitors as well as the scanner will take place before any documents are swcanned. A sampling of a range of documents scanned at the correct resolution and color values will serve as models for ideal scanning. Settings of these model scans will be documented and clearly displayed at each work station.

Transcribed documents will appear in a legible and easy to manage format online. For the long court documents, a significant amount of scrolling will be required, but efforts will be made to break up the documents to avoid excessive scrolling. Images of documents (if scanned) will be displayed along-side transcriptions as JPEGs. PDFs of the images will be available for saving, printing and emailing. Users can determine if they want a PDF of a range of pages or the entire document. PDFs of transcriptions will not be provided as original text is legible. A caption containing the title of the document, the name of the collection it came from, and the institution that owns the collection, will be displayed under each image.

