This project is very fortunate to enjoy the support of New York University in carrying out its goals. The university will provide office space for the project and allow a measure of access to additional resources as it progresses. Thus the first task for the project will be to determine which of the Henry Barnard papers should be digitized and in what order guided by the above selection policy. To carry out this step of the project a small committee of composed of experts on the nineteenth century common school reform movement, researchers versed in using a variety of online collections, and the archivists in charge of the physical collection will be brought together to review the collection. They will be assisted in their deliberations by a lawyer specializing in copyright law, who will be consulted to ensure that all of the materials selected are free from legal concerns. The committee will also be able to request advice from the preservation department in Bobst library regarding conservation issues pertaining to the physical documents and to ask New York University’s Information Technology Services team for recommendations for digitizing any unusually sized manuscripts. The committee members will also be asked to develop a list of keywords to assign to the digital documents to act as search terms. While the committee deliberates a project manager will be hired to begin setting up the practical aspect of the project. The manager will assist the project director in acquiring equipment, hiring staff, laying out the physical work space, establishing benchmarks of scan quality and time, and documenting the project as it progresses.

Once the sequence of the documents has been determined the digital capture work can begin. The project will employ students, preferably with work-study, on a part time basis to carry out this and other parts of the process. Once a document has been scanned the digitizer will name the new digital file and input basic metadata such as its author, original date of creation, a general description of the object, the location of the original within the physical collection at Fales Library, the date it was digitized, as well as the time and place it was published if applicable. The scanning will be carried out in color, which is necessary because many of the manuscripts were written on colored paper, and at a resolution of six hundred dpi then saved in the TIFF file format. The high resolution image will serve as an archival copy from which lower resolution scans and publication quality images could be made. Online the image would be accessible as a seventy-five dpi document in jpeg format.

The digital versions of the Henry Barnard papers will be assigned file names that include the series number, subseries letter, the number of the box containing the original, the original author’s first initials, surname, and the year, or an approximation, that it was created with each piece of information separated by an underscore. If no date information is available it will be given the letters “nd” instead. The file name will also have an indicator as to the type of document that was scanned. The file names for correspondence will include the index number assigned to the item, diaries will simply have the word “diary” in the file name as there are only three of them, and the draft materials will be assigned their published title or a reasonable abridgement thereof. The images of Henry Barnard’s correspondents would be named similarly, though the name of the picture’s subject would be provided instead of the author and the term “photo” would be assigned to follow the year. If the item has multiple pages the page number will be indicated at the end of the file name. For example the image of the first page of a hypothetical two page letter from Horace Mann to Henry Barnard written in 1845 drawn from series 1B inbox 1 of the collection and assigned the index number 1135 would receive the file name 1B_b1_H_Mann_1845_no1135_p1. The inclusion of all this information will allow the files to be searched and organized by a variety of factors in order to make them easy to locate within the vast number of files this project will create.

Once a textual document has been scanned and entered into the database it will be transcribed and proofread. Both of these tasks will be carried out by student workers as well. Since the vast majority of the Henry Barnard manuscripts at Fales are handwritten, and thus inscrutable to many OCR programs, most of the transcription work will need to be done manually. The transcribers will work from the digital document, which they will compare to the original version before they begin to ensure the image quality, fidelity, and completeness of the scan, thereby acting as the first quality control check on the digital document as it proceeds toward publication. Transcribed documents will then be examined by two other part time students, who will check the accuracy of the transcription against the original. The proofreaders will also check the metadata that has been assigned to the item. Towards the end of the project when all of the selected documents have been scanned, transcribed and proofread the students assigned to those tasks will work together to create the calendar of manuscript materials that were excluded from the digitization process.

The last four steps in the process will be carried out by full time employees rather than student workers. After the transcribed document has been proofread it will be catalogued according to Encoded Archival Description standards and assigned keywords to make them more accessible to topic search queries by users. After cataloging the digital documents will be reviewed by the educational historians and the archivists on the selection committee to check the quality of the scan as well as the completeness of the keywords and metadata. Once a document has been cleared it will be encoded as an Extensible Markup Language (XML) document, which must link the transcribed manuscript to the digital images of the original. After the XML document has been validated it will be reviewed one final time by the researchers hired for the selection committee. Once all of the manuscripts assigned a particular level of priority have been processed the digital content will be published to the web.


Managing the digital material created by the project will require a complex and powerful content management program. Since the project will enjoy the technical support of New York University’s Information Technology Services department it will take advantage of the flexibility and opportunities for innovation offered by open source content management software. The chosen program will be required to manage a large number of image and encoded text files. Thus it will need to be able to link related files together, tying transcribed versions of Henry Barnard’s manuscripts to digital images of them, and retrieve them reliably. The program will need to support searching and be able to organize files according to a variety of inquiries. For instance, it must be able to flag and list digital files according to their current status in the progress of the project, say displaying upon request all the files that have not yet been proofread or that have failed a validation test. It must also be able to schedule checksums for the digital files to inspect their integrity and identify those objects that have been corrupted. The content management program must also be able to log all changes made to the digital objects it manages, and create backup copies of the project’s files on a New York University server when such adjustments are made in addition to scheduled backups. In addition the content management software must be able to publish the digital version of the manuscripts and their transcriptions into XML documents. Finally, the program will need to support tiered access to the digital documents allowing staff to tweak the product’s content as necessary while prohibiting users from doing the same.

The complexity of the data management program combined with the large quantity of data this project will generate demand platforms capable of supporting them. This project will need computers that can transmit and receive large numbers of files using high speed network connections, which are available through New York University. Thus they will need strong processors, a good deal of memory, large hard disks, and adequate graphics cards. The project will also need external hard drives to create additional backup files for the project. To carry out the actual digital capture the project will require basic flatbed color document scanning devices for most of the manuscripts in the Henry Barnard Papers collection are letter size or smaller. However since a sizable number of the manuscripts are legal size the project will need access to at least one larger device such as the twelve inch by seventeen inch Epson 10,000 XL New York University’s Digital Studio in uses in Bobst Library.

