Staten Island Tagging Guidelines

Documents for this project will be tagged in a manner to make them more useful and useable for researchers. The two fundamental purposes for tagging are to enable easier searching and to provide metadata to researchers. It is likely that researchers will want to know certain facts about the material being presented to them which is not readily apparent from the document itself, but which can be provided by those working on the project. Among the useful pieces of information are dates, geographical locations, organizations, document type, size, and format, persons, objects within images, odd or unusual aspects of written documents, and much more. All of this material, if available, can and will be tagged. We will try to locate as many tags as possible for each document, perhaps simply by having staff members examine complex items and saying what they believe a useful tag would be. However, we will also make it clear to researchers that any suggestions they have for further additions will be welcomed and used. This applies to both search terms and metadata information. If a user feels that a better, newer or more extensive tag is necessary, we can add it.

If possible, searching terms for tagging will be derived from the Library of Congress Authority names, available on their website. In this way, we can try to ensure a standardized set of index terms and will avoid creating too many phrases. However, it is inevitable that the terms necessary for the project will not be available from the LOC, and we will need to generate new or more applicable terms for our project. Terms will be kept as straightforward as possible to prevent confusion and to minimize the effort on he part of the staff. In other words, synonyms and multiple tenses for words will be minimized whenever they can be to make searching easier. This does not apply when several terms that are very similar are necessary for the document to be described correctly or completely. For example, one collection likely to be included in this project is material from the Protectors of Pine Oak Woods. This group is also known by its acronym, PPOW, as well as by the shortened term “protectors.” Because all of these are possible search terms for this group, all of these will be included, instead of focusing on one single name. Similar situations will be handled in the same manner.

Tags for metadata will be standardized to ensure that the same information shows up for each item. Every written document or image will receive the identical treatment in terms of basic information. An entry for the title of the material, its creators, and date will always be noted, even if there is no information to enter for one of these fields. This will provide both a consistent format and indicate to our users what we do and do not know about our materials. Less standardized information, such as provenance, will be handed on a case by case basis. All available information for these fields will be provided in as succinct a manner as possible to reduce the amount of effort and space needed for the entries. It is imperative that tags be accurate and clear, especially when there is no transcription to provide search terms or explanations. It will be necessary to establish a system to promote quality control to ensure that information about individual items are not confused despite the large amount of data that will likely be available. This can be done by assigning numbers to each item, which will already be necessary during the scanning and posting phases, and adding an identifier to indicate what a piece of metadata is for. For example, the same number used to identify a certain document can be used for its metadata, with the addition of a letter. This way, when tagging, workers will not confuse information and thereby provide incorrect material to our users.

Presentation is a somewhat important aspect for metadata tags. Although appearance does not affect the content of the site or the information itself, it will affect the judgment of users. If a site does not appear to be authoritative, clear, and useable, people will not rely on it for research. As mentioned above, there should be a clear, standardized format for this information for each item to ensure that researchers can always view the basic information in a consistent manner. Other information, because it will be produced in prose, cannot be easily managed in terms of space. An adaptable format of some sort will be necessary to ensure that the text does not physically extend beyond the boarders of the other tags and thereby look disorganized or sloppy. Again, the information is more important and takes precedence, but it is necessary to give some consideration to appearance.

In terms of technology, the simplest program for tagging available will be utilized. In his way, any staff member should be able to add or edit tags. This will reduce costs, as a specialist will not be necessary, and save time, as an worker at an time can work on his aspect of he project. However, if the work becomes too complex, an additional staff member or raining of some sort will be necessary. Once again, it will be necessary at all times to maintain a high quality and good level of control over tags, as they will be the first informational source researchers will use for searching and metadata.

