Metadata & Information Retrieval: 2008

Tuesday, July 8, 2008

Concluding Thoughts

All in all, major initiatives are currently being taken to enhance interoperability among the various metadata schemas. Interoperability is a major issue, especially among digital collections, since metadata tends to be stored for use only by the institution that creates and maintains it. Thus, the driving force behind the development of metadata standards in the future will most likely be a desire for uniform access methodology across collections (Smiraglia, 2005). Union catalogs, cross-system searches, crosswalks, and metadata registries each attempt to address and overcome the prevalent barriers to semantic and/or syntactic interoperability. Nevertheless, it is important to acknowledge that some degree of inconsistency or discrepancy shall continue to exist due to the very nature of different institutions creating and using different schemas. As for now, there is still no “one size fits all” universal bibliographic control (Younger, 1997).

References:

Smiraglia, R.P. (Ed.). (2005). Metadata: A cataloger’s primer. New York: The Haworth Information Press.
Younger, J. A. (1997). Resources description in the digital age. Library Trends, 45(3), 462-488. Retrieved July 3, 2008, from InfoTrac OneFile database.

Saturday, July 5, 2008

Metadata Registries

Metadata registries have also facilitated interoperability among schemas in that they record authoritative information about metadata elements from multiple sources. A metadata registry is a “…database used to organize, store, manage, and share metadata schemas. [They] provide information about metadata schemas, elements, profiles, definitions, and relationships, using a standard structure” (Taylor, 2004, p.153).

Thus, metadata registries facilitate interoperability since they provide a record of names, definitions, and properties of metadata elements. Metadata creators can consult a registry to clarify meaning and usage as well as exchange information. Additionally, metadata registries can help prevent duplication of effort (Taylor, 2004).

The Dublic Core Metadata Registry Lite is one example of a metadata registry, available at http://wip.dublincore.org/dcregistrylt/

References:

Taylor, A.G. (2004). The organization of information (2nd ed.). Westport, CN: Libraries Unlimited.

Wednesday, July 2, 2008

Crosswalks

Interoperability, particularly semantic interoperability, can also be facilitated through the use of crosswalks, which are authoritative mappings of metadata elements from one schema to another (Caplan, 2003). By analyzing the metadata elements in separate schemas and correlating the similar fields, metadata creators can “map” the equivalent relationships between the schemas. Thus, crosswalks are the “maps” that show these relationships (Woodley, 2000). Crosswalks are most useful when the schemas are relatively simple, are for similar communities or types of materials, and have overlapping concepts. Mapping becomes much more difficult when it involves cross-domains, schemas of different complexities, and schemas with great semantic differences (Taylor, 2004).

Crosswalks are primarily used as a basis for specifications of the physical conversions of records from one metadata schema to another with regards to record exchange. However, since crosswalks only provide lateral or one-way mapping from one schema to another, separate crosswalks are required to map from schema A to schema B and then from schema B to schema A. As a result, some information can become distorted or lost among pairs of crosswalks. This means that the information retrieved after a reversion may not be identical to the original (Taylor, 2004).

The Library of Congress provides a good example of the MARC to Dublin Core Crosswalk, which is available at http://www.loc.gov/marc/marc2dc.html.

References:

Caplan, P. (2003). Metadata fundamentals for all libraries. Chicago: American Library Association.
Taylor, A.G. (2004). The organization of information (2nd ed.). Westport, CN: Libraries Unlimited.
Woodley, M.S. (2000). Crosswalks: The path to universal access? In Introduction to metadata: Pathways to digital information. Retrieved July 1, 2008 from http://www.getty.edu/

Monday, June 23, 2008

Cross-system Searches

Another approach to reduce the barriers to interoperability is through the use of cross-system searches. Unlike a union catalog where a union database is maintained and a central search is used to retrieve data, the cross-system search stores metadata records in multiple databases, which are retrieved using the search facilities associated with each individual database system. ANSI/NISO Z39.50 is an example of an international standard protocol that allows one client system to request a search to be performed within another target system (Caplan, 2003). Here, the client receives the results back in a format that it can display. This cross-system search requires that the search be expressed in a common syntax so that every system only needs to comprehend its own search language and that of the international standard protocol.

References:

Caplan, P. (2003). Metadata fundamentals for all libraries. Chicago: American Library Association.

Thursday, June 19, 2008

Union Catalogs

Although interoperability among diverse sets of metadata records can be problematic, there are several current approaches to address these issues. One approach is through the use of a union catalog, a centralized database of metadata from multiple sources. One such union catalog used among libraries, for example, would include the MARC-based library catalog. Union catalogs can exist at any level, from a local institutional level to an international level. In libraries, OCLC’s WorldCat is one example of an international union catalog (Caplan, 2003).

There are several methods of implementing union catalogs. One method is that participating institutions submit copies of their own cataloging records to an organization that maintains the centralized search catalog. Another method is to create records directly into the union catalog database and then copied into the institution’s local system. In either of these two methods, records for the same resource contributed by different institutions can either be maintained as duplicate records or consolidated into a single master record presenting multiple holding locations. A third method includes the creation of a false union catalog via a union index over multiple catalog files, instead of maintaining a compiled database. This approach displays records from the source catalogs when entries from the index are selected.

In general, union catalogs work best when the participating institutions share a common data format and common set of cataloging rules. For example, libraries tend to use similar data formats and cataloging rules, which contributes to the effectiveness of OCLC’s WorldCat. When the records in the central database and local contributing catalogs are relatively homogenous, the familiarity of the search will facilitate retrievals. Although it is more complicated, it is possible to create union catalogs from non-homogenous metadata sources. Non-homogenous contributions usually result when a variety of institutions, as opposed to just one type of institution, participate in the union catalog. These institutions can include archives, libraries, historical societies, museums, and so on. Typically, the creation of a union catalog from non-homogenous sources would require a conversion of the various metadata schemas submitted into a common format for storage and indexing before loading the records into the union catalog (Caplan, 2003).

References

Caplan, P. (2003). Metadata fundamentals for all libraries. Chicago: American Library Association.

Friday, June 6, 2008

Metadata Interoperability Part 2

Extensibility also affects interoperability semantics. Extensibility refers to the ability to include additional metadata elements specific to the needs of a community. The individual metadata creators subjectively determine these inclusions and exclusions. Consequently, extensibility usually exhibits an inverse relationship to interoperability in that the additional metadata elements often cause the metadata to become less understandable to other systems (Taylor, 2004, p. 144).

Incompatible vocabularies are another common factor affecting interoperability that is most apparent when users try to search across metadata or among different institutions such as libraries, archives, and museums. Different organizations often use different or highly specialized vocabularies. For example, one institution, such as a public library, may index a resource using common names whereas another institution, such as a medical lab, may index using scientific names. As a result, the use of more specialized vocabularies must be taken into consideration when working with metadata. In addition to vocabulary, multiple languages also affect interoperability, especially when searching the world wide web. Controlled vocabularies and translations via multilingual thesauri are effective yet limited in their ability to remedy discrepancies (Caplan, 2003, p. 42).

The representation of the metadata elements can also differ, even when the element definitions are identical, since data can be recorded various ways. For example, one set of metadata records may depict an author’s name as “Smith, Jane A.” whereas another set of metadata records may use “Smith, J.A.” for the same author. Consequently, a keyword search on “Jane Smith” would only retrieve records from the first set of metadata records, not the second (Caplan, 2003, p. 42).

References

Caplan, P. (2003). Metadata fundamentals for all libraries. Chicago: American Library Association.
Taylor, A.G. (2004). The organization of information (2nd ed.). Westport, CN: Libraries Unlimited.

Tuesday, June 3, 2008

Metadata Interoperability Part 1

Interoperability refers to the ability of various systems to interact with one another. There are two fundamental forms of interoperability: semantic and syntactic. Semantic interoperability refers to the compatibility of the meanings assigned to the metadata elements of a schema, such as whether or not the term "author" in one schema corresponds in meaning with the term "creator" in another schema. Different applications, databases, and institutions may result in disparate meanings to the same terms or utilize distinct terms to express the same meaning (Gruninger & Kopena, 2005). Syntactic interoperability refers to the ability to extract and use metadata from other systems, requiring the use of a common language or encoding format. In general, metadata interoperability commonly refers to search interoperability, the ability to process various metadata records and retrieve desired results.

Differences in the semantics and syntax of metadata schemas usually cause difficulties in retrieving desired materials. The greater the dissimilarities, the more problematic the retrieval process can become. In terms of semantic differences, there is a wide range of possible variation and misinterpretation in meanings. For example, when comparing two schemas, one schema may require a more precise or well-defined set of rules in determining the meaning of a particular element than the other. For instance, the Dublic Core schema considers the Title element to be any name given to the resource whereas AARC2/MARC follows a strict set of guidelines when assigning what should be considered the Title Proper (Caplan, 2003, p. 41). As a result, there can be various degrees of misinterpretation between the two records. An even more obvious discrepancy would be if one record did not provide a corresponding element at all.

References

Caplan, P. (2003). Metadata fundamentals for all libraries. Chicago: American Library Association.

Gruninger, M., & Kopena, J.B. (2005). Semantic integration through invariants. AI Magazine, 26(1), 11-21. Retrieved May 20, 2008, from InfoTrac OneFile database.

Thursday, May 22, 2008

Metadata Schemas

A metadata schema is the underlying organizational pattern or framework for the metadata, which consists of pre-defined elements representing the specific characteristics of an information resource. Examples of pre-defined metadata elements include title, creator, creation date, and other related bibliographic features. It is important to recognize that there is no single, comprehensive schema for metadata. Instead, there are different types of schemas, each created for specific types of information according to their own set of standardized guidelines.

Although individual schemas are controlled and standardized, there can be a significant amount of variation among the different schemas created by different institutions. Flexibility refers to the ability of the metadata creators to determine the level of detail contained within a record. Consequently, not all schemas possess the same levels of detail. As a result, schemas can vary in the number and types of metadata elements used, in the use of controlled vocabularies, and in encoding into machine-readable form (Taylor, 2004, p. 142). Additionally, it has become increasingly difficult to distinguish the bibliographic features of resources of different types of media and formats when compared to traditional print materials (Kim, 2003, p. 103).

Most schemas, however, do tend to exhibit three common traits: structure, semantics, and syntax. Structure refers to the model that coordinates the data, ultimately arranging how the data is presented. Semantics refers to the meaning associated with the pre-defined metadata elements that compose the schema. For example, does the meaning of the term “author” used in one schema correspond to the meaning of the term “creator” used in another? Syntax refers to how the metadata elements are to be encoded into machine-readable form. The encoding allows the metadata to be processed by a computer program. Unless the encoding scheme understands the semantics of the metadata schema, the data will be unusable (Taylor, 2003). Discrepancies among element meanings and incompatibility among encoding formats of different schemas usually result in interoperability issues.

References

Kim, K. (2003). Recent work in cataloging and classification, 2000-2002. Library Resources and Technical Services, 47(3), 96-109. Retrieved May 20, 2008, from InfoTrac OneFile database.
Taylor, A.G. (2004). The organization of information (2nd ed.). Westport, CN: Libraries Unlimited.
Taylor, C. (2003). An introduction to metadata. Retrieved May 20, 2008, from http://www.library.uq.edu.au/iad/ctmeta4.html

Wednesday, May 21, 2008

Metadata Defined

Metadata can be broadly defined as “data about data.” Metadata is structured data that describes the bibliographic attributes of an information resource. Any organizable unit of information can be considered an information resource, such as a book, a website, an audio file, a video, an image, and so on (Taylor, 2004, p. 139). Regardless of the medium or format for an information resource, metadata serves to facilitate the discovery, description, management, retrieval, and preservation of a unit of information.

According to the International Federation of Library Associations' (IFLA) Functional Requirements for Bibliographic Records, metadata has four primary objectives:
1. To assist users in finding desired information resources
2. To help users in identifying similar information resources and distinguishing them from one another
3. To aid users in selecting the appropriate materials suitable to their needs
4. To provide uses with the information necessary to obtain or access the desired resource
Based on these objectives, usability in information retrieval systems as well as user needs must be considered when creating metadata (Taylor, 2004, p. 146).

Accordingly, metadata is used in conjunction with information retrieval tools to identify, discover, manage, and retrieve information resources. Using this definition, a very simple and familiar example of metadata would be a card catalog record describing information about a book, since the primary purpose of the descriptive data contained in the record is to facilitate the discovery, description, and retrieval of the book.

In this example, it is evident that the idea of metadata is not a new concept to library and information science. However, modern technology has enabled bibliographic items to be published in a variety of formats. Therefore, the term metadata is applicable to all information resources regardless of media or format. Consequently, the term is now commonly used in regards to digital and electronic information, in addition to print resources (Yousefi & Yousefi, 2007). Moreover, with the advent of computer technology, metadata is now applied to electronic and online information retrieval systems (Herner, 1984, p. 162).

References

Herner, S. (May 1984). Brief history of information science. Journal of the American Society for Information Science, 35(3), 157-163. Retrieved May 18, 2008, from Wiley Interscience database.
Taylor, A.G. (2004). The organization of information (2nd ed.). Westport, CT: Libraries Unlimited.
Yousefi, A., & Yousefi, S. (2007). Metadata: A new word for an old concept. Library Philosophy and Practice. Retrieved May 20, 2007, from InfoTrac OneFile database.

Wednesday, May 14, 2008

Introduction

The development of computers revolutionized the way in which information can be retrieved. In 1945, Vannevar Bush envisioned, among other things, “a machine for the storage and retrieval of documents” in his article “As We May Think” (Rubin, 2004, p. 34). With the increasing use of computers throughout the twentieth century, the field of information science experienced a change in the way information was accessed- “a shift in emphasis away from the item that held the information to an emphasis on accessing the content of the information” (Rubin, 2004, p. 34). But how does one decipher which content is relevant to one’s information needs? Simply stated, metadata provides the descriptive information about resources to aid in the discovery, identification, management, retrieval, and preservation of units of information (Taylor, 2004, p. 139). While the concept of metadata is not new to library and information science, it continues to evolve with computer technology into the twenty-first century.

References

Bush, V. (1945). As we may think. Atlantic Monthly, 176, 101-108. Retrieved May 14, 2008, from http://www.ps.uni-sb.de/~duchier/pub/vbush/vbush-all.shtml.
Rubin, R.E. (2004). Foundations of library and information science (2nd ed.). New York: Neal-Schuman.
Taylor, A.G. (2004). The organization of information (2nd ed.). Westport, CT: Libraries Unlimited.

Metadata & Information Retrieval