Vocabulary Translation System

Product — The Distributed Resources Network (DRN), an open, heterogeneous, fully distributed, and single-runtime middleware solution for enterprise services integration and business activity monitoring.

Project — Vocabulary Translation System (VTS) to manage the vocabulary and ontological differences in peer-to-peer communications among heterogeneous resources on the DRN.

 

résumé & CV   |    career projects   |    personal projects    |    photos   |    Contact Information

 

Introduction

Cytaq's product, The KnowledgePath DRN, was an Enterprise Application Integration (EAI) platform. The platform was based on an innovative model -- The Resource-Property (RP) Model -- for representing any information source or sink. The RP model could homogeneously represent heterogeneous information resources – e.g., file systems, databases, LDAP servers, desktop (e.g., word processors, spreadsheets) and enterprise applications (e.g., CRM). With homogeneous representations on the DRN , these information resources could now communicate enabling coordination and cooperation.

However, communication does not guarantee understanding. One resource's "Employee/Spouse" could be another resource's "Person/Partner/Name". To realize its potential the DRN needed a Translation System that could handle the vocabulary and ontological differences among any set of resources. It was my responsibility to design and build this translation system.

Design Process

The design process began with an extended series of interviews with the inventor of the DRN technology because there was little or no clear documentation of the DRN architecture, nothing on the translation problem, nor anything discussing how a translation module would interact with the rest of the DRN and users. The product of these interviews was a white papers describing the architecture of the DRN and defining the RP Model and the Universal Query Language (UQL) based on it. Design of the VTS hinged upon accurately assessing these critical components of the product platform. These whitepapers were also used as marketing and sales collateral.

A set of use case scenarios established a basecase for VTS performance.

Because the DRN was evolving, the VTS design had to have a broad scope and vision, so that it would continue to operate within the changing platform. However, the product release schedule did not allow for a full implementation of the VTS. To satisfy these constraints the following documents were produced:

  1. A Functional Specification of an idealized VTS module established the parameters within which any VTS module must operate. This specification outlined the infrastructure of a fully realized VTS.
  2. A detailed design specified the intial VTS module, concentrating on the necessary functionalities for the initial product release. This included UML diagrams of the class structure and process flows, an API to the VTS module for access from any point in the DRN, and specifications for two GUI administration tools.

Design Solution

The main features of the VTS design are:

  • Because, all communication on the DRN uses the UQL language, the VTS operates via linguistic rules that manipulate UQL statements. Translation, therefore, is the application of linguistic rules to convert a UQL statement in the source vocabulary and ontology domain into an equivalent UQL statement in the target vocabulary and ontology domain. The set of rules from one vocabulary and ontology domain to another is called a vocabulary-to-vocabulary mapping.
  • To prevent the explosion of vocabulary-to-vocabulary mappings, all communication translation occurs through the DRN system's own vocabulary domain (see Figure 1). Thus, n resources do not need n-1 mappings each. Instead each resource only needs 1 mapping to the DRN system vocabulary.
  • Since all communication is translated into the DRN system domain, its vocabulary and ontology limits all communication. This is less than an ideal situation – if the system ontology is unable to represent a concept, then that concept could never be communicated across the DRN. Two mechanisms are included in the design to deal with this problem.
    1. A GUI system vocabulary administration tool, that provides mechanisms to create, import, export and edit the system vocabulary. This mechanism allows new vocabularies to be imported and edited to match the needs of the DRN environment.
    2. Division of the system vocabulary into multiple namespace domains (see Figure 2). In essence, there is not one system vocabulary. However, a resource on the DRN need not know that. It only needs to map to the one system namespace it communicates through. If a new resource has ontological needs that are not met by the system, then a new namespace can be created or imported that meets the resource's ontological needs. Of course, a mapping from the new namespace vocabulary to one or more of the system's original namespaces is now required.
  • Linguistic rules are divided into three types: Structure rules defining vocabulary elements and their data; Formatting rules using regular expressions defining elements’ data; and Semantic rules defining possibly complex ontological relationships between vocabulary elements. See Figure 3 and Figure 4.
  • A GUI vocabulary mapping tool aids the creation and management of vocabulary-to-vocabulary mappings.

 

Figure 1: Diagram from VTS Functional Specification showing information flows through the DRN and the necessary translation actions

 

Figure 2: Demonstration of DRN system vocabulary namespaces. Translation occurs whenever messages pass from one namespace into another.

 

Figure 3: The three types of linguistic translation rules.

 

Figure 4: The three types of translation rules seen as layers of abstraction in the translation process.
 

résumé & CV   |    career projects   |    personal projects    |    photos   |    Contact Information

dworman@acm.org Last Updated: 14-Feb-2003