How Topic Maps view enriches relational data sources


Relational databases represent important data sources in enterprise knowledge integration pipeline. How can we improve knowledge integration by providing Topic Maps view on relational databases?

Let's take a quick look at a development cycle of a traditional enterprise application.

During design phase development team creates a conceptual model of a future application. UML is used for this quite often. What is important that model explicitly represents semantics of the application domain. Typical model includes descriptions of domain classes, relationships, constraints.

Later this conceptual model is mapped to relational database. During this transformation process a lot of domain semantic information is lost or presented in compressed form. On a good side, this compression helps to build efficient application. On a bad side, it limits information integration from different applications (the same conceptual information can be mapped to different relational structures in various applications). Just looking at tables from two different applications it is difficult to find if they reference the same subjects and if they contain the same kind of information.

Service Oriented Architecture (SOA) promotes concept of information providers and information consumers. Typical SOA enterprise application is a provider of some well defined information and can consume information from other data sources. In SOA world there is a shift from application centric view to service-based view. Each service is responsible for management of own set of assertions about subjects. Two service providers should not have the same assertions about the same subjects except assertions responsible for identifying subjects. In SOA world we should know service-provider for any assertion. Shift to service-based architecture helps to minimize data overlapping between different applications.

How can we create topic map view on a relational database (or data service)? Firstly we identify primary information of this database. In SOA world it is already done by definition. Then we retrieve our UML diagrams (or create new ... :) and represent explicitly taxonomy of classes and relationships using "ontology" topic map. After that we define export procedure which produces "factual" topic map based on assertions from database/data service. "Factual" topic map is merged with "ontology" topic map. There is an extremely important issue regarding subject identification. We should clarify subject identification schema used by application/service for each basic class and define procedure for generating Public Subject Identifiers (PSIs). When we generate "factual" topic map we use these PSIs to reference subjects.

In ideal situation we should have only one identification schema for subjects of basic classes in enterprise applications. But it does not happen often these days so we have to build mapping topic maps which define mapping between different identification schemas.

Another interesting question is about enterprise ontology. Quite often UML diagrams are created specifically for each application. What happens in this case is that application designers "reinvent" again and again parts of enterprise ontology. With Topic Maps we try to explicitly define and use enterprise ontology. We also try to reuse existing standard ontologies.

When we try to verbalize ontology for application we should attempt to reuse existing parts of enterprise ontology and extend/refine it as needed. "Enterprise ontology" can sound scary. But even simple taxonomies help a lot in knowledge integration.

We do not need to export all information from databases to topic maps. A lot of factual information can be delivered to users in a form of "reports" or "dynamic resources". Topic Maps are ideal for representing relationships between subjects, summary facts about subjects and references to static and dynamic resources. With this approach users will be able efficiently navigate between different subjects and when they need details they can "jump" to dynamic or static resources.

As a result we have "virtual" enterprise-wide topic map with shared ontology and shared (or mapped) identification schemas. This topic map effectively represents classes of subjects and resources important for enterprise business processes. It has summary information about all important subjects, relationships between subjects and cross-references between subjects and resources.

Posted: Thu - August 12, 2004 at 11:37 PM      


©