How Topic Maps view enriches relational data sources
Relational databases represent important data
sources in enterprise knowledge integration pipeline. How can we improve
knowledge integration by providing Topic Maps view on relational
databases?
Let's take a quick look at a development cycle of a
traditional enterprise
application.
During design phase
development team creates a conceptual model of a future application. UML is used
for this quite often. What is important that model explicitly represents
semantics of the application domain. Typical model includes descriptions of
domain classes, relationships, constraints.
Later this conceptual model is mapped
to relational database. During this transformation process a lot of domain
semantic information is lost or presented in compressed form. On a good side,
this compression helps to build efficient application. On a bad side, it limits
information integration from different applications (the same conceptual
information can be mapped to different relational structures in various
applications). Just looking at tables from two different applications it is
difficult to find if they reference the same subjects and if they contain the
same kind of information.
Service
Oriented Architecture (SOA) promotes concept of information providers and
information consumers. Typical SOA enterprise application is a provider of some
well defined information and can consume information from other data sources.
In SOA world there is a shift from application centric view to service-based
view. Each service is responsible for management of own set of assertions about
subjects. Two service providers should not have the same assertions about the
same subjects except assertions responsible for identifying subjects. In SOA
world we should know service-provider for any assertion. Shift to service-based
architecture helps to minimize data overlapping between different
applications.
How can we create topic
map view on a relational database (or data service)? Firstly we identify primary
information of this database. In SOA world it is already done by definition.
Then we retrieve our UML diagrams (or create new ... :) and represent
explicitly taxonomy of classes and relationships using "ontology" topic map.
After that we define export procedure which produces "factual" topic map based
on assertions from database/data service. "Factual" topic map is merged with
"ontology" topic map. There is an extremely important issue regarding subject
identification. We should clarify subject identification schema used by
application/service for each basic class and define procedure for generating
Public Subject Identifiers (PSIs). When we generate "factual" topic map we use
these PSIs to reference subjects.
In
ideal situation we should have only one identification schema for subjects of
basic classes in enterprise applications. But it does not happen often these
days so we have to build mapping topic maps which define mapping between
different identification
schemas.
Another interesting question
is about enterprise ontology. Quite often UML diagrams are created specifically
for each application. What happens in this case is that application designers
"reinvent" again and again parts of enterprise ontology. With Topic Maps we try
to explicitly define and use enterprise ontology. We also try to reuse existing
standard ontologies.
When we try to
verbalize ontology for application we should attempt to reuse existing parts of
enterprise ontology and extend/refine it as needed. "Enterprise ontology" can
sound scary. But even simple taxonomies help a lot in knowledge integration.
We do not need to export all
information from databases to topic maps. A lot of factual information can be
delivered to users in a form of "reports" or "dynamic resources". Topic Maps are
ideal for representing relationships between subjects, summary facts about
subjects and references to static and dynamic resources. With this approach
users will be able efficiently navigate between different subjects and when they
need details they can "jump" to dynamic or static
resources.
As a result we have
"virtual" enterprise-wide topic map with shared ontology and shared (or mapped)
identification schemas. This topic map effectively represents classes of
subjects and resources important for enterprise business processes. It has
summary information about all important subjects, relationships between subjects
and cross-references between subjects and
resources.
Posted: Thu - August 12, 2004 at 11:37 PM