Sun - December 11, 2005

Scalability issues and ... MyTopics


It becomes challenging to work with a topic map when number of topics grows.

I am going to experiment with the following approach:
- allow user to maintain a list of personal roles
- allow user to specify for any topic that this topic is in "MyTopics" for a specific role
- allow switching between roles and navigating to a role-specific list of "MyTopics"
- make "MyTopics" available through RSS
- do data mining on MyTopics between different users

Posted at 04:21 PM     Read More  


Sat - October 1, 2005

Agile knowledge management or why I still choose Topic Maps


I was looking around and investigating different technologies/products which can be used for knowledge management recently...

Oracle just announced support of RDF in 10g Release 2. I really like the way RDF is implemented in Oracle database: clean and elegant design, beautiful integration with traditional relational and XML data.

This announcement made me think again about RDF and Topic Maps. I was playing in my mind with idea of using RDF for projects I am involved in...

First interesting observation is about modeling dynamic worlds. In my domains objects change names, properties, they move around, they create and delete relationships. With Topic Maps I can create a type "TimeInterval" with occurrences "DateStart", "DateEnd" and use instances of this type in scope of occurrences and associations. Using TMQL I can easily create a projection of a topic map for specific moment in time. With RDF... replace simple values with objects which have DateStart, DateEnd properties?... hmmmm... use reification for each time sensitive assertion?... brrrr....

Scopes in Topic Maps allow to represent context sensitive knowledge very nicely. And contexts are not limited by time, of course. We can define and use any dimensions which are useful for modeling.

Second observation is about agile ontology development. I found myself refactoring ontologies for topic maps in "production" again and again. Why Topic Maps in this case? Topic Maps can work without schemas or additional ontology definitions such as "inverse" or "symmetric" properties. Topic Maps basic semantic model is rich enough to represent useful information. We can easily modify type hierarchies, add/delete constraints without changing factual information. Topic Maps basic semantic model combined with TMCL - pattern-based Topic Maps Constraint Language, supports this agile ontology development style better than heavy OWL, from my perspective.

Posted at 01:18 PM     Read More  


Thu - December 2, 2004

Instant messaging, subject centric group chats, topic maps and... goodbye email (... almost)


I recently saw presentation of Parlano MindAlign for Microsoft Live Communication Server instant messaging platform.

I enjoy using IM, IRC and enterprise group chats for many years. I use them for person to person communications, for getting and providing quick answers from/to peers, for notifications about important events. MindAlign introduces new trend, I think: real-time subject centric communication. MindAlign smart client allows to manage effectively and participate in hundreds of subject centric channels at the same time. It also allows to see history of all conversations and search message archives. All these features are not new. But effective support of hundreds of channels on the client side changes rules of the game and moves group chats to a new level.

There is something extremely powerful in combination of real-time subject centric communication, ability to access message history and search. I think that this kind of system can replace about 80% of emails in the future.

What is the next step? We can connect channel topics with topic map and allow users to reference subtopics in real time conversation using analog of WikiWords. In this case we have a topic map which is modified by users in real-time. This topic map has references to group chat messages. But as any other topic map it also can have information about associations between topics and references to other resources.

We can add ability for users to provide Wiki-like occurrences in this topic map and ability to add links to resources (analog of social bookmark manager del.icio.us ).

Result - live topic map which integrates summary information about subjects, associations between subjects, real-time messages and links to resources connected with subjects.

Posted at 05:24 PM     Read More  


Tue - October 12, 2004

Jabber, publish & subscribe arcitecture and ... Topic Maps


One of the main goals of the Topic Maps is to facilitate development of a distributed network of providers and consumers of "subject centric information maps". Existing Topic Maps standards provide basic support for building this kind of environment: XML-based interchange syntax (XTM), scalable identification schema based on a concept of Public Subject Identifiers (PSIs), concept of topic maps merging and leveraging of HTTP as a standard access protocol.

In a "simplified world" we can create some topic map and we can publish it to a public web server and everyone can view it using topic map viewer (such as Omnigator ) or reference it (and reuse) in other topic maps. Problem with this approach is that it is static. If later we change our topic map, nobody will know about that. Only when topic map consumers refresh their topic maps they will be able to reload and remerge our topic map with their own information.

Building real-life Topic Maps-based applications, however, often requires more sophisticated protocols and architectures. I would like firstly to reference several interesting ideas and approaches:
- TMShare
- TMRAP and here
- Virtual and Federated Topic Maps

We are ready now to jump into discussion about possible usage of Jabber messaging architecture for building distributed networks of topic maps...

What we really need is a near real time messaging / notification mechanism which distributes updates to all interested parties and/or allows running asynchronous queries to multiple information providers.

Jabber, from my perspective, is a very good candidate for this kind of infrastructure. Why?

- Jabber is an XML-based messaging infrastructure with extensible architecture.
- It allows to implement query-response scenarios.
- It also supports notification scenarios.
- It supports store and forward mechanism (important for occasionally connected clients)
- It is relatively firewall friendly (it uses client to server outgoing connections on predefined port).
- It is platform/ language neutral.
- Jabber's protocol XMPP is published recently by IETF as RFCs
- It is a proven (and running already) messaging infrastructure for communicating between humans and applications in all combinations.

The most interesting part is probably JEP-0060: Publish-Subscribe. This specification defines a generic publish/subscribe framework for use by Jabber entities. I think it can be reused for building publish/subscribe infrastructure for topic maps.

With this approach topic map providers can announce which topics are available for subscriptions. And consumers can subscribe to specific topics. As soon as new information is available about topic it can be distributed to all subscribers. Inside of messages we can use topic map fragments defined by TMRAP , for example.

Interesting example of JEP-0060 usage can be found on PubSub.com

Posted at 12:26 PM     Read More  


Sat - September 25, 2004

Subject oriented computing, new approaches to user interface and... Topic Maps


During last years we saw several interesting attempts to implement new user interface. Trick here is that just moving existing application centric computing model to 3D world will not do it...

It is nice to have shrinking/extending/rotating application windows or "3D room paradigm" but true paradigm shift is a shift to subject based computing.

I am thinking about several technologies/ideas which can help in building new user interface for subject oriented computing.

1. Marvin Minsky's concept of K-Lines ("Society Of Mind"). This theory of memory tries to explain how people can remember and use memories in solving new problems and addressing new situations.

"...Whenever you "get a good idea", solve a problem, or have a memorable experience, you activate K-line to "represent" it. A K-line is a wirelike structure that attaches itself to whichever mental agents are active when you solve a problem or have a good idea.

When you activate that K-line later, the agents attached to it are aroused, putting you into a "mental state" much like the one you were in when you solved that problem or got that idea...."

This theory can provide some insight into dynamic nature of subject proxy maps. According to this theory activation of some subject proxy leads to activation of some other subject proxies based on connections between proxies. At any moment of time different proxies can have different activation level. When we change our focus from one subject to another activation level is also changed.

2. Treemaps "... Treemap is a space-constrained visualization of hierarchical structures. It is very effective in showing attributes of a leaf nodes using size and color coding..." If we combine K-Lines with treemaps we can get very interesting result, I think. We can connect size of treemap items with subject proxy activation level and to use color to represent changes in activation level. Treemaps also can be very good for representing hierarchy of subject proxies.

3. Jef Raskin's concept of "applicationless" user interface . "...The idea of an application is an artificial one, convenient to the programmer but not to the user. From a user's point of view there is a content (a set of objects created or obtained by the user) and there are commands that can operate on objects....". Concept of "Zooming " is also extremely interesting from perspective of subject oriented computing.



Posted at 02:02 PM     Read More  


Wed - September 8, 2004

Managing subject proxies and Topic Maps: Enterprise perspective


If we would like to extend subject centric computing environment to Enterprise level we need to support multiple levels of relevancy and security for subject proxies and information resources.

In Enterprise case we deal not only with "personal" subjects, but with Enterprise, department, team relevant subjects. Some subjects can be available only for specific groups of users based on security policies. The same, of course, is true for information resources.

Applications often have own security models. If we want to export and merge subjects proxies from different applications we need to have "generalized" security model for subject proxy map.

In Enterprise environment information workers consume and produce information resources. They can wear different "hats" during a day. It is important to implement concept of subject relevancy based on context (role, workflow-based).

I see subject proxy map as a very active substance. At any time I can work with subject proxies which I have access to. But my "subject views" are optimized based on current context. I can manually specify context, for example: "Researching technology", "Testing application". If I start some application, application can introduce additional dynamic context. If I open resource (document, report, web page etc.), resource also creates a new context and activates subject proxies which are relevant to this resource.

One of the challenges is that most of existing Enterprise applications are not transparent in terms of subject proxies. Try to ask questions about different kind of objects application deals with. Typically "main" objects can be found in application forms and reports. Some secondary objects often do not have explicit representations. Application design documents (UML diagrams ?) can help at this step.

Posted at 12:10 AM     Read More  


Mon - September 6, 2004

Subject Oriented Computing - Topic Maps and management of subject proxies


It is surprising that modern desktop operating systems continue to ignore fundamental aspect of information processing, it's subject orientation.

Computers become better and better in helping people to create/edit/transmit information resources. Today we can easily manipulate resources of different types including pictures and music. But we still have minimum support for managing subjects of our interests. The main computing paradigm continues to be resource and application-centric.

Let's say that I participate in projects and I need to keep track of different information resources related to these projects. How can I do it now? I will probably create a spreadsheet and list project names with some summary information (start, finish dates, project manager, team members). I also can create subfolders on my hard drive for each project and try to keep documents related to each project in corresponding subfolder. But what if a document is related to several projects? What if I also would like to keep track of technologies used in each project? And, I am also is interested in managing information resources about different technologies (news items, industry reports, reviews, predictions, rumors etc.) Should I create a new spreadsheet with list of technologies and subfolders? Hmmm....

In the world of resource and application-centric computer environments it is "expensive" to manage subjects of our interests. We are forced to use tools which were not designed for this task.

Now let's switch to subject oriented computer environment. In this environment subjects (more precisely subject proxies) are basis of user experience. It is easy to create proxies for subjects which we are interested in. It is easy to describe relationships between subjects. It is easy to connect resources and subject proxies.

For example, in case of projects, I can type "projects" in a search box and I will get list of projects which I am interested in. I can also click on a "new" button to create a subject proxy for a project which is new or I just started monitoring. If I click on a project name I can get representation of a subject proxy which can include summary information and relationships with other subject proxies. I can see, for example, who is a project manager and project members, what technologies are used, tasks involved etc. I also can see all resources on my hard drive which are related to this project: documents, plans, emails, schedules, links. I easily can navigate to other subject proxies or resources.

The most important thing is that when I create a subject proxy for a project it becomes available for all applications on my computer. I can connect any resource managed by any application with this subject proxy. When I am working with resource my computer helps me to identify relevant subject proxies. I also can manually connect resource and subject proxy (for example, by dragging and dropping subject proxy to resource or vise versa). At any time I can jump between resources and subject proxies, create new and modify existing connections.

Applications can register "actions" available for subjects of specific classes. So when I am looking at project proxy I can easily jump to specific actions which are related to this project: create a new task, schedule a meeting, prepare a status report etc.

Applications typically do not "own subjects". They own and manage some partial information about subjects. But all these different pieces of information are combined together at "desktop level".

Topic Maps technology provides basis for building subject oriented environments. Some of the ideas described above are implementable right now with Topic Maps. Other ideas require deeper integration with desktop operating systems.




Posted at 05:39 AM     Read More  


Sun - August 15, 2004

"SOA Challenges: Entity Aggregation" from .Net Architecture Center and ... Topic Maps


In my journey to understand Topic Maps strength for enterprise knowledge integration I came across this article at Microsoft .Net Architecture Center: "SOA Challenges: Entity Aggregation" .

I love this article! It explains well challenge of information integration about the same subject (it is called an "entity aggregation" in the article). It also introduces concept of "Entity Aggregation Service" which is responsible for presenting unified view on entities across multiple enterprise applications.

It also demonstrates that SOA in fact does not provide solution for entity aggregation by itself. (That's already my interpretation :-) This is exactly place where database replication, cross SQL servers queries, XQuery-based integration, XML schemas etc. should be lifted to "knowledge level" using concept of ontology-based knowledge integration. And ... Topic Maps technology is a technology which can help enterprise system/software architects to implement this concept of "entity aggregation".

Posted at 08:03 PM     Read More  


Thu - August 12, 2004

How Topic Maps view enriches relational data sources


Relational databases represent important data sources in enterprise knowledge integration pipeline. How can we improve knowledge integration by providing Topic Maps view on relational databases?

Let's take a quick look at a development cycle of a traditional enterprise application.

During design phase development team creates a conceptual model of a future application. UML is used for this quite often. What is important that model explicitly represents semantics of the application domain. Typical model includes descriptions of domain classes, relationships, constraints.

Later this conceptual model is mapped to relational database. During this transformation process a lot of domain semantic information is lost or presented in compressed form. On a good side, this compression helps to build efficient application. On a bad side, it limits information integration from different applications (the same conceptual information can be mapped to different relational structures in various applications). Just looking at tables from two different applications it is difficult to find if they reference the same subjects and if they contain the same kind of information.

Service Oriented Architecture (SOA) promotes concept of information providers and information consumers. Typical SOA enterprise application is a provider of some well defined information and can consume information from other data sources. In SOA world there is a shift from application centric view to service-based view. Each service is responsible for management of own set of assertions about subjects. Two service providers should not have the same assertions about the same subjects except assertions responsible for identifying subjects. In SOA world we should know service-provider for any assertion. Shift to service-based architecture helps to minimize data overlapping between different applications.

How can we create topic map view on a relational database (or data service)? Firstly we identify primary information of this database. In SOA world it is already done by definition. Then we retrieve our UML diagrams (or create new ... :) and represent explicitly taxonomy of classes and relationships using "ontology" topic map. After that we define export procedure which produces "factual" topic map based on assertions from database/data service. "Factual" topic map is merged with "ontology" topic map. There is an extremely important issue regarding subject identification. We should clarify subject identification schema used by application/service for each basic class and define procedure for generating Public Subject Identifiers (PSIs). When we generate "factual" topic map we use these PSIs to reference subjects.

In ideal situation we should have only one identification schema for subjects of basic classes in enterprise applications. But it does not happen often these days so we have to build mapping topic maps which define mapping between different identification schemas.

Another interesting question is about enterprise ontology. Quite often UML diagrams are created specifically for each application. What happens in this case is that application designers "reinvent" again and again parts of enterprise ontology. With Topic Maps we try to explicitly define and use enterprise ontology. We also try to reuse existing standard ontologies.

When we try to verbalize ontology for application we should attempt to reuse existing parts of enterprise ontology and extend/refine it as needed. "Enterprise ontology" can sound scary. But even simple taxonomies help a lot in knowledge integration.

We do not need to export all information from databases to topic maps. A lot of factual information can be delivered to users in a form of "reports" or "dynamic resources". Topic Maps are ideal for representing relationships between subjects, summary facts about subjects and references to static and dynamic resources. With this approach users will be able efficiently navigate between different subjects and when they need details they can "jump" to dynamic or static resources.

As a result we have "virtual" enterprise-wide topic map with shared ontology and shared (or mapped) identification schemas. This topic map effectively represents classes of subjects and resources important for enterprise business processes. It has summary information about all important subjects, relationships between subjects and cross-references between subjects and resources.

Posted at 11:37 PM     Read More  


Wed - August 11, 2004

XQuery-based data integration - one step forward, can we do two?


XQuery engine can be used to query and combine results from multiple data sources. This is a step forward in enterprise data integration. Can we do more?

Of course we can do better integration using Topic Maps.

XQuery-based integration tries to address a real business problem: How does an organization get a consolidated view of its information?

XQuery-based solution:
- get information from diverse sources in XML
- use XQuery adapters if XML is not supported natively by data sources
- create aggregated views using XQuery transformations, joins and filtering

Can we do better with Topic Maps?

XQuery does information integration at data level nicely. But it cannot handle integration at knowledge level.

Main problem with pure XML-based information integration is that semantic information is hidden (lost?) in XML. Lack of explicit semantics limits ability to use general information merging rules. Each XML-based query in fact implements own information merging procedure.

XML-based integration also ignores two other important problems:
- providing standard mechanism for identity mapping between different information sources
- providing standard mechanism for checking data integrity

Topic Maps technology helps to deal with the same problem "How does an organization get a consolidated view of its information?" at a different level. Instead of XQuery view we introduce Topic Maps view of data sources. Topic Maps technology promotes usage of ontology as a basis of information integration. Topic Maps view of specific data source has not only factual information but also is enriched by information about classes and taxonomies (which is typically lost in relational databases).

Topic Maps technology defines standard mechanism for merging information from different sources. This technology cares about identity management / mapping and suggests best practices to minimize problems with different identification schemas.

Using TMCL it will be possible to check integrity of resulting "virtual topic map". TMCL will allow also to monitor business-related constraints using powerful rule-based language.

It will be very easy to create XML views based on "virtual topic map" using TMQL and/or templates. Difference with pure XML solution is that we can have semantically enriched XML. For each item we can keep reference back to topic map constructs. We know WHAT is represented using XML.

Creating XML views still makes sense because it allows to use standard well developed technologies (such XSLT, later may be XForms?) at the end of integration pipeline.

And ... of course, Topic Maps-based integration hub is not a replacement for XQuery hub. XQuery hub nicely implements concept of "virtual resources". Topic Maps hub helps organize these and other resources in semantically enriched "knowledge map".

Posted at 09:21 AM     Read More  


Sat - July 10, 2004

Topic Maps - based information integration


I was recently involved in implementation of an intranet portal. Topic Maps engine was out of question from the beginning...

Project team had to do a lot of traditional portal-based information integration. The portal combines information about topics of several classes with cross-references between topics. Project team implemented several portlets which represent various pieces of information. These portlets are combined into templates (one per class). Portlets extract information from several databases, simple document management system and several collaborative applications. Documents are tagged with references to portal topics. Integrated full-text/metadata search engine allows to find all relevant resources to specific topic. Typical portal project.

This project helped me a lot to understand how Topic Maps - based information integration is different from traditional portal integration.

I would like to describe some "ideal knowledge integration scenario" which can be partially implemented with existing Topic Maps software. Some features are in fact in my "wish list" for next generation Topic Maps engines and development infrastructure.

Hmmm...

I would like to start a new portal project with designing/implementing ontology. I would like to describe classes of objects which portal deals with. On next phase I would like to define mapping between existing data sources and ontology. I would like to specify where I can get different pieces of information and what kind of transformation is required. I should be able to see "mapping gaps" at any time.

I prefer "conceptual language" as a target of these transformations. As a portal developer I do not think in terms of relational tables or document forms or even XML trees. I think in terms of objects, classes, properties and relationships. Mapping to conceptual language automatically adds metadata to factual information from data sources.

I should be able to define caching/update strategies for data transformations. In some cases I would like to do "just in time" transformation and integration. In some cases I can use some kind of caching. I would like to have some smart agent which manages updates from different data sources based on declared polices. This agent preferably uses asynchronous communication with data sources for improved performance. Agent creates virtual, ontology enriched knowledge base.

One of the key issues in knowledge integration is identity management. Quite often different data sources use different identification schemas for the same subjects. "Ideal Integration tool" should help to define/implement/monitor identity mappings.

Knowledge base agent also uses ontology and rule-based constraints to identify knowledge conflicts. I would like to have ability at any time to check existence of conflicts. I also would like to have conflict notification mechanism and conflict resolution workflow. Some conflicts can be resolved automatically based on defined polices. Knowledge conflicts are natural feature of any open information system. We are not "afraid" of conflicts. We should have infrastructure which helps to deal with them. Conflict identification and resolution mechanisms should improve knowledge base consistency.

It is important to note that knowledge base includes not only facts about business objects but also facts about different resources (documents, reports, diagrams etc.). Non- and semi-structural information is integrated with other data sources.

Knowledge base should support query language. It allows to query virtual knowledge base at 'conceptual level'.

At this stage we have already in fact "information portal". It just does not have visual representation.

It is much easier to define and implement visual part of a portal in this scenario. All knowledge integration is already done.

What we need is to define some views and templates. When we define views we rely on knowledge base query language.
View design can be guided by portal ontology.




Posted at 02:21 PM     Read More  


Sat - May 1, 2004

It is time for "Save as XTM" initiative


More and more applications can produce XML representation of internal information and save it to shared storage. It helps users to synchronize information on several computers. XML representation also helps to create user communities based on sharing of information. Think about shared calendars, music and picture mixes, blogs, recipes. It's nice, but it can be much better... with topic maps.

Topic Maps provide "out of the box" support for information sharing and merging. This support is based on ability to explicitly represent subjects and ability to connect any piece of information with subjects.

If we have a blog entry, for example, we have a standard mechanism to express that this entry is related to specific subjects. And we have a standard way to merge information from several blogs. As a result we can easily find all blog entries related to the same subject.

"Pure" XML solutions can encode relationships between information pieces and subjects. But these solutions are based on custom schemas. Each time we need to define custom merging rules which also can include transformations between various XML schemas.

It is time... it is time to promote XTM format as "save as" option for various applications. Applications can use optimized internal data models to implement specific set of functions. But applications can also publish Topic Map - based representations of internal information to shared storage. Other applications can "subscribe" to external topic maps and merge external and internal information. Of course, applications remember source of information so users can keep track of "who said what".

With "save as XTM" support it will be possible to use "universal topic map browsers" to explore information from different applications. Users also will be able to rely on specific applications with optimized views.

Posted at 01:18 PM     Read More  
Three levels of information integration
Topic Maps in investment industry


©