Fri - June 10, 2005

Yahoo, Mindset ... and subject centric computing


I really like idea behind Mindset research project

It is great that we can easily change ranking of the search results by specifying our intention. I would suggest to extend idea of dimensions and to introduce "Document - Subject" dimension.

I think we should have one "Subject" node on the right side. On the left side we can have different nodes which correspond to different dimensions (Shopping, News, Entertainment etc.) . I think that other side of each dimension is "Subject" so all these dimensions are connected to one subject node.

So if we position our intention close to subject node we will have something like "Answers.com ": results are subject centric pages. But we can add different "flavors" to search request by adding more "News" or "Shopping" for example. Search results become less subject centric in this case.

Posted at 11:44 AM     Read More  


Wed - October 27, 2004

Resources and Subject pages ... towards better web


One of the most interesting aspects of subject centric computing (and smart search) is a relationship between resources and subjects.

I use several news web sites such as EWeek and ZDNet to check news in computer industry. Each time when I read an article I (automatically) estimate it from perspective of subject centric computing. General practice is to reference other news articles on the same web site, provide links to company web sites and links to conference web sites.

That's not how it should be in a subject centric world. In subject centric world we should have special kind of resources - subject pages. Subject page provides summary information about subject with links to other subjects and (regular) resources.

Several examples of subject page collections:

Wikipedia (sample pages: Topic Maps , Treaty establishing a constitution for Europe, Search Engine , WiFi Microsoft ,IBM)
Internet Movie Database (sample pages: Troy , Brad Pitt, Wolfgang Petersen )
Yahoo Finance (sample pages: Microsoft IBM)

Italian Opera Topic Map (sample pages: Tosca , Giacomo Puccini , Teatro alla Scala )

When I read an article on IT news web site I would like to have references to subject pages for companies, events (! such as company merges , spin-offs, product announcements), technologies, people, products etc. In this case I can easily jump from article to subject pages where I have links to other related subjects and resources.

I think web can be much more user friendly if we have additional resource layer - subject pages.

Posted at 11:45 PM     Read More  


Tue - October 19, 2004

Google desktop search or ... "Where are my subjects?"


I installed Google desktop search (beta) last week. I have mixed feelings about it.

I really like concept of desktop and Internet search integration. What I do not like is that I cannot organize search results around subjects and subject categories. So now in addition to all those links on Internet I also have thousands links from my hard drive and web pages cache.

I can just repeat my comments and recommendations from
Apple's Spotlight, What do we search for and ... Topic Maps and Smart search, blinkx and ... Topic Maps

with probably one addition.

In short term it can help if Google desktop search can allow users to define virtual folders/categories. When users define virtual folder/category they can specify which URI patterns should go to this folder. Folder rules also can use metadata. So, for example, if I trust Wikipedia as a source of my "subject - centric " pages, I can specify that pages from Wikipedia under specific category should go to specific virtual folder in my desktop search. Same is true for "local subjects". I can run desktop Wiki (and/or topic map engine) and use it to manage my own subjects and produce local subject centric pages. Now, if I have virtual folders/categories defined and I type keywords in a search text field, I should get search results organized by virtual folders.

Posted at 07:33 PM     Read More  


Sun - October 10, 2004

Smart search ... and specifics of Topic Maps approach


I read several interesting materials regarding search improvement last week: comments about Microsoft Search Champs conference, notes from Web 2.0 conference and Thomas B. Passin's book "Explorer's Guide to the Semantic Web". A lot of different ideas and approaches! These thought-provoking readings inspired me to elaborate on Topic Maps approach for solving search problem.

I would like to start with reference to Lars Marius Garshol's work "Metadata? Thesauri? Taxonomies? Topic maps!" and his summary of benefits of using Topic Maps for search. What is not probably obvious is how Topic Maps can be used by "traditional" search/directory engines such as Google, Yahoo, MSN to implement new generation of search. And also how Topic Maps can be used for building Internet-scale search infrastructure.

1. Topic Maps - based approach shares understanding that people are searching mostly for information, not for "documents/resources".

2. Information is "hidden" in resources. Resources are optimized for reading by humans, not computers.

3. Automatic extraction of information from resources is expensive operation with limited reliability.

4. Automatic reliable matching queries with information in resources is expensive operation.

5. Context plays important role in search. People can play different roles and can switch area of interests.

6. Personalization is important. Each person can have own topics of interests and requirements for information retrieval. Personal interests are relatively stable.

Instead of concentrating on advanced general algorithms for 3 and 4 using 5 and 6, Topic Maps approach breaks tradition of working in a "resource world" and suggests to shift efforts to a world of "topics of our interests" or subject proxies.

Topic Maps approach concentrates more on a question of how to create a distributed network of information providers and consumers based on interchange standard for managing "maps" of subject proxies linked with resources.

Topic Maps approach is based on explicit management of subject proxies which represent "topics of our interests". With Topic Maps we also explicitly represent summary of information about subjects and their relationships.

We also connect "world of subjects" with "world of resources" using explicit links. Topic Maps approach does not really define how these links are created. It can be done manually by person, by sophisticated linguistic or statistical algorithms or combination of available methods.

Topic Maps approach is supported by ISO standard which helps to create, interchange and merge topic maps and in the future query and constraint topic maps.

Any person, organization, company can be provider of information using Topic Maps interchange standard.

Topic Maps standard does not force information providers to use topic maps for internal representation. Information suppliers can use relational, XML, object databases with different schemas to represent information. The only requirement is to provide "topic map view" using interchange standard.

If topic map views are available information from multiple suppliers can be aggregated. This aggregation can be done by aggregators (such as Google,Yahoo, MSN), and/or directly on desktop. This reminds us a world of RSS with exception that we are interested in distributing and aggregating topic maps instead of RSS feeds. Inspiring preview of desktop topic map aggaregation can be found in Steve Pepper's presentation "Seamless Knowledge with TMRAP"

Problem of search against network of resources is replaced by a problem of search against network of subject proxies and resources. Second approach can provide better user experience because it effectively bridges the gap between resources, subjects and users.

Posted at 12:52 PM     Read More  


Fri - October 1, 2004

Smart search, Yahoo, iTunes and... Topic Maps


I was thinking who can provide this kind of subject proxy service which I described in previous posting

First company which came to my mind was Yahoo. They already manage huge directory of subject proxies. So, for example, if I am interested in philosophy there is already subject proxy for this topic and subject proxy for class Philosophers Web page which represents this subject proxy has a list of philosophers with some comments. Very subject centric! (almost, there are some links to resources there) Let's click on Martin Heidegger link, for example. Hmmm... I see links to several resources related to Martin Heidegger with some comments. In subject centric environment I expect more. I would like to see summary of facts about this philosopher with links to other subject proxies. These facts can be aggregated from several different sources. This set of facts is the first thing I would like to know and see. After that I should have links to different resources: original publications, comments, reviews, related works, news, pictures, blogs, RSS feeds etc. And with Yahoo personalization I should have ability to specify that Martin Heidegger is a topic of my interests so I can easily get access to facts and resources about this topic.

What about Google (directory)? From perspective of subject centric computing it is very close to Yahoo directory. For example, there is page for Philosophers and a page for Martin Heidegger. Unfortunately, again, mixture of "subject" and resource links, no "facts".

My second thought was about Apple's iTunes music store. True, it has limited "ontology". But it is very subject centric. We can find "subject pages" for artists, albums, genres. We can get biography, links to influencers and contemporaries. Search provides results grouped by subject classes: albums, artists, songs.

I do not mind to have iTunes-like subject centric service as part of extended .Mac. .Mac also can provide some interesting ideas about subject proxy synchronization. Think about iDisk extended to idea of subject proxies. I can use my "subject proxies" in disconnected scenario. I can add new proxies, facts, links to resources. It can be synchronized with "subject proxy server" and with local copies on my different computers. And with Apple's seamless network connectivity I even can share some of my subject proxies, comments and resources with my friends sitting somewhere in Starbucks.

Posted at 08:46 PM     Read More  


Tue - September 28, 2004

Smart search, blinkx and ... Topic Maps


I was experimenting with blinkx recently and I tried to understand how close/far it is from subject centric computing model.

Blinkx "attaches" nicely itself to Internet Explorer, Word and several other programs. When you look at resource using these programs you can select some phrase and blinkx will try to find available resources which are related to concept(s) in selected phrase. Blinkx has several channels for resources: local drive, internet, news, products, video clips and web logs. List of channels can be extended. That's nice.

What is not so nice, I think, is that concept of subjects is hidden and not available for users. When I select some phrase blinkx tries to find "ideas not keywords" behind this phrase. But I cannot really see what blinkx's guess is. I only can see resources which somehow are related to blinkx's guess.

I think better results can be achieved if we introduce subject proxies explicitly and allow user to manage subjects of his/her interests. I would split channels into two groups. First group represents subjects which I am interested in: People, Projects, Technologies, Products etc. Second group represents resources such as News, Reports, Video Clips etc.

When I select some phrase on a web page or document, entries in all channels (resources and subjects) can be activated with different "relevancy" level. I personally will go in most cases to "subject" channels and will jump to "subject page" which is a summary of information about specific subject combined from different sources. So when I select word "Troy" on a web page I would like to have activated subject proxy for city "Troy" in my City channel and subject proxy for movie "Troy" in Movie channel and in my DVD channel... well... DVD "Troy". Relevancy can be assigned based on whatever algorithms are available for web page "subject scanning".

As a next step I can jump, for example, to Troy-movie. My smart search client (with my help) "knows" now better what I am really interested in. It can go to server(s) and retrieve summary information about Troy-movie. Troy-movie is in my focus now, it changes activation level of different resources and subject proxies.

So if I look at People channel I will probably see several names related to this movie. I can jump to one of this names and activation level of resources and subject proxies will be changed again.

Now, let's say I selected a name of a person on a web page and my smart search client cannot give me any reasonable suggestions. Well... I drag and drop this name to People channel and new local subject proxy is created. As a next step my smart search client goes to server environment and tries to find well known subject matching my local subject. If there are some suggestions and I agree with one of them my local proxy becomes connected with "world wide" subject proxy network.

Server environment can monitor "false" subject requests and create new public subject proxies for subjects which become "popular".

That's how I see smart search...

Posted at 08:25 PM     Read More  


Wed - September 1, 2004

Apple's Spotlight, What do we search for and ... Topic Maps


I recently enjoyed watching "Tiger" presentation and specifically presentation of a new Apple's search technology - "Spotlight"

As many other people I would like to have this kind of search now on OS X, Windows and Linux computers. I also would like to have this kind of search for enterprise document repositories.

What I cannot find in this demonstration is an explicit concept of "subjects" or "topics". If I select a name of a person in email, for example, I can find all emails, presentations, calendar entries, documents, images etc. which have reference to this name in a file name, metatags or in document content. But can I find all projects which I manage? Can I find all applications which I am responsible for? Can I find all servers which I have to check from time to time or all technologies which I am interested in? Projects, applications, servers, technologies are subjects in my area of interests.

When I do search, I would like to search not only for resources which reference my favorite subjects, but also for other subjects which are connected with subject in focus.

So I will probably add topic map engine to Spotlight on my OS X computer as soon as Tiger will be available. How will I use Topic Map engine? I will use it to define subjects which are not covered by standard OS X applications. I will use it to manage relationships between subjects in my area of interests. I will also create a script which creates pseudo-documents (in html format?) for each subject. Each pseudo-document will have all names, inline occurrences and associations. I can also create document proxies for external resources which are not located on my hard drive (if Spotlight/Safari do not allow to attach custom metatags for bookmarked URIs).

It seems that Spotlight allows to define custom document categories/types. So I can define pseudo-document types for my subject classes, such as "projects", "applications", "people", "servers", "companies", "technologies" etc. Now I can use standard system-wide Spotlight engine to search subjects and resources. And I can use Safari to navigate between different subjects.

Posted at 09:03 PM     Read More  


Tue - January 27, 2004

(About) IBM's content management strategy


I recently had a chance to look at IBM's direction in content management. There are several solutions for content management in IBM's portfolio right now. Some of them can work together. Some of them have overlaps. And some of them do not have good integration with others...

Strategic plan is (as I understood it) to leverage JSR 170 to provide general unified interface to all content storages/products. As addition to JSR 170, WebDAV can be used to some extent.

It looks like JSR 170 will address basic features important for content management in Java world. I personally is interested in comparison JSR 170 with Microsoft's WinFS (and both with Topic Maps).

Relevancy to Topic Maps?

JSR 170 (and WinFS) allows to represent metadata and relationships between various subjects.
There is a basic (content) schema and ability to extend basic schema.

What I think is interesting for Topic Maps community is ability in future to leverage metadata initiatives supported by IBM and Microsoft. I hope that Topic Map engines will be able to import (may be virtually) metadata from both systems. Stay tuned for Topic Map Thoughts about some improvements required for TMDM to support this kind of import.

I am sure it will be also possible to represent topic maps as "content/metadata provider" for JSR 170 (and WinFS) based systems.




Posted at 02:45 PM     Read More  


©