Attended 8th Search Engine Meeting in Boston
AbstractDay Two of
the 8th Search Engine Meeting organized by infonortics featured a number of
interesting talks on search engines and knowledge management. Presentations by
Michael McGlashen of Dow Chemical and Susan Feldman from IDC were especially
fascinating.
Unfortunately, I missed Day One of the meeting (yet another instance where that pesky day job, the one that keeps my wife and son in health insurance, got in the way), which featured what looked like some good talks on text mining and taxonomy issues from Vivisimo, Elron Software (why anyone would pick a name so close to Enron is beyond me) and Convera. The staff at infornortics promise that the presentations will be made available on their
website soon so hopefully I will be able to review them there. Let’s focus on three of the presentations that I particularly enjoyed and may have some baring on the patent analysis space.
The first presentation was from Matthew Koll, of
Wondir Foundation, on a community approach to answering questions on the web. This reminded me a little of Google Answers (click
here for an article from a former searcher on this service) except that instead of hiring searchers to answer the questions the system opens the question up to the community-at-large for answers. The hope is that the majority of these interactions will take place in real-time using instant messaging so the user can get an immediate response. There is even a ticker that scrolls by with questions that are being posed. People are encouraged to register with the organization as an expert and accept questions in their area of expertise. If the user likes the answer they can make a donation to the foundation and “tip” the person who provided the answer. The idea is to facilitate a virtual community where collective knowledge and experience can be shared. This approach also reminded me of the “Ask the Audience” lifeline on “Who Wants to be a Millionare”. How often do you think the majority of the audience was flat wrong on a particular question (I didn’t watch very often but it seems to me that the audience got the answer right a lot of the time)? I think I will sign up as a patent analysis expert and see if any questions come my way. It would also be terrific if some PIUG members signed up so patent information questions can be adequately addressed. As a matter of fact, I believe there will also be an option where PIUG as a community can be represented and given credit for their participation.
Susan Feldman from IDC presented a case for merging text search engines (which she refers to as content engines) with data engines (your typical relational data warehouse) so that mixed queries can be worked on. Her premise is that the content engines work well on questions such as, “who was the first President of the United States?” which a data engine would be unable to process since this is not a standard SQL type query while content engines don’t do a good job on data questions such as, “what were the gross sales for Gatorade in 2002?”. Combining the systems would allow an entirely more complex series of questions to be asked. A system of this sort would understand what portion of the question should be directed to the data engine and what should be discovered using the content engine. Since patents represented a mixed collection of data elements (the bibliographic data) and content elements (claims, abstracts, et...) a simple query system for this type of a database using this type of approach makes a good deal of sense.
Perhaps my favorite talk was from Michael McGlashen of Dow Chemical who talked about Knowledge Management for Science and Technology R&D. When Dow acquired Union Carbide they were faced with the problem of assimilating and distributing the vast collection of knowledge (being kept in old file boxes), which was inherited with the purchase. Michael described the process Dow went through to make the archives available electronically and how they indexed the collection (along with the Dow archives). Since these were technical reports they wanted the information to be structure searchable as well. They were able to accomplish this project with the help of Chemical Abstracts (registry services), Xerox (scanning and text searchable PDF creation), ClearForest (information extraction and tagging), Battelle SPIRE (used for information visualization) and IFI (machine assisted indexing). Michael claims that they were able to accomplish this project at half the cost Chemical Abstracts proposed to do it. Having done some work of this type I think the folks at Dow made some excellent choices on which vendors to work with.
Harry Collier and the staff at Infonortics are to be congratulated on an excellent meeting that generated a number of provocative ideas.
Posted: Wed - April 9, 2003 at 07:31 PM
Patinformatics Tony's Reflections Email Comments