ACS Meeting Talk on Comparing Source Titles and Abstacts to Enhanced Titles and AbstractsOriginal
Abstract:
In recent years text-mining software has been developed that allows analysts to organize and visualize large collections of documents without having to read and manually place each individual document. In particular, ThemeScape from Aurigin Systems allows large collections of documents to be clustered based on co-occurance of subject topics or themes. Once similar documents are group based on shared content they are visualized using a topographical map representation. These maps allow for relative document density to be measured based on the height of the content peaks and they allow for secondary relationships to be identified within a set by observing the relative distance between document clusters that are spatially close together. In previous work, a map created from original patent titles and abstracts was compared to a map created using intellectually assigned, hierarchical classification. These maps were similar to one another but it appeared that the original title and abstract was a better analysis source for clustering documents by their function or use. The current study will continue to explore this area by comparing a map created from the original titles and abstracts to the identical collection of chemical patent documents using enhanced titles and abstracts produced by Chemical Abstracts Service and Derwent Information. The discussion will revolve around differences and similarities in each approach and will attempt to provide information on which source provides the most valuable insight under different circumstances. I gave this talk at the 223rd American Chemical
Society meeting in Orlando, Fl and the slides are
attached.
Visual_Chem_Pats.ppt
The funny thing about this talk is that I wrote the abstract for this presentation in the fall of 2001 and when 2002 came along and it was time for me to actually write the thing I had left Aurigin Systems (which had declared bankruptcy) and ended up using software from OmniViz instead of ThemeScape from Aurigin. I probably could have gotten someone from Aurigin to let me use ThemeScape but it seemed a lot easier to use OmniViz which is a close cousin to ThemeScape (both were developed by Battelle) and was recently introduced to me by the session organizer Lou O'Korn. As it turned out I learned a great deal about OmniViz and have been a supporter ever since.
I was genuinely curious about what value the so called "value added indexing" providers added to patent abstracts when they re-wrote them. It has been my experience that most of the time the abstracts are very similar to the ones found in the original documents and therefore provide very little extra value. This study was done on a very small set of data and is no way statistically relevant but it appeared to me that I was just as likely to get interesting results using text mining to analyze patent documents using the source title and abstracts as I was using the ones from CAS and Derwent. I will leave it to the reader to look at the limited data I included and draw their own conclusions. Posted: Sun - March 30, 2003 at 07:11 PM Patinformatics New Presentations and Papers Email Comments |
Quick Links
Calendar
Categories
General Information
Interesting Reference Articles New Presentations and Papers Tony's Reflections Vendor News Archives
XML/RSS Feed
Statistics
Total entries in this blog: 39
Total entries in this category: 7 Published On: Jun 06, 2003 06:53 AM |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||