Flawed “Growth of Embedded Software Related Patents” Paper Appears in June 2003 Issue of the Journal Technovation


Abstract
Authors Douglas McQueen and Henrik Olsson have published a study of the growth of embedded software related patents in the June 2003 issue of the journal Technovation. Unfortunately, their methods are highly questionable and the dataset they used to perform the analysis is probably highly flawed.

I have to admit getting a little excited when this paper first came across my alert screen (click here for a look at the abstract from ScienceDirect ). I was anticipating an interesting study and another example I could point to in order to demonstrate the utility of patent analysis for relevant business questions. However, in the back of my mind there was a nagging little voice that warned, “Don’t get too excited, there is a good chance the assumptions that went into their search are way off the mark.” Have a look at the abstract and you will see what I mean. “The distribution of embedded software related patent applications across 118 IPC patent classes has been determined for 1988, 1993 and 1998 using a bibliometric technique. The patent applications were identified using search words based on patent claims concerning various aspects of embedded software. The number of these patent application assignments each year increased at an annual rate of about 17% while the distribution over IPC patent classes narrowed only slightly. In terms of patent classification sections, in 1998 over 41% of all embedded software related patent application assignments were in section G (Physics) while section H (Electricity) contained another 36% of them. In terms of patent classes, the most important patent classes are H04 (Electric communication technique) and G06 (Computing, calculating, counting) accounting for a little more than 40% of all embedded software patent application assignments. In only two patent classes, H04 and G06, is there a majority of embedded software related patent applications. Since patent application assignments are mainly according to the character of the problem to be solved and its solution, rather than the field of the application of the solution found, these statistics relate to the technical problems solved by the inventions. This is in contrast to economic statistics on the distribution of embedded software over branches of industry that characterizes the application of the problem solutions.” Turns out that when I read the article the little voice was right. This article is flawed on a lot of different levels but the conclusions of the analysis are almost certainly way off based on the way they did the search. Even before the problems with the search, I totally disagree with another of the assumptions they made. The authors are quoted as saying, “We have chosen to study patent applications rather than granted patents because the applications are nearer the respective inventions than are the granted patents.” There are a number of problems with this statement first and foremost being the fact that granted patents have been examined and regardless of your opinion on patent quality, granted patents are going to be better indicators than applications that may never grant due to a number of factors including lack of commercial application, lack of inventive step, et... Not that I don’t think some useful information can be gleaned from applications, since I do, but to ignore granted patents out of hand seems like a poor decision. Getting back to the search in question here. The authors started by looking at words found in the patent claims of 24 US and 22 EP patent documents and attempted to identify words, “with a very strong exclusive relation to embedded software”. They did this by stripping away obvious, non-content bearing terms and words that could have multiple meanings. The remaining words were treated as follows, “For each candidate search word the number of patent documents in which the word was found at least once in 1993 was generated as well as a list of the first 500 such documents. (In a few cases this latter list was less than 500 documents long.) The first 10 and last 10 patent documents were checked to determine whether they indeed related to embedded software.” Thus by only sampling 4% of the possible number of documents they claim that they have chosen the correct words for a search strategy. This approach might be ok if an individual expected to retrieve a large percentage of false hits (for instance one of the words used was data, which in their examination of 20 documents, all 20 were on target, but imagine that data could also simply apply to any piece of discovered information whether it came from software, a computer or human observation) and was able to further refine the set to enhance its relevance. The authors however stopped working on their search strategy at this point and used these terms alone to build their query. The fact that they then started counting occurrences of IPC codes within this collection assuming that they were all discussing embedded software might be the most egregious thing about this paper. So, the authors have started with a data set that probably contains a large number of irrelevant documents and then they decide to start assigning relative importance based on the distribution of IPC codes with this set over time. If this was not bad enough, they decided to count the number of patent applications in each patent class by summing the occurrences of the sub-classes. They understand that documents can be assigned more than one IPC code and that this will eventually skew their numbers (what they might not have realized is that Derwent, whom they used for this part of the study, add IPC classes from family members when they differ from what is listed on the basic patent so there are even more IPC codes to choose from some assigned by relatively dubious sources). In several cases they end up with a ratio of embedded software documents to total number of patent applications within the IPC class of greater than 1. In other words there were more codes than actual documents in these cases. Why they did not simply use the main IPC code only in these cases is beyond me since this would eliminate the problem of additional IPC codes and would represent the essence of what was supposed to be captured by the document. I could actually go on and on about this study but I will leave it to the readers to have a look at this paper, if they like and come up with more problems themselves. The bottom line is that studies like these give the field of patent analysis a bad name and one has to be concerned that the editors of Technovation could not tell that this study was severely flawed. Even scarier still is the thought of the business leader who is using this study to make decisions.
Posted: Mon - April 14, 2003 at 08:48 PM   Patinformatics   Interesting Reference Articles   Email Comments


© Anthony Trippe