On OSS as prior art, tagging coding semantics, and folksonomies
This will probably be my most buzzword-filled blog post title of the year!
On Jan 10 IBM
announced that in collaboration with the US Patent and Trademark Office (USPTO), and the global open source software development community, it is initiating three projects to "improve patent quality":
- the creation of an Open Patent Review process, which will allow experts in industry to contribute to the review of pending patent applications
- a project to make it easier to use the body of available Open Source Software as prior art in patent applications
- the creation of a Patent Quality Index, which will be used to help tune the patent approval process and (hopefully) incent people to draft better patent applications.
All of these three initiatives are interesting, but it's the second (the project to make OSS searchable as prior art) that I think is particularly so from a technology point of view.
The idea is that IBM will work with OSDL, Novell, Red Hat, SourceForge and others to develop a system that "stores source code in an electronically searchable format [that] satisfies legal requirements to qualify as prior art". The core of this scheme will be a taxonomy that can be used by developers, patent examiners and others to "describe and help locate relevant source code and documentation".
This is interesting for a number of reasons.
First off, it's a fascinating potential application of tagging. Of course annotation has been applied to source code for a long time (both automatically, in the compilation process; and latterly in languages like C#, Java and of course in vendor-specific SQL hints). But this proposal is about annotation of the semantics and patterns of the underlying design behind source code, by people on a grand scale. It's hugely ambitious.
There is a mass of questions which spring to mind but a couple of key ones are: who will create the taxonomy? What kind of process will be involved in its creation? As things like the
Dewey Decimal System show, taxonomies can be great for "putting information in" to a system; but lousy at helping regular punters get information out.
Given the organic, open and participatory nature of the open source software lifecycle, it strikes me that a
folksonomy might work better than a top-down imposed taxonomy. Although (as
Clay Shirky points out) we are likely to have the kind of "expert catalogers and expert users" that make a formal ontology useable, what we don't have (I believe) is the kind of "stable and restricted entities, and clear edges" which would be required. Moreover we don't have "coordinated users".
However the challenge with a folksonomy is that the community-generated categorisation scheme – whatever it might be – would have to evolve to a relatively stable point before it could be useful in something like patent prior art searching.
Perhaps we need some
professional tag gardeners to help chart a middle ground between the "folk" and the "tax"...?