You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jos van der Meer <jm...@aidministrator.nl> on 2003/07/01 16:16:16 UTC

Lucene + controlled vocabulary + guided exploration

We use Lucene in a way in which we provide a controlled vocabulary
to the end-users + continuous feedback about all combinations of
concepts that occur.

Creating a controlled vocabulary is not harder for you than it is for
the average user to construct a good, selective query. However, the benefits of
having a controlled vocabulary are much bigger.

This is the setup:

	- have of lot of unstructured documents (locally or on the web)
	- index them by Lucene.
	
So far nothing new, but now:

	- create a controlled vocabulary:
		* For each concept:
		  	write down a corresponding Lucene query
		* Group related concepts
		
An example controlled vocabulary is "abnamro.xml"
http://spectacle.aidministrator.nl/spectacle/channel/lucabn/files/abnamro.xml
(by default the concept name itself is the Lucene query.)

Download Spectacle:Server from http://spectacle.aidministrator.nl/ and
offer guided exploration (see below) over the results. 

You and your end-users will be more succesful in exploration 
("find valuable results you were not looking for") and regular finding 
("find what you were looking for").
		
The results on a small set of documents is in:

	http://spectacle.aidministrator.nl/spectacle/channel/lucabn
	
It's not the best demo ever (..), but it is a start. Do it on your own data,
create your own controlled vocabulary and try for yourself. You'll find
all source code you'll need in the Spectacle:Server download.

You're invited to forward this mail, or to reply to:

	
jos.van.der.meer@aidministrator.nl
aidministrator nederland bv  -  http://www.aidministrator.nl/
prinses julianaplein 14-b, 3817 cs amersfoort, the netherlands
tel. +31-(0)33-4659987   fax. +31-(0)33-4659987


=============================================================================
Background: 
=============================================================================

	* In fact we are using Lucene as a poor man's classifier.
	  A document is considered to contain a concept "A" if the document
	  matches the query specified with "A".
	  This mechanism is not only cheap because Lucene is for free,
	  it is also cheap because the creation the characteristics
	  (in this case a query) is simple and elegant. It takes a lot
	  less time than e.g. assembling a training set for each concept.
	  
	* Even though it's a poor man's solution, it is surprisingly
	  well performing. The reason is that the query can deviate a lot
	  from the concept itself and a lot of human expertise can be
	  'implemented' in the query definition.

=============================================================================
Guided exploration consists of: 
=============================================================================

    - continuous feedback to the user about the
      options which limit the amount of matching documents.
        -- there is no need for the user to type-in anything.
           Spectacle shows what it knows.
        -- eliminates the famous "no hits" behaviour of
           so many traditional sites
        -- enables guided specialization whenever the current
           result set is too large.
    - continuous feedback to the user about the
      options which lead to alternatives for the current set of matching
      documents.




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org