You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Chris Wildgoose <ch...@knowledge-stream.com> on 2006/07/20 20:19:50 UTC

Using Lucene for Semantic search

I have been working with Lucene for some time, and I have an interest in developing a Semantic Search solution. I was looking into extending lucene for this. I know this would involve some significant re-engineering of the indexing procedure to support the ability to assign words to nodes within an ontology. In addition the query would need to be modified. I was wondering whether anyone out there had gone down this path? 


Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Using Lucene for Semantic search

Posted by Chuck Williams <ch...@manawiz.com>.
I have built such a system, although not with Lucene at the time.  I
doubt you need to modify anything in Lucene to achieve this.

You may want to index words, stems and/or concepts from the ontology. 
Concepts from the ontology may relate to words or phrases.  Lucene's
token structure is flexible, supporting all of these.  E.g., you can
create your own Analyzer that looks up words and phrases in your
ontology and then generates appropriate concept tokens that supplement
the word/stem tokens.  Concept tokens can similarly span phrases. 
Presuming you want some kind of word sense disambiguation through
context, you can either integrate your model into the Analyzer or create
a separate pre-processor.

The same Analyzer or a variant of it could be used to map the Query into
tokens to search.  This would support concept-->concept searches, useful
for example in cross-language search.

Word sense disambiguation is generally more difficult in typically short
queries, so there are alternatives worth considering.  E.g., you could
expand queries (or index tokens) into the full set of possibilities
(synonym words or concepts).  If you have an a-priori or contextual
ranking of those possibilities, you can generate boosts in Lucene to
reflect that.

If all you want is ontologic search, there are your hooks.  If you want
more sophisticated query transformations, e.g. for natural language Q&A,
you probably want a custom query pre-processor to generate the specific
queries you want.

Hope these thoughts are useful,

Chuck


Chris Wildgoose wrote on 07/20/2006 11:19 AM:
> I have been working with Lucene for some time, and I have an interest in developing a Semantic Search solution. I was looking into extending lucene for this. I know this would involve some significant re-engineering of the indexing procedure to support the ability to assign words to nodes within an ontology. In addition the query would need to be modified. I was wondering whether anyone out there had gone down this path? 
>
>
> Chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Using Lucene for Semantic search

Posted by karl wettin <ka...@gmail.com>.
On Thu, 2006-07-20 at 14:19 -0400, Chris Wildgoose wrote:
> I have been working with Lucene for some time, and I have an interest
> in developing a Semantic Search solution. I was looking into extending
> lucene for this. I know this would involve some significant
> re-engineering of the indexing procedure to support the ability to
> assign words to nodes within an ontology. In addition the query would
> need to be modified. I was wondering whether anyone out there had gone
> down this path? 

I'm not sure what you mean, please do develop your paragraph a bit more.
You want to index an RDFS (or so) storage? Want to use Lucene as the
primary storage? Or perhaps you just want to classify your documents in
lots and lots of dimensions? Something else?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org