You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicolas Maisonneuve <n....@gmail.com> on 2005/03/09 18:29:12 UTC

lucene index with structured fields

hy everybody,

I would like use a index with structured search field.

- flat index (lucene type)
searchfield1
searchfield2
searchffield3
...
-structured index 
search1
   search2
   	search4
   search3
	search5

to allow simple extensions of some search features: 
- the query TermQuery("search2", "coco" )  search in search2 and
search4 fields,
- The score depend of the depth where the word is found : A document
where "coco" is found in search4 field has a score lower than a
document with "coco" found in search2

How {do with,hack} lucene to integrated easily this notion of
structured field  ? (no fuzzy methods allow because of the
performance) ?

thanks in advance,

Nicolas Maisonneuve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene index with structured fields

Posted by Nicolas Maisonneuve <n....@gmail.com>.
oups .. 

2- According to the table , a new Query is created with just non zero
boost search field search2:coco^search2_boost
search4:coco^search4_boost


On Wed, 9 Mar 2005 18:53:41 +0100, Nicolas Maisonneuve
<n....@gmail.com> wrote:
> hmm a idea  would be to create dynamically a boost table  (field,
> boostvalue) in a extended version of a TermQuery depending on the
> field:
> ex: StructuredTermQuery ("search2", "coco")
> 
> 1-> Update a boost table according to:
> 
> *  the virtual structure :
> 
> > -structured index
> > search1
> >    search2
> >         search4
> >    search3
> >         search5
> 
> *  the fact the score is decrease with the dept :
> 
> *  the search field argument, here "search2"
> 
> So for this exemple the table would be =  (search1_boost=0,
> search2_boost=1, search3_boost=0, search4_boost=0.7)
> 
> 2- According to the table , a new Query is created with just non zero
> boost search field search2:coco^search1_boost
> search4:coco^search2_boost
> 
> WDYT ?
> 
> other proposals more flexible, clever, faster ?
> 
> nicolas maisonneuve
> 
> On Wed, 9 Mar 2005 18:29:12 +0100, Nicolas Maisonneuve
> <n....@gmail.com> wrote:
> > hy everybody,
> >
> > I would like use a index with structured search field.
> >
> > - flat index (lucene type)
> > searchfield1
> > searchfield2
> > searchffield3
> > ...
> 
> >
> > to allow simple extensions of some search features:
> > - the query TermQuery("search2", "coco" )  search in search2 and
> > search4 fields,
> > - The score depend of the depth where the word is found : A document
> > where "coco" is found in search4 field has a score lower than a
> > document with "coco" found in search2
> >
> > How {do with,hack} lucene to integrated easily this notion of
> > structured field  ? (no fuzzy methods allow because of the
> > performance) ?
> >
> > thanks in advance,
> >
> > Nicolas Maisonneuve
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene index with structured fields

Posted by Nicolas Maisonneuve <n....@gmail.com>.
hmm a idea  would be to create dynamically a boost table  (field,
boostvalue) in a extended version of a TermQuery depending on the
field:
ex: StructuredTermQuery ("search2", "coco")

1-> Update a boost table according to:

*  the virtual structure :

> -structured index
> search1
>    search2
>         search4
>    search3
>         search5

*  the fact the score is decrease with the dept :

*  the search field argument, here "search2"

So for this exemple the table would be =  (search1_boost=0,
search2_boost=1, search3_boost=0, search4_boost=0.7)

2- According to the table , a new Query is created with just non zero
boost search field search2:coco^search1_boost 
search4:coco^search2_boost

WDYT ?

other proposals more flexible, clever, faster ? 

nicolas maisonneuve

On Wed, 9 Mar 2005 18:29:12 +0100, Nicolas Maisonneuve
<n....@gmail.com> wrote:
> hy everybody,
> 
> I would like use a index with structured search field.
> 
> - flat index (lucene type)
> searchfield1
> searchfield2
> searchffield3
> ...

> 
> to allow simple extensions of some search features:
> - the query TermQuery("search2", "coco" )  search in search2 and
> search4 fields,
> - The score depend of the depth where the word is found : A document
> where "coco" is found in search4 field has a score lower than a
> document with "coco" found in search2
> 
> How {do with,hack} lucene to integrated easily this notion of
> structured field  ? (no fuzzy methods allow because of the
> performance) ?
> 
> thanks in advance,
> 
> Nicolas Maisonneuve
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene index with structured fields

Posted by Nicolas Maisonneuve <n....@gmail.com>.
thanks Miles, yes... i missed that the queryParser could transform a query.. 

but i don't like very much my idea: the integration is too high level.
it would be great if a low level of integration will be possible
because of the scorer.

In fact the calcul of score will certainly more complex. Actually it's
for XML searching/ranking,   tf*idf would be computed differently not
in all the index but only in the set of search field, with different
variations depending of the query element location,  the structure of
the xml and the semantic of elements.).

i thought a lower integration of this notions would be needed but
maybe not if just a query extendor and a customised similarity class
are suffisant to do that. i'll try this kind of idea first !

If somebody are another solution, just to compare ... 
 

On Wed, 09 Mar 2005 18:00:21 +0000, Miles Barr
<mi...@runtime-collective.com> wrote:
> On Wed, 2005-03-09 at 18:29 +0100, Nicolas Maisonneuve wrote:
> > I would like use a index with structured search field.
> >
> > - flat index (lucene type)
> > searchfield1
> > searchfield2
> > searchffield3
> > ...
> > -structured index
> > search1
> >    search2
> >       search4
> >    search3
> >       search5
> >
> > to allow simple extensions of some search features:
> > - the query TermQuery("search2", "coco" )  search in search2 and
> > search4 fields,
> > - The score depend of the depth where the word is found : A document
> > where "coco" is found in search4 field has a score lower than a
> > document with "coco" found in search2
> >
> > How {do with,hack} lucene to integrated easily this notion of
> > structured field  ? (no fuzzy methods allow because of the
> > performance) ?
> 
> If you know the structure of the index ahead of time and the weights you
> want to place on the different levels I'd do a query expansion. i.e.
> 
> search2:coco
> 
> would become
> 
> search2:coco^4 OR search4:coco
> 
> but actually creating the query objects rather than generating the
> string to be parsed by the QueryParser.
> 
> --
> Miles Barr <mi...@runtime-collective.com>
> Runtime Collective Ltd.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene index with structured fields

Posted by Miles Barr <mi...@runtime-collective.com>.
On Wed, 2005-03-09 at 18:29 +0100, Nicolas Maisonneuve wrote:
> I would like use a index with structured search field.
> 
> - flat index (lucene type)
> searchfield1
> searchfield2
> searchffield3
> ...
> -structured index 
> search1
>    search2
>    	search4
>    search3
> 	search5
> 
> to allow simple extensions of some search features: 
> - the query TermQuery("search2", "coco" )  search in search2 and
> search4 fields,
> - The score depend of the depth where the word is found : A document
> where "coco" is found in search4 field has a score lower than a
> document with "coco" found in search2
> 
> How {do with,hack} lucene to integrated easily this notion of
> structured field  ? (no fuzzy methods allow because of the
> performance) ?

If you know the structure of the index ahead of time and the weights you
want to place on the different levels I'd do a query expansion. i.e.

search2:coco

would become

search2:coco^4 OR search4:coco

but actually creating the query objects rather than generating the
string to be parsed by the QueryParser.


-- 
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org