You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicolas Maisonneuve <n....@gmail.com> on 2005/03/09 18:29:12 UTC
lucene index with structured fields
hy everybody,
I would like use a index with structured search field.
- flat index (lucene type)
searchfield1
searchfield2
searchffield3
...
-structured index
search1
search2
search4
search3
search5
to allow simple extensions of some search features:
- the query TermQuery("search2", "coco" ) search in search2 and
search4 fields,
- The score depend of the depth where the word is found : A document
where "coco" is found in search4 field has a score lower than a
document with "coco" found in search2
How {do with,hack} lucene to integrated easily this notion of
structured field ? (no fuzzy methods allow because of the
performance) ?
thanks in advance,
Nicolas Maisonneuve
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene index with structured fields
Posted by Nicolas Maisonneuve <n....@gmail.com>.
oups ..
2- According to the table , a new Query is created with just non zero
boost search field search2:coco^search2_boost
search4:coco^search4_boost
On Wed, 9 Mar 2005 18:53:41 +0100, Nicolas Maisonneuve
<n....@gmail.com> wrote:
> hmm a idea would be to create dynamically a boost table (field,
> boostvalue) in a extended version of a TermQuery depending on the
> field:
> ex: StructuredTermQuery ("search2", "coco")
>
> 1-> Update a boost table according to:
>
> * the virtual structure :
>
> > -structured index
> > search1
> > search2
> > search4
> > search3
> > search5
>
> * the fact the score is decrease with the dept :
>
> * the search field argument, here "search2"
>
> So for this exemple the table would be = (search1_boost=0,
> search2_boost=1, search3_boost=0, search4_boost=0.7)
>
> 2- According to the table , a new Query is created with just non zero
> boost search field search2:coco^search1_boost
> search4:coco^search2_boost
>
> WDYT ?
>
> other proposals more flexible, clever, faster ?
>
> nicolas maisonneuve
>
> On Wed, 9 Mar 2005 18:29:12 +0100, Nicolas Maisonneuve
> <n....@gmail.com> wrote:
> > hy everybody,
> >
> > I would like use a index with structured search field.
> >
> > - flat index (lucene type)
> > searchfield1
> > searchfield2
> > searchffield3
> > ...
>
> >
> > to allow simple extensions of some search features:
> > - the query TermQuery("search2", "coco" ) search in search2 and
> > search4 fields,
> > - The score depend of the depth where the word is found : A document
> > where "coco" is found in search4 field has a score lower than a
> > document with "coco" found in search2
> >
> > How {do with,hack} lucene to integrated easily this notion of
> > structured field ? (no fuzzy methods allow because of the
> > performance) ?
> >
> > thanks in advance,
> >
> > Nicolas Maisonneuve
> >
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene index with structured fields
Posted by Nicolas Maisonneuve <n....@gmail.com>.
hmm a idea would be to create dynamically a boost table (field,
boostvalue) in a extended version of a TermQuery depending on the
field:
ex: StructuredTermQuery ("search2", "coco")
1-> Update a boost table according to:
* the virtual structure :
> -structured index
> search1
> search2
> search4
> search3
> search5
* the fact the score is decrease with the dept :
* the search field argument, here "search2"
So for this exemple the table would be = (search1_boost=0,
search2_boost=1, search3_boost=0, search4_boost=0.7)
2- According to the table , a new Query is created with just non zero
boost search field search2:coco^search1_boost
search4:coco^search2_boost
WDYT ?
other proposals more flexible, clever, faster ?
nicolas maisonneuve
On Wed, 9 Mar 2005 18:29:12 +0100, Nicolas Maisonneuve
<n....@gmail.com> wrote:
> hy everybody,
>
> I would like use a index with structured search field.
>
> - flat index (lucene type)
> searchfield1
> searchfield2
> searchffield3
> ...
>
> to allow simple extensions of some search features:
> - the query TermQuery("search2", "coco" ) search in search2 and
> search4 fields,
> - The score depend of the depth where the word is found : A document
> where "coco" is found in search4 field has a score lower than a
> document with "coco" found in search2
>
> How {do with,hack} lucene to integrated easily this notion of
> structured field ? (no fuzzy methods allow because of the
> performance) ?
>
> thanks in advance,
>
> Nicolas Maisonneuve
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene index with structured fields
Posted by Nicolas Maisonneuve <n....@gmail.com>.
thanks Miles, yes... i missed that the queryParser could transform a query..
but i don't like very much my idea: the integration is too high level.
it would be great if a low level of integration will be possible
because of the scorer.
In fact the calcul of score will certainly more complex. Actually it's
for XML searching/ranking, tf*idf would be computed differently not
in all the index but only in the set of search field, with different
variations depending of the query element location, the structure of
the xml and the semantic of elements.).
i thought a lower integration of this notions would be needed but
maybe not if just a query extendor and a customised similarity class
are suffisant to do that. i'll try this kind of idea first !
If somebody are another solution, just to compare ...
On Wed, 09 Mar 2005 18:00:21 +0000, Miles Barr
<mi...@runtime-collective.com> wrote:
> On Wed, 2005-03-09 at 18:29 +0100, Nicolas Maisonneuve wrote:
> > I would like use a index with structured search field.
> >
> > - flat index (lucene type)
> > searchfield1
> > searchfield2
> > searchffield3
> > ...
> > -structured index
> > search1
> > search2
> > search4
> > search3
> > search5
> >
> > to allow simple extensions of some search features:
> > - the query TermQuery("search2", "coco" ) search in search2 and
> > search4 fields,
> > - The score depend of the depth where the word is found : A document
> > where "coco" is found in search4 field has a score lower than a
> > document with "coco" found in search2
> >
> > How {do with,hack} lucene to integrated easily this notion of
> > structured field ? (no fuzzy methods allow because of the
> > performance) ?
>
> If you know the structure of the index ahead of time and the weights you
> want to place on the different levels I'd do a query expansion. i.e.
>
> search2:coco
>
> would become
>
> search2:coco^4 OR search4:coco
>
> but actually creating the query objects rather than generating the
> string to be parsed by the QueryParser.
>
> --
> Miles Barr <mi...@runtime-collective.com>
> Runtime Collective Ltd.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene index with structured fields
Posted by Miles Barr <mi...@runtime-collective.com>.
On Wed, 2005-03-09 at 18:29 +0100, Nicolas Maisonneuve wrote:
> I would like use a index with structured search field.
>
> - flat index (lucene type)
> searchfield1
> searchfield2
> searchffield3
> ...
> -structured index
> search1
> search2
> search4
> search3
> search5
>
> to allow simple extensions of some search features:
> - the query TermQuery("search2", "coco" ) search in search2 and
> search4 fields,
> - The score depend of the depth where the word is found : A document
> where "coco" is found in search4 field has a score lower than a
> document with "coco" found in search2
>
> How {do with,hack} lucene to integrated easily this notion of
> structured field ? (no fuzzy methods allow because of the
> performance) ?
If you know the structure of the index ahead of time and the weights you
want to place on the different levels I'd do a query expansion. i.e.
search2:coco
would become
search2:coco^4 OR search4:coco
but actually creating the query objects rather than generating the
string to be parsed by the QueryParser.
--
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org