You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Livia Hauser <li...@web.de> on 2011/02/07 20:58:55 UTC

How to implement a proximity search using LINES as slop

Hi All,

I use solr 3.x and put excel documents into an index.
I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
Most often than not its better to limit the proxmity to one or two line's, not to X words.
I try to find a NewLine indicator... unsuccessfully.
How can use the number of lines as proximity?

Thx!
___________________________________________________________
WEB.DE DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt mit 
gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to implement a proximity search using LINES as slop

Posted by Doron Cohen <cd...@gmail.com>.
IIUC what you are trying to achieve I think the following could help,
without setting all words in a line to be in the same position:
At indexing, set a position increment of N (e.g. 100) at line start tokens.
This would set a position gap of N between last token of line x to first
token of line x+1.
At search, if using span near queries (or sloppy phrase queries), set the
acceptable span as N, to only accept matches from the same line.
If this makes sense, take a look at Analyzer.getPositionIncrementGap().
If you also want scores in these cases to treat distances based on line
numbers, in your custom Similarity change sloppyFreq(distance) to be aware
of N, e.g. use (1+distance/N) instead of distance for the computation.
HTH,
Doron

On Wed, Feb 9, 2011 at 10:15 AM, Pierre GOSSE <pi...@arisem.com>wrote:

> I've just read about payload, and I'm not sure it would be easy to use that
> feature in calculating distances for SpanQueries. I guess You would have to
> build your own SpanQuery, using payload instead of term position. But it
> would be much lighter for index size if it were possible, indeed.
>
> I hope some lucene expert can give his insight about all this. :)
>
> Pierre
>
> -----Message d'origine-----
> De : Livia Hauser [mailto:livia.hauser@web.de]
> Envoyé : mardi 8 février 2011 22:51
> À : java-user@lucene.apache.org
> Objet : RE: How to implement a proximity search using LINES as slop
>
> Hi Pierre,
>
> many thanks for your idear.
> I had a look to Payloads ... should it possible to store the line number as
> payload?
>
> Best Regards,
> Livia
>
>
> -----Ursprüngliche Nachricht-----
> Von: "Pierre GOSSE" <pi...@arisem.com>
> Gesendet: 08.02.2011 09:37:53
> An: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
> Betreff: RE: How to implement a proximity search using LINES as slop
>
> >Hi Livia,
> >
> >One way of doing this line slope would be to implement a custom tokenizer
> that could tokenize on new line, and split each token into the words it
> contains. I.e. Each word of a line would be seen as being at the same
> position (and having same offset and length as the complete line).
> >
> >I don't think usual queries would respond well with that field, so maybe
> you will have to have two fields, one standard for searching and scoring and
> one custom for filtering your results.
> >
> >But maybe that's overkill, and there's a simpler manner to achieve this
> line slope, I'm quite new to solr. :)
> >
> >Pierre
> >
> >-----Message d'origine-----
> >De : Livia Hauser [mailto:livia.hauser@web.de]
> >Envoyé : lundi 7 février 2011 20:59
> >À : java-user@lucene.apache.org
> >Objet : How to implement a proximity search using LINES as slop
> >
> >Hi All,
> >
> >I use solr 3.x and put excel documents into an index.
> >I have my own query parser and use SpanQueries to provide a proximity
> search feature. It works really good.
> >Most often than not its better to limit the proxmity to one or two line's,
> not to X words.
> >I try to find a NewLine indicator... unsuccessfully.
> >How can use the number of lines as proximity?
> >
> >Thx!
> >___________________________________________________________
> >WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit
> >gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> ___________________________________________________________
> Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
> Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: How to implement a proximity search using LINES as slop

Posted by Pierre GOSSE <pi...@arisem.com>.
I've just read about payload, and I'm not sure it would be easy to use that feature in calculating distances for SpanQueries. I guess You would have to build your own SpanQuery, using payload instead of term position. But it would be much lighter for index size if it were possible, indeed.

I hope some lucene expert can give his insight about all this. :)

Pierre

-----Message d'origine-----
De : Livia Hauser [mailto:livia.hauser@web.de] 
Envoyé : mardi 8 février 2011 22:51
À : java-user@lucene.apache.org
Objet : RE: How to implement a proximity search using LINES as slop

Hi Pierre,

many thanks for your idear.
I had a look to Payloads ... should it possible to store the line number as payload?

Best Regards,
Livia


-----Ursprüngliche Nachricht-----
Von: "Pierre GOSSE" <pi...@arisem.com>
Gesendet: 08.02.2011 09:37:53
An: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
Betreff: RE: How to implement a proximity search using LINES as slop

>Hi Livia,
>
>One way of doing this line slope would be to implement a custom tokenizer that could tokenize on new line, and split each token into the words it contains. I.e. Each word of a line would be seen as being at the same position (and having same offset and length as the complete line). 
>
>I don't think usual queries would respond well with that field, so maybe you will have to have two fields, one standard for searching and scoring and one custom for filtering your results.
>
>But maybe that's overkill, and there's a simpler manner to achieve this line slope, I'm quite new to solr. :)
>
>Pierre
>
>-----Message d'origine-----
>De : Livia Hauser [mailto:livia.hauser@web.de] 
>Envoyé : lundi 7 février 2011 20:59
>À : java-user@lucene.apache.org
>Objet : How to implement a proximity search using LINES as slop
>
>Hi All,
>
>I use solr 3.x and put excel documents into an index.
>I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
>Most often than not its better to limit the proxmity to one or two line's, not to X words.
>I try to find a NewLine indicator... unsuccessfully.
>How can use the number of lines as proximity?
>
>Thx!
>___________________________________________________________
>WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit 
>gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
___________________________________________________________
Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: How to implement a proximity search using LINES as slop

Posted by Livia Hauser <li...@web.de>.
Hi Pierre,

many thanks for your idear.
I had a look to Payloads ... should it possible to store the line number as payload?

Best Regards,
Livia


-----Ursprüngliche Nachricht-----
Von: "Pierre GOSSE" <pi...@arisem.com>
Gesendet: 08.02.2011 09:37:53
An: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
Betreff: RE: How to implement a proximity search using LINES as slop

>Hi Livia,
>
>One way of doing this line slope would be to implement a custom tokenizer that could tokenize on new line, and split each token into the words it contains. I.e. Each word of a line would be seen as being at the same position (and having same offset and length as the complete line). 
>
>I don't think usual queries would respond well with that field, so maybe you will have to have two fields, one standard for searching and scoring and one custom for filtering your results.
>
>But maybe that's overkill, and there's a simpler manner to achieve this line slope, I'm quite new to solr. :)
>
>Pierre
>
>-----Message d'origine-----
>De : Livia Hauser [mailto:livia.hauser@web.de] 
>Envoyé : lundi 7 février 2011 20:59
>À : java-user@lucene.apache.org
>Objet : How to implement a proximity search using LINES as slop
>
>Hi All,
>
>I use solr 3.x and put excel documents into an index.
>I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
>Most often than not its better to limit the proxmity to one or two line's, not to X words.
>I try to find a NewLine indicator... unsuccessfully.
>How can use the number of lines as proximity?
>
>Thx!
>___________________________________________________________
>WEB.DE DSL Doppel-Flat ab 19,99 €/mtl.! Jetzt mit 
>gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
___________________________________________________________
Neu: WEB.DE De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
Jetzt De-Mail-Adresse reservieren: https://produkte.web.de/go/demail02

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: How to implement a proximity search using LINES as slop

Posted by Pierre GOSSE <pi...@arisem.com>.
Hi Livia,

One way of doing this line slope would be to implement a custom tokenizer that could tokenize on new line, and split each token into the words it contains. I.e. Each word of a line would be seen as being at the same position (and having same offset and length as the complete line). 

I don't think usual queries would respond well with that field, so maybe you will have to have two fields, one standard for searching and scoring and one custom for filtering your results.

But maybe that's overkill, and there's a simpler manner to achieve this line slope, I'm quite new to solr. :)

Pierre

-----Message d'origine-----
De : Livia Hauser [mailto:livia.hauser@web.de] 
Envoyé : lundi 7 février 2011 20:59
À : java-user@lucene.apache.org
Objet : How to implement a proximity search using LINES as slop

Hi All,

I use solr 3.x and put excel documents into an index.
I have my own query parser and use SpanQueries to provide a proximity search feature. It works really good.
Most often than not its better to limit the proxmity to one or two line's, not to X words.
I try to find a NewLine indicator... unsuccessfully.
How can use the number of lines as proximity?

Thx!
___________________________________________________________
WEB.DE DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt mit 
gratis Handy-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org