You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Fielder, Todd Patrick" <tp...@sandia.gov> on 2015/03/30 19:07:30 UTC

general question

Hello,

I'm new to Lucene and am looking for advice.  I'm wanting to search the entire DB (or almost the entire DB) for a keyword.  The users also want to know which field the string occurred in.

I can think of two ways to do this, but neither are ideal and I'm looking for suggestions:

1)      Search the entire DB and add all the text I want to search to a single string and store that string.  Then create a query against that string...Using this approach, is there any way to know which field the match it is in?  Context highlighting is not sufficient, they want the DB column of the match...we are using EclipseLink/JPA (but are considering switching to Hibernate)

2)      Query every single field checking for results with each query (this seems slow (and tedious!))

Any help is greatly appreciated

-Todd

RE: [EXTERNAL] Re: general question

Posted by "Fielder, Todd Patrick" <tp...@sandia.gov>.
I can't get the suggested way to work (either the child scorer or creating a query wrapper), so may end up doing a query on each field, just not sure how expensive that will end up being...

Additional thoughts?

-Todd

-----Original Message-----
From: Sanne Grinovero [mailto:sanne.grinovero@gmail.com] 
Sent: Wednesday, April 01, 2015 5:33 PM
To: java-user@lucene.apache.org
Subject: Re: [EXTERNAL] Re: general question

Hello all,
I don't need to do the same, but the suggestions got me curious.

Why would you consider it more efficient to iterate on the child scorers, rather than performing an independent Query on each field?
(assuming he indexes each {table,column} content in a different field)

Thanks,
Sanne


On 31 March 2015 at 15:07, Michael McCandless <lu...@mikemccandless.com> wrote:
> Indeed LUCENE-6229 is very related here ...
>
> Failing Scorer.getChildren, I think you'd have to make your own query 
> wrappers that wrapped the whole tree (Query, Weight, Scorer) and then 
> kept track, using those wrappers?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Mar 31, 2015 at 9:25 AM, Terry Smith <sh...@gmail.com> wrote:
>> Mike,
>>
>> Your suggestion seems related to LUCENE-6229 
>> <https://issues.apache.org/jira/browse/LUCENE-6229>. My understanding 
>> is that we shouldn't rely on Scorer.getChildren() as you won't always 
>> get all child scorers (just the minimum needed to match) and their 
>> positions aren't necessarily reliable.
>>
>> Should we be suggesting an different approach to Todd's question?
>>
>> --Terry
>>
>>
>> On Mon, Mar 30, 2015 at 6:08 PM, Fielder, Todd Patrick 
>> <tp...@sandia.gov>
>> wrote:
>>
>>> I am attempting to loop through the ChildScorer of the
>>> scorer.getChildren() method inside my collect() call, and it is empty.
>>>
>>> Is there something else I should do or some setup that I am missing?
>>>
>>> Thanks
>>>
>>> @Override
>>>   public void collect(int docID) throws IOException {
>>>
>>>     for(ChildScorer child : scorer.getChildren()){
>>>       System.out.println("relationship: " + child.relationship);
>>>     }
>>>
>>>   }
>>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>> Sent: Monday, March 30, 2015 11:20 AM
>>> To: Lucene Users; tpfield@sandia.go
>>> Subject: [EXTERNAL] Re: general question
>>>
>>> You could do this with a custom Collector which, for every hit 
>>> visits all child scorers asking each one whether it matched the current hit.
>>> Your collector would have to somehow store this information away so 
>>> that once the search is done and you pull the top N hits, you know 
>>> which fields those hits had matched.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Mon, Mar 30, 2015 at 1:07 PM, Fielder, Todd Patrick 
>>> <tp...@sandia.gov>
>>> wrote:
>>> > Hello,
>>> >
>>> > I'm new to Lucene and am looking for advice.  I'm wanting to 
>>> > search the
>>> entire DB (or almost the entire DB) for a keyword.  The users also 
>>> want to know which field the string occurred in.
>>> >
>>> > I can think of two ways to do this, but neither are ideal and I'm
>>> looking for suggestions:
>>> >
>>> > 1)      Search the entire DB and add all the text I want to search to a
>>> single string and store that string.  Then create a query against 
>>> that string...Using this approach, is there any way to know which 
>>> field the match it is in?  Context highlighting is not sufficient, 
>>> they want the DB column of the match...we are using EclipseLink/JPA 
>>> (but are considering switching to Hibernate)
>>> >
>>> > 2)      Query every single field checking for results with each query
>>> (this seems slow (and tedious!))
>>> >
>>> > Any help is greatly appreciated
>>> >
>>> > -Todd
>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [EXTERNAL] Re: general question

Posted by Sanne Grinovero <sa...@gmail.com>.
Hello all,
I don't need to do the same, but the suggestions got me curious.

Why would you consider it more efficient to iterate on the child
scorers, rather than performing an independent Query on each field?
(assuming he indexes each {table,column} content in a different field)

Thanks,
Sanne


On 31 March 2015 at 15:07, Michael McCandless <lu...@mikemccandless.com> wrote:
> Indeed LUCENE-6229 is very related here ...
>
> Failing Scorer.getChildren, I think you'd have to make your own query
> wrappers that wrapped the whole tree (Query, Weight, Scorer) and then
> kept track, using those wrappers?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Mar 31, 2015 at 9:25 AM, Terry Smith <sh...@gmail.com> wrote:
>> Mike,
>>
>> Your suggestion seems related to LUCENE-6229
>> <https://issues.apache.org/jira/browse/LUCENE-6229>. My understanding is
>> that we shouldn't rely on Scorer.getChildren() as you won't always get all
>> child scorers (just the minimum needed to match) and their positions aren't
>> necessarily reliable.
>>
>> Should we be suggesting an different approach to Todd's question?
>>
>> --Terry
>>
>>
>> On Mon, Mar 30, 2015 at 6:08 PM, Fielder, Todd Patrick <tp...@sandia.gov>
>> wrote:
>>
>>> I am attempting to loop through the ChildScorer of the
>>> scorer.getChildren() method inside my collect() call, and it is empty.
>>>
>>> Is there something else I should do or some setup that I am missing?
>>>
>>> Thanks
>>>
>>> @Override
>>>   public void collect(int docID) throws IOException {
>>>
>>>     for(ChildScorer child : scorer.getChildren()){
>>>       System.out.println("relationship: " + child.relationship);
>>>     }
>>>
>>>   }
>>>
>>> -----Original Message-----
>>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>>> Sent: Monday, March 30, 2015 11:20 AM
>>> To: Lucene Users; tpfield@sandia.go
>>> Subject: [EXTERNAL] Re: general question
>>>
>>> You could do this with a custom Collector which, for every hit visits all
>>> child scorers asking each one whether it matched the current hit.
>>> Your collector would have to somehow store this information away so that
>>> once the search is done and you pull the top N hits, you know which fields
>>> those hits had matched.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Mon, Mar 30, 2015 at 1:07 PM, Fielder, Todd Patrick <tp...@sandia.gov>
>>> wrote:
>>> > Hello,
>>> >
>>> > I'm new to Lucene and am looking for advice.  I'm wanting to search the
>>> entire DB (or almost the entire DB) for a keyword.  The users also want to
>>> know which field the string occurred in.
>>> >
>>> > I can think of two ways to do this, but neither are ideal and I'm
>>> looking for suggestions:
>>> >
>>> > 1)      Search the entire DB and add all the text I want to search to a
>>> single string and store that string.  Then create a query against that
>>> string...Using this approach, is there any way to know which field the
>>> match it is in?  Context highlighting is not sufficient, they want the DB
>>> column of the match...we are using EclipseLink/JPA (but are considering
>>> switching to Hibernate)
>>> >
>>> > 2)      Query every single field checking for results with each query
>>> (this seems slow (and tedious!))
>>> >
>>> > Any help is greatly appreciated
>>> >
>>> > -Todd
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [EXTERNAL] Re: general question

Posted by Michael McCandless <lu...@mikemccandless.com>.
Indeed LUCENE-6229 is very related here ...

Failing Scorer.getChildren, I think you'd have to make your own query
wrappers that wrapped the whole tree (Query, Weight, Scorer) and then
kept track, using those wrappers?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Mar 31, 2015 at 9:25 AM, Terry Smith <sh...@gmail.com> wrote:
> Mike,
>
> Your suggestion seems related to LUCENE-6229
> <https://issues.apache.org/jira/browse/LUCENE-6229>. My understanding is
> that we shouldn't rely on Scorer.getChildren() as you won't always get all
> child scorers (just the minimum needed to match) and their positions aren't
> necessarily reliable.
>
> Should we be suggesting an different approach to Todd's question?
>
> --Terry
>
>
> On Mon, Mar 30, 2015 at 6:08 PM, Fielder, Todd Patrick <tp...@sandia.gov>
> wrote:
>
>> I am attempting to loop through the ChildScorer of the
>> scorer.getChildren() method inside my collect() call, and it is empty.
>>
>> Is there something else I should do or some setup that I am missing?
>>
>> Thanks
>>
>> @Override
>>   public void collect(int docID) throws IOException {
>>
>>     for(ChildScorer child : scorer.getChildren()){
>>       System.out.println("relationship: " + child.relationship);
>>     }
>>
>>   }
>>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Monday, March 30, 2015 11:20 AM
>> To: Lucene Users; tpfield@sandia.go
>> Subject: [EXTERNAL] Re: general question
>>
>> You could do this with a custom Collector which, for every hit visits all
>> child scorers asking each one whether it matched the current hit.
>> Your collector would have to somehow store this information away so that
>> once the search is done and you pull the top N hits, you know which fields
>> those hits had matched.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Mar 30, 2015 at 1:07 PM, Fielder, Todd Patrick <tp...@sandia.gov>
>> wrote:
>> > Hello,
>> >
>> > I'm new to Lucene and am looking for advice.  I'm wanting to search the
>> entire DB (or almost the entire DB) for a keyword.  The users also want to
>> know which field the string occurred in.
>> >
>> > I can think of two ways to do this, but neither are ideal and I'm
>> looking for suggestions:
>> >
>> > 1)      Search the entire DB and add all the text I want to search to a
>> single string and store that string.  Then create a query against that
>> string...Using this approach, is there any way to know which field the
>> match it is in?  Context highlighting is not sufficient, they want the DB
>> column of the match...we are using EclipseLink/JPA (but are considering
>> switching to Hibernate)
>> >
>> > 2)      Query every single field checking for results with each query
>> (this seems slow (and tedious!))
>> >
>> > Any help is greatly appreciated
>> >
>> > -Todd
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: [EXTERNAL] Re: general question

Posted by Terry Smith <sh...@gmail.com>.
Mike,

Your suggestion seems related to LUCENE-6229
<https://issues.apache.org/jira/browse/LUCENE-6229>. My understanding is
that we shouldn't rely on Scorer.getChildren() as you won't always get all
child scorers (just the minimum needed to match) and their positions aren't
necessarily reliable.

Should we be suggesting an different approach to Todd's question?

--Terry


On Mon, Mar 30, 2015 at 6:08 PM, Fielder, Todd Patrick <tp...@sandia.gov>
wrote:

> I am attempting to loop through the ChildScorer of the
> scorer.getChildren() method inside my collect() call, and it is empty.
>
> Is there something else I should do or some setup that I am missing?
>
> Thanks
>
> @Override
>   public void collect(int docID) throws IOException {
>
>     for(ChildScorer child : scorer.getChildren()){
>       System.out.println("relationship: " + child.relationship);
>     }
>
>   }
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, March 30, 2015 11:20 AM
> To: Lucene Users; tpfield@sandia.go
> Subject: [EXTERNAL] Re: general question
>
> You could do this with a custom Collector which, for every hit visits all
> child scorers asking each one whether it matched the current hit.
> Your collector would have to somehow store this information away so that
> once the search is done and you pull the top N hits, you know which fields
> those hits had matched.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Mar 30, 2015 at 1:07 PM, Fielder, Todd Patrick <tp...@sandia.gov>
> wrote:
> > Hello,
> >
> > I'm new to Lucene and am looking for advice.  I'm wanting to search the
> entire DB (or almost the entire DB) for a keyword.  The users also want to
> know which field the string occurred in.
> >
> > I can think of two ways to do this, but neither are ideal and I'm
> looking for suggestions:
> >
> > 1)      Search the entire DB and add all the text I want to search to a
> single string and store that string.  Then create a query against that
> string...Using this approach, is there any way to know which field the
> match it is in?  Context highlighting is not sufficient, they want the DB
> column of the match...we are using EclipseLink/JPA (but are considering
> switching to Hibernate)
> >
> > 2)      Query every single field checking for results with each query
> (this seems slow (and tedious!))
> >
> > Any help is greatly appreciated
> >
> > -Todd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: [EXTERNAL] Re: general question

Posted by "Fielder, Todd Patrick" <tp...@sandia.gov>.
I am attempting to loop through the ChildScorer of the scorer.getChildren() method inside my collect() call, and it is empty.

Is there something else I should do or some setup that I am missing?

Thanks

@Override
  public void collect(int docID) throws IOException {
   
    for(ChildScorer child : scorer.getChildren()){
      System.out.println("relationship: " + child.relationship);
    }
   
  }

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: Monday, March 30, 2015 11:20 AM
To: Lucene Users; tpfield@sandia.go
Subject: [EXTERNAL] Re: general question

You could do this with a custom Collector which, for every hit visits all child scorers asking each one whether it matched the current hit.
Your collector would have to somehow store this information away so that once the search is done and you pull the top N hits, you know which fields those hits had matched.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Mar 30, 2015 at 1:07 PM, Fielder, Todd Patrick <tp...@sandia.gov> wrote:
> Hello,
>
> I'm new to Lucene and am looking for advice.  I'm wanting to search the entire DB (or almost the entire DB) for a keyword.  The users also want to know which field the string occurred in.
>
> I can think of two ways to do this, but neither are ideal and I'm looking for suggestions:
>
> 1)      Search the entire DB and add all the text I want to search to a single string and store that string.  Then create a query against that string...Using this approach, is there any way to know which field the match it is in?  Context highlighting is not sufficient, they want the DB column of the match...we are using EclipseLink/JPA (but are considering switching to Hibernate)
>
> 2)      Query every single field checking for results with each query (this seems slow (and tedious!))
>
> Any help is greatly appreciated
>
> -Todd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: general question

Posted by Michael McCandless <lu...@mikemccandless.com>.
You could do this with a custom Collector which, for every hit visits
all child scorers asking each one whether it matched the current hit.
Your collector would have to somehow store this information away so
that once the search is done and you pull the top N hits, you know
which fields those hits had matched.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Mar 30, 2015 at 1:07 PM, Fielder, Todd Patrick
<tp...@sandia.gov> wrote:
> Hello,
>
> I'm new to Lucene and am looking for advice.  I'm wanting to search the entire DB (or almost the entire DB) for a keyword.  The users also want to know which field the string occurred in.
>
> I can think of two ways to do this, but neither are ideal and I'm looking for suggestions:
>
> 1)      Search the entire DB and add all the text I want to search to a single string and store that string.  Then create a query against that string...Using this approach, is there any way to know which field the match it is in?  Context highlighting is not sufficient, they want the DB column of the match...we are using EclipseLink/JPA (but are considering switching to Hibernate)
>
> 2)      Query every single field checking for results with each query (this seems slow (and tedious!))
>
> Any help is greatly appreciated
>
> -Todd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org