You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Vauthrin, Laurent" <La...@disney.com> on 2009/11/24 18:34:04 UTC

SolrPlugin Guidance

Hello,

 

Our team is trying to make a Solr plugin that needs to parse/decompose a
given query into potentially multiple queries.  The idea is that we're
trying to abstract a complex schema (with different document types) from
the users so that their queries can be simpler.

 

So basically, we're trying to do the following:

 

1.       Decompose query A into query B and query C

2.       Send query B to all shards and plug query B's results into
query C

3.       Send Query C to all shards and pass the results back to the
client

 

I started trying to implement this by subclassing the SearchHandler but
realized that I would not have access to HttpCommComponent.  Then I
tried to replicate the SearchHandler class but realized that I might not
have access to fields I would need in ShardResponse.  So I figured I
should step back and get advice from the mailing list now J.  What is
the best plugin point for decomposing a query into multiple queries so
that all resultant queries can be sent to each shard?

 

Thanks,
Laurent Vauthrin

RE: SolrPlugin Guidance

Posted by Chris Hostetter <ho...@fucit.org>.

: Our QParser plugin will perform queries against directory documents and
: return any file document that has the matching directory id(s).  So the
: plugin transforms the query to something like 
: 
: q:+(directory_id:4 directory:10) +directory_id:(4)
	...
: Currently the parser plugin is doing the lookup queries via the standard
: request handler.  The problem with this approach is that the look up
: queries are going to be analyzed twice.  This only seems to be a problem

...you lost me there.  if you are taking part of the query, and using it 
to get directory ids, and then using those directory ids to build a new 
query, why are you ever passing the output from one query parser to 
another query parser?

You take the input string, you let the LuceneQParser parse it and use it 
to search against "Directory" documents, and then you iterate over hte 
results, and get an ID from them.  You should be using those IDs directly 
to build your new query.

Honestly: even if you were using those ids to build a query string, and 
then pass that string to hte analyzer, i don't see why stemming would 
cause any problems for you if the ids are numbers (like in your example)

-Hoss

RE: SolrPlugin Guidance

Posted by "Vauthrin, Laurent" <La...@disney.com>.

It looks like the SolrQueryParser constructor accepts an analyzer as a
parameter.  That seems to do the trick.  Although feel free to respond
anyway if you have a comment on the approach :)

-----Original Message-----
From:
solr-user-return-30215-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30215-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Vauthrin, Laurent
Sent: Thursday, December 10, 2009 11:44 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance

Ok, looks like I may not be taking the right approach here.  I'm running
a problem.

Let's say a user is looking for all files in any directory 'foo' with a
directory description 'bar' 

q:+directory_name:foo +directory_description:bar

Our QParser plugin will perform queries against directory documents and
return any file document that has the matching directory id(s).  So the
plugin transforms the query to something like 

q:+(directory_id:4 directory:10) +directory_id:(4)

Note: directory_id is only in file documents.  The query above assumes
that two directories had the name 'foo' but only one had the description
'bar'

Currently the parser plugin is doing the lookup queries via the standard
request handler.  The problem with this approach is that the look up
queries are going to be analyzed twice.  This only seems to be a problem
because we're using stemming.  For example, stemming 'franchise' gives
'franchis' and stemming it again gives 'franchi'.  The second stemming
will cause the query not to match anymore.

So basically my questions are:
1. Should I not be passing my lookup queries back to the request
handler, but instead to some lower level component?  If so, which
component would be good to look at?
2. Is there a way to tell the SolrQueryParser not to analyze or a
different way to run the query so that they query analysis won't happen?

Thanks again,
Laurent Vauthrin

-----Original Message-----
From:
solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Vauthrin, Laurent
Sent: Wednesday, December 09, 2009 2:53 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance

Thanks for the response.  I went ahead and gave it a shot.  In my case,
the directory name may not be unique so if I get multiple ids back then
I create a BooleanQuery (Occur.SHOULD) to substitute the directory name
query.  This seems to work at the moment so hopefully that's the right
approach. 

Thanks,
Laurent Vauthrin


-----Original Message-----
From:
solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Chris Hostetter
Sent: Monday, December 07, 2009 12:17 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance


: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
: 
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
: 
: I guess I'm looking for the following feedback:
: - Does this sound crazy?  

it's a little crazy, but not absurd.

: - Is the QParser the right place for this logic?  If so, can I get a 
: little more guidance on how to decompose the queries there (filter 
: queries maybe)?

a QParser could work. (and in general, if you can solve something with a

QParser that's probably for the best, since it allows the most reuse).
but 
exactly how to do it depends on how many results you expect from your 
first query:  if you are going to structure things so they have to 
uniquely id a directory, and you'll have a singleID, then this is 
something that could easily make sense in a QParser (you are essentailly

just rewriting part of the query from string to id -- you just happen to

be using solr as a lookup table for those strings).

but if you plan to support any arbitrary "N" directories, then you may 
need something more complicated ... straight filter queries won't help 
much because you'll want the union instead of hte intersection, so for 
every directoryId you find, use it as a query to get a DocSet and then 
maintain a running union of all those DocSets to use as your final
filter 
(hmm... that may not actually be possible with the QParser API ... i 
haven't look at ti in a while, but for an approach like this you may
beed 
to subclass QueryComponent instead)




-Hoss

RE: SolrPlugin Guidance

Posted by "Vauthrin, Laurent" <La...@disney.com>.

Ok, looks like I may not be taking the right approach here.  I'm running
a problem.

Let's say a user is looking for all files in any directory 'foo' with a
directory description 'bar' 

q:+directory_name:foo +directory_description:bar

Our QParser plugin will perform queries against directory documents and
return any file document that has the matching directory id(s).  So the
plugin transforms the query to something like 

q:+(directory_id:4 directory:10) +directory_id:(4)

Note: directory_id is only in file documents.  The query above assumes
that two directories had the name 'foo' but only one had the description
'bar'

Currently the parser plugin is doing the lookup queries via the standard
request handler.  The problem with this approach is that the look up
queries are going to be analyzed twice.  This only seems to be a problem
because we're using stemming.  For example, stemming 'franchise' gives
'franchis' and stemming it again gives 'franchi'.  The second stemming
will cause the query not to match anymore.

So basically my questions are:
1. Should I not be passing my lookup queries back to the request
handler, but instead to some lower level component?  If so, which
component would be good to look at?
2. Is there a way to tell the SolrQueryParser not to analyze or a
different way to run the query so that they query analysis won't happen?

Thanks again,
Laurent Vauthrin

-----Original Message-----
From:
solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Vauthrin, Laurent
Sent: Wednesday, December 09, 2009 2:53 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance

Thanks for the response.  I went ahead and gave it a shot.  In my case,
the directory name may not be unique so if I get multiple ids back then
I create a BooleanQuery (Occur.SHOULD) to substitute the directory name
query.  This seems to work at the moment so hopefully that's the right
approach. 

Thanks,
Laurent Vauthrin


-----Original Message-----
From:
solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Chris Hostetter
Sent: Monday, December 07, 2009 12:17 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance


: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
: 
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
: 
: I guess I'm looking for the following feedback:
: - Does this sound crazy?  

it's a little crazy, but not absurd.

: - Is the QParser the right place for this logic?  If so, can I get a 
: little more guidance on how to decompose the queries there (filter 
: queries maybe)?

a QParser could work. (and in general, if you can solve something with a

QParser that's probably for the best, since it allows the most reuse).
but 
exactly how to do it depends on how many results you expect from your 
first query:  if you are going to structure things so they have to 
uniquely id a directory, and you'll have a singleID, then this is 
something that could easily make sense in a QParser (you are essentailly

just rewriting part of the query from string to id -- you just happen to

be using solr as a lookup table for those strings).

but if you plan to support any arbitrary "N" directories, then you may 
need something more complicated ... straight filter queries won't help 
much because you'll want the union instead of hte intersection, so for 
every directoryId you find, use it as a query to get a DocSet and then 
maintain a running union of all those DocSets to use as your final
filter 
(hmm... that may not actually be possible with the QParser API ... i 
haven't look at ti in a while, but for an approach like this you may
beed 
to subclass QueryComponent instead)




-Hoss

RE: SolrPlugin Guidance

Posted by "Vauthrin, Laurent" <La...@disney.com>.

Thanks for the response.  I went ahead and gave it a shot.  In my case,
the directory name may not be unique so if I get multiple ids back then
I create a BooleanQuery (Occur.SHOULD) to substitute the directory name
query.  This seems to work at the moment so hopefully that's the right
approach. 

Thanks,
Laurent Vauthrin


-----Original Message-----
From:
solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Chris Hostetter
Sent: Monday, December 07, 2009 12:17 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance


: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
: 
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
: 
: I guess I'm looking for the following feedback:
: - Does this sound crazy?  

it's a little crazy, but not absurd.

: - Is the QParser the right place for this logic?  If so, can I get a 
: little more guidance on how to decompose the queries there (filter 
: queries maybe)?

a QParser could work. (and in general, if you can solve something with a

QParser that's probably for the best, since it allows the most reuse).
but 
exactly how to do it depends on how many results you expect from your 
first query:  if you are going to structure things so they have to 
uniquely id a directory, and you'll have a singleID, then this is 
something that could easily make sense in a QParser (you are essentailly

just rewriting part of the query from string to id -- you just happen to

be using solr as a lookup table for those strings).

but if you plan to support any arbitrary "N" directories, then you may 
need something more complicated ... straight filter queries won't help 
much because you'll want the union instead of hte intersection, so for 
every directoryId you find, use it as a query to get a DocSet and then 
maintain a running union of all those DocSets to use as your final
filter 
(hmm... that may not actually be possible with the QParser API ... i 
haven't look at ti in a while, but for an approach like this you may
beed 
to subclass QueryComponent instead)




-Hoss

RE: SolrPlugin Guidance

Posted by Chris Hostetter <ho...@fucit.org>.

: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
: 
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
: 
: I guess I'm looking for the following feedback:
: - Does this sound crazy?  

it's a little crazy, but not absurd.

: - Is the QParser the right place for this logic?  If so, can I get a 
: little more guidance on how to decompose the queries there (filter 
: queries maybe)?

a QParser could work. (and in general, if you can solve something with a 
QParser that's probably for the best, since it allows the most reuse). but 
exactly how to do it depends on how many results you expect from your 
first query:  if you are going to structure things so they have to 
uniquely id a directory, and you'll have a singleID, then this is 
something that could easily make sense in a QParser (you are essentailly 
just rewriting part of the query from string to id -- you just happen to 
be using solr as a lookup table for those strings).

but if you plan to support any arbitrary "N" directories, then you may 
need something more complicated ... straight filter queries won't help 
much because you'll want the union instead of hte intersection, so for 
every directoryId you find, use it as a query to get a DocSet and then 
maintain a running union of all those DocSets to use as your final filter 
(hmm... that may not actually be possible with the QParser API ... i 
haven't look at ti in a while, but for an approach like this you may beed 
to subclass QueryComponent instead)




-Hoss

RE: SolrPlugin Guidance

Posted by "Vauthrin, Laurent" <La...@disney.com>.

Thanks for the response but I'm still confused.  I don't see how a QParser will create multiple queries that need to be sent to shards sequentially.

Here's a more detailed example of what we're doing:

We're indexing documents in Solr that are somewhat equivalent to files.  We want users to be able to search by a file's directory.  We're shying away from the approach of storing the directory as an attribute because renaming a directory could mean re-indexing tens of thousands of file documents.  There are other file attributes that would have the same effect if they are modified.

So in an effort to avoid many large reindex jobs, we're trying to index both file documents and directory documents.  We don't want search users to have to deal with this implementation detail so we're looking to write a plugin that would do this for them.

e.g. For the following query that looks for a file in a directory:
q=+directory_name:"myDirectory" +file_name:"myFile"

We'd need to decompose the query into the following two queries:
1. q=+directory_name:"myDirectory"&fl=directory_id
2. q=+file_name:"myFile" +directory_id:(results from query #1)

I guess I'm looking for the following feedback:
- Does this sound crazy?  
- Is the QParser the right place for this logic?  If so, can I get a little more guidance on how to decompose the queries there (filter queries maybe)?

Thanks,
Laurent Vauthrin

-----Original Message-----
From: solr-user-return-29672-Laurent.Vauthrin=disney.com@lucene.apache.org [mailto:solr-user-return-29672-Laurent.Vauthrin=disney.com@lucene.apache.org] On Behalf Of Shalin Shekhar Mangar
Sent: Wednesday, November 25, 2009 5:42 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrPlugin Guidance

On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent <
Laurent.Vauthrin@disney.com> wrote:

>
> Our team is trying to make a Solr plugin that needs to parse/decompose a
> given query into potentially multiple queries.  The idea is that we're
> trying to abstract a complex schema (with different document types) from
> the users so that their queries can be simpler.
>
>
>
> So basically, we're trying to do the following:
>
>
>
> 1.       Decompose query A into query B and query C
>
> 2.       Send query B to all shards and plug query B's results into
> query C
>
> 3.       Send Query C to all shards and pass the results back to the
> client
>
>
>
> I started trying to implement this by subclassing the SearchHandler but
> realized that I would not have access to HttpCommComponent.  Then I
> tried to replicate the SearchHandler class but realized that I might not
> have access to fields I would need in ShardResponse.  So I figured I
> should step back and get advice from the mailing list now J.  What is
> the best plugin point for decomposing a query into multiple queries so
> that all resultant queries can be sent to each shard?
>
>
>
All queries are sent to all shards? If yes, it sounds like a job for a
custom QParser.

-- 
Regards,
Shalin Shekhar Mangar.

Re: SolrPlugin Guidance

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent <
Laurent.Vauthrin@disney.com> wrote:

>
> Our team is trying to make a Solr plugin that needs to parse/decompose a
> given query into potentially multiple queries.  The idea is that we're
> trying to abstract a complex schema (with different document types) from
> the users so that their queries can be simpler.
>
>
>
> So basically, we're trying to do the following:
>
>
>
> 1.       Decompose query A into query B and query C
>
> 2.       Send query B to all shards and plug query B's results into
> query C
>
> 3.       Send Query C to all shards and pass the results back to the
> client
>
>
>
> I started trying to implement this by subclassing the SearchHandler but
> realized that I would not have access to HttpCommComponent.  Then I
> tried to replicate the SearchHandler class but realized that I might not
> have access to fields I would need in ShardResponse.  So I figured I
> should step back and get advice from the mailing list now J.  What is
> the best plugin point for decomposing a query into multiple queries so
> that all resultant queries can be sent to each shard?
>
>
>
All queries are sent to all shards? If yes, it sounds like a job for a
custom QParser.

-- 
Regards,
Shalin Shekhar Mangar.