You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Vauthrin, Laurent" <La...@disney.com> on 2009/11/24 18:34:04 UTC
SolrPlugin Guidance
Hello,
Our team is trying to make a Solr plugin that needs to parse/decompose a
given query into potentially multiple queries. The idea is that we're
trying to abstract a complex schema (with different document types) from
the users so that their queries can be simpler.
So basically, we're trying to do the following:
1. Decompose query A into query B and query C
2. Send query B to all shards and plug query B's results into
query C
3. Send Query C to all shards and pass the results back to the
client
I started trying to implement this by subclassing the SearchHandler but
realized that I would not have access to HttpCommComponent. Then I
tried to replicate the SearchHandler class but realized that I might not
have access to fields I would need in ShardResponse. So I figured I
should step back and get advice from the mailing list now J. What is
the best plugin point for decomposing a query into multiple queries so
that all resultant queries can be sent to each shard?
Thanks,
Laurent Vauthrin
RE: SolrPlugin Guidance
Posted by Chris Hostetter <ho...@fucit.org>.
: Our QParser plugin will perform queries against directory documents and
: return any file document that has the matching directory id(s). So the
: plugin transforms the query to something like
:
: q:+(directory_id:4 directory:10) +directory_id:(4)
...
: Currently the parser plugin is doing the lookup queries via the standard
: request handler. The problem with this approach is that the look up
: queries are going to be analyzed twice. This only seems to be a problem
...you lost me there. if you are taking part of the query, and using it
to get directory ids, and then using those directory ids to build a new
query, why are you ever passing the output from one query parser to
another query parser?
You take the input string, you let the LuceneQParser parse it and use it
to search against "Directory" documents, and then you iterate over hte
results, and get an ID from them. You should be using those IDs directly
to build your new query.
Honestly: even if you were using those ids to build a query string, and
then pass that string to hte analyzer, i don't see why stemming would
cause any problems for you if the ids are numbers (like in your example)
-Hoss
RE: SolrPlugin Guidance
Posted by "Vauthrin, Laurent" <La...@disney.com>.
It looks like the SolrQueryParser constructor accepts an analyzer as a
parameter. That seems to do the trick. Although feel free to respond
anyway if you have a comment on the approach :)
-----Original Message-----
From:
solr-user-return-30215-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30215-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Vauthrin, Laurent
Sent: Thursday, December 10, 2009 11:44 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance
Ok, looks like I may not be taking the right approach here. I'm running
a problem.
Let's say a user is looking for all files in any directory 'foo' with a
directory description 'bar'
q:+directory_name:foo +directory_description:bar
Our QParser plugin will perform queries against directory documents and
return any file document that has the matching directory id(s). So the
plugin transforms the query to something like
q:+(directory_id:4 directory:10) +directory_id:(4)
Note: directory_id is only in file documents. The query above assumes
that two directories had the name 'foo' but only one had the description
'bar'
Currently the parser plugin is doing the lookup queries via the standard
request handler. The problem with this approach is that the look up
queries are going to be analyzed twice. This only seems to be a problem
because we're using stemming. For example, stemming 'franchise' gives
'franchis' and stemming it again gives 'franchi'. The second stemming
will cause the query not to match anymore.
So basically my questions are:
1. Should I not be passing my lookup queries back to the request
handler, but instead to some lower level component? If so, which
component would be good to look at?
2. Is there a way to tell the SolrQueryParser not to analyze or a
different way to run the query so that they query analysis won't happen?
Thanks again,
Laurent Vauthrin
-----Original Message-----
From:
solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Vauthrin, Laurent
Sent: Wednesday, December 09, 2009 2:53 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance
Thanks for the response. I went ahead and gave it a shot. In my case,
the directory name may not be unique so if I get multiple ids back then
I create a BooleanQuery (Occur.SHOULD) to substitute the directory name
query. This seems to work at the moment so hopefully that's the right
approach.
Thanks,
Laurent Vauthrin
-----Original Message-----
From:
solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Chris Hostetter
Sent: Monday, December 07, 2009 12:17 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance
: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
:
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
:
: I guess I'm looking for the following feedback:
: - Does this sound crazy?
it's a little crazy, but not absurd.
: - Is the QParser the right place for this logic? If so, can I get a
: little more guidance on how to decompose the queries there (filter
: queries maybe)?
a QParser could work. (and in general, if you can solve something with a
QParser that's probably for the best, since it allows the most reuse).
but
exactly how to do it depends on how many results you expect from your
first query: if you are going to structure things so they have to
uniquely id a directory, and you'll have a singleID, then this is
something that could easily make sense in a QParser (you are essentailly
just rewriting part of the query from string to id -- you just happen to
be using solr as a lookup table for those strings).
but if you plan to support any arbitrary "N" directories, then you may
need something more complicated ... straight filter queries won't help
much because you'll want the union instead of hte intersection, so for
every directoryId you find, use it as a query to get a DocSet and then
maintain a running union of all those DocSets to use as your final
filter
(hmm... that may not actually be possible with the QParser API ... i
haven't look at ti in a while, but for an approach like this you may
beed
to subclass QueryComponent instead)
-Hoss
RE: SolrPlugin Guidance
Posted by "Vauthrin, Laurent" <La...@disney.com>.
Ok, looks like I may not be taking the right approach here. I'm running
a problem.
Let's say a user is looking for all files in any directory 'foo' with a
directory description 'bar'
q:+directory_name:foo +directory_description:bar
Our QParser plugin will perform queries against directory documents and
return any file document that has the matching directory id(s). So the
plugin transforms the query to something like
q:+(directory_id:4 directory:10) +directory_id:(4)
Note: directory_id is only in file documents. The query above assumes
that two directories had the name 'foo' but only one had the description
'bar'
Currently the parser plugin is doing the lookup queries via the standard
request handler. The problem with this approach is that the look up
queries are going to be analyzed twice. This only seems to be a problem
because we're using stemming. For example, stemming 'franchise' gives
'franchis' and stemming it again gives 'franchi'. The second stemming
will cause the query not to match anymore.
So basically my questions are:
1. Should I not be passing my lookup queries back to the request
handler, but instead to some lower level component? If so, which
component would be good to look at?
2. Is there a way to tell the SolrQueryParser not to analyze or a
different way to run the query so that they query analysis won't happen?
Thanks again,
Laurent Vauthrin
-----Original Message-----
From:
solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30170-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Vauthrin, Laurent
Sent: Wednesday, December 09, 2009 2:53 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance
Thanks for the response. I went ahead and gave it a shot. In my case,
the directory name may not be unique so if I get multiple ids back then
I create a BooleanQuery (Occur.SHOULD) to substitute the directory name
query. This seems to work at the moment so hopefully that's the right
approach.
Thanks,
Laurent Vauthrin
-----Original Message-----
From:
solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Chris Hostetter
Sent: Monday, December 07, 2009 12:17 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance
: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
:
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
:
: I guess I'm looking for the following feedback:
: - Does this sound crazy?
it's a little crazy, but not absurd.
: - Is the QParser the right place for this logic? If so, can I get a
: little more guidance on how to decompose the queries there (filter
: queries maybe)?
a QParser could work. (and in general, if you can solve something with a
QParser that's probably for the best, since it allows the most reuse).
but
exactly how to do it depends on how many results you expect from your
first query: if you are going to structure things so they have to
uniquely id a directory, and you'll have a singleID, then this is
something that could easily make sense in a QParser (you are essentailly
just rewriting part of the query from string to id -- you just happen to
be using solr as a lookup table for those strings).
but if you plan to support any arbitrary "N" directories, then you may
need something more complicated ... straight filter queries won't help
much because you'll want the union instead of hte intersection, so for
every directoryId you find, use it as a query to get a DocSet and then
maintain a running union of all those DocSets to use as your final
filter
(hmm... that may not actually be possible with the QParser API ... i
haven't look at ti in a while, but for an approach like this you may
beed
to subclass QueryComponent instead)
-Hoss
RE: SolrPlugin Guidance
Posted by "Vauthrin, Laurent" <La...@disney.com>.
Thanks for the response. I went ahead and gave it a shot. In my case,
the directory name may not be unique so if I get multiple ids back then
I create a BooleanQuery (Occur.SHOULD) to substitute the directory name
query. This seems to work at the moment so hopefully that's the right
approach.
Thanks,
Laurent Vauthrin
-----Original Message-----
From:
solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache.org
[mailto:solr-user-return-30054-Laurent.Vauthrin=disney.com@lucene.apache
.org] On Behalf Of Chris Hostetter
Sent: Monday, December 07, 2009 12:17 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrPlugin Guidance
: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
:
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
:
: I guess I'm looking for the following feedback:
: - Does this sound crazy?
it's a little crazy, but not absurd.
: - Is the QParser the right place for this logic? If so, can I get a
: little more guidance on how to decompose the queries there (filter
: queries maybe)?
a QParser could work. (and in general, if you can solve something with a
QParser that's probably for the best, since it allows the most reuse).
but
exactly how to do it depends on how many results you expect from your
first query: if you are going to structure things so they have to
uniquely id a directory, and you'll have a singleID, then this is
something that could easily make sense in a QParser (you are essentailly
just rewriting part of the query from string to id -- you just happen to
be using solr as a lookup table for those strings).
but if you plan to support any arbitrary "N" directories, then you may
need something more complicated ... straight filter queries won't help
much because you'll want the union instead of hte intersection, so for
every directoryId you find, use it as a query to get a DocSet and then
maintain a running union of all those DocSets to use as your final
filter
(hmm... that may not actually be possible with the QParser API ... i
haven't look at ti in a while, but for an approach like this you may
beed
to subclass QueryComponent instead)
-Hoss
RE: SolrPlugin Guidance
Posted by Chris Hostetter <ho...@fucit.org>.
: e.g. For the following query that looks for a file in a directory:
: q=+directory_name:"myDirectory" +file_name:"myFile"
:
: We'd need to decompose the query into the following two queries:
: 1. q=+directory_name:"myDirectory"&fl=directory_id
: 2. q=+file_name:"myFile" +directory_id:(results from query #1)
:
: I guess I'm looking for the following feedback:
: - Does this sound crazy?
it's a little crazy, but not absurd.
: - Is the QParser the right place for this logic? If so, can I get a
: little more guidance on how to decompose the queries there (filter
: queries maybe)?
a QParser could work. (and in general, if you can solve something with a
QParser that's probably for the best, since it allows the most reuse). but
exactly how to do it depends on how many results you expect from your
first query: if you are going to structure things so they have to
uniquely id a directory, and you'll have a singleID, then this is
something that could easily make sense in a QParser (you are essentailly
just rewriting part of the query from string to id -- you just happen to
be using solr as a lookup table for those strings).
but if you plan to support any arbitrary "N" directories, then you may
need something more complicated ... straight filter queries won't help
much because you'll want the union instead of hte intersection, so for
every directoryId you find, use it as a query to get a DocSet and then
maintain a running union of all those DocSets to use as your final filter
(hmm... that may not actually be possible with the QParser API ... i
haven't look at ti in a while, but for an approach like this you may beed
to subclass QueryComponent instead)
-Hoss
RE: SolrPlugin Guidance
Posted by "Vauthrin, Laurent" <La...@disney.com>.
Thanks for the response but I'm still confused. I don't see how a QParser will create multiple queries that need to be sent to shards sequentially.
Here's a more detailed example of what we're doing:
We're indexing documents in Solr that are somewhat equivalent to files. We want users to be able to search by a file's directory. We're shying away from the approach of storing the directory as an attribute because renaming a directory could mean re-indexing tens of thousands of file documents. There are other file attributes that would have the same effect if they are modified.
So in an effort to avoid many large reindex jobs, we're trying to index both file documents and directory documents. We don't want search users to have to deal with this implementation detail so we're looking to write a plugin that would do this for them.
e.g. For the following query that looks for a file in a directory:
q=+directory_name:"myDirectory" +file_name:"myFile"
We'd need to decompose the query into the following two queries:
1. q=+directory_name:"myDirectory"&fl=directory_id
2. q=+file_name:"myFile" +directory_id:(results from query #1)
I guess I'm looking for the following feedback:
- Does this sound crazy?
- Is the QParser the right place for this logic? If so, can I get a little more guidance on how to decompose the queries there (filter queries maybe)?
Thanks,
Laurent Vauthrin
-----Original Message-----
From: solr-user-return-29672-Laurent.Vauthrin=disney.com@lucene.apache.org [mailto:solr-user-return-29672-Laurent.Vauthrin=disney.com@lucene.apache.org] On Behalf Of Shalin Shekhar Mangar
Sent: Wednesday, November 25, 2009 5:42 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrPlugin Guidance
On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent <
Laurent.Vauthrin@disney.com> wrote:
>
> Our team is trying to make a Solr plugin that needs to parse/decompose a
> given query into potentially multiple queries. The idea is that we're
> trying to abstract a complex schema (with different document types) from
> the users so that their queries can be simpler.
>
>
>
> So basically, we're trying to do the following:
>
>
>
> 1. Decompose query A into query B and query C
>
> 2. Send query B to all shards and plug query B's results into
> query C
>
> 3. Send Query C to all shards and pass the results back to the
> client
>
>
>
> I started trying to implement this by subclassing the SearchHandler but
> realized that I would not have access to HttpCommComponent. Then I
> tried to replicate the SearchHandler class but realized that I might not
> have access to fields I would need in ShardResponse. So I figured I
> should step back and get advice from the mailing list now J. What is
> the best plugin point for decomposing a query into multiple queries so
> that all resultant queries can be sent to each shard?
>
>
>
All queries are sent to all shards? If yes, it sounds like a job for a
custom QParser.
--
Regards,
Shalin Shekhar Mangar.
Re: SolrPlugin Guidance
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent <
Laurent.Vauthrin@disney.com> wrote:
>
> Our team is trying to make a Solr plugin that needs to parse/decompose a
> given query into potentially multiple queries. The idea is that we're
> trying to abstract a complex schema (with different document types) from
> the users so that their queries can be simpler.
>
>
>
> So basically, we're trying to do the following:
>
>
>
> 1. Decompose query A into query B and query C
>
> 2. Send query B to all shards and plug query B's results into
> query C
>
> 3. Send Query C to all shards and pass the results back to the
> client
>
>
>
> I started trying to implement this by subclassing the SearchHandler but
> realized that I would not have access to HttpCommComponent. Then I
> tried to replicate the SearchHandler class but realized that I might not
> have access to fields I would need in ShardResponse. So I figured I
> should step back and get advice from the mailing list now J. What is
> the best plugin point for decomposing a query into multiple queries so
> that all resultant queries can be sent to each shard?
>
>
>
All queries are sent to all shards? If yes, it sounds like a job for a
custom QParser.
--
Regards,
Shalin Shekhar Mangar.