You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gerald Blanck <ge...@barometerit.com> on 2012/11/02 16:32:17 UTC

Nested Join Queries

At a high level, I have a need to be able to execute a query that joins
across cores, and that query during its joining may join back to the
originating core.

Example:
Find all Books written by an Author who has written a best selling Book.

In Solr query syntax
A) against the book core - bestseller:true
B) against the author core - {!join fromIndex=book from=id
to=bookid}bestseller:true
C) against the book core - {!join fromIndex=author from=id
to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true

A - returns results
B - returns results
C - does not return results

Given that A and C use the same core, I started looking for join code that
compares the originating core to the fromIndex and found this
in JoinQParserPlugin (line #159).

        if (info.getReq().getCore() == fromCore) {

          // if this is the same core, use the searcher passed in...
otherwise we could be warming and

          // get an older searcher from the core.

          fromSearcher = searcher;

        } else {

          // This could block if there is a static warming query with a
join in it, and if useColdSearcher is true.

          // Deadlock could result if two cores both had useColdSearcher
and had joins that used eachother.

          // This would be very predictable though (should happen every
time if misconfigured)

          fromRef = fromCore.getSearcher(false, true, null);


          // be careful not to do anything with this searcher that requires
the thread local

          // SolrRequestInfo in a manner that requires the core in the
request to match

          fromSearcher = fromRef.get();

        }

I found that if I were to modify the above code so that it always follows
the logic in the else block, I get the results I expect.

Can someone explain to me why the code is written as it is?  And if we were
to run with only the else block being executed, what type of adverse
impacts we might have?

Does anyone have other ideas on how to solve this issue?

Thanks in advance.
-Gerald

Re: Nested Join Queries

Posted by Erick Erickson <er...@gmail.com>.

Gerald:
Here's the place to start: http://wiki.apache.org/solr/HowToContribute

But the basic setup is
1> create a JIRA login (anyone can)
2> create a JIRA if one doesn't exist
3> generate the patch. From your root level (the one that contains "solr"
and "lucene" dirs) and "svn diff > SOLR-###.patch" wher e### is the Solr
JIRA from <2>
4> upload the patch to the Jira
5> prompt the JIRA occasionally if nobody picks it up <G>...

GIT patches work too, but I'm Git-naive

Happy Hacking!
Erick


On Wed, Nov 14, 2012 at 5:43 PM, Gerald Blanck <
gerald.blanck@barometerit.com> wrote:

> Mikhail-
>
> Let me know how to contribute a test case and I will put it on my to do
> list.
>
> When your many-to-many BlockJoin solution matures I would love to see it.
>
> Thanks.
> -Gerald
>
>
> On Tue, Nov 13, 2012 at 11:52 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
> > Gerald,
> > Nice to hear the the your problem is solved. Can you contribute a test
> > case to reproduce this issue?
> >
> > FWIW, my team successfully deals with Many-to-Many in BlockJoin. It
> works,
> > but solution is a little bit immature yet.
> >
> >
> >
> > On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck <
> > gerald.blanck@barometerit.com> wrote:
> >
> >> Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
> >> leverage.
> >>
> >> - We have modeled our document types as different indexes/cores.
> >> - Our relationships which we are attempting to join across are not
> >> single-parent to many-children relationships.  They are in fact many to
> >> many.
> >> - Additionally, memory usage is a concern.
> >>
> >> FYI.  After making the code change I mentioned in my original post, we
> >> have completed a full test cycle and did not experience any adverse
> impacts
> >> to the change.  And our join query functionality returns the results we
> >> wanted.  I would still be interested in hearing an explanation as to why
> >> the code is written as it is in v4.0.0.
> >>
> >> Thanks.
> >>
> >>
> >>
> >>
> >> On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
> >> mkhludnev@griddynamics.com> wrote:
> >>
> >>> Please find reference materials
> >>>
> >>>
> >>>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
> >>> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
> >>> gerald.blanck@barometerit.com> wrote:
> >>>
> >>>> Thank you.  I've not heard of BlockJoin.  I will look into it today.
> >>>>  Thanks.
> >>>>
> >>>>
> >>>> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
> >>>> mkhludnev@griddynamics.com> wrote:
> >>>>
> >>>>> Replied. pls check maillist.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
> >>>>> mkhludnev@griddynamics.com> wrote:
> >>>>>
> >>>>>> Gerald,
> >>>>>>
> >>>>>> I wonder if you tried to approach BlockJoin for your problem? Can
> you
> >>>>>> afford less frequent updates?
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
> >>>>>> gerald.blanck@barometerit.com> wrote:
> >>>>>>
> >>>>>>> Thank you Erick for your reply.  I understand that search is not an
> >>>>>>> RDBMS.
> >>>>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize
> >>>>>>> and
> >>>>>>> duplicate data.  In fact, I believe our use case is exactly what
> the
> >>>>>>> Solr
> >>>>>>> developers were trying to solve with the addition of the Join
> query.
> >>>>>>>  And
> >>>>>>> while the example I gave illustrates the problem we are solving
> with
> >>>>>>> the
> >>>>>>> Join functionality, it is simplistic in nature compared to what we
> >>>>>>> have in
> >>>>>>> actuality.
> >>>>>>>
> >>>>>>> Am still looking for an answer here if someone can shed some light.
> >>>>>>>  Thanks.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
> >>>>>>> erickerickson@gmail.com>wrote:
> >>>>>>>
> >>>>>>> > I'm going to go a bit sideways on you, partly because I can't
> >>>>>>> answer the
> >>>>>>> > question <G>...
> >>>>>>> >
> >>>>>>> > But, every time I see someone doing what looks like substituting
> >>>>>>> "core" for
> >>>>>>> > "table" and
> >>>>>>> > then trying to use Solr like a DB, I get on my soap-box and
> >>>>>>> preach......
> >>>>>>> >
> >>>>>>> > In this case, consider de-normalizing your DB so you can ask the
> >>>>>>> query in
> >>>>>>> > terms
> >>>>>>> > of search rather than joins. e.g.
> >>>>>>> >
> >>>>>>> > Make each document a combination of the author and the book, with
> >>>>>>> an
> >>>>>>> > additional
> >>>>>>> > field "author_has_written_a_bestseller". Now your query becomes a
> >>>>>>> really
> >>>>>>> > simple
> >>>>>>> > search, "author:name AND author_has_written_a_bestseller:true".
> >>>>>>> True, this
> >>>>>>> > kind
> >>>>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_
> >>>>>>> rather than
> >>>>>>> > a query.
> >>>>>>> > Yes, it replicates data, but unless you have a huge combinatorial
> >>>>>>> > explosion, that's
> >>>>>>> > not a problem.
> >>>>>>> >
> >>>>>>> > And the join functionality isn't called "pseudo" for nothing. It
> >>>>>>> was
> >>>>>>> > written for a specific
> >>>>>>> > use-case. It is often expensive, especially when the field being
> >>>>>>> joined has
> >>>>>>> > many unique
> >>>>>>> > values.
> >>>>>>> >
> >>>>>>> > FWIW,
> >>>>>>> > Erick
> >>>>>>> >
> >>>>>>> >
> >>>>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
> >>>>>>> > gerald.blanck@barometerit.com> wrote:
> >>>>>>> >
> >>>>>>> > > At a high level, I have a need to be able to execute a query
> >>>>>>> that joins
> >>>>>>> > > across cores, and that query during its joining may join back
> to
> >>>>>>> the
> >>>>>>> > > originating core.
> >>>>>>> > >
> >>>>>>> > > Example:
> >>>>>>> > > Find all Books written by an Author who has written a best
> >>>>>>> selling Book.
> >>>>>>> > >
> >>>>>>> > > In Solr query syntax
> >>>>>>> > > A) against the book core - bestseller:true
> >>>>>>> > > B) against the author core - {!join fromIndex=book from=id
> >>>>>>> > > to=bookid}bestseller:true
> >>>>>>> > > C) against the book core - {!join fromIndex=author from=id
> >>>>>>> > > to=authorid}{!join fromIndex=book from=id
> >>>>>>> to=bookid}bestseller:true
> >>>>>>> > >
> >>>>>>> > > A - returns results
> >>>>>>> > > B - returns results
> >>>>>>> > > C - does not return results
> >>>>>>> > >
> >>>>>>> > > Given that A and C use the same core, I started looking for
> join
> >>>>>>> code
> >>>>>>> > that
> >>>>>>> > > compares the originating core to the fromIndex and found this
> >>>>>>> > > in JoinQParserPlugin (line #159).
> >>>>>>> > >
> >>>>>>> > >         if (info.getReq().getCore() == fromCore) {
> >>>>>>> > >
> >>>>>>> > >           // if this is the same core, use the searcher passed
> >>>>>>> in...
> >>>>>>> > > otherwise we could be warming and
> >>>>>>> > >
> >>>>>>> > >           // get an older searcher from the core.
> >>>>>>> > >
> >>>>>>> > >           fromSearcher = searcher;
> >>>>>>> > >
> >>>>>>> > >         } else {
> >>>>>>> > >
> >>>>>>> > >           // This could block if there is a static warming
> query
> >>>>>>> with a
> >>>>>>> > > join in it, and if useColdSearcher is true.
> >>>>>>> > >
> >>>>>>> > >           // Deadlock could result if two cores both had
> >>>>>>> useColdSearcher
> >>>>>>> > > and had joins that used eachother.
> >>>>>>> > >
> >>>>>>> > >           // This would be very predictable though (should
> >>>>>>> happen every
> >>>>>>> > > time if misconfigured)
> >>>>>>> > >
> >>>>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
> >>>>>>> > >
> >>>>>>> > >
> >>>>>>> > >           // be careful not to do anything with this searcher
> >>>>>>> that
> >>>>>>> > requires
> >>>>>>> > > the thread local
> >>>>>>> > >
> >>>>>>> > >           // SolrRequestInfo in a manner that requires the core
> >>>>>>> in the
> >>>>>>> > > request to match
> >>>>>>> > >
> >>>>>>> > >           fromSearcher = fromRef.get();
> >>>>>>> > >
> >>>>>>> > >         }
> >>>>>>> > >
> >>>>>>> > > I found that if I were to modify the above code so that it
> >>>>>>> always follows
> >>>>>>> > > the logic in the else block, I get the results I expect.
> >>>>>>> > >
> >>>>>>> > > Can someone explain to me why the code is written as it is?
>  And
> >>>>>>> if we
> >>>>>>> > were
> >>>>>>> > > to run with only the else block being executed, what type of
> >>>>>>> adverse
> >>>>>>> > > impacts we might have?
> >>>>>>> > >
> >>>>>>> > > Does anyone have other ideas on how to solve this issue?
> >>>>>>> > >
> >>>>>>> > > Thanks in advance.
> >>>>>>> > > -Gerald
> >>>>>>> > >
> >>>>>>> >
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> *Gerald Blanck*
> >>>>>>>
> >>>>>>> baro*m*eter*IT*
> >>>>>>>
> >>>>>>> 1331 Tyler Street NE, Suite 100
> >>>>>>> Minneapolis, MN 55413
> >>>>>>>
> >>>>>>>
> >>>>>>> 612.208.2802
> >>>>>>>
> >>>>>>> gerald.blanck@barometerit.com
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Sincerely yours
> >>>>>> Mikhail Khludnev
> >>>>>> Principal Engineer,
> >>>>>> Grid Dynamics
> >>>>>>
> >>>>>> <http://www.griddynamics.com>
> >>>>>>  <mk...@griddynamics.com>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Sincerely yours
> >>>>> Mikhail Khludnev
> >>>>> Principal Engineer,
> >>>>> Grid Dynamics
> >>>>>
> >>>>> <http://www.griddynamics.com>
> >>>>>  <mk...@griddynamics.com>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> *Gerald Blanck*
> >>>>
> >>>> baro*m*eter*IT*
> >>>>
> >>>> 1331 Tyler Street NE, Suite 100
> >>>> Minneapolis, MN 55413
> >>>>
> >>>>
> >>>> 612.208.2802
> >>>>
> >>>> gerald.blanck@barometerit.com
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>> Principal Engineer,
> >>> Grid Dynamics
> >>>
> >>> <http://www.griddynamics.com>
> >>>  <mk...@griddynamics.com>
> >>>
> >>>
> >>
> >>
> >> --
> >>
> >> *Gerald Blanck*
> >>
> >> baro*m*eter*IT*
> >>
> >> 1331 Tyler Street NE, Suite 100
> >> Minneapolis, MN 55413
> >>
> >>
> >> 612.208.2802
> >>
> >> gerald.blanck@barometerit.com
> >>
> >>
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
> >
> >
>
>
> --
>
> *Gerald Blanck*
>
> baro*m*eter*IT*
>
> 1331 Tyler Street NE, Suite 100
> Minneapolis, MN 55413
>
>
> 612.208.2802
>
> gerald.blanck@barometerit.com
>

Re: Nested Join Queries

Posted by Gerald Blanck <ge...@barometerit.com>.

Mikhail-

Let me know how to contribute a test case and I will put it on my to do
list.

When your many-to-many BlockJoin solution matures I would love to see it.

Thanks.
-Gerald


On Tue, Nov 13, 2012 at 11:52 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Gerald,
> Nice to hear the the your problem is solved. Can you contribute a test
> case to reproduce this issue?
>
> FWIW, my team successfully deals with Many-to-Many in BlockJoin. It works,
> but solution is a little bit immature yet.
>
>
>
> On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck <
> gerald.blanck@barometerit.com> wrote:
>
>> Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
>> leverage.
>>
>> - We have modeled our document types as different indexes/cores.
>> - Our relationships which we are attempting to join across are not
>> single-parent to many-children relationships.  They are in fact many to
>> many.
>> - Additionally, memory usage is a concern.
>>
>> FYI.  After making the code change I mentioned in my original post, we
>> have completed a full test cycle and did not experience any adverse impacts
>> to the change.  And our join query functionality returns the results we
>> wanted.  I would still be interested in hearing an explanation as to why
>> the code is written as it is in v4.0.0.
>>
>> Thanks.
>>
>>
>>
>>
>> On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
>> mkhludnev@griddynamics.com> wrote:
>>
>>> Please find reference materials
>>>
>>>
>>> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>>> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
>>>
>>>
>>>
>>>
>>> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
>>> gerald.blanck@barometerit.com> wrote:
>>>
>>>> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>>>>  Thanks.
>>>>
>>>>
>>>> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
>>>> mkhludnev@griddynamics.com> wrote:
>>>>
>>>>> Replied. pls check maillist.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>>>>> mkhludnev@griddynamics.com> wrote:
>>>>>
>>>>>> Gerald,
>>>>>>
>>>>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>>>>> afford less frequent updates?
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>>>>> gerald.blanck@barometerit.com> wrote:
>>>>>>
>>>>>>> Thank you Erick for your reply.  I understand that search is not an
>>>>>>> RDBMS.
>>>>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize
>>>>>>> and
>>>>>>> duplicate data.  In fact, I believe our use case is exactly what the
>>>>>>> Solr
>>>>>>> developers were trying to solve with the addition of the Join query.
>>>>>>>  And
>>>>>>> while the example I gave illustrates the problem we are solving with
>>>>>>> the
>>>>>>> Join functionality, it is simplistic in nature compared to what we
>>>>>>> have in
>>>>>>> actuality.
>>>>>>>
>>>>>>> Am still looking for an answer here if someone can shed some light.
>>>>>>>  Thanks.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
>>>>>>> erickerickson@gmail.com>wrote:
>>>>>>>
>>>>>>> > I'm going to go a bit sideways on you, partly because I can't
>>>>>>> answer the
>>>>>>> > question <G>...
>>>>>>> >
>>>>>>> > But, every time I see someone doing what looks like substituting
>>>>>>> "core" for
>>>>>>> > "table" and
>>>>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>>>>> preach......
>>>>>>> >
>>>>>>> > In this case, consider de-normalizing your DB so you can ask the
>>>>>>> query in
>>>>>>> > terms
>>>>>>> > of search rather than joins. e.g.
>>>>>>> >
>>>>>>> > Make each document a combination of the author and the book, with
>>>>>>> an
>>>>>>> > additional
>>>>>>> > field "author_has_written_a_bestseller". Now your query becomes a
>>>>>>> really
>>>>>>> > simple
>>>>>>> > search, "author:name AND author_has_written_a_bestseller:true".
>>>>>>> True, this
>>>>>>> > kind
>>>>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_
>>>>>>> rather than
>>>>>>> > a query.
>>>>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>>>>> > explosion, that's
>>>>>>> > not a problem.
>>>>>>> >
>>>>>>> > And the join functionality isn't called "pseudo" for nothing. It
>>>>>>> was
>>>>>>> > written for a specific
>>>>>>> > use-case. It is often expensive, especially when the field being
>>>>>>> joined has
>>>>>>> > many unique
>>>>>>> > values.
>>>>>>> >
>>>>>>> > FWIW,
>>>>>>> > Erick
>>>>>>> >
>>>>>>> >
>>>>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>>>>> > gerald.blanck@barometerit.com> wrote:
>>>>>>> >
>>>>>>> > > At a high level, I have a need to be able to execute a query
>>>>>>> that joins
>>>>>>> > > across cores, and that query during its joining may join back to
>>>>>>> the
>>>>>>> > > originating core.
>>>>>>> > >
>>>>>>> > > Example:
>>>>>>> > > Find all Books written by an Author who has written a best
>>>>>>> selling Book.
>>>>>>> > >
>>>>>>> > > In Solr query syntax
>>>>>>> > > A) against the book core - bestseller:true
>>>>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>>>>> > > to=bookid}bestseller:true
>>>>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>>>>> > > to=authorid}{!join fromIndex=book from=id
>>>>>>> to=bookid}bestseller:true
>>>>>>> > >
>>>>>>> > > A - returns results
>>>>>>> > > B - returns results
>>>>>>> > > C - does not return results
>>>>>>> > >
>>>>>>> > > Given that A and C use the same core, I started looking for join
>>>>>>> code
>>>>>>> > that
>>>>>>> > > compares the originating core to the fromIndex and found this
>>>>>>> > > in JoinQParserPlugin (line #159).
>>>>>>> > >
>>>>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>>>>> > >
>>>>>>> > >           // if this is the same core, use the searcher passed
>>>>>>> in...
>>>>>>> > > otherwise we could be warming and
>>>>>>> > >
>>>>>>> > >           // get an older searcher from the core.
>>>>>>> > >
>>>>>>> > >           fromSearcher = searcher;
>>>>>>> > >
>>>>>>> > >         } else {
>>>>>>> > >
>>>>>>> > >           // This could block if there is a static warming query
>>>>>>> with a
>>>>>>> > > join in it, and if useColdSearcher is true.
>>>>>>> > >
>>>>>>> > >           // Deadlock could result if two cores both had
>>>>>>> useColdSearcher
>>>>>>> > > and had joins that used eachother.
>>>>>>> > >
>>>>>>> > >           // This would be very predictable though (should
>>>>>>> happen every
>>>>>>> > > time if misconfigured)
>>>>>>> > >
>>>>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >           // be careful not to do anything with this searcher
>>>>>>> that
>>>>>>> > requires
>>>>>>> > > the thread local
>>>>>>> > >
>>>>>>> > >           // SolrRequestInfo in a manner that requires the core
>>>>>>> in the
>>>>>>> > > request to match
>>>>>>> > >
>>>>>>> > >           fromSearcher = fromRef.get();
>>>>>>> > >
>>>>>>> > >         }
>>>>>>> > >
>>>>>>> > > I found that if I were to modify the above code so that it
>>>>>>> always follows
>>>>>>> > > the logic in the else block, I get the results I expect.
>>>>>>> > >
>>>>>>> > > Can someone explain to me why the code is written as it is?  And
>>>>>>> if we
>>>>>>> > were
>>>>>>> > > to run with only the else block being executed, what type of
>>>>>>> adverse
>>>>>>> > > impacts we might have?
>>>>>>> > >
>>>>>>> > > Does anyone have other ideas on how to solve this issue?
>>>>>>> > >
>>>>>>> > > Thanks in advance.
>>>>>>> > > -Gerald
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Gerald Blanck*
>>>>>>>
>>>>>>> baro*m*eter*IT*
>>>>>>>
>>>>>>> 1331 Tyler Street NE, Suite 100
>>>>>>> Minneapolis, MN 55413
>>>>>>>
>>>>>>>
>>>>>>> 612.208.2802
>>>>>>>
>>>>>>> gerald.blanck@barometerit.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sincerely yours
>>>>>> Mikhail Khludnev
>>>>>> Principal Engineer,
>>>>>> Grid Dynamics
>>>>>>
>>>>>> <http://www.griddynamics.com>
>>>>>>  <mk...@griddynamics.com>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>>  <mk...@griddynamics.com>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Gerald Blanck*
>>>>
>>>> baro*m*eter*IT*
>>>>
>>>> 1331 Tyler Street NE, Suite 100
>>>> Minneapolis, MN 55413
>>>>
>>>>
>>>> 612.208.2802
>>>>
>>>> gerald.blanck@barometerit.com
>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <mk...@griddynamics.com>
>>>
>>>
>>
>>
>> --
>>
>> *Gerald Blanck*
>>
>> baro*m*eter*IT*
>>
>> 1331 Tyler Street NE, Suite 100
>> Minneapolis, MN 55413
>>
>>
>> 612.208.2802
>>
>> gerald.blanck@barometerit.com
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>


-- 

*Gerald Blanck*

baro*m*eter*IT*

1331 Tyler Street NE, Suite 100
Minneapolis, MN 55413


612.208.2802

gerald.blanck@barometerit.com

Re: Nested Join Queries

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

Gerald,
Nice to hear the the your problem is solved. Can you contribute a test case
to reproduce this issue?

FWIW, my team successfully deals with Many-to-Many in BlockJoin. It works,
but solution is a little bit immature yet.


On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck <
gerald.blanck@barometerit.com> wrote:

> Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
> leverage.
>
> - We have modeled our document types as different indexes/cores.
> - Our relationships which we are attempting to join across are not
> single-parent to many-children relationships.  They are in fact many to
> many.
> - Additionally, memory usage is a concern.
>
> FYI.  After making the code change I mentioned in my original post, we
> have completed a full test cycle and did not experience any adverse impacts
> to the change.  And our join query functionality returns the results we
> wanted.  I would still be interested in hearing an explanation as to why
> the code is written as it is in v4.0.0.
>
> Thanks.
>
>
>
>
> On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Please find reference materials
>>
>>
>> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
>>
>>
>>
>>
>> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
>> gerald.blanck@barometerit.com> wrote:
>>
>>> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>>>  Thanks.
>>>
>>>
>>> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
>>> mkhludnev@griddynamics.com> wrote:
>>>
>>>> Replied. pls check maillist.
>>>>
>>>>
>>>>
>>>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>>>> mkhludnev@griddynamics.com> wrote:
>>>>
>>>>> Gerald,
>>>>>
>>>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>>>> afford less frequent updates?
>>>>>
>>>>>
>>>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>>>> gerald.blanck@barometerit.com> wrote:
>>>>>
>>>>>> Thank you Erick for your reply.  I understand that search is not an
>>>>>> RDBMS.
>>>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize and
>>>>>> duplicate data.  In fact, I believe our use case is exactly what the
>>>>>> Solr
>>>>>> developers were trying to solve with the addition of the Join query.
>>>>>>  And
>>>>>> while the example I gave illustrates the problem we are solving with
>>>>>> the
>>>>>> Join functionality, it is simplistic in nature compared to what we
>>>>>> have in
>>>>>> actuality.
>>>>>>
>>>>>> Am still looking for an answer here if someone can shed some light.
>>>>>>  Thanks.
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
>>>>>> erickerickson@gmail.com>wrote:
>>>>>>
>>>>>> > I'm going to go a bit sideways on you, partly because I can't
>>>>>> answer the
>>>>>> > question <G>...
>>>>>> >
>>>>>> > But, every time I see someone doing what looks like substituting
>>>>>> "core" for
>>>>>> > "table" and
>>>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>>>> preach......
>>>>>> >
>>>>>> > In this case, consider de-normalizing your DB so you can ask the
>>>>>> query in
>>>>>> > terms
>>>>>> > of search rather than joins. e.g.
>>>>>> >
>>>>>> > Make each document a combination of the author and the book, with an
>>>>>> > additional
>>>>>> > field "author_has_written_a_bestseller". Now your query becomes a
>>>>>> really
>>>>>> > simple
>>>>>> > search, "author:name AND author_has_written_a_bestseller:true".
>>>>>> True, this
>>>>>> > kind
>>>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_
>>>>>> rather than
>>>>>> > a query.
>>>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>>>> > explosion, that's
>>>>>> > not a problem.
>>>>>> >
>>>>>> > And the join functionality isn't called "pseudo" for nothing. It was
>>>>>> > written for a specific
>>>>>> > use-case. It is often expensive, especially when the field being
>>>>>> joined has
>>>>>> > many unique
>>>>>> > values.
>>>>>> >
>>>>>> > FWIW,
>>>>>> > Erick
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>>>> > gerald.blanck@barometerit.com> wrote:
>>>>>> >
>>>>>> > > At a high level, I have a need to be able to execute a query that
>>>>>> joins
>>>>>> > > across cores, and that query during its joining may join back to
>>>>>> the
>>>>>> > > originating core.
>>>>>> > >
>>>>>> > > Example:
>>>>>> > > Find all Books written by an Author who has written a best
>>>>>> selling Book.
>>>>>> > >
>>>>>> > > In Solr query syntax
>>>>>> > > A) against the book core - bestseller:true
>>>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>>>> > > to=bookid}bestseller:true
>>>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>>>> > > to=authorid}{!join fromIndex=book from=id
>>>>>> to=bookid}bestseller:true
>>>>>> > >
>>>>>> > > A - returns results
>>>>>> > > B - returns results
>>>>>> > > C - does not return results
>>>>>> > >
>>>>>> > > Given that A and C use the same core, I started looking for join
>>>>>> code
>>>>>> > that
>>>>>> > > compares the originating core to the fromIndex and found this
>>>>>> > > in JoinQParserPlugin (line #159).
>>>>>> > >
>>>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>>>> > >
>>>>>> > >           // if this is the same core, use the searcher passed
>>>>>> in...
>>>>>> > > otherwise we could be warming and
>>>>>> > >
>>>>>> > >           // get an older searcher from the core.
>>>>>> > >
>>>>>> > >           fromSearcher = searcher;
>>>>>> > >
>>>>>> > >         } else {
>>>>>> > >
>>>>>> > >           // This could block if there is a static warming query
>>>>>> with a
>>>>>> > > join in it, and if useColdSearcher is true.
>>>>>> > >
>>>>>> > >           // Deadlock could result if two cores both had
>>>>>> useColdSearcher
>>>>>> > > and had joins that used eachother.
>>>>>> > >
>>>>>> > >           // This would be very predictable though (should happen
>>>>>> every
>>>>>> > > time if misconfigured)
>>>>>> > >
>>>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>>>> > >
>>>>>> > >
>>>>>> > >           // be careful not to do anything with this searcher that
>>>>>> > requires
>>>>>> > > the thread local
>>>>>> > >
>>>>>> > >           // SolrRequestInfo in a manner that requires the core
>>>>>> in the
>>>>>> > > request to match
>>>>>> > >
>>>>>> > >           fromSearcher = fromRef.get();
>>>>>> > >
>>>>>> > >         }
>>>>>> > >
>>>>>> > > I found that if I were to modify the above code so that it always
>>>>>> follows
>>>>>> > > the logic in the else block, I get the results I expect.
>>>>>> > >
>>>>>> > > Can someone explain to me why the code is written as it is?  And
>>>>>> if we
>>>>>> > were
>>>>>> > > to run with only the else block being executed, what type of
>>>>>> adverse
>>>>>> > > impacts we might have?
>>>>>> > >
>>>>>> > > Does anyone have other ideas on how to solve this issue?
>>>>>> > >
>>>>>> > > Thanks in advance.
>>>>>> > > -Gerald
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Gerald Blanck*
>>>>>>
>>>>>> baro*m*eter*IT*
>>>>>>
>>>>>> 1331 Tyler Street NE, Suite 100
>>>>>> Minneapolis, MN 55413
>>>>>>
>>>>>>
>>>>>> 612.208.2802
>>>>>>
>>>>>> gerald.blanck@barometerit.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>>  <mk...@griddynamics.com>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  <mk...@griddynamics.com>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Gerald Blanck*
>>>
>>> baro*m*eter*IT*
>>>
>>> 1331 Tyler Street NE, Suite 100
>>> Minneapolis, MN 55413
>>>
>>>
>>> 612.208.2802
>>>
>>> gerald.blanck@barometerit.com
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>
>>
>
>
> --
>
> *Gerald Blanck*
>
> baro*m*eter*IT*
>
> 1331 Tyler Street NE, Suite 100
> Minneapolis, MN 55413
>
>
> 612.208.2802
>
> gerald.blanck@barometerit.com
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Nested Join Queries

Posted by Gerald Blanck <ge...@barometerit.com>.

Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
leverage.

- We have modeled our document types as different indexes/cores.
- Our relationships which we are attempting to join across are not
single-parent to many-children relationships.  They are in fact many to
many.
- Additionally, memory usage is a concern.

FYI.  After making the code change I mentioned in my original post, we have
completed a full test cycle and did not experience any adverse impacts to
the change.  And our join query functionality returns the results we
wanted.  I would still be interested in hearing an explanation as to why
the code is written as it is in v4.0.0.

Thanks.




On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Please find reference materials
>
>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
>
>
>
>
> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
> gerald.blanck@barometerit.com> wrote:
>
>> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>>  Thanks.
>>
>>
>> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
>> mkhludnev@griddynamics.com> wrote:
>>
>>> Replied. pls check maillist.
>>>
>>>
>>>
>>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>>> mkhludnev@griddynamics.com> wrote:
>>>
>>>> Gerald,
>>>>
>>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>>> afford less frequent updates?
>>>>
>>>>
>>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>>> gerald.blanck@barometerit.com> wrote:
>>>>
>>>>> Thank you Erick for your reply.  I understand that search is not an
>>>>> RDBMS.
>>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize and
>>>>> duplicate data.  In fact, I believe our use case is exactly what the
>>>>> Solr
>>>>> developers were trying to solve with the addition of the Join query.
>>>>>  And
>>>>> while the example I gave illustrates the problem we are solving with
>>>>> the
>>>>> Join functionality, it is simplistic in nature compared to what we
>>>>> have in
>>>>> actuality.
>>>>>
>>>>> Am still looking for an answer here if someone can shed some light.
>>>>>  Thanks.
>>>>>
>>>>>
>>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
>>>>> erickerickson@gmail.com>wrote:
>>>>>
>>>>> > I'm going to go a bit sideways on you, partly because I can't answer
>>>>> the
>>>>> > question <G>...
>>>>> >
>>>>> > But, every time I see someone doing what looks like substituting
>>>>> "core" for
>>>>> > "table" and
>>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>>> preach......
>>>>> >
>>>>> > In this case, consider de-normalizing your DB so you can ask the
>>>>> query in
>>>>> > terms
>>>>> > of search rather than joins. e.g.
>>>>> >
>>>>> > Make each document a combination of the author and the book, with an
>>>>> > additional
>>>>> > field "author_has_written_a_bestseller". Now your query becomes a
>>>>> really
>>>>> > simple
>>>>> > search, "author:name AND author_has_written_a_bestseller:true".
>>>>> True, this
>>>>> > kind
>>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_
>>>>> rather than
>>>>> > a query.
>>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>>> > explosion, that's
>>>>> > not a problem.
>>>>> >
>>>>> > And the join functionality isn't called "pseudo" for nothing. It was
>>>>> > written for a specific
>>>>> > use-case. It is often expensive, especially when the field being
>>>>> joined has
>>>>> > many unique
>>>>> > values.
>>>>> >
>>>>> > FWIW,
>>>>> > Erick
>>>>> >
>>>>> >
>>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>>> > gerald.blanck@barometerit.com> wrote:
>>>>> >
>>>>> > > At a high level, I have a need to be able to execute a query that
>>>>> joins
>>>>> > > across cores, and that query during its joining may join back to
>>>>> the
>>>>> > > originating core.
>>>>> > >
>>>>> > > Example:
>>>>> > > Find all Books written by an Author who has written a best selling
>>>>> Book.
>>>>> > >
>>>>> > > In Solr query syntax
>>>>> > > A) against the book core - bestseller:true
>>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>>> > > to=bookid}bestseller:true
>>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>>> > > to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
>>>>> > >
>>>>> > > A - returns results
>>>>> > > B - returns results
>>>>> > > C - does not return results
>>>>> > >
>>>>> > > Given that A and C use the same core, I started looking for join
>>>>> code
>>>>> > that
>>>>> > > compares the originating core to the fromIndex and found this
>>>>> > > in JoinQParserPlugin (line #159).
>>>>> > >
>>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>>> > >
>>>>> > >           // if this is the same core, use the searcher passed
>>>>> in...
>>>>> > > otherwise we could be warming and
>>>>> > >
>>>>> > >           // get an older searcher from the core.
>>>>> > >
>>>>> > >           fromSearcher = searcher;
>>>>> > >
>>>>> > >         } else {
>>>>> > >
>>>>> > >           // This could block if there is a static warming query
>>>>> with a
>>>>> > > join in it, and if useColdSearcher is true.
>>>>> > >
>>>>> > >           // Deadlock could result if two cores both had
>>>>> useColdSearcher
>>>>> > > and had joins that used eachother.
>>>>> > >
>>>>> > >           // This would be very predictable though (should happen
>>>>> every
>>>>> > > time if misconfigured)
>>>>> > >
>>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>>> > >
>>>>> > >
>>>>> > >           // be careful not to do anything with this searcher that
>>>>> > requires
>>>>> > > the thread local
>>>>> > >
>>>>> > >           // SolrRequestInfo in a manner that requires the core in
>>>>> the
>>>>> > > request to match
>>>>> > >
>>>>> > >           fromSearcher = fromRef.get();
>>>>> > >
>>>>> > >         }
>>>>> > >
>>>>> > > I found that if I were to modify the above code so that it always
>>>>> follows
>>>>> > > the logic in the else block, I get the results I expect.
>>>>> > >
>>>>> > > Can someone explain to me why the code is written as it is?  And
>>>>> if we
>>>>> > were
>>>>> > > to run with only the else block being executed, what type of
>>>>> adverse
>>>>> > > impacts we might have?
>>>>> > >
>>>>> > > Does anyone have other ideas on how to solve this issue?
>>>>> > >
>>>>> > > Thanks in advance.
>>>>> > > -Gerald
>>>>> > >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Gerald Blanck*
>>>>>
>>>>> baro*m*eter*IT*
>>>>>
>>>>> 1331 Tyler Street NE, Suite 100
>>>>> Minneapolis, MN 55413
>>>>>
>>>>>
>>>>> 612.208.2802
>>>>>
>>>>> gerald.blanck@barometerit.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  <mk...@griddynamics.com>
>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <mk...@griddynamics.com>
>>>
>>>
>>
>>
>> --
>>
>> *Gerald Blanck*
>>
>> baro*m*eter*IT*
>>
>> 1331 Tyler Street NE, Suite 100
>> Minneapolis, MN 55413
>>
>>
>> 612.208.2802
>>
>> gerald.blanck@barometerit.com
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>


-- 

*Gerald Blanck*

baro*m*eter*IT*

1331 Tyler Street NE, Suite 100
Minneapolis, MN 55413


612.208.2802

gerald.blanck@barometerit.com

Re: Nested Join Queries

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

Please find reference materials

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
http://blog.griddynamics.com/2012/08/block-join-query-performs.html



On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
gerald.blanck@barometerit.com> wrote:

> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>  Thanks.
>
>
> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Replied. pls check maillist.
>>
>>
>>
>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>> mkhludnev@griddynamics.com> wrote:
>>
>>> Gerald,
>>>
>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>> afford less frequent updates?
>>>
>>>
>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>> gerald.blanck@barometerit.com> wrote:
>>>
>>>> Thank you Erick for your reply.  I understand that search is not an
>>>> RDBMS.
>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize and
>>>> duplicate data.  In fact, I believe our use case is exactly what the
>>>> Solr
>>>> developers were trying to solve with the addition of the Join query.
>>>>  And
>>>> while the example I gave illustrates the problem we are solving with the
>>>> Join functionality, it is simplistic in nature compared to what we have
>>>> in
>>>> actuality.
>>>>
>>>> Am still looking for an answer here if someone can shed some light.
>>>>  Thanks.
>>>>
>>>>
>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <erickerickson@gmail.com
>>>> >wrote:
>>>>
>>>> > I'm going to go a bit sideways on you, partly because I can't answer
>>>> the
>>>> > question <G>...
>>>> >
>>>> > But, every time I see someone doing what looks like substituting
>>>> "core" for
>>>> > "table" and
>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>> preach......
>>>> >
>>>> > In this case, consider de-normalizing your DB so you can ask the
>>>> query in
>>>> > terms
>>>> > of search rather than joins. e.g.
>>>> >
>>>> > Make each document a combination of the author and the book, with an
>>>> > additional
>>>> > field "author_has_written_a_bestseller". Now your query becomes a
>>>> really
>>>> > simple
>>>> > search, "author:name AND author_has_written_a_bestseller:true". True,
>>>> this
>>>> > kind
>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_ rather
>>>> than
>>>> > a query.
>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>> > explosion, that's
>>>> > not a problem.
>>>> >
>>>> > And the join functionality isn't called "pseudo" for nothing. It was
>>>> > written for a specific
>>>> > use-case. It is often expensive, especially when the field being
>>>> joined has
>>>> > many unique
>>>> > values.
>>>> >
>>>> > FWIW,
>>>> > Erick
>>>> >
>>>> >
>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>> > gerald.blanck@barometerit.com> wrote:
>>>> >
>>>> > > At a high level, I have a need to be able to execute a query that
>>>> joins
>>>> > > across cores, and that query during its joining may join back to the
>>>> > > originating core.
>>>> > >
>>>> > > Example:
>>>> > > Find all Books written by an Author who has written a best selling
>>>> Book.
>>>> > >
>>>> > > In Solr query syntax
>>>> > > A) against the book core - bestseller:true
>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>> > > to=bookid}bestseller:true
>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>> > > to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
>>>> > >
>>>> > > A - returns results
>>>> > > B - returns results
>>>> > > C - does not return results
>>>> > >
>>>> > > Given that A and C use the same core, I started looking for join
>>>> code
>>>> > that
>>>> > > compares the originating core to the fromIndex and found this
>>>> > > in JoinQParserPlugin (line #159).
>>>> > >
>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>> > >
>>>> > >           // if this is the same core, use the searcher passed in...
>>>> > > otherwise we could be warming and
>>>> > >
>>>> > >           // get an older searcher from the core.
>>>> > >
>>>> > >           fromSearcher = searcher;
>>>> > >
>>>> > >         } else {
>>>> > >
>>>> > >           // This could block if there is a static warming query
>>>> with a
>>>> > > join in it, and if useColdSearcher is true.
>>>> > >
>>>> > >           // Deadlock could result if two cores both had
>>>> useColdSearcher
>>>> > > and had joins that used eachother.
>>>> > >
>>>> > >           // This would be very predictable though (should happen
>>>> every
>>>> > > time if misconfigured)
>>>> > >
>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>> > >
>>>> > >
>>>> > >           // be careful not to do anything with this searcher that
>>>> > requires
>>>> > > the thread local
>>>> > >
>>>> > >           // SolrRequestInfo in a manner that requires the core in
>>>> the
>>>> > > request to match
>>>> > >
>>>> > >           fromSearcher = fromRef.get();
>>>> > >
>>>> > >         }
>>>> > >
>>>> > > I found that if I were to modify the above code so that it always
>>>> follows
>>>> > > the logic in the else block, I get the results I expect.
>>>> > >
>>>> > > Can someone explain to me why the code is written as it is?  And if
>>>> we
>>>> > were
>>>> > > to run with only the else block being executed, what type of adverse
>>>> > > impacts we might have?
>>>> > >
>>>> > > Does anyone have other ideas on how to solve this issue?
>>>> > >
>>>> > > Thanks in advance.
>>>> > > -Gerald
>>>> > >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Gerald Blanck*
>>>>
>>>> baro*m*eter*IT*
>>>>
>>>> 1331 Tyler Street NE, Suite 100
>>>> Minneapolis, MN 55413
>>>>
>>>>
>>>> 612.208.2802
>>>>
>>>> gerald.blanck@barometerit.com
>>>>
>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <mk...@griddynamics.com>
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>
>>
>
>
> --
>
> *Gerald Blanck*
>
> baro*m*eter*IT*
>
> 1331 Tyler Street NE, Suite 100
> Minneapolis, MN 55413
>
>
> 612.208.2802
>
> gerald.blanck@barometerit.com
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Nested Join Queries

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

Gerald,

I wonder if you tried to approach BlockJoin for your problem? Can you
afford less frequent updates?


On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <gerald.blanck@barometerit.com
> wrote:

> Thank you Erick for your reply.  I understand that search is not an RDBMS.
>  Yes, we do have a huge combinatorial explosion if we de-normalize and
> duplicate data.  In fact, I believe our use case is exactly what the Solr
> developers were trying to solve with the addition of the Join query.  And
> while the example I gave illustrates the problem we are solving with the
> Join functionality, it is simplistic in nature compared to what we have in
> actuality.
>
> Am still looking for an answer here if someone can shed some light.
>  Thanks.
>
>
> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > I'm going to go a bit sideways on you, partly because I can't answer the
> > question <G>...
> >
> > But, every time I see someone doing what looks like substituting "core"
> for
> > "table" and
> > then trying to use Solr like a DB, I get on my soap-box and preach......
> >
> > In this case, consider de-normalizing your DB so you can ask the query in
> > terms
> > of search rather than joins. e.g.
> >
> > Make each document a combination of the author and the book, with an
> > additional
> > field "author_has_written_a_bestseller". Now your query becomes a really
> > simple
> > search, "author:name AND author_has_written_a_bestseller:true". True,
> this
> > kind
> > of approach isn't as flexible as an RDBMS, but it's a _search_ rather
> than
> > a query.
> > Yes, it replicates data, but unless you have a huge combinatorial
> > explosion, that's
> > not a problem.
> >
> > And the join functionality isn't called "pseudo" for nothing. It was
> > written for a specific
> > use-case. It is often expensive, especially when the field being joined
> has
> > many unique
> > values.
> >
> > FWIW,
> > Erick
> >
> >
> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
> > gerald.blanck@barometerit.com> wrote:
> >
> > > At a high level, I have a need to be able to execute a query that joins
> > > across cores, and that query during its joining may join back to the
> > > originating core.
> > >
> > > Example:
> > > Find all Books written by an Author who has written a best selling
> Book.
> > >
> > > In Solr query syntax
> > > A) against the book core - bestseller:true
> > > B) against the author core - {!join fromIndex=book from=id
> > > to=bookid}bestseller:true
> > > C) against the book core - {!join fromIndex=author from=id
> > > to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
> > >
> > > A - returns results
> > > B - returns results
> > > C - does not return results
> > >
> > > Given that A and C use the same core, I started looking for join code
> > that
> > > compares the originating core to the fromIndex and found this
> > > in JoinQParserPlugin (line #159).
> > >
> > >         if (info.getReq().getCore() == fromCore) {
> > >
> > >           // if this is the same core, use the searcher passed in...
> > > otherwise we could be warming and
> > >
> > >           // get an older searcher from the core.
> > >
> > >           fromSearcher = searcher;
> > >
> > >         } else {
> > >
> > >           // This could block if there is a static warming query with a
> > > join in it, and if useColdSearcher is true.
> > >
> > >           // Deadlock could result if two cores both had
> useColdSearcher
> > > and had joins that used eachother.
> > >
> > >           // This would be very predictable though (should happen every
> > > time if misconfigured)
> > >
> > >           fromRef = fromCore.getSearcher(false, true, null);
> > >
> > >
> > >           // be careful not to do anything with this searcher that
> > requires
> > > the thread local
> > >
> > >           // SolrRequestInfo in a manner that requires the core in the
> > > request to match
> > >
> > >           fromSearcher = fromRef.get();
> > >
> > >         }
> > >
> > > I found that if I were to modify the above code so that it always
> follows
> > > the logic in the else block, I get the results I expect.
> > >
> > > Can someone explain to me why the code is written as it is?  And if we
> > were
> > > to run with only the else block being executed, what type of adverse
> > > impacts we might have?
> > >
> > > Does anyone have other ideas on how to solve this issue?
> > >
> > > Thanks in advance.
> > > -Gerald
> > >
> >
>
>
>
> --
>
> *Gerald Blanck*
>
> baro*m*eter*IT*
>
> 1331 Tyler Street NE, Suite 100
> Minneapolis, MN 55413
>
>
> 612.208.2802
>
> gerald.blanck@barometerit.com
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Nested Join Queries

Posted by Gerald Blanck <ge...@barometerit.com>.

Thank you Erick for your reply.  I understand that search is not an RDBMS.
 Yes, we do have a huge combinatorial explosion if we de-normalize and
duplicate data.  In fact, I believe our use case is exactly what the Solr
developers were trying to solve with the addition of the Join query.  And
while the example I gave illustrates the problem we are solving with the
Join functionality, it is simplistic in nature compared to what we have in
actuality.

Am still looking for an answer here if someone can shed some light.  Thanks.


On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <er...@gmail.com>wrote:

> I'm going to go a bit sideways on you, partly because I can't answer the
> question <G>...
>
> But, every time I see someone doing what looks like substituting "core" for
> "table" and
> then trying to use Solr like a DB, I get on my soap-box and preach......
>
> In this case, consider de-normalizing your DB so you can ask the query in
> terms
> of search rather than joins. e.g.
>
> Make each document a combination of the author and the book, with an
> additional
> field "author_has_written_a_bestseller". Now your query becomes a really
> simple
> search, "author:name AND author_has_written_a_bestseller:true". True, this
> kind
> of approach isn't as flexible as an RDBMS, but it's a _search_ rather than
> a query.
> Yes, it replicates data, but unless you have a huge combinatorial
> explosion, that's
> not a problem.
>
> And the join functionality isn't called "pseudo" for nothing. It was
> written for a specific
> use-case. It is often expensive, especially when the field being joined has
> many unique
> values.
>
> FWIW,
> Erick
>
>
> On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
> gerald.blanck@barometerit.com> wrote:
>
> > At a high level, I have a need to be able to execute a query that joins
> > across cores, and that query during its joining may join back to the
> > originating core.
> >
> > Example:
> > Find all Books written by an Author who has written a best selling Book.
> >
> > In Solr query syntax
> > A) against the book core - bestseller:true
> > B) against the author core - {!join fromIndex=book from=id
> > to=bookid}bestseller:true
> > C) against the book core - {!join fromIndex=author from=id
> > to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
> >
> > A - returns results
> > B - returns results
> > C - does not return results
> >
> > Given that A and C use the same core, I started looking for join code
> that
> > compares the originating core to the fromIndex and found this
> > in JoinQParserPlugin (line #159).
> >
> >         if (info.getReq().getCore() == fromCore) {
> >
> >           // if this is the same core, use the searcher passed in...
> > otherwise we could be warming and
> >
> >           // get an older searcher from the core.
> >
> >           fromSearcher = searcher;
> >
> >         } else {
> >
> >           // This could block if there is a static warming query with a
> > join in it, and if useColdSearcher is true.
> >
> >           // Deadlock could result if two cores both had useColdSearcher
> > and had joins that used eachother.
> >
> >           // This would be very predictable though (should happen every
> > time if misconfigured)
> >
> >           fromRef = fromCore.getSearcher(false, true, null);
> >
> >
> >           // be careful not to do anything with this searcher that
> requires
> > the thread local
> >
> >           // SolrRequestInfo in a manner that requires the core in the
> > request to match
> >
> >           fromSearcher = fromRef.get();
> >
> >         }
> >
> > I found that if I were to modify the above code so that it always follows
> > the logic in the else block, I get the results I expect.
> >
> > Can someone explain to me why the code is written as it is?  And if we
> were
> > to run with only the else block being executed, what type of adverse
> > impacts we might have?
> >
> > Does anyone have other ideas on how to solve this issue?
> >
> > Thanks in advance.
> > -Gerald
> >
>



-- 

*Gerald Blanck*

baro*m*eter*IT*

1331 Tyler Street NE, Suite 100
Minneapolis, MN 55413


612.208.2802

gerald.blanck@barometerit.com

Re: Nested Join Queries

Posted by Erick Erickson <er...@gmail.com>.

I'm going to go a bit sideways on you, partly because I can't answer the
question <G>...

But, every time I see someone doing what looks like substituting "core" for
"table" and
then trying to use Solr like a DB, I get on my soap-box and preach......

In this case, consider de-normalizing your DB so you can ask the query in
terms
of search rather than joins. e.g.

Make each document a combination of the author and the book, with an
additional
field "author_has_written_a_bestseller". Now your query becomes a really
simple
search, "author:name AND author_has_written_a_bestseller:true". True, this
kind
of approach isn't as flexible as an RDBMS, but it's a _search_ rather than
a query.
Yes, it replicates data, but unless you have a huge combinatorial
explosion, that's
not a problem.

And the join functionality isn't called "pseudo" for nothing. It was
written for a specific
use-case. It is often expensive, especially when the field being joined has
many unique
values.

FWIW,
Erick

On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
gerald.blanck@barometerit.com> wrote:

> At a high level, I have a need to be able to execute a query that joins
> across cores, and that query during its joining may join back to the
> originating core.
>
> Example:
> Find all Books written by an Author who has written a best selling Book.
>
> In Solr query syntax
> A) against the book core - bestseller:true
> B) against the author core - {!join fromIndex=book from=id
> to=bookid}bestseller:true
> C) against the book core - {!join fromIndex=author from=id
> to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
>
> A - returns results
> B - returns results
> C - does not return results
>
> Given that A and C use the same core, I started looking for join code that
> compares the originating core to the fromIndex and found this
> in JoinQParserPlugin (line #159).
>
>         if (info.getReq().getCore() == fromCore) {
>
>           // if this is the same core, use the searcher passed in...
> otherwise we could be warming and
>
>           // get an older searcher from the core.
>
>           fromSearcher = searcher;
>
>         } else {
>
>           // This could block if there is a static warming query with a
> join in it, and if useColdSearcher is true.
>
>           // Deadlock could result if two cores both had useColdSearcher
> and had joins that used eachother.
>
>           // This would be very predictable though (should happen every
> time if misconfigured)
>
>           fromRef = fromCore.getSearcher(false, true, null);
>
>
>           // be careful not to do anything with this searcher that requires
> the thread local
>
>           // SolrRequestInfo in a manner that requires the core in the
> request to match
>
>           fromSearcher = fromRef.get();
>
>         }
>
> I found that if I were to modify the above code so that it always follows
> the logic in the else block, I get the results I expect.
>
> Can someone explain to me why the code is written as it is?  And if we were
> to run with only the else block being executed, what type of adverse
> impacts we might have?
>
> Does anyone have other ideas on how to solve this issue?
>
> Thanks in advance.
> -Gerald
>