You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jack Park <ja...@topicquests.org> on 2013/03/27 17:53:02 UTC

Querying a transitive closure?

This is a question about "isA?"

We want to know if M isA B   isA?(M,B)

For some M, one might be able to look into M to see its type or which
class(es) for which it is a subClass. We're talking taxonomic queries
now.
But, for some M, one might need to ripple up the "transitive closure",
looking at all the super classes, etc, recursively.

It seems unreasonable to do that over HTTP; it seems more reasonable
to grab a core and write a custom isA query handler. But, how do you
do that in a SolrCloud?

Really curious...

Many thanks in advance for ideas.
Jack

Re: Querying a transitive closure?

Posted by Jack Park <ja...@topicquests.org>.
Thank you for this. I had thought about it but reasoned in a naive
way: who would do such a thing?

Doing so makes the query local: once the object has been retrieved, no
further HTTP queries are required. Implementation perhaps entails one
request to fetch the presumed parent in order to harvest its
transitive closure.  I need to think about that.

Many thanks
Jack

On Thu, Mar 28, 2013 at 5:06 AM, Jens Grivolla <j+...@grivolla.net> wrote:
> Exactly, you should usually design your schema to fit your queries, and if
> you need to retrieve all ancestors then you should index all ancestors so
> you can query for them easily.
>
> If that doesn't work for you then either Solr is not the right tool for the
> job, or you need to rethink your schema.
>
> The description of doing lookups within a tree structure doesn't sound at
> all like what you would use a text retrieval engine for, so you might want
> to rethink why you want to use Solr for this. But if that "transitive
> closure" is something you can calculate at indexing time then the correct
> solution is the one Upayavira provided.
>
> If you want people to be able to help you you need to actually describe your
> problem (i.e. what is my data, and what are my queries) instead of diving
> into technical details like "reducing HTTP roundtrips". My guess is that if
> you need to "reduce HTTP roundtrips" you're probably doing it wrong.
>
> HTH,
> Jens
>
>
> On 03/28/2013 08:15 AM, Upayavira wrote:
>>
>> Why don't you index all ancestor classes with the document, as a
>> multivalued field, then you could get it in one hit. Am I missing
>> something?
>>
>> Upayavira
>>
>> On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:
>>>
>>> Hi Otis,
>>> That's essentially the answer I was looking for: each shard (are we
>>> talking master + replicas?) has the plug-in custom query handler.  I
>>> need to build it to find out.
>>>
>>> What I mean is that there is a taxonomy, say one with a single root
>>> for sake of illustration, which grows all the classes, subclasses, and
>>> instances. If I have an object that is somewhere in that taxonomy,
>>> then it has a zigzag chain of parents up that tree (I've seen that
>>> called a "transitive closure". If class B is way up that tree from M,
>>> no telling how many queries it will take to find it.  Hmmm...
>>> recursive ascent, I suppose.
>>>
>>> Many thanks
>>> Jack
>>>
>>> On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
>>> <ot...@gmail.com> wrote:
>>>>
>>>> Hi Jack,
>>>>
>>>> I don't fully understand the exact taxonomy structure and your needs,
>>>> but in terms of reducing the number of HTTP round trips, you can do it
>>>> by writing a custom SearchComponent that, upon getting the initial
>>>> request, does everything "locally", meaning that it talks to the
>>>> local/specified shard before returning to the caller.  In SolrCloud
>>>> setup with N shards, each of these N shards could be queried in such a
>>>> way in parallel, running query/queries on their local shards.
>>>>
>>>> Otis
>>>> --
>>>> Solr & ElasticSearch Support
>>>> http://sematext.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 3:11 PM, Jack Park <ja...@topicquests.org>
>>>> wrote:
>>>>>
>>>>> Hi Otis,
>>>>>
>>>>> I fully expect to grow to SolrCloud -- many shards. For now, it's
>>>>> solo. But, my thinking relates to cloud. I look for ways to reduce the
>>>>> number of HTTP round trips through SolrJ. Maybe you have some ideas?
>>>>>
>>>>> Thanks
>>>>> Jack
>>>>>
>>>>> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
>>>>> <ot...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Jack,
>>>>>>
>>>>>> Is this really about HTTP and Solr vs. SolrCloud or more whether
>>>>>> Solr(Cloud) is the right tool for the job and if so how to structure
>>>>>> the schema and queries to make such lookups efficient?
>>>>>>
>>>>>> Otis
>>>>>> --
>>>>>> Solr & ElasticSearch Support
>>>>>> http://sematext.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park <ja...@topicquests.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> This is a question about "isA?"
>>>>>>>
>>>>>>> We want to know if M isA B   isA?(M,B)
>>>>>>>
>>>>>>> For some M, one might be able to look into M to see its type or which
>>>>>>> class(es) for which it is a subClass. We're talking taxonomic queries
>>>>>>> now.
>>>>>>> But, for some M, one might need to ripple up the "transitive
>>>>>>> closure",
>>>>>>> looking at all the super classes, etc, recursively.
>>>>>>>
>>>>>>> It seems unreasonable to do that over HTTP; it seems more reasonable
>>>>>>> to grab a core and write a custom isA query handler. But, how do you
>>>>>>> do that in a SolrCloud?
>>>>>>>
>>>>>>> Really curious...
>>>>>>>
>>>>>>> Many thanks in advance for ideas.
>>>>>>> Jack
>>
>>
>
>

Re: Querying a transitive closure?

Posted by Jens Grivolla <j+...@grivolla.net>.
Exactly, you should usually design your schema to fit your queries, and 
if you need to retrieve all ancestors then you should index all 
ancestors so you can query for them easily.

If that doesn't work for you then either Solr is not the right tool for 
the job, or you need to rethink your schema.

The description of doing lookups within a tree structure doesn't sound 
at all like what you would use a text retrieval engine for, so you might 
want to rethink why you want to use Solr for this. But if that 
"transitive closure" is something you can calculate at indexing time 
then the correct solution is the one Upayavira provided.

If you want people to be able to help you you need to actually describe 
your problem (i.e. what is my data, and what are my queries) instead of 
diving into technical details like "reducing HTTP roundtrips". My guess 
is that if you need to "reduce HTTP roundtrips" you're probably doing it 
wrong.

HTH,
Jens

On 03/28/2013 08:15 AM, Upayavira wrote:
> Why don't you index all ancestor classes with the document, as a
> multivalued field, then you could get it in one hit. Am I missing
> something?
>
> Upayavira
>
> On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:
>> Hi Otis,
>> That's essentially the answer I was looking for: each shard (are we
>> talking master + replicas?) has the plug-in custom query handler.  I
>> need to build it to find out.
>>
>> What I mean is that there is a taxonomy, say one with a single root
>> for sake of illustration, which grows all the classes, subclasses, and
>> instances. If I have an object that is somewhere in that taxonomy,
>> then it has a zigzag chain of parents up that tree (I've seen that
>> called a "transitive closure". If class B is way up that tree from M,
>> no telling how many queries it will take to find it.  Hmmm...
>> recursive ascent, I suppose.
>>
>> Many thanks
>> Jack
>>
>> On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
>> <ot...@gmail.com> wrote:
>>> Hi Jack,
>>>
>>> I don't fully understand the exact taxonomy structure and your needs,
>>> but in terms of reducing the number of HTTP round trips, you can do it
>>> by writing a custom SearchComponent that, upon getting the initial
>>> request, does everything "locally", meaning that it talks to the
>>> local/specified shard before returning to the caller.  In SolrCloud
>>> setup with N shards, each of these N shards could be queried in such a
>>> way in parallel, running query/queries on their local shards.
>>>
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 3:11 PM, Jack Park <ja...@topicquests.org> wrote:
>>>> Hi Otis,
>>>>
>>>> I fully expect to grow to SolrCloud -- many shards. For now, it's
>>>> solo. But, my thinking relates to cloud. I look for ways to reduce the
>>>> number of HTTP round trips through SolrJ. Maybe you have some ideas?
>>>>
>>>> Thanks
>>>> Jack
>>>>
>>>> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
>>>> <ot...@gmail.com> wrote:
>>>>> Hi Jack,
>>>>>
>>>>> Is this really about HTTP and Solr vs. SolrCloud or more whether
>>>>> Solr(Cloud) is the right tool for the job and if so how to structure
>>>>> the schema and queries to make such lookups efficient?
>>>>>
>>>>> Otis
>>>>> --
>>>>> Solr & ElasticSearch Support
>>>>> http://sematext.com/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park <ja...@topicquests.org> wrote:
>>>>>> This is a question about "isA?"
>>>>>>
>>>>>> We want to know if M isA B   isA?(M,B)
>>>>>>
>>>>>> For some M, one might be able to look into M to see its type or which
>>>>>> class(es) for which it is a subClass. We're talking taxonomic queries
>>>>>> now.
>>>>>> But, for some M, one might need to ripple up the "transitive closure",
>>>>>> looking at all the super classes, etc, recursively.
>>>>>>
>>>>>> It seems unreasonable to do that over HTTP; it seems more reasonable
>>>>>> to grab a core and write a custom isA query handler. But, how do you
>>>>>> do that in a SolrCloud?
>>>>>>
>>>>>> Really curious...
>>>>>>
>>>>>> Many thanks in advance for ideas.
>>>>>> Jack
>



Re: Querying a transitive closure?

Posted by Upayavira <uv...@odoko.co.uk>.
Why don't you index all ancestor classes with the document, as a
multivalued field, then you could get it in one hit. Am I missing
something?

Upayavira

On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote:
> Hi Otis,
> That's essentially the answer I was looking for: each shard (are we
> talking master + replicas?) has the plug-in custom query handler.  I
> need to build it to find out.
> 
> What I mean is that there is a taxonomy, say one with a single root
> for sake of illustration, which grows all the classes, subclasses, and
> instances. If I have an object that is somewhere in that taxonomy,
> then it has a zigzag chain of parents up that tree (I've seen that
> called a "transitive closure". If class B is way up that tree from M,
> no telling how many queries it will take to find it.  Hmmm...
> recursive ascent, I suppose.
> 
> Many thanks
> Jack
> 
> On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
> > Hi Jack,
> >
> > I don't fully understand the exact taxonomy structure and your needs,
> > but in terms of reducing the number of HTTP round trips, you can do it
> > by writing a custom SearchComponent that, upon getting the initial
> > request, does everything "locally", meaning that it talks to the
> > local/specified shard before returning to the caller.  In SolrCloud
> > setup with N shards, each of these N shards could be queried in such a
> > way in parallel, running query/queries on their local shards.
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Wed, Mar 27, 2013 at 3:11 PM, Jack Park <ja...@topicquests.org> wrote:
> >> Hi Otis,
> >>
> >> I fully expect to grow to SolrCloud -- many shards. For now, it's
> >> solo. But, my thinking relates to cloud. I look for ways to reduce the
> >> number of HTTP round trips through SolrJ. Maybe you have some ideas?
> >>
> >> Thanks
> >> Jack
> >>
> >> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
> >> <ot...@gmail.com> wrote:
> >>> Hi Jack,
> >>>
> >>> Is this really about HTTP and Solr vs. SolrCloud or more whether
> >>> Solr(Cloud) is the right tool for the job and if so how to structure
> >>> the schema and queries to make such lookups efficient?
> >>>
> >>> Otis
> >>> --
> >>> Solr & ElasticSearch Support
> >>> http://sematext.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park <ja...@topicquests.org> wrote:
> >>>> This is a question about "isA?"
> >>>>
> >>>> We want to know if M isA B   isA?(M,B)
> >>>>
> >>>> For some M, one might be able to look into M to see its type or which
> >>>> class(es) for which it is a subClass. We're talking taxonomic queries
> >>>> now.
> >>>> But, for some M, one might need to ripple up the "transitive closure",
> >>>> looking at all the super classes, etc, recursively.
> >>>>
> >>>> It seems unreasonable to do that over HTTP; it seems more reasonable
> >>>> to grab a core and write a custom isA query handler. But, how do you
> >>>> do that in a SolrCloud?
> >>>>
> >>>> Really curious...
> >>>>
> >>>> Many thanks in advance for ideas.
> >>>> Jack

Re: Querying a transitive closure?

Posted by Jack Park <ja...@topicquests.org>.
Hi Otis,
That's essentially the answer I was looking for: each shard (are we
talking master + replicas?) has the plug-in custom query handler.  I
need to build it to find out.

What I mean is that there is a taxonomy, say one with a single root
for sake of illustration, which grows all the classes, subclasses, and
instances. If I have an object that is somewhere in that taxonomy,
then it has a zigzag chain of parents up that tree (I've seen that
called a "transitive closure". If class B is way up that tree from M,
no telling how many queries it will take to find it.  Hmmm...
recursive ascent, I suppose.

Many thanks
Jack

On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Hi Jack,
>
> I don't fully understand the exact taxonomy structure and your needs,
> but in terms of reducing the number of HTTP round trips, you can do it
> by writing a custom SearchComponent that, upon getting the initial
> request, does everything "locally", meaning that it talks to the
> local/specified shard before returning to the caller.  In SolrCloud
> setup with N shards, each of these N shards could be queried in such a
> way in parallel, running query/queries on their local shards.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Wed, Mar 27, 2013 at 3:11 PM, Jack Park <ja...@topicquests.org> wrote:
>> Hi Otis,
>>
>> I fully expect to grow to SolrCloud -- many shards. For now, it's
>> solo. But, my thinking relates to cloud. I look for ways to reduce the
>> number of HTTP round trips through SolrJ. Maybe you have some ideas?
>>
>> Thanks
>> Jack
>>
>> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
>> <ot...@gmail.com> wrote:
>>> Hi Jack,
>>>
>>> Is this really about HTTP and Solr vs. SolrCloud or more whether
>>> Solr(Cloud) is the right tool for the job and if so how to structure
>>> the schema and queries to make such lookups efficient?
>>>
>>> Otis
>>> --
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park <ja...@topicquests.org> wrote:
>>>> This is a question about "isA?"
>>>>
>>>> We want to know if M isA B   isA?(M,B)
>>>>
>>>> For some M, one might be able to look into M to see its type or which
>>>> class(es) for which it is a subClass. We're talking taxonomic queries
>>>> now.
>>>> But, for some M, one might need to ripple up the "transitive closure",
>>>> looking at all the super classes, etc, recursively.
>>>>
>>>> It seems unreasonable to do that over HTTP; it seems more reasonable
>>>> to grab a core and write a custom isA query handler. But, how do you
>>>> do that in a SolrCloud?
>>>>
>>>> Really curious...
>>>>
>>>> Many thanks in advance for ideas.
>>>> Jack

Re: Querying a transitive closure?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Jack,

I don't fully understand the exact taxonomy structure and your needs,
but in terms of reducing the number of HTTP round trips, you can do it
by writing a custom SearchComponent that, upon getting the initial
request, does everything "locally", meaning that it talks to the
local/specified shard before returning to the caller.  In SolrCloud
setup with N shards, each of these N shards could be queried in such a
way in parallel, running query/queries on their local shards.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Mar 27, 2013 at 3:11 PM, Jack Park <ja...@topicquests.org> wrote:
> Hi Otis,
>
> I fully expect to grow to SolrCloud -- many shards. For now, it's
> solo. But, my thinking relates to cloud. I look for ways to reduce the
> number of HTTP round trips through SolrJ. Maybe you have some ideas?
>
> Thanks
> Jack
>
> On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
>> Hi Jack,
>>
>> Is this really about HTTP and Solr vs. SolrCloud or more whether
>> Solr(Cloud) is the right tool for the job and if so how to structure
>> the schema and queries to make such lookups efficient?
>>
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park <ja...@topicquests.org> wrote:
>>> This is a question about "isA?"
>>>
>>> We want to know if M isA B   isA?(M,B)
>>>
>>> For some M, one might be able to look into M to see its type or which
>>> class(es) for which it is a subClass. We're talking taxonomic queries
>>> now.
>>> But, for some M, one might need to ripple up the "transitive closure",
>>> looking at all the super classes, etc, recursively.
>>>
>>> It seems unreasonable to do that over HTTP; it seems more reasonable
>>> to grab a core and write a custom isA query handler. But, how do you
>>> do that in a SolrCloud?
>>>
>>> Really curious...
>>>
>>> Many thanks in advance for ideas.
>>> Jack

Re: Querying a transitive closure?

Posted by Jack Park <ja...@topicquests.org>.
Hi Otis,

I fully expect to grow to SolrCloud -- many shards. For now, it's
solo. But, my thinking relates to cloud. I look for ways to reduce the
number of HTTP round trips through SolrJ. Maybe you have some ideas?

Thanks
Jack

On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Hi Jack,
>
> Is this really about HTTP and Solr vs. SolrCloud or more whether
> Solr(Cloud) is the right tool for the job and if so how to structure
> the schema and queries to make such lookups efficient?
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Wed, Mar 27, 2013 at 12:53 PM, Jack Park <ja...@topicquests.org> wrote:
>> This is a question about "isA?"
>>
>> We want to know if M isA B   isA?(M,B)
>>
>> For some M, one might be able to look into M to see its type or which
>> class(es) for which it is a subClass. We're talking taxonomic queries
>> now.
>> But, for some M, one might need to ripple up the "transitive closure",
>> looking at all the super classes, etc, recursively.
>>
>> It seems unreasonable to do that over HTTP; it seems more reasonable
>> to grab a core and write a custom isA query handler. But, how do you
>> do that in a SolrCloud?
>>
>> Really curious...
>>
>> Many thanks in advance for ideas.
>> Jack

Re: Querying a transitive closure?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Jack,

Is this really about HTTP and Solr vs. SolrCloud or more whether
Solr(Cloud) is the right tool for the job and if so how to structure
the schema and queries to make such lookups efficient?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Mar 27, 2013 at 12:53 PM, Jack Park <ja...@topicquests.org> wrote:
> This is a question about "isA?"
>
> We want to know if M isA B   isA?(M,B)
>
> For some M, one might be able to look into M to see its type or which
> class(es) for which it is a subClass. We're talking taxonomic queries
> now.
> But, for some M, one might need to ripple up the "transitive closure",
> looking at all the super classes, etc, recursively.
>
> It seems unreasonable to do that over HTTP; it seems more reasonable
> to grab a core and write a custom isA query handler. But, how do you
> do that in a SolrCloud?
>
> Really curious...
>
> Many thanks in advance for ideas.
> Jack