You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "N. Tucker" <nt...@august20th.com> on 2012/04/06 03:00:01 UTC

schema design question

Apologies if this is a very straightforward schema design problem that
should be fairly obvious, but I'm not seeing a good way to do it.
Let's say I have an index that wants to model Albums and Tracks, and
they all have arbitrary tags attached to them (represented by
multivalue string type fields).  Tracks also have an album id field
which can be used to associate them with an album.  I'd like to
perform a query which shows both Track and Album results, but
suppresses Tracks that are associated with Albums in the result set.

I am tempted to use a "join" here, but I have reservations because it
is my understanding that joins cannot work across shards, and I'm not
sure it's a good idea to limit myself in that way if possible.  Any
suggestions?  Is there a standard solution to this type of problem
where you've got hierarchical items and you don't want children shown
in the same result as the parent?

Re: schema design question

Posted by Lance Norskog <go...@gmail.com>.
(albums:query OR tracks:query) AND NOT(tracks:query -> albums:query)

Is this it? That last clause does sound like a join.

How do you shard? Is it possible to put all associated albums and
tracks in one shard? You can then do a join query against each shard
and merge the output yourself.

On Fri, Apr 6, 2012 at 9:59 AM, Neal Tucker <nt...@august20th.com> wrote:
> Thanks, but I don't want to exclude all tracks that are associated
> with albums, I want to exclude tracks that are associated with albums
> *which match the query* (tracks and their associated albums may have
> different tags).  I don't think your suggestion covers that.
>
> On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson <er...@gmail.com> wrote:
>> I'd consider a field like "associated_with_album", and a
>> field that identifies the kind of record this is "track or album".
>>
>> Then you can form a query like -associated_with_album:true
>> (where '-' is the Lucene or NOT).
>>
>> And then group by kind to get separate groups of albums and
>> tracks.
>>
>> Hope this helps
>> Erick
>>
>> On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
>> <nt...@august20th.com> wrote:
>>> Apologies if this is a very straightforward schema design problem that
>>> should be fairly obvious, but I'm not seeing a good way to do it.
>>> Let's say I have an index that wants to model Albums and Tracks, and
>>> they all have arbitrary tags attached to them (represented by
>>> multivalue string type fields).  Tracks also have an album id field
>>> which can be used to associate them with an album.  I'd like to
>>> perform a query which shows both Track and Album results, but
>>> suppresses Tracks that are associated with Albums in the result set.
>>>
>>> I am tempted to use a "join" here, but I have reservations because it
>>> is my understanding that joins cannot work across shards, and I'm not
>>> sure it's a good idea to limit myself in that way if possible.  Any
>>> suggestions?  Is there a standard solution to this type of problem
>>> where you've got hierarchical items and you don't want children shown
>>> in the same result as the parent?



-- 
Lance Norskog
goksron@gmail.com

Re: schema design question

Posted by Neal Tucker <nt...@august20th.com>.
Thanks, but I don't want to exclude all tracks that are associated
with albums, I want to exclude tracks that are associated with albums
*which match the query* (tracks and their associated albums may have
different tags).  I don't think your suggestion covers that.

On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson <er...@gmail.com> wrote:
> I'd consider a field like "associated_with_album", and a
> field that identifies the kind of record this is "track or album".
>
> Then you can form a query like -associated_with_album:true
> (where '-' is the Lucene or NOT).
>
> And then group by kind to get separate groups of albums and
> tracks.
>
> Hope this helps
> Erick
>
> On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
> <nt...@august20th.com> wrote:
>> Apologies if this is a very straightforward schema design problem that
>> should be fairly obvious, but I'm not seeing a good way to do it.
>> Let's say I have an index that wants to model Albums and Tracks, and
>> they all have arbitrary tags attached to them (represented by
>> multivalue string type fields).  Tracks also have an album id field
>> which can be used to associate them with an album.  I'd like to
>> perform a query which shows both Track and Album results, but
>> suppresses Tracks that are associated with Albums in the result set.
>>
>> I am tempted to use a "join" here, but I have reservations because it
>> is my understanding that joins cannot work across shards, and I'm not
>> sure it's a good idea to limit myself in that way if possible.  Any
>> suggestions?  Is there a standard solution to this type of problem
>> where you've got hierarchical items and you don't want children shown
>> in the same result as the parent?

Re: schema design question

Posted by Erick Erickson <er...@gmail.com>.
I'd consider a field like "associated_with_album", and a
field that identifies the kind of record this is "track or album".

Then you can form a query like -associated_with_album:true
(where '-' is the Lucene or NOT).

And then group by kind to get separate groups of albums and
tracks.

Hope this helps
Erick

On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
<nt...@august20th.com> wrote:
> Apologies if this is a very straightforward schema design problem that
> should be fairly obvious, but I'm not seeing a good way to do it.
> Let's say I have an index that wants to model Albums and Tracks, and
> they all have arbitrary tags attached to them (represented by
> multivalue string type fields).  Tracks also have an album id field
> which can be used to associate them with an album.  I'd like to
> perform a query which shows both Track and Album results, but
> suppresses Tracks that are associated with Albums in the result set.
>
> I am tempted to use a "join" here, but I have reservations because it
> is my understanding that joins cannot work across shards, and I'm not
> sure it's a good idea to limit myself in that way if possible.  Any
> suggestions?  Is there a standard solution to this type of problem
> where you've got hierarchical items and you don't want children shown
> in the same result as the parent?