You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by naleiden <na...@gmail.com> on 2012/06/05 18:48:51 UTC

Boost by Nested Query / Join Needed?

Hi,

First off, I'm about a week into all things Solr, and still trying to figure
out how to fit my relational-shaped peg through a denormalized hole. Please
forgive my ignorance below :-D

I have the need store a One-to-N type relationship, and perform a boost a
related field.

Let's say I want to index a number of different types of candy, and also a
customer's preference for each type of candy (which I index/update when a
customer makes a purchase), and then boost by that preference on search.

Here is my paired-down attempt at a denormalized schema:

<! -- Common Fields -- >
<field name="id" type="string" indexed="true" stored="true" required="true"
/>
<field name="type" type="string" indexed="true" stored="true"
required="true" />

<! -- Fields for 'candy' -- > 
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>

<! -- Fields for Customer-Candy Preference ('preference') -- >
<field name="user" type="integer" indexed="true" stored="true">
<field name="candy" type="integer" indexed="true" stored="true">
<field name="weight" type="integer" indexed="true" stored="true"
default="0">

I am indexing 'candy' and 'preferences' separately, and when indexing one, I
leave the fields of the other empty (with the exception of the required 'id'
and 'type').

Ignoring the query score, this is effectively what I'm looking to do in SQL:

SELECT candy.id, candy.name, candy.description FROM candy
LEFT JOIN preference ON (preference.candy = candy.id AND preference.customer
= 'someCustomerID')
// Where some match is made on query against candy.name or candy.description
ORDER BY preference.weight DESC

My questions are:

1.) Am I making any assumptions with respect to what are effectively
different document types in the schema that will not scale well? I don't
think I want to be duplicating each 'candy' entry for every customer, or
maybe that wouldn't be such a big deal in Solr.

2.) Can someone point me in the right direction on how to perform this type
of boost in a Solr query?

Thanks in advance,
Nick


--
View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boost by Nested Query / Join Needed?

Posted by Chris Hostetter <ho...@fucit.org>.

: For posterity, I think we're going to remove 'preference' data from Solr
: indexing and go in the custom Function Query direction with a key-value
: store.

that would be my suggestion.

Assuming you really are modeling candy & users, my guess is the number if 
distinct candies you have is "very large" and hte number of distinct users 
you have is "very large" but the number of prefrences per user is "small 
to medium"

you can probably go very far by just storying your $user->[candy,weight] 
prefrence data in the key+val store of your choice, and then whenever a 
$user does a $search, augment the $search with the boost params based on 
the $user->[candy,weight] prefs.

if you find that you have too many prefs from some users, put a cap on the 
number of prefrences you let influence the query (ie: only the top N 
weights, or only the N most confident weights, or N most recent prefs) or 
aggregate some prefs into category/manufactorur prefs instead of specific 
$candies, etc...

Having said all that: with the new Solr NRT stuff and the /get handler 
real time gets, you can treat another solr core/server as your key+val 
store if you want -- but using straight SolrJoin won't let you take 
advantage of the weight boostings.


-Hoss

Re: Boost by Nested Query / Join Needed?

Posted by jp <gj...@ramco.com>.

HiI have simialr need of boosing the specific records based on the user
profile. We have master table which has details about warehouses and we have
another table where user preferred warehouses exists. When the user searches
for warehouses, we need to boost warehouses which are preferred to the top
of the list followed by other warehouses.Tried to use join in the function
query but gives invalid boolean value error*URL
used*http://localhost:8983/solr/stores/select?q=*%3A*&wt=xml&defType=edismax&query=if(exists({!join
from WAREHOUSE_MASTER to WAREHOUSE_USER_PREF}USER_NAME:ABCD);20;1)*Response*
400	0			xml		*:*					if(exists({!join from WAREHOUSE_MASTER to
WAREHOUSE_USER_PREF}USER_NAME:ABCD);20;1)				edismax					invalid boolean
value: if(exists({!join from WAREHOUSE_MASTER to 	
WAREHOUSE_USER_PREF}USER_NAME:ABCD);20;1)		400	Any inputs to meet the
requirement appreciatedThanks JP



--
View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818p4038552.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boost by Nested Query / Join Needed?

Posted by naleiden <na...@gmail.com>.

For posterity, I think we're going to remove 'preference' data from Solr
indexing and go in the custom Function Query direction with a key-value
store.

--
View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818p3988255.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boost by Nested Query / Join Needed?

Posted by naleiden <na...@gmail.com>.

Thanks for your reply.

I think the number could eventually get very large (~1B) as our
customer-base grows, since each customer could possibly have a preference
for each candy, but currently we're looking at around 50M.

I've looked at the Solr-2272 patch for joins, which looks as though it might
fit the bill, but don't want to ignore an underlying scalability issue if my
schema organization doesn't make sense.

Also, it has recently been brought to my attention that it might be
problematic if preferences are updated frequently, which they will be
('candy' records will not be). If it helps things at all, I never have to do
any *direct* searches (just indirect/join-type referencing) on the
preference data.

Does it make more sense to try to index preference data in a separate core
and use another (non-nested) query to obtain them?

I had thought of trying a nested query with the query Function Query, but I
need the 'candy' id from the initial query, which amounts to join-like
behavior.

Thanks again for your guidance,
-Nick

--
View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818p3988210.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boost by Nested Query / Join Needed?

Posted by Erick Erickson <er...@gmail.com>.

Generally, you just have to bite the bullet and denormalize. Yes, it
really runs counter to to your DB mindset <G>....

But before jumping that way, how many denormalized records are we
talking here? 1M? 100M? 1B?

Solr has (4.x) some join capability, but it makes a lousy general-purpose
database.

You might want to look at Function Queries as a way to boost results
based on numeric fields. If you want a strict ordering, you're looking
at sort, but note that sorts only work on a single-valued field.

Best
Erick

On Tue, Jun 5, 2012 at 12:48 PM, naleiden <na...@gmail.com> wrote:
> Hi,
>
> First off, I'm about a week into all things Solr, and still trying to figure
> out how to fit my relational-shaped peg through a denormalized hole. Please
> forgive my ignorance below :-D
>
> I have the need store a One-to-N type relationship, and perform a boost a
> related field.
>
> Let's say I want to index a number of different types of candy, and also a
> customer's preference for each type of candy (which I index/update when a
> customer makes a purchase), and then boost by that preference on search.
>
> Here is my paired-down attempt at a denormalized schema:
>
> <! -- Common Fields -- >
> <field name="id" type="string" indexed="true" stored="true" required="true"
> />
> <field name="type" type="string" indexed="true" stored="true"
> required="true" />
>
> <! -- Fields for 'candy' -- >
> <field name="name" type="text_general" indexed="true" stored="true"/>
> <field name="description" type="text_general" indexed="true" stored="true"/>
>
> <! -- Fields for Customer-Candy Preference ('preference') -- >
> <field name="user" type="integer" indexed="true" stored="true">
> <field name="candy" type="integer" indexed="true" stored="true">
> <field name="weight" type="integer" indexed="true" stored="true"
> default="0">
>
> I am indexing 'candy' and 'preferences' separately, and when indexing one, I
> leave the fields of the other empty (with the exception of the required 'id'
> and 'type').
>
> Ignoring the query score, this is effectively what I'm looking to do in SQL:
>
> SELECT candy.id, candy.name, candy.description FROM candy
> LEFT JOIN preference ON (preference.candy = candy.id AND preference.customer
> = 'someCustomerID')
> // Where some match is made on query against candy.name or candy.description
> ORDER BY preference.weight DESC
>
> My questions are:
>
> 1.) Am I making any assumptions with respect to what are effectively
> different document types in the schema that will not scale well? I don't
> think I want to be duplicating each 'candy' entry for every customer, or
> maybe that wouldn't be such a big deal in Solr.
>
> 2.) Can someone point me in the right direction on how to perform this type
> of boost in a Solr query?
>
> Thanks in advance,
> Nick
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818.html
> Sent from the Solr - User mailing list archive at Nabble.com.