You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Sharma, Siddharth" <Si...@Staples.com> on 2005/10/08 16:43:11 UTC

Is Lucene right for me?

Hi
I am complete newbie to Lucene. In fact I'm not even a search guy. I looked
up terms such as stemming just yesterday. So this is going to be so much fun
;)

Here's the problem I am trying to solve:
I work in the B2B space at Staples (an office supplies company in the US).
We sell office products to companies with whom we have a contract. The
contract defines what we can sell (or not sell) to a company's employees.
One contract may be shared by numerous small and medium sized companies and
one company may also have more than one contract based on their ship-to
location.

Currently, we have a very bad system of doing this. We do blocking at the
SKU level. In other words, the sales people go in and mark individual SKUs
as blocked or available for sale. Given that we sell upwards of 90000 SKUs,
this is a nightmare and unwanted SKUs do become available for sale. A new
project called 'Assortment View' is funded to address this problem. The
sales people will define blocking/unblocking rules in the catalog at the
category, subcategory and even individual SKU level to create a customer's
product assortment. With blocking at a much higher level than individual
SKUs, we hope the problem will be alleviated.
I am evaluating Lucene for this purpose. I realize I am attempting to use
primarily a search engine as an inclusion/exclusion index solution, where
data about contracts, customers and blocking rules is in the index, and
Lucene provides the class of products available/forbidden for sale.

Questions:
1. Is this the right use of Lucene?

2. Has anybody done something like this before?

3. When there are two high level entities viz. catalog and contract, what is
the best index structure?
    a. Two separate indices which are 'joined' (my relational database 
       background shows) at runtime to provide the customer's product 
       assortment.
    b. One index with two tables (db shows again), if there is such a 
       concept in Lucene
    c. One flattened index containing catalog and contract information as 
       tokenized fields in the same document (now I'm using Lucene terms ;)

4. I need to ask two kinds of questions from the index:
    a. Given a list of customer's contracts, give me the customer's product 
       assortment. In other words, give me counts of products within a 
       category, subcategory available for sale. 
    b. Given these list of SKUs, tell me which ones are blocked and which 
       are not by looking at all the blocking rules defined at the category 
       or individual SKU level.
Is Lucene's Query API flexible enough to support such different queries?
Will it scale for query b, where the list of SKUs may be large (thousands)?

Fun Fun!!
Thanks in advance
Sid


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Is Lucene right for me?

Posted by Chris Hostetter <ho...@fucit.org>.
: I am evaluating Lucene for this purpose. I realize I am attempting to use
: primarily a search engine as an inclusion/exclusion index solution, where
: data about contracts, customers and blocking rules is in the index, and
: Lucene provides the class of products available/forbidden for sale.
:
: Questions:
: 1. Is this the right use of Lucene?

I don't think it's too much of a stretch to say that Lucene can easily be
used for applying inclusion/exclusion rules to generate a product catalog
-- that's exactly what I do with it -- but it doesn't particularly
facilitate it.

If you haven't seen this post, you should start by reading it...
http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html#a748420

...it strikes me as a very similar situation to what you are describing.

I would suggest you store each product as a document in your index, and
impliment your exclusion rules as Filters.  When a user comes to your
site, identify what set of exclusion rules applies to them, and generate a
Filter to corrispond to those rules.  Then let them search/browse as
needed.   how you store those exclusion rules is up to you.  In my case i
put them in stored fields in special metadata documes in my index, using
an XML representation that i can use to easily construct filter objects.
I can imagine it might make sense for you to store "contract" documents in
your index, with "companyId" as an indexed field.  When a user comes to
your site and says they are with companyId#1234, you do one search to get
all the "contracts" associated with that company, and then parse those
documents to generate a list of Filters to compose together on all
searches done by the user.

One last comment...

:    b. Given these list of SKUs, tell me which ones are blocked and which
:       are not by looking at all the blocking rules defined at the category
:       or individual SKU level.

: Is Lucene's Query API flexible enough to support such different queries?
: Will it scale for query b, where the list of SKUs may be large (thousands)?

What made it possible for me to impliment my project at all (let alone
with lucene) was the ability to deal with sets of documents as BitSets.  I
didn't have to worry about finding the intersections of large sets of
product SKUs - i just had to AND some some BitSets.  In short, I don't
think you'll ever find a need to query for a list of thousands of SKUs, I
think you'll find it easier to query for farious rules/categories and get
a BitSet which identifies all of the documents for the products those SKUs
represent in your index.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org