You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Anton Potehin <an...@orbita1.ru> on 2006/02/28 08:59:41 UTC

search problem

I have a problem. 

There is an index, which contains about 6,000,000 records (15,000,000
will be soon) the size is 4GB. Index is optimized and consists of only
one segment. This index stores the products. Each product has brand,
price and about 10 more additional fields. I want to not just find
something in the index also I want to get the lists of all brands and
price. The list of brands is needed for displaying all of the products
and the quantity of products of this brand for certain search request.

 

Here is the fragment of the code for results obtaining:

 

            ...the fragment of code for search is passed by, because it
is not so important... 

int count = hits.length(); 

            TreeMap brands = new TreeMap();

            float[] priceArr = new float[count];

 

            ArrayList results = new ArrayList();

            for (int i = 0; i < count; i++) {

                try {

                    Document doc = hits.doc(i);

                    String brand =
doc.getField("manufacturer").stringValue();

                    int number = 1;

                    try {

                        number = Integer.parseInt((String)
brands.get(brand));

                        number++;

                    } catch (Exception ex) {

                    }

                    brands.put(brand, number + "");

                    float price =
PriceUtils.priceFromStringToFloat(doc.getField("price").stringValue());

                    float maxPriceCurrent =
PriceUtils.priceFromStringToFloat(doc.getField("max_price").stringValue(
));

                    float minPriceCurrent =
PriceUtils.priceFromStringToFloat(doc.getField("min_price").stringValue(
));

 

                    priceArr[i] = price;

 

//start----------------------Brand Filter

                    if (!brandFilter.equals("") &&
!brandFilter.toLowerCase().equals(brand.toLowerCase())) {

                        continue;

                    }

//end------------------------Brand Filter

 

//start----------------------Price Filter

                    if (minPrice != -1 && price < minPrice) {

                        continue;

                    }

                    if (maxPrice != -1 && price > maxPrice) {

                        continue;

                    }

//start----------------------Price Filter

                    if (startAt <= numberOfResults && results.size() <
maxNumberResults) {

                        String name =
StringUtils.stringLimitation(MAX_LENGTH_OF_NAME,
doc.getField("name").stringValue());

                        String description =
StringUtils.stringLimitation(MAX_LENGTH_OF_DESCRIPTION,
doc.getField("description").stringValue());

                        String idProduct =
doc.getField("id_product").stringValue();

                        Element item =
documentFactory.createElement("item");

 

//                        ...the reading of needed fields from document
is omitted... 

 

                        results.add((threadNumber - 1), item);

                    }

                    numberOfResults++;

                } catch (IOException e) {

                    System.err.println(e.toString());

                }

            }

 

As it is seen, to feel brands and priceArr I have to check all of
results from hits. The search may give up to 100,000 results, the
checking takes too much time and memory. Do anybody have some ideas how
to speed up this process?

RE: search problem

Posted by Chris Hostetter <ho...@fucit.org>.

: Where can I see the example using HitCollector ?

if you mean examples of writing a HitCollector, then the javadocs for the
HitCollector class are a good place to start.  As is the source code for
TopDocCollector and and TopFieldDocCollector.

If you mean an example of using a HitCollector when you search...

     Searcher s = ...;
     Query q = ...;
     MyHitCollectorSubClass h = ...;
     s.search(q, h);
     // now pull whatever data out of h that you want.

LIA Chapter 6.2 ha a lot of good info on HitCollectors as well.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: search problem

Posted by an...@orbita1.ru.

2) assuming what you want is not all brands and all prices, but just the
prices and brands of the documents in you results, then i would strongly
recommend doing your search twice -- once do get the Hits object you need
and once using a HitCollector.  Within the HitCollector, use FieldCaches
to lookup the values of the fields you want.  this requires that those
fields are indexed and not tokenized, but it should be *much* faster then
your current approach.

Where can I see the example using HitCollector ?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: search problem

Posted by Chris Hostetter <ho...@fucit.org>.

: price and about 10 more additional fields. I want to not just find
: something in the index also I want to get the lists of all brands and
: price. The list of brands is needed for displaying all of the products
: and the quantity of products of this brand for certain search request.

1) iterating over all the docs in an instance of Hits can be very
expensive for large results sets.  partly becuase under the covers Hits
will reexecute your search many times, and partly because you are
constantly puling back *all* of hte stored fields of all matching
documents.

2) assuming what you want is not all brands and all prices, but just the
prices and brands of the documents in you results, then i would strongly
recommend doing your search twice -- once do get the Hits object you need
and once using a HitCollector.  Within the HitCollector, use FieldCaches
to lookup the values of the fields you want.  this requires that those
fields are indexed and not tokenized, but it should be *much* faster then
your current approach.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: search problem

Posted by "Michael D. Curtin" <mi...@curtin.com>.

Anton Potehin wrote:

> I have a problem. 
> 
> There is an index, which contains about 6,000,000 records (15,000,000
> will be soon) the size is 4GB. Index is optimized and consists of only
> one segment. This index stores the products. Each product has brand,
> price and about 10 more additional fields. I want to not just find
> something in the index also I want to get the lists of all brands and
> price. The list of brands is needed for displaying all of the products
> and the quantity of products of this brand for certain search request.
> ...

Try using IndexReader.terms() to enumerate over all the values in the index 
for a given field.  Should be a LOT faster.

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org