You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by "Koga, Diego" <di...@gmail.com> on 2016/12/26 23:05:46 UTC

Facet Performance

Guys, Happy Holidays!!!

I am tunning an existing Lucene.Net index where I am currently working.

The perfomance was greatly increase when we talk about the search itself.

But when it goes to facet concern, I'm getting crazy!

This is the scenario:

- 1.8M products on a SQL Server.
- Basically we concatenate all analyzed fields into one. (made the search
work way better)
- Create a specific price field to set a price range in order to group
later in a facet. (here we have ranges configured in the system that we
save only the id. The ideia is to minimize the number of possible facets)
- 16k categories (maybe I can do a tree here and minimize it too, need
opinions)
- 2k manufacturers
- Facets needed: On Sale (boolean), Review (0 - 5), Price (let's say 10
ranges), Free Shipping (boolean), Manufacturer (string) and Category
(string).

Question 1: What do you guys would improve when writing the index?

The lucene is embedded in ASP.NET app where the clients make a REST http
request. It's only one request that returns the result of the search and
also the facets. To make it faster, I am processing the search and each
facet in parallel. Which also make the search waits for the facets until
it's done (I figured out that is faster process in parallel than pass an
array).

This is the facet search:

            using (var simpleFacetedSearch = new
SimpleFacetedSearch(_indexReader, facetName))
            {
                var hits = simpleFacetedSearch.Search(query, 1);
                return hits;
            }

Question 2: What do you guys recommend to make this search faster and also
does not freeze the server because high memory usage?


Thanks....


Att.,
------------------
Koga, Diego


>

Re: Facet Performance

Posted by Itamar Syn-Hershko <it...@code972.com>.
I'd assume 4.8 is much better in this regard, but the design is mostly the
same (aka FieldCache). I haven't used it extensively myself - Shad may be
able to shed more light on it? (no pun intended!)

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Dec 29, 2016 at 6:07 PM, Koga, Diego <di...@gmail.com> wrote:

> Hi Itamar,
>
> Is there any difference between 3.0.3 and 4.8 on this matter?
>
>
>
> Att.,
> ------------------
> Koga, Diego
>
>
> On Thu, Dec 29, 2016 at 11:01 AM, Itamar Syn-Hershko <it...@code972.com>
> wrote:
> > This makes sense, yes. Faceted search is by definition a high-memory
> > consumer.
> >
> > --
> >
> > Itamar Syn-Hershko
> > http://code972.com | @synhershko <https://twitter.com/synhershko>
> > Freelance Developer & Consultant
> > Lucene.NET committer and PMC member
> >
> > On Thu, Dec 29, 2016 at 5:53 PM, Koga, Diego <di...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> I figured something out. Once I keep the reader and searcher as
> >> singleton, I also need to keep the facet engine.
> >>
> >> However, the SimpleFacetedSearch was consuming huge amount of memory,
> >> the SparseFacetedSearcher is way more memory optimized. I am creating
> >> one for each facet that I have and while searching, I pick the right
> >> facet searcher to use.
> >>
> >> This is how my code looks like:
> >>
> >>
> >> private void CreateReader()
> >> {
> >> if (_indexReader == null)
> >> {
> >> var dir = FSDirectory.Open(IndexDirectories.WorkDirectory);
> >>
> >> if (!IndexReader.IndexExists(dir))
> >> {
> >> dir.Dispose();
> >> throw new FileNotFoundException(
> >> "Index not found. Before initialize the reader, make sure to create
> >> the index.");
> >> }
> >> _indexReader = IndexReader.Open(dir, true);
> >> _indexSeacher = new IndexSearcher(_indexReader);
> >> _facetsSearchers = CreateFacetsSearchers(dir);
> >> }
> >> }
> >>
> >> private Dictionary<string, SparseFacetedSearcher>
> >> CreateFacetsSearchers(FSDirectory directory)
> >> {
> >> var facetReaders = new Dictionary<string, SparseFacetedSearcher>();
> >>
> >> foreach (var facet in AvailableFacets.Split(new[] { ',' },
> >> StringSplitOptions.RemoveEmptyEntries))
> >> {
> >> var indexReader = IndexReader.Open(directory, true);
> >> var facetSeacher = new SparseFacetedSearcher(indexReader, facet);
> >> facetReaders.Add(facet, facetSeacher);
> >> }
> >>
> >> return facetReaders;
> >> }
> >>
> >>
> >>
> >> Att.,
> >> ------------------
> >> Koga, Diego
> >>
> >>
> >> On Mon, Dec 26, 2016 at 6:05 PM, Koga, Diego <di...@gmail.com> wrote:
> >> > Guys, Happy Holidays!!!
> >> >
> >> > I am tunning an existing Lucene.Net index where I am currently
> working.
> >> >
> >> > The perfomance was greatly increase when we talk about the search
> itself.
> >> >
> >> > But when it goes to facet concern, I'm getting crazy!
> >> >
> >> > This is the scenario:
> >> >
> >> > - 1.8M products on a SQL Server.
> >> > - Basically we concatenate all analyzed fields into one. (made the
> search
> >> > work way better)
> >> > - Create a specific price field to set a price range in order to group
> >> later
> >> > in a facet. (here we have ranges configured in the system that we save
> >> only
> >> > the id. The ideia is to minimize the number of possible facets)
> >> > - 16k categories (maybe I can do a tree here and minimize it too, need
> >> > opinions)
> >> > - 2k manufacturers
> >> > - Facets needed: On Sale (boolean), Review (0 - 5), Price (let's say
> 10
> >> > ranges), Free Shipping (boolean), Manufacturer (string) and Category
> >> > (string).
> >> >
> >> > Question 1: What do you guys would improve when writing the index?
> >> >
> >> > The lucene is embedded in ASP.NET app where the clients make a REST
> http
> >> > request. It's only one request that returns the result of the search
> and
> >> > also the facets. To make it faster, I am processing the search and
> each
> >> > facet in parallel. Which also make the search waits for the facets
> until
> >> > it's done (I figured out that is faster process in parallel than pass
> an
> >> > array).
> >> >
> >> > This is the facet search:
> >> >
> >> >             using (var simpleFacetedSearch = new
> >> > SimpleFacetedSearch(_indexReader, facetName))
> >> >             {
> >> >                 var hits = simpleFacetedSearch.Search(query, 1);
> >> >                 return hits;
> >> >             }
> >> >
> >> > Question 2: What do you guys recommend to make this search faster and
> >> also
> >> > does not freeze the server because high memory usage?
> >> >
> >> >
> >> > Thanks....
> >> >
> >> >
> >> > Att.,
> >> > ------------------
> >> > Koga, Diego
> >> >
> >> >>
> >> >
> >>
>

Re: Facet Performance

Posted by "Koga, Diego" <di...@gmail.com>.
Hi Itamar,

Is there any difference between 3.0.3 and 4.8 on this matter?



Att.,
------------------
Koga, Diego


On Thu, Dec 29, 2016 at 11:01 AM, Itamar Syn-Hershko <it...@code972.com> wrote:
> This makes sense, yes. Faceted search is by definition a high-memory
> consumer.
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>
> On Thu, Dec 29, 2016 at 5:53 PM, Koga, Diego <di...@gmail.com> wrote:
>
>> Hi all,
>>
>> I figured something out. Once I keep the reader and searcher as
>> singleton, I also need to keep the facet engine.
>>
>> However, the SimpleFacetedSearch was consuming huge amount of memory,
>> the SparseFacetedSearcher is way more memory optimized. I am creating
>> one for each facet that I have and while searching, I pick the right
>> facet searcher to use.
>>
>> This is how my code looks like:
>>
>>
>> private void CreateReader()
>> {
>> if (_indexReader == null)
>> {
>> var dir = FSDirectory.Open(IndexDirectories.WorkDirectory);
>>
>> if (!IndexReader.IndexExists(dir))
>> {
>> dir.Dispose();
>> throw new FileNotFoundException(
>> "Index not found. Before initialize the reader, make sure to create
>> the index.");
>> }
>> _indexReader = IndexReader.Open(dir, true);
>> _indexSeacher = new IndexSearcher(_indexReader);
>> _facetsSearchers = CreateFacetsSearchers(dir);
>> }
>> }
>>
>> private Dictionary<string, SparseFacetedSearcher>
>> CreateFacetsSearchers(FSDirectory directory)
>> {
>> var facetReaders = new Dictionary<string, SparseFacetedSearcher>();
>>
>> foreach (var facet in AvailableFacets.Split(new[] { ',' },
>> StringSplitOptions.RemoveEmptyEntries))
>> {
>> var indexReader = IndexReader.Open(directory, true);
>> var facetSeacher = new SparseFacetedSearcher(indexReader, facet);
>> facetReaders.Add(facet, facetSeacher);
>> }
>>
>> return facetReaders;
>> }
>>
>>
>>
>> Att.,
>> ------------------
>> Koga, Diego
>>
>>
>> On Mon, Dec 26, 2016 at 6:05 PM, Koga, Diego <di...@gmail.com> wrote:
>> > Guys, Happy Holidays!!!
>> >
>> > I am tunning an existing Lucene.Net index where I am currently working.
>> >
>> > The perfomance was greatly increase when we talk about the search itself.
>> >
>> > But when it goes to facet concern, I'm getting crazy!
>> >
>> > This is the scenario:
>> >
>> > - 1.8M products on a SQL Server.
>> > - Basically we concatenate all analyzed fields into one. (made the search
>> > work way better)
>> > - Create a specific price field to set a price range in order to group
>> later
>> > in a facet. (here we have ranges configured in the system that we save
>> only
>> > the id. The ideia is to minimize the number of possible facets)
>> > - 16k categories (maybe I can do a tree here and minimize it too, need
>> > opinions)
>> > - 2k manufacturers
>> > - Facets needed: On Sale (boolean), Review (0 - 5), Price (let's say 10
>> > ranges), Free Shipping (boolean), Manufacturer (string) and Category
>> > (string).
>> >
>> > Question 1: What do you guys would improve when writing the index?
>> >
>> > The lucene is embedded in ASP.NET app where the clients make a REST http
>> > request. It's only one request that returns the result of the search and
>> > also the facets. To make it faster, I am processing the search and each
>> > facet in parallel. Which also make the search waits for the facets until
>> > it's done (I figured out that is faster process in parallel than pass an
>> > array).
>> >
>> > This is the facet search:
>> >
>> >             using (var simpleFacetedSearch = new
>> > SimpleFacetedSearch(_indexReader, facetName))
>> >             {
>> >                 var hits = simpleFacetedSearch.Search(query, 1);
>> >                 return hits;
>> >             }
>> >
>> > Question 2: What do you guys recommend to make this search faster and
>> also
>> > does not freeze the server because high memory usage?
>> >
>> >
>> > Thanks....
>> >
>> >
>> > Att.,
>> > ------------------
>> > Koga, Diego
>> >
>> >>
>> >
>>

Re: Facet Performance

Posted by Itamar Syn-Hershko <it...@code972.com>.
This makes sense, yes. Faceted search is by definition a high-memory
consumer.

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Dec 29, 2016 at 5:53 PM, Koga, Diego <di...@gmail.com> wrote:

> Hi all,
>
> I figured something out. Once I keep the reader and searcher as
> singleton, I also need to keep the facet engine.
>
> However, the SimpleFacetedSearch was consuming huge amount of memory,
> the SparseFacetedSearcher is way more memory optimized. I am creating
> one for each facet that I have and while searching, I pick the right
> facet searcher to use.
>
> This is how my code looks like:
>
>
> private void CreateReader()
> {
> if (_indexReader == null)
> {
> var dir = FSDirectory.Open(IndexDirectories.WorkDirectory);
>
> if (!IndexReader.IndexExists(dir))
> {
> dir.Dispose();
> throw new FileNotFoundException(
> "Index not found. Before initialize the reader, make sure to create
> the index.");
> }
> _indexReader = IndexReader.Open(dir, true);
> _indexSeacher = new IndexSearcher(_indexReader);
> _facetsSearchers = CreateFacetsSearchers(dir);
> }
> }
>
> private Dictionary<string, SparseFacetedSearcher>
> CreateFacetsSearchers(FSDirectory directory)
> {
> var facetReaders = new Dictionary<string, SparseFacetedSearcher>();
>
> foreach (var facet in AvailableFacets.Split(new[] { ',' },
> StringSplitOptions.RemoveEmptyEntries))
> {
> var indexReader = IndexReader.Open(directory, true);
> var facetSeacher = new SparseFacetedSearcher(indexReader, facet);
> facetReaders.Add(facet, facetSeacher);
> }
>
> return facetReaders;
> }
>
>
>
> Att.,
> ------------------
> Koga, Diego
>
>
> On Mon, Dec 26, 2016 at 6:05 PM, Koga, Diego <di...@gmail.com> wrote:
> > Guys, Happy Holidays!!!
> >
> > I am tunning an existing Lucene.Net index where I am currently working.
> >
> > The perfomance was greatly increase when we talk about the search itself.
> >
> > But when it goes to facet concern, I'm getting crazy!
> >
> > This is the scenario:
> >
> > - 1.8M products on a SQL Server.
> > - Basically we concatenate all analyzed fields into one. (made the search
> > work way better)
> > - Create a specific price field to set a price range in order to group
> later
> > in a facet. (here we have ranges configured in the system that we save
> only
> > the id. The ideia is to minimize the number of possible facets)
> > - 16k categories (maybe I can do a tree here and minimize it too, need
> > opinions)
> > - 2k manufacturers
> > - Facets needed: On Sale (boolean), Review (0 - 5), Price (let's say 10
> > ranges), Free Shipping (boolean), Manufacturer (string) and Category
> > (string).
> >
> > Question 1: What do you guys would improve when writing the index?
> >
> > The lucene is embedded in ASP.NET app where the clients make a REST http
> > request. It's only one request that returns the result of the search and
> > also the facets. To make it faster, I am processing the search and each
> > facet in parallel. Which also make the search waits for the facets until
> > it's done (I figured out that is faster process in parallel than pass an
> > array).
> >
> > This is the facet search:
> >
> >             using (var simpleFacetedSearch = new
> > SimpleFacetedSearch(_indexReader, facetName))
> >             {
> >                 var hits = simpleFacetedSearch.Search(query, 1);
> >                 return hits;
> >             }
> >
> > Question 2: What do you guys recommend to make this search faster and
> also
> > does not freeze the server because high memory usage?
> >
> >
> > Thanks....
> >
> >
> > Att.,
> > ------------------
> > Koga, Diego
> >
> >>
> >
>

Re: Facet Performance

Posted by "Koga, Diego" <di...@gmail.com>.
Hi all,

I figured something out. Once I keep the reader and searcher as
singleton, I also need to keep the facet engine.

However, the SimpleFacetedSearch was consuming huge amount of memory,
the SparseFacetedSearcher is way more memory optimized. I am creating
one for each facet that I have and while searching, I pick the right
facet searcher to use.

This is how my code looks like:


private void CreateReader()
{
if (_indexReader == null)
{
var dir = FSDirectory.Open(IndexDirectories.WorkDirectory);

if (!IndexReader.IndexExists(dir))
{
dir.Dispose();
throw new FileNotFoundException(
"Index not found. Before initialize the reader, make sure to create
the index.");
}
_indexReader = IndexReader.Open(dir, true);
_indexSeacher = new IndexSearcher(_indexReader);
_facetsSearchers = CreateFacetsSearchers(dir);
}
}

private Dictionary<string, SparseFacetedSearcher>
CreateFacetsSearchers(FSDirectory directory)
{
var facetReaders = new Dictionary<string, SparseFacetedSearcher>();

foreach (var facet in AvailableFacets.Split(new[] { ',' },
StringSplitOptions.RemoveEmptyEntries))
{
var indexReader = IndexReader.Open(directory, true);
var facetSeacher = new SparseFacetedSearcher(indexReader, facet);
facetReaders.Add(facet, facetSeacher);
}

return facetReaders;
}



Att.,
------------------
Koga, Diego


On Mon, Dec 26, 2016 at 6:05 PM, Koga, Diego <di...@gmail.com> wrote:
> Guys, Happy Holidays!!!
>
> I am tunning an existing Lucene.Net index where I am currently working.
>
> The perfomance was greatly increase when we talk about the search itself.
>
> But when it goes to facet concern, I'm getting crazy!
>
> This is the scenario:
>
> - 1.8M products on a SQL Server.
> - Basically we concatenate all analyzed fields into one. (made the search
> work way better)
> - Create a specific price field to set a price range in order to group later
> in a facet. (here we have ranges configured in the system that we save only
> the id. The ideia is to minimize the number of possible facets)
> - 16k categories (maybe I can do a tree here and minimize it too, need
> opinions)
> - 2k manufacturers
> - Facets needed: On Sale (boolean), Review (0 - 5), Price (let's say 10
> ranges), Free Shipping (boolean), Manufacturer (string) and Category
> (string).
>
> Question 1: What do you guys would improve when writing the index?
>
> The lucene is embedded in ASP.NET app where the clients make a REST http
> request. It's only one request that returns the result of the search and
> also the facets. To make it faster, I am processing the search and each
> facet in parallel. Which also make the search waits for the facets until
> it's done (I figured out that is faster process in parallel than pass an
> array).
>
> This is the facet search:
>
>             using (var simpleFacetedSearch = new
> SimpleFacetedSearch(_indexReader, facetName))
>             {
>                 var hits = simpleFacetedSearch.Search(query, 1);
>                 return hits;
>             }
>
> Question 2: What do you guys recommend to make this search faster and also
> does not freeze the server because high memory usage?
>
>
> Thanks....
>
>
> Att.,
> ------------------
> Koga, Diego
>
>>
>