You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Young, Cody" <Co...@move.com> on 2011/11/29 22:25:41 UTC

Grouping on Long type uses function query?

Hi All,

 

I'm new to solr development. Since I'm new with the code base, I thought
I'd double check here before making a JIRA issue. We're trying to use
grouping on a field with a type of long (on trunk):

    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
omitNorms="true" positionIncrementGap="0"/>

 

The performance wasn't what we were looking for so I'm taking a quick
look at the grouping code in solr and I noticed that a string field uses
the Term grouping classes (CommandField in
/trunk/solr/core/src/java/org/apache/solr/search/Grouping.java).
However, when using a long field the Function grouping classes get used
(CommandFunc in
/trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). When I
change it over to using CommandField instead of CommandFunc for long
type I get a decrease in QTime (I only did light testing, and just
simple queries but it seemed to drop by 50% or so).

 

The functionality appears to still work and the grouping tests pass, but
as I'm not very familiar with the solr code I wasn't sure if there was a
reason for Long to use CommandFunc instead of CommandField.

 

I'm happy to take a stab at making a JIRA issue and a patch if this is
indeed an issue, but I'll need some guidance on the best way to fix this
(perhaps instead of using instanceof StrFieldSource or instanceof
LongFieldSource there is a better way to check?). 

 

The change I made to test this was very simple, I just added:

 

import org.apache.lucene.queries.function.valuesource.LongFieldSource;

 

and at Line 176 of Grouping.java

     } else if(valueSource instanceof LongFieldSource) {

         String field = ((LongFieldSource) valueSource).getField();

         CommandField commandField = new CommandField();

         commandField.groupBy = field;

         gc = commandField;

 

Thanks,

Cody


Re: Grouping on Long type uses function query?

Posted by Martijn v Groningen <ma...@gmail.com>.
Actually DocTermsIndex entry can take quite some memory. I believe in
the case when you have a lot of unique strings more memory is used for
DocTermsIndex then if you have a small number of unique fieldvalues
with many documents per value.

I do think that an option that decides whether a double cache entry is
added to FC is desirable. The default should be false and if users
want fast grouping for non string fields then they set this option to
true. I think group.method is a bit vague and it isn't descriptive
about what exactly is is doing. It should be an expert option.
Maybe something like group.moreRamFasterGroupingNonStringFields=[true|false]

Having the BlockGroupingCollector in Solr would be great. However the
collector depends on block indexing and this is something that Solr
currently doesn't support. So that needs to be implemented first. I
think for using the BlockGroupingCollector we would just need two
parameters one that tells Solr to actually use the
BlockGroupingCollector and one parameter that tell Solr how to query
for the parent documents. Maybe be something like:
group.block=[true|false] and group.parent.query=[query]

Martijn

On 30 November 2011 00:30, Young, Cody <Co...@move.com> wrote:
> Hi Martijn,
>
> Thanks for the response!
>
> Doesn't it take a lot more memory to hold a string field in the FieldCache than a long field?
>
> In our grouping scenario, we have many unique values with a small number of documents per group. I would think that even the double FieldCache memory hit on a long would be less than using a string.
>
> Would this is a suitable place to have a grouping parameter to control the behavior? group.method? I'm looking at using the BlockGroupingCollector as well, perhaps "block" could be another choice?
> The downside being that there are invalid combinations. (You wouldn’t change group.method to anything else if you were using a function to group)
>
> Thanks,
> Cody
>
> -----Original Message-----
> From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On Behalf Of Martijn v Groningen
> Sent: Tuesday, November 29, 2011 2:09 PM
> To: dev@lucene.apache.org
> Subject: Re: Grouping on Long type uses function query?
>
> If I remember correctly this was done to avoid insane FieldCache usage.
>
> If Term based grouping implementation is used then for that field an entry is created in the FieldCache of type DocTermsIndex. It might then happen that for other search features like sorting and faceting a second entry is created in the FieldCache. Sorting for example will put in your case a new entry for this field in the FieldCache of type long. When the Function based grouping implementations are used this is not the case. Only one cache entry of type long is put in the FieldCache and sorting or faceting will reuse these entries.
>
> The downside of the Function based grouping implementations is that they are slower then the Term based implementation.
> At the time this feature was integrated into Solr the decision was made to not have double FieldCache usage per field and use the slower Function based implementation for non string fields.
>
> The work around that doesn't involve coding is the make a copy field of type string, but then you add more fields / data to your index...
>
> On 29 November 2011 22:25, Young, Cody <Co...@move.com> wrote:
>> Hi All,
>>
>>
>>
>> I’m new to solr development. Since I’m new with the code base, I
>> thought I’d double check here before making a JIRA issue. We’re trying
>> to use grouping on a field with a type of long (on trunk):
>>
>>     <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>> omitNorms="true" positionIncrementGap="0"/>
>>
>>
>>
>> The performance wasn’t what we were looking for so I’m taking a quick
>> look at the grouping code in solr and I noticed that a string field
>> uses the Term grouping classes (CommandField in
>> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java).
>> However, when using a long field the Function grouping classes get
>> used (CommandFunc in
>> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). When
>> I change it over to using CommandField instead of CommandFunc for long
>> type I get a decrease in QTime (I only did light testing, and just simple queries but it seemed to drop by 50% or so).
>>
>>
>>
>> The functionality appears to still work and the grouping tests pass,
>> but as I’m not very familiar with the solr code I wasn’t sure if there
>> was a reason for Long to use CommandFunc instead of CommandField.
>>
>>
>>
>> I’m happy to take a stab at making a JIRA issue and a patch if this is
>> indeed an issue, but I’ll need some guidance on the best way to fix
>> this (perhaps instead of using instanceof StrFieldSource or instanceof
>> LongFieldSource there is a better way to check?).
>>
>>
>>
>> The change I made to test this was very simple, I just added:
>>
>>
>>
>> import org.apache.lucene.queries.function.valuesource.LongFieldSource;
>>
>>
>>
>> and at Line 176 of Grouping.java
>>
>>      } else if(valueSource instanceof LongFieldSource) {
>>
>>          String field = ((LongFieldSource) valueSource).getField();
>>
>>          CommandField commandField = new CommandField();
>>
>>          commandField.groupBy = field;
>>
>>          gc = commandField;
>>
>>
>>
>> Thanks,
>>
>> Cody
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Grouping on Long type uses function query?

Posted by "Young, Cody" <Co...@move.com>.
Hi Martijn,

Thanks for the response!

Doesn't it take a lot more memory to hold a string field in the FieldCache than a long field? 

In our grouping scenario, we have many unique values with a small number of documents per group. I would think that even the double FieldCache memory hit on a long would be less than using a string. 
 
Would this is a suitable place to have a grouping parameter to control the behavior? group.method? I'm looking at using the BlockGroupingCollector as well, perhaps "block" could be another choice?
The downside being that there are invalid combinations. (You wouldn’t change group.method to anything else if you were using a function to group)

Thanks,
Cody

-----Original Message-----
From: martijn.is.hier@gmail.com [mailto:martijn.is.hier@gmail.com] On Behalf Of Martijn v Groningen
Sent: Tuesday, November 29, 2011 2:09 PM
To: dev@lucene.apache.org
Subject: Re: Grouping on Long type uses function query?

If I remember correctly this was done to avoid insane FieldCache usage.

If Term based grouping implementation is used then for that field an entry is created in the FieldCache of type DocTermsIndex. It might then happen that for other search features like sorting and faceting a second entry is created in the FieldCache. Sorting for example will put in your case a new entry for this field in the FieldCache of type long. When the Function based grouping implementations are used this is not the case. Only one cache entry of type long is put in the FieldCache and sorting or faceting will reuse these entries.

The downside of the Function based grouping implementations is that they are slower then the Term based implementation.
At the time this feature was integrated into Solr the decision was made to not have double FieldCache usage per field and use the slower Function based implementation for non string fields.

The work around that doesn't involve coding is the make a copy field of type string, but then you add more fields / data to your index...

On 29 November 2011 22:25, Young, Cody <Co...@move.com> wrote:
> Hi All,
>
>
>
> I’m new to solr development. Since I’m new with the code base, I 
> thought I’d double check here before making a JIRA issue. We’re trying 
> to use grouping on a field with a type of long (on trunk):
>
>     <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>
>
>
> The performance wasn’t what we were looking for so I’m taking a quick 
> look at the grouping code in solr and I noticed that a string field 
> uses the Term grouping classes (CommandField in 
> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). 
> However, when using a long field the Function grouping classes get 
> used (CommandFunc in 
> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). When 
> I change it over to using CommandField instead of CommandFunc for long 
> type I get a decrease in QTime (I only did light testing, and just simple queries but it seemed to drop by 50% or so).
>
>
>
> The functionality appears to still work and the grouping tests pass, 
> but as I’m not very familiar with the solr code I wasn’t sure if there 
> was a reason for Long to use CommandFunc instead of CommandField.
>
>
>
> I’m happy to take a stab at making a JIRA issue and a patch if this is 
> indeed an issue, but I’ll need some guidance on the best way to fix 
> this (perhaps instead of using instanceof StrFieldSource or instanceof 
> LongFieldSource there is a better way to check?).
>
>
>
> The change I made to test this was very simple, I just added:
>
>
>
> import org.apache.lucene.queries.function.valuesource.LongFieldSource;
>
>
>
> and at Line 176 of Grouping.java
>
>      } else if(valueSource instanceof LongFieldSource) {
>
>          String field = ((LongFieldSource) valueSource).getField();
>
>          CommandField commandField = new CommandField();
>
>          commandField.groupBy = field;
>
>          gc = commandField;
>
>
>
> Thanks,
>
> Cody



--
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org


Re: Grouping on Long type uses function query?

Posted by Martijn v Groningen <ma...@gmail.com>.
If I remember correctly this was done to avoid insane FieldCache usage.

If Term based grouping implementation is used then for that field an
entry is created in the FieldCache of type DocTermsIndex. It might
then happen that for other search features like sorting and faceting a
second entry is created in the FieldCache. Sorting for example will
put in your case a new entry for this field in the FieldCache of type
long. When the Function based grouping implementations are used this
is not the case. Only one cache entry of type long is put in the
FieldCache and sorting or faceting will reuse these entries.

The downside of the Function based grouping implementations is that
they are slower then the Term based implementation.
At the time this feature was integrated into Solr the decision was
made to not have double FieldCache usage per field and use the slower
Function based implementation for non string fields.

The work around that doesn't involve coding is the make a copy field
of type string, but then you add more fields / data to your index...

On 29 November 2011 22:25, Young, Cody <Co...@move.com> wrote:
> Hi All,
>
>
>
> I’m new to solr development. Since I’m new with the code base, I thought I’d
> double check here before making a JIRA issue. We’re trying to use grouping
> on a field with a type of long (on trunk):
>
>     <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>
>
>
> The performance wasn’t what we were looking for so I’m taking a quick look
> at the grouping code in solr and I noticed that a string field uses the Term
> grouping classes (CommandField in
> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). However,
> when using a long field the Function grouping classes get used (CommandFunc
> in /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). When I
> change it over to using CommandField instead of CommandFunc for long type I
> get a decrease in QTime (I only did light testing, and just simple queries
> but it seemed to drop by 50% or so).
>
>
>
> The functionality appears to still work and the grouping tests pass, but as
> I’m not very familiar with the solr code I wasn’t sure if there was a reason
> for Long to use CommandFunc instead of CommandField.
>
>
>
> I’m happy to take a stab at making a JIRA issue and a patch if this is
> indeed an issue, but I’ll need some guidance on the best way to fix this
> (perhaps instead of using instanceof StrFieldSource or instanceof
> LongFieldSource there is a better way to check?).
>
>
>
> The change I made to test this was very simple, I just added:
>
>
>
> import org.apache.lucene.queries.function.valuesource.LongFieldSource;
>
>
>
> and at Line 176 of Grouping.java
>
>      } else if(valueSource instanceof LongFieldSource) {
>
>          String field = ((LongFieldSource) valueSource).getField();
>
>          CommandField commandField = new CommandField();
>
>          commandField.groupBy = field;
>
>          gc = commandField;
>
>
>
> Thanks,
>
> Cody



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org