You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Matthias J. Sax" <mj...@apache.org> on 2015/10/01 11:08:22 UTC

Re: Field Group Hash Computation

The hash code will only be computed on the fields specified as grouping
attributes

Thus, Values(str2,str3) will be used.

The code is basically, Tuple.selectFields(groupingFiels).hashValue()

-Matthias

On 09/30/2015 04:05 PM, Kashyap Mhaisekar wrote:
> Thanks Matthias. My question was this -
> If am emitting out str1,str2,str3 but field grouped on str2,str3 only
> then will the hash be on Values(str1,str2,str3) or on Values(str2,str3)
> alone?. In my case str1,str2 are changing but I see the values go to
> same bolt instance. Can we debug what is the hash generated?
> 
> Thanks you!
> 
> Kashyap
> 
> On Sep 30, 2015 5:14 AM, "Matthias J. Sax" <mjsax@apache.org
> <ma...@apache.org>> wrote:
> 
>     Yes. That's right.
> 
>     "Values" extends ArrayList and does not overwrite .hashCode().
> 
>     -Matthias
> 
>     On 09/30/2015 11:21 AM, Kashyap Mhaisekar wrote:
>     > Is the computation right for hash? ArrayList(str1,str2...).hashcode()
>     > where str1,str2 etc are fields being grouped?
>     >
>     > Thanks
>     > Kashyap
>     >
>     > On Sep 29, 2015 18:04, "Kashyap Mhaisekar" <kashyap.m@gmail.com
>     <ma...@gmail.com>
>     > <mailto:kashyap.m@gmail.com <ma...@gmail.com>>> wrote:
>     >
>     >     Thanks guys. From what I understand, partial key grouping is used
>     >     when you know your grouping will create imbalance. In my case,
>     most
>     >     of my field groups to one bolt thereby causing it to be a
>     >     bottleneck. Since I emit string, I guess the hash is on
>     >     ArrayList(str1,str2...).hashcode(). This hashcode is coming
>     out same
>     >     for different string combinations...
>     >
>     >     Thanks
>     >     Kashyap
>     >
>     >     On Sep 29, 2015 17:51, "Matthias J. Sax" <mjsax@apache.org
>     <ma...@apache.org>
>     >     <mailto:mjsax@apache.org <ma...@apache.org>>> wrote:
>     >
>     >         If you can use "partial key grouping" depends on your use
>     case.
>     >         Think
>     >         careful before you apply it...
>     >
>     >         Maybe you want to read the research paper about it. It clearly
>     >         describes
>     >         when you can use it and when not:
>     >       
>      https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
>     >
>     >
>     >         -Matthias
>     >
>     >         On 09/30/2015 12:18 AM, Ken Danniswara wrote:
>     >         > Hi,
>     >         >
>     >         > From what I read, the default FieldGrouping did not balance
>     >         the load as
>     >         > like ShuffleGrouping do. In this case, there is a
>     discussion about
>     >         > custom Grouping implementation called partial key grouping
>     >         where it have
>     >         > better balancing problem. Maybe it
>     >         > helps. https://github.com/gdfm/partial-key-grouping
>     >         >
>     >         > On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar
>     >         <kashyap.m@gmail.com <ma...@gmail.com>
>     <mailto:kashyap.m@gmail.com <ma...@gmail.com>>
>     >         > <mailto:kashyap.m@gmail.com <ma...@gmail.com>
>     <mailto:kashyap.m@gmail.com <ma...@gmail.com>>>> wrote:
>     >         >
>     >         >     Thanks Derek. I use strings and I still end up with
>     some bolts
>     >         >     having the maximum requests :(
>     >         >
>     >         >     On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit
>     >         <derekd@yahoo-inc.com <ma...@yahoo-inc.com>
>     <mailto:derekd@yahoo-inc.com <ma...@yahoo-inc.com>>
>     >         >     <mailto:derekd@yahoo-inc.com
>     <ma...@yahoo-inc.com>
>     >         <mailto:derekd@yahoo-inc.com
>     <ma...@yahoo-inc.com>>>> wrote:
>     >         >
>     >         >         The code that hashes the field values is here:
>     >         >
>     >         >
>     >         
>     https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
>     >         >
>     >         >
>     >         >         You can write a little java program, something like:
>     >         >
>     >         >         public static void main(String[] args) {
>     >         >           ArrayList<String> myList = new
>     ArrayList<String>();
>     >         >              myList.add("first field value");
>     >         >           myList.add("second field value");
>     >         >
>     >         >           int hash =
>     Arrays.deephashCode(myList.toArray()); //
>     >         as in
>     >         >         tuple.clj
>     >         >
>     >         >
>     >         >           System.out.println("hash is "+hash);
>     >         >           int numTasks = 32;
>     >         >
>     >         >           System.out.println("task index is " + hash %
>     numTasks);
>     >         >
>     >         >         }
>     >         >
>     >         >
>     >         >         There are certain types of values that may not hash
>     >         >         consistently.  If you are using String values,
>     then it
>     >         should be
>     >         >         fine. Other types may or may not, depending on
>     how the
>     >         class
>     >         >         implements hashCode().
>     >         >
>     >         >
>     >         >         --
>     >         >         Derek
>     >         >
>     >         >
>     >         >         ________________________________
>     >         >         From: Kashyap Mhaisekar <kashyap.m@gmail.com
>     <ma...@gmail.com>
>     >         <mailto:kashyap.m@gmail.com <ma...@gmail.com>>
>     >         >         <mailto:kashyap.m@gmail.com
>     <ma...@gmail.com> <mailto:kashyap.m@gmail.com
>     <ma...@gmail.com>>>>
>     >         >         To: user@storm.apache.org
>     <ma...@storm.apache.org>
>     >         <mailto:user@storm.apache.org
>     <ma...@storm.apache.org>> <mailto:user@storm.apache.org
>     <ma...@storm.apache.org>
>     >         <mailto:user@storm.apache.org <ma...@storm.apache.org>>>
>     >         >         Sent: Tuesday, September 29, 2015 4:28 PM
>     >         >         Subject: Field Group Hash Computation
>     >         >
>     >         >
>     >         >
>     >         >         Hi,
>     >         >         I have a field grouping based on 2 fields. I have 32
>     >         consumers
>     >         >         for the tuple and I see most of the times, out of 64
>     >         bolts, the
>     >         >         field group is always on 8 of them. Of the 8, 2 have
>     >         more than
>     >         >         60% of the data. The data for the field grouping can
>     >         have 20
>     >         >         different combinations.
>     >         >
>     >         >         Do you know what is the way to compute the Hash
>     of the
>     >         fields
>     >         >         used for computing? One of the groups mails indicate
>     >         that the
>     >         >         approach is -
>     >         >
>     >         >         It calls "hashCode" on the list of selected
>     values and
>     >         mods it
>     >         >         by the
>     >         >         number of consumer tasks. You can play around with
>     >         that function
>     >         >         to see if
>     >         >         something about your data is causing something
>     >         degenerative to
>     >         >         happen and
>     >         >         cause skew
>     >         >
>     >         >         I saw the clojure code but not sure how to
>     understand
>     >         this.
>     >         >
>     >         >         Thanks
>     >         >         Kashyap
>     >         >
>     >         >
>     >         >
>     >
>

Re: Field Group Hash Computation

Posted by Kashyap Mhaisekar <ka...@gmail.com>.

THis is interesting. We were mitigating it by avoid power of 2. Thanks
Florian.

On Tue, Oct 6, 2015 at 5:12 AM, Florian Hussonnois <fh...@gmail.com>
wrote:

> Hi Kashyap,
>
> You could improve your tuples distribution by implementing a
> CustomStreamGrouping.
> I have tried yours example with Murmur3 algorithm and the result looks
> better.
>
> Arrays.deepHashCode : [-35, -35, -35, -3, -3, -3, -3, 29, 29, 29, 29, 41,
> 51, 61, 61, 61, 61, 61]
> Murmur3 : [-61, -58, -57, -48, -37, -31, -15, -7, -4, 3, 6, 12, 20, 27,
> 45, 49, 56, 57]
>
> You can find my implementation here :
> https://github.com/fhussonnois/storm-cassandra/blob/master/src/main/java/com/github/fhuss/storm/cassandra/Murmur3StreamGrouping.java
>
> Hope this help.
>
> 2015-10-01 11:08 GMT+02:00 Matthias J. Sax <mj...@apache.org>:
>
>> The hash code will only be computed on the fields specified as grouping
>> attributes
>>
>> Thus, Values(str2,str3) will be used.
>>
>> The code is basically, Tuple.selectFields(groupingFiels).hashValue()
>>
>> -Matthias
>>
>> On 09/30/2015 04:05 PM, Kashyap Mhaisekar wrote:
>> > Thanks Matthias. My question was this -
>> > If am emitting out str1,str2,str3 but field grouped on str2,str3 only
>> > then will the hash be on Values(str1,str2,str3) or on Values(str2,str3)
>> > alone?. In my case str1,str2 are changing but I see the values go to
>> > same bolt instance. Can we debug what is the hash generated?
>> >
>> > Thanks you!
>> >
>> > Kashyap
>> >
>> > On Sep 30, 2015 5:14 AM, "Matthias J. Sax" <mjsax@apache.org
>> > <ma...@apache.org>> wrote:
>> >
>> >     Yes. That's right.
>> >
>> >     "Values" extends ArrayList and does not overwrite .hashCode().
>> >
>> >     -Matthias
>> >
>> >     On 09/30/2015 11:21 AM, Kashyap Mhaisekar wrote:
>> >     > Is the computation right for hash?
>> ArrayList(str1,str2...).hashcode()
>> >     > where str1,str2 etc are fields being grouped?
>> >     >
>> >     > Thanks
>> >     > Kashyap
>> >     >
>> >     > On Sep 29, 2015 18:04, "Kashyap Mhaisekar" <kashyap.m@gmail.com
>> >     <ma...@gmail.com>
>> >     > <mailto:kashyap.m@gmail.com <ma...@gmail.com>>> wrote:
>> >     >
>> >     >     Thanks guys. From what I understand, partial key grouping is
>> used
>> >     >     when you know your grouping will create imbalance. In my case,
>> >     most
>> >     >     of my field groups to one bolt thereby causing it to be a
>> >     >     bottleneck. Since I emit string, I guess the hash is on
>> >     >     ArrayList(str1,str2...).hashcode(). This hashcode is coming
>> >     out same
>> >     >     for different string combinations...
>> >     >
>> >     >     Thanks
>> >     >     Kashyap
>> >     >
>> >     >     On Sep 29, 2015 17:51, "Matthias J. Sax" <mjsax@apache.org
>> >     <ma...@apache.org>
>> >     >     <mailto:mjsax@apache.org <ma...@apache.org>>> wrote:
>> >     >
>> >     >         If you can use "partial key grouping" depends on your use
>> >     case.
>> >     >         Think
>> >     >         careful before you apply it...
>> >     >
>> >     >         Maybe you want to read the research paper about it. It
>> clearly
>> >     >         describes
>> >     >         when you can use it and when not:
>> >     >
>> >
>> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
>> >     >
>> >     >
>> >     >         -Matthias
>> >     >
>> >     >         On 09/30/2015 12:18 AM, Ken Danniswara wrote:
>> >     >         > Hi,
>> >     >         >
>> >     >         > From what I read, the default FieldGrouping did not
>> balance
>> >     >         the load as
>> >     >         > like ShuffleGrouping do. In this case, there is a
>> >     discussion about
>> >     >         > custom Grouping implementation called partial key
>> grouping
>> >     >         where it have
>> >     >         > better balancing problem. Maybe it
>> >     >         > helps. https://github.com/gdfm/partial-key-grouping
>> >     >         >
>> >     >         > On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar
>> >     >         <kashyap.m@gmail.com <ma...@gmail.com>
>> >     <mailto:kashyap.m@gmail.com <ma...@gmail.com>>
>> >     >         > <mailto:kashyap.m@gmail.com <mailto:kashyap.m@gmail.com
>> >
>> >     <mailto:kashyap.m@gmail.com <ma...@gmail.com>>>> wrote:
>> >     >         >
>> >     >         >     Thanks Derek. I use strings and I still end up with
>> >     some bolts
>> >     >         >     having the maximum requests :(
>> >     >         >
>> >     >         >     On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit
>> >     >         <derekd@yahoo-inc.com <ma...@yahoo-inc.com>
>> >     <mailto:derekd@yahoo-inc.com <ma...@yahoo-inc.com>>
>> >     >         >     <mailto:derekd@yahoo-inc.com
>> >     <ma...@yahoo-inc.com>
>> >     >         <mailto:derekd@yahoo-inc.com
>> >     <ma...@yahoo-inc.com>>>> wrote:
>> >     >         >
>> >     >         >         The code that hashes the field values is here:
>> >     >         >
>> >     >         >
>> >     >
>> >
>> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
>> >     >         >
>> >     >         >
>> >     >         >         You can write a little java program, something
>> like:
>> >     >         >
>> >     >         >         public static void main(String[] args) {
>> >     >         >           ArrayList<String> myList = new
>> >     ArrayList<String>();
>> >     >         >              myList.add("first field value");
>> >     >         >           myList.add("second field value");
>> >     >         >
>> >     >         >           int hash =
>> >     Arrays.deephashCode(myList.toArray()); //
>> >     >         as in
>> >     >         >         tuple.clj
>> >     >         >
>> >     >         >
>> >     >         >           System.out.println("hash is "+hash);
>> >     >         >           int numTasks = 32;
>> >     >         >
>> >     >         >           System.out.println("task index is " + hash %
>> >     numTasks);
>> >     >         >
>> >     >         >         }
>> >     >         >
>> >     >         >
>> >     >         >         There are certain types of values that may not
>> hash
>> >     >         >         consistently.  If you are using String values,
>> >     then it
>> >     >         should be
>> >     >         >         fine. Other types may or may not, depending on
>> >     how the
>> >     >         class
>> >     >         >         implements hashCode().
>> >     >         >
>> >     >         >
>> >     >         >         --
>> >     >         >         Derek
>> >     >         >
>> >     >         >
>> >     >         >         ________________________________
>> >     >         >         From: Kashyap Mhaisekar <kashyap.m@gmail.com
>> >     <ma...@gmail.com>
>> >     >         <mailto:kashyap.m@gmail.com <ma...@gmail.com>>
>> >     >         >         <mailto:kashyap.m@gmail.com
>> >     <ma...@gmail.com> <mailto:kashyap.m@gmail.com
>> >     <ma...@gmail.com>>>>
>> >     >         >         To: user@storm.apache.org
>> >     <ma...@storm.apache.org>
>> >     >         <mailto:user@storm.apache.org
>> >     <ma...@storm.apache.org>> <mailto:user@storm.apache.org
>> >     <ma...@storm.apache.org>
>> >     >         <mailto:user@storm.apache.org <mailto:
>> user@storm.apache.org>>>
>> >     >         >         Sent: Tuesday, September 29, 2015 4:28 PM
>> >     >         >         Subject: Field Group Hash Computation
>> >     >         >
>> >     >         >
>> >     >         >
>> >     >         >         Hi,
>> >     >         >         I have a field grouping based on 2 fields. I
>> have 32
>> >     >         consumers
>> >     >         >         for the tuple and I see most of the times, out
>> of 64
>> >     >         bolts, the
>> >     >         >         field group is always on 8 of them. Of the 8, 2
>> have
>> >     >         more than
>> >     >         >         60% of the data. The data for the field
>> grouping can
>> >     >         have 20
>> >     >         >         different combinations.
>> >     >         >
>> >     >         >         Do you know what is the way to compute the Hash
>> >     of the
>> >     >         fields
>> >     >         >         used for computing? One of the groups mails
>> indicate
>> >     >         that the
>> >     >         >         approach is -
>> >     >         >
>> >     >         >         It calls "hashCode" on the list of selected
>> >     values and
>> >     >         mods it
>> >     >         >         by the
>> >     >         >         number of consumer tasks. You can play around
>> with
>> >     >         that function
>> >     >         >         to see if
>> >     >         >         something about your data is causing something
>> >     >         degenerative to
>> >     >         >         happen and
>> >     >         >         cause skew
>> >     >         >
>> >     >         >         I saw the clojure code but not sure how to
>> >     understand
>> >     >         this.
>> >     >         >
>> >     >         >         Thanks
>> >     >         >         Kashyap
>> >     >         >
>> >     >         >
>> >     >         >
>> >     >
>> >
>>
>>
>
>
> --
> Florian HUSSONNOIS
>

Re: Field Group Hash Computation

Posted by Florian Hussonnois <fh...@gmail.com>.

Hi Kashyap,

You could improve your tuples distribution by implementing a
CustomStreamGrouping.
I have tried yours example with Murmur3 algorithm and the result looks
better.

Arrays.deepHashCode : [-35, -35, -35, -3, -3, -3, -3, 29, 29, 29, 29, 41,
51, 61, 61, 61, 61, 61]
Murmur3 : [-61, -58, -57, -48, -37, -31, -15, -7, -4, 3, 6, 12, 20, 27, 45,
49, 56, 57]

You can find my implementation here :
https://github.com/fhussonnois/storm-cassandra/blob/master/src/main/java/com/github/fhuss/storm/cassandra/Murmur3StreamGrouping.java

Hope this help.

2015-10-01 11:08 GMT+02:00 Matthias J. Sax <mj...@apache.org>:

> The hash code will only be computed on the fields specified as grouping
> attributes
>
> Thus, Values(str2,str3) will be used.
>
> The code is basically, Tuple.selectFields(groupingFiels).hashValue()
>
> -Matthias
>
> On 09/30/2015 04:05 PM, Kashyap Mhaisekar wrote:
> > Thanks Matthias. My question was this -
> > If am emitting out str1,str2,str3 but field grouped on str2,str3 only
> > then will the hash be on Values(str1,str2,str3) or on Values(str2,str3)
> > alone?. In my case str1,str2 are changing but I see the values go to
> > same bolt instance. Can we debug what is the hash generated?
> >
> > Thanks you!
> >
> > Kashyap
> >
> > On Sep 30, 2015 5:14 AM, "Matthias J. Sax" <mjsax@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Yes. That's right.
> >
> >     "Values" extends ArrayList and does not overwrite .hashCode().
> >
> >     -Matthias
> >
> >     On 09/30/2015 11:21 AM, Kashyap Mhaisekar wrote:
> >     > Is the computation right for hash?
> ArrayList(str1,str2...).hashcode()
> >     > where str1,str2 etc are fields being grouped?
> >     >
> >     > Thanks
> >     > Kashyap
> >     >
> >     > On Sep 29, 2015 18:04, "Kashyap Mhaisekar" <kashyap.m@gmail.com
> >     <ma...@gmail.com>
> >     > <mailto:kashyap.m@gmail.com <ma...@gmail.com>>> wrote:
> >     >
> >     >     Thanks guys. From what I understand, partial key grouping is
> used
> >     >     when you know your grouping will create imbalance. In my case,
> >     most
> >     >     of my field groups to one bolt thereby causing it to be a
> >     >     bottleneck. Since I emit string, I guess the hash is on
> >     >     ArrayList(str1,str2...).hashcode(). This hashcode is coming
> >     out same
> >     >     for different string combinations...
> >     >
> >     >     Thanks
> >     >     Kashyap
> >     >
> >     >     On Sep 29, 2015 17:51, "Matthias J. Sax" <mjsax@apache.org
> >     <ma...@apache.org>
> >     >     <mailto:mjsax@apache.org <ma...@apache.org>>> wrote:
> >     >
> >     >         If you can use "partial key grouping" depends on your use
> >     case.
> >     >         Think
> >     >         careful before you apply it...
> >     >
> >     >         Maybe you want to read the research paper about it. It
> clearly
> >     >         describes
> >     >         when you can use it and when not:
> >     >
> >
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> >     >
> >     >
> >     >         -Matthias
> >     >
> >     >         On 09/30/2015 12:18 AM, Ken Danniswara wrote:
> >     >         > Hi,
> >     >         >
> >     >         > From what I read, the default FieldGrouping did not
> balance
> >     >         the load as
> >     >         > like ShuffleGrouping do. In this case, there is a
> >     discussion about
> >     >         > custom Grouping implementation called partial key
> grouping
> >     >         where it have
> >     >         > better balancing problem. Maybe it
> >     >         > helps. https://github.com/gdfm/partial-key-grouping
> >     >         >
> >     >         > On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar
> >     >         <kashyap.m@gmail.com <ma...@gmail.com>
> >     <mailto:kashyap.m@gmail.com <ma...@gmail.com>>
> >     >         > <mailto:kashyap.m@gmail.com <ma...@gmail.com>
> >     <mailto:kashyap.m@gmail.com <ma...@gmail.com>>>> wrote:
> >     >         >
> >     >         >     Thanks Derek. I use strings and I still end up with
> >     some bolts
> >     >         >     having the maximum requests :(
> >     >         >
> >     >         >     On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit
> >     >         <derekd@yahoo-inc.com <ma...@yahoo-inc.com>
> >     <mailto:derekd@yahoo-inc.com <ma...@yahoo-inc.com>>
> >     >         >     <mailto:derekd@yahoo-inc.com
> >     <ma...@yahoo-inc.com>
> >     >         <mailto:derekd@yahoo-inc.com
> >     <ma...@yahoo-inc.com>>>> wrote:
> >     >         >
> >     >         >         The code that hashes the field values is here:
> >     >         >
> >     >         >
> >     >
> >
> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
> >     >         >
> >     >         >
> >     >         >         You can write a little java program, something
> like:
> >     >         >
> >     >         >         public static void main(String[] args) {
> >     >         >           ArrayList<String> myList = new
> >     ArrayList<String>();
> >     >         >              myList.add("first field value");
> >     >         >           myList.add("second field value");
> >     >         >
> >     >         >           int hash =
> >     Arrays.deephashCode(myList.toArray()); //
> >     >         as in
> >     >         >         tuple.clj
> >     >         >
> >     >         >
> >     >         >           System.out.println("hash is "+hash);
> >     >         >           int numTasks = 32;
> >     >         >
> >     >         >           System.out.println("task index is " + hash %
> >     numTasks);
> >     >         >
> >     >         >         }
> >     >         >
> >     >         >
> >     >         >         There are certain types of values that may not
> hash
> >     >         >         consistently.  If you are using String values,
> >     then it
> >     >         should be
> >     >         >         fine. Other types may or may not, depending on
> >     how the
> >     >         class
> >     >         >         implements hashCode().
> >     >         >
> >     >         >
> >     >         >         --
> >     >         >         Derek
> >     >         >
> >     >         >
> >     >         >         ________________________________
> >     >         >         From: Kashyap Mhaisekar <kashyap.m@gmail.com
> >     <ma...@gmail.com>
> >     >         <mailto:kashyap.m@gmail.com <ma...@gmail.com>>
> >     >         >         <mailto:kashyap.m@gmail.com
> >     <ma...@gmail.com> <mailto:kashyap.m@gmail.com
> >     <ma...@gmail.com>>>>
> >     >         >         To: user@storm.apache.org
> >     <ma...@storm.apache.org>
> >     >         <mailto:user@storm.apache.org
> >     <ma...@storm.apache.org>> <mailto:user@storm.apache.org
> >     <ma...@storm.apache.org>
> >     >         <mailto:user@storm.apache.org <mailto:
> user@storm.apache.org>>>
> >     >         >         Sent: Tuesday, September 29, 2015 4:28 PM
> >     >         >         Subject: Field Group Hash Computation
> >     >         >
> >     >         >
> >     >         >
> >     >         >         Hi,
> >     >         >         I have a field grouping based on 2 fields. I
> have 32
> >     >         consumers
> >     >         >         for the tuple and I see most of the times, out
> of 64
> >     >         bolts, the
> >     >         >         field group is always on 8 of them. Of the 8, 2
> have
> >     >         more than
> >     >         >         60% of the data. The data for the field grouping
> can
> >     >         have 20
> >     >         >         different combinations.
> >     >         >
> >     >         >         Do you know what is the way to compute the Hash
> >     of the
> >     >         fields
> >     >         >         used for computing? One of the groups mails
> indicate
> >     >         that the
> >     >         >         approach is -
> >     >         >
> >     >         >         It calls "hashCode" on the list of selected
> >     values and
> >     >         mods it
> >     >         >         by the
> >     >         >         number of consumer tasks. You can play around
> with
> >     >         that function
> >     >         >         to see if
> >     >         >         something about your data is causing something
> >     >         degenerative to
> >     >         >         happen and
> >     >         >         cause skew
> >     >         >
> >     >         >         I saw the clojure code but not sure how to
> >     understand
> >     >         this.
> >     >         >
> >     >         >         Thanks
> >     >         >         Kashyap
> >     >         >
> >     >         >
> >     >         >
> >     >
> >
>
>


-- 
Florian HUSSONNOIS