You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2020/07/17 09:36:06 UTC

DISCUSS: add linq4j.Hasher API instead of java.util.Objects#hash

Hi,

java.lang.Objects.hash is convenient, however, it does result in memory
allocation for varargs.

In practice, it can easily be visible in the benchmarks (at least for Java
1.8), so I suggest we create our own API
and use it instead of java.util.Objects#hash

We can have "up to 10 overloads" (or whatever we need), so the hashcode
won't need to allocate arrays.

If no objections within a week or so, I will commit that fix, and I would
add `java.util.Objects#hash` to the list of forbidden APIs.

I'm ok if somebody else wants to pick this.

---
Sample with a4fa05458840cfd:

@Benchmark
public RexNode baseline_eq() {
  return rex.makeCall(
      SqlStdOperatorTable.EQUALS,
      rex.makeInputRef(intType, 0),
      rex.makeInputRef(intType, 1));
}

@Benchmark
public int equals(Blackhole blackhole) {
  RexNode node = rex.makeCall(
      SqlStdOperatorTable.EQUALS,
      rex.makeInputRef(intType, 0),
      rex.makeInputRef(intType, 1));
  blackhole.consume(node);
  return node.hashCode();
}

...

Benchmark                            Mode  Cnt  Score Error  Units
baseline_equals                      avgt    5   222 ± 121   ns/op
equals                               avgt    5   248 ±  68   ns/op
greater_than                         avgt    5   275 ±  54   ns/op
less_than                            avgt    5   224 ±  27   ns/op

baseline_equals:·gc.alloc.rate.norm  avgt    5   496 ±   0    B/op
equals:·gc.alloc.rate.norm           avgt    5   568 ±   0    B/op
greater_than:·gc.alloc.rate.norm     avgt    5   624 ±   0    B/op
less_than:·gc.alloc.rate.norm        avgt    5   520 ±   0    B/op


Vladimir

Re: DISCUSS: add linq4j.Hasher API instead of java.util.Objects#hash

Posted by Julian Hyde <jh...@apache.org>.
> Set#hashCode is specified as "the sum of the hash codes of the element".

Yes, I know. There's only one call to Set.hashCode() in the code base
(and that is in a test). So we won't need such a method.

If people did ever need to hash sets we'd encourage them to hash a
deterministically ordered (e.g. sorted) list view of the set.

On Fri, Jul 17, 2020 at 12:31 PM Vladimir Sitnikov
<si...@gmail.com> wrote:
>
> >Hashes
>
> Ok
>
> >I would imagine that there would be various functions
> >static int hash(Object v0, int v1)
>
> I was not planning to add `int` overloads. At least, I don't want to add
> 2^N overloads like (Object, int, int, Object).
> I'll check if (Object, Object, Object) is enough.
> I hope that escape analysis eliminates boxing.
>
> >And maybe there is also a set of functions that return results consistent
> with new HashSet
>
> Set#hashCode is specified as "the sum of the hash codes of the element".
>
> Vladimir

Re: DISCUSS: add linq4j.Hasher API instead of java.util.Objects#hash

Posted by Vladimir Sitnikov <si...@gmail.com>.
>Hashes

Ok

>I would imagine that there would be various functions
>static int hash(Object v0, int v1)

I was not planning to add `int` overloads. At least, I don't want to add
2^N overloads like (Object, int, int, Object).
I'll check if (Object, Object, Object) is enough.
I hope that escape analysis eliminates boxing.

>And maybe there is also a set of functions that return results consistent
with new HashSet

Set#hashCode is specified as "the sum of the hash codes of the element".

Vladimir

Re: DISCUSS: add linq4j.Hasher API instead of java.util.Objects#hash

Posted by Julian Hyde <jh...@apache.org>.
I suggest Hashes.

(Hasher in English is a noun - 'one who hashes’ - and therefore suggests that it would be instantiated.)

I would imagine that there would be various functions

  static int hash(Object v0, Object v1)
  static int hash(Object v0, int v1)
  static int hash(int v0, Object v1)
  static int hash(Object v0, long v1)

but they would all produce a result consistent with all of

  Arrays.hashCode(v0, v1)
  Arrays.asList(v0, v1).hashCode()
  Objects.hash(v0, v1)

People would be allowed to add more overloaded functions as long as they adhere to that contract.

And maybe there is also a set of functions that return results consistent with new HashSet(Arrays.asList(v0, v1)).hashCode() (even though, yeah, it isn’t a great hash function).

Julian


> On Jul 17, 2020, at 11:34 AM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
>> The name Hasher implies that it would be an object. Could it not be just a
> set of static methods?
> 
> Do you have naming suggestions?
> I think Objects would be worse as it would conflict with j.u.Objects.
> 
>> Is there a library that does this for us?
> 
> I am afraid it would be easier/faster to just write it instead of search
> for an existing implementation.
> On top of that, it would be a natural fit for unorderedHash which is way
> less likely to be available in existing libraries.
> 
> Vladimir


Re: DISCUSS: add linq4j.Hasher API instead of java.util.Objects#hash

Posted by Vladimir Sitnikov <si...@gmail.com>.
>The name Hasher implies that it would be an object. Could it not be just a
set of static methods?

Do you have naming suggestions?
I think Objects would be worse as it would conflict with j.u.Objects.

>Is there a library that does this for us?

I am afraid it would be easier/faster to just write it instead of search
for an existing implementation.
On top of that, it would be a natural fit for unorderedHash which is way
less likely to be available in existing libraries.

Vladimir

Re: DISCUSS: add linq4j.Hasher API instead of java.util.Objects#hash

Posted by Julian Hyde <jh...@apache.org>.
The name Hasher implies that it would be an object. Could it not be just a set of static methods?

Is there a library that does this for us?

> On Jul 17, 2020, at 7:26 AM, Haisheng Yuan <hy...@apache.org> wrote:
> 
> +1
> I am not sure the difference will be observable in real Calcite application, because object allocation in Java is pretty efficient. But it is always good to avoid unnecessary object allocation and reduce GC pressure if we can achieve it easily.
> 
> On 2020/07/17 09:36:06, Vladimir Sitnikov <si...@gmail.com> wrote: 
>> Hi,
>> 
>> java.lang.Objects.hash is convenient, however, it does result in memory
>> allocation for varargs.
>> 
>> In practice, it can easily be visible in the benchmarks (at least for Java
>> 1.8), so I suggest we create our own API
>> and use it instead of java.util.Objects#hash
>> 
>> We can have "up to 10 overloads" (or whatever we need), so the hashcode
>> won't need to allocate arrays.
>> 
>> If no objections within a week or so, I will commit that fix, and I would
>> add `java.util.Objects#hash` to the list of forbidden APIs.
>> 
>> I'm ok if somebody else wants to pick this.
>> 
>> ---
>> Sample with a4fa05458840cfd:
>> 
>> @Benchmark
>> public RexNode baseline_eq() {
>>  return rex.makeCall(
>>      SqlStdOperatorTable.EQUALS,
>>      rex.makeInputRef(intType, 0),
>>      rex.makeInputRef(intType, 1));
>> }
>> 
>> @Benchmark
>> public int equals(Blackhole blackhole) {
>>  RexNode node = rex.makeCall(
>>      SqlStdOperatorTable.EQUALS,
>>      rex.makeInputRef(intType, 0),
>>      rex.makeInputRef(intType, 1));
>>  blackhole.consume(node);
>>  return node.hashCode();
>> }
>> 
>> ...
>> 
>> Benchmark                            Mode  Cnt  Score Error  Units
>> baseline_equals                      avgt    5   222 ± 121   ns/op
>> equals                               avgt    5   248 ±  68   ns/op
>> greater_than                         avgt    5   275 ±  54   ns/op
>> less_than                            avgt    5   224 ±  27   ns/op
>> 
>> baseline_equals:·gc.alloc.rate.norm  avgt    5   496 ±   0    B/op
>> equals:·gc.alloc.rate.norm           avgt    5   568 ±   0    B/op
>> greater_than:·gc.alloc.rate.norm     avgt    5   624 ±   0    B/op
>> less_than:·gc.alloc.rate.norm        avgt    5   520 ±   0    B/op
>> 
>> 
>> Vladimir
>> 


Re: DISCUSS: add linq4j.Hasher API instead of java.util.Objects#hash

Posted by Haisheng Yuan <hy...@apache.org>.
+1
I am not sure the difference will be observable in real Calcite application, because object allocation in Java is pretty efficient. But it is always good to avoid unnecessary object allocation and reduce GC pressure if we can achieve it easily.

On 2020/07/17 09:36:06, Vladimir Sitnikov <si...@gmail.com> wrote: 
> Hi,
> 
> java.lang.Objects.hash is convenient, however, it does result in memory
> allocation for varargs.
> 
> In practice, it can easily be visible in the benchmarks (at least for Java
> 1.8), so I suggest we create our own API
> and use it instead of java.util.Objects#hash
> 
> We can have "up to 10 overloads" (or whatever we need), so the hashcode
> won't need to allocate arrays.
> 
> If no objections within a week or so, I will commit that fix, and I would
> add `java.util.Objects#hash` to the list of forbidden APIs.
> 
> I'm ok if somebody else wants to pick this.
> 
> ---
> Sample with a4fa05458840cfd:
> 
> @Benchmark
> public RexNode baseline_eq() {
>   return rex.makeCall(
>       SqlStdOperatorTable.EQUALS,
>       rex.makeInputRef(intType, 0),
>       rex.makeInputRef(intType, 1));
> }
> 
> @Benchmark
> public int equals(Blackhole blackhole) {
>   RexNode node = rex.makeCall(
>       SqlStdOperatorTable.EQUALS,
>       rex.makeInputRef(intType, 0),
>       rex.makeInputRef(intType, 1));
>   blackhole.consume(node);
>   return node.hashCode();
> }
> 
> ...
> 
> Benchmark                            Mode  Cnt  Score Error  Units
> baseline_equals                      avgt    5   222 ± 121   ns/op
> equals                               avgt    5   248 ±  68   ns/op
> greater_than                         avgt    5   275 ±  54   ns/op
> less_than                            avgt    5   224 ±  27   ns/op
> 
> baseline_equals:·gc.alloc.rate.norm  avgt    5   496 ±   0    B/op
> equals:·gc.alloc.rate.norm           avgt    5   568 ±   0    B/op
> greater_than:·gc.alloc.rate.norm     avgt    5   624 ±   0    B/op
> less_than:·gc.alloc.rate.norm        avgt    5   520 ±   0    B/op
> 
> 
> Vladimir
>