You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Gleb Kanterov (JIRA)" <ji...@apache.org> on 2018/10/04 09:18:00 UTC
[jira] [Created] (BEAM-5646) Equality is broken for Rows with BYTES
field
Gleb Kanterov created BEAM-5646:
-----------------------------------
Summary: Equality is broken for Rows with BYTES field
Key: BEAM-5646
URL: https://issues.apache.org/jira/browse/BEAM-5646
Project: Beam
Issue Type: Bug
Components: dsl-sql
Affects Versions: 2.7.0
Reporter: Gleb Kanterov
Assignee: Xu Mingmin
The problem is with `org.apache.beam.sdk.values.Row#equals` and `hashCode`. Java arrays do reference equality instead of comparing contents. Row stores fields of type BYTES as byte[].
These failing tests illustrate the problem:
{code:java}
@Test
public void testByteArrayEquality() {
byte[] a0 = new byte[16];
byte[] b0 = new byte[16];
Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
Row a = Row.withSchema(schema).addValue(a0).build();
Row b = Row.withSchema(schema).addValue(b0).build();
Assert.assertEquals(a, b);
}
@Test
public void testByteBufferEquality() {
byte[] a0 = new byte[16];
byte[] b0 = new byte[16];
Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
Row a = Row.withSchema(schema).addValue(ByteBuffer.wrap(a0)).build();
Row b = Row.withSchema(schema).addValue(ByteBuffer.wrap(b0)).build();
Assert.assertEquals(a, b);
}
{code}
Option 1. Fix by storing `byte[]` as `ByteBuffer`, or something more simple that doesn't have offsets. `Row#getValue` will return this type, and for consistency, it would be preferable to change `Row#getBytes` in an incompatible way to be consistent with `Row#getValue` because that's how it behaves for the rest of the methods.
Option 2. Do the same as Spark does, add `if (x instanceof byte[])` to `equals`. The problem in Spark is that `hashCode` implementation isn't consistent with `equals`, see SPARK-25122.
Option 3. Consider it as intended behavior, and fix `RowCoder#consistentWithEquals` implementation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)