You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Peter Rozsa (Code Review)" <ge...@cloudera.org> on 2023/02/21 08:13:13 UTC

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Peter Rozsa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19507


Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed from ImpalaStringWritable
as these operations are now handled in the byte array.

Tests:
 - Test UDFs added as CachedWritablesUdf and Generic CachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Resettable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
13 files changed, 287 insertions(+), 93 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 18: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 18
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 19:54:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed as these operations are now
handled in the byte array. ImpalaStringWritable class is also removed,
writables that used it before now store the data directly.

Tests:
 - Test UDFs added as BufferAlteringUdf and GenericBufferAlteringUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
D fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java
A fe/src/main/java/org/apache/impala/hive/executor/Reloadable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/BufferAlteringUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericBufferAlteringUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
16 files changed, 546 insertions(+), 425 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/17
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 17
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Hello Daniel Becker, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19507

to look at the new patch set (#8).

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed from ImpalaStringWritable
as these operations are now handled in the byte array.

Tests:
 - Test UDFs added as CachedWritablesUdf and Generic CachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Resettable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
14 files changed, 371 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed from ImpalaStringWritable
as these operations are now handled in the byte array.

Tests:
 - Test UDFs added as CachedWritablesUdf and Generic CachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Resettable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
13 files changed, 320 insertions(+), 94 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12414/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 10:19:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19507/12/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/12/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@89
PS12, Line 89:     if (newCap <= array_.length) {
The original Hadoop version has different semantics for setCapacity:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/BytesWritable.java#L157

I think that we can keep the current logic, as it is similar to the old Impala logic, but a TODO could be added.


http://gerrit.cloudera.org:8080/#/c/19507/12/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@104
PS12, Line 104:   // Updates the underlying array's size
              :   public void setSize(int s) {
              :     setCapacity(s);
              :   }
Shouldn't this truncate the array if the new size is smaller?
The old logic did this by setting the underlying string's length.

I think that we need a separate size member similarly to the Hadoop implementation to be able to handle capacity and size separately - so the capacity would be the size of the array, but it is possible that the size is less than that.

It would be also nice to have a test for this.



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Feb 2023 12:56:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java@29
PS7, Line 29: or, now every data manipulation
            :  * operation on array-backed writables loads the native heap data j
> It is a bit unclear to me, could you make it clearer that we load it once a
Rephrased


http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@25
PS7, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
> line too long (96 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@75
PS7, Line 75: incre
> 'bytes' could be declared and assigned here, no need to forward declare it.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 14:36:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/8/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/8/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@25
PS8, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 14:36:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/5/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/5/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@8
PS5, Line 8: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 10:00:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@25
PS4, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 4
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Feb 2023 09:16:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 7:

> Uploaded patch set 7.

ST_SetSRID workaround removed


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 12:15:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 8: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Feb 2023 10:12:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12418/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 14:53:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 17: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 17
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 14:45:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Hello Daniel Becker, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19507

to look at the new patch set (#7).

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed from ImpalaStringWritable
as these operations are now handled in the byte array.

Tests:
 - Test UDFs added as CachedWritablesUdf and Generic CachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Resettable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
14 files changed, 372 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19507/14/fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/14/fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java@35
PS14, Line 35:     super.set(bytes, 0, bytes.length);
> Changed ordering, setCapacity is called first.
Do we actually need the setCapacity() call? Calling set() will handle increasing capacity if needed, so calling setCapacity() is only useful if the new size is smaller than the old one - we can reclaim memory then but at the expense of an allocation. Is it worth it? Maybe we should only reset the capacity if the size is significantly smaller?


http://gerrit.cloudera.org:8080/#/c/19507/15/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/15/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java@77
PS15, Line 77:     return new BytesWritable(target.getBytes(), target.getLength());
Can't we return 'target' here?



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 15
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Mar 2023 10:18:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19507/12/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/12/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@85
PS12, Line 85:   public void setCapacity(int newCap) {
We have pairs that do the same:
  setCapacity - setSize
  getCapacity - getLength
Can't we delete one of them?


http://gerrit.cloudera.org:8080/#/c/19507/12/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@89
PS12, Line 89:     if (newCap <= array_.length) {
Instead of returning here, we could invert the condition and have L92 in the body of the IF.



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Feb 2023 12:59:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12518/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 15
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 16:40:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/17/java/test-hive-udfs/src/main/java/org/apache/impala/GenericBufferAlteringUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericBufferAlteringUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/17/java/test-hive-udfs/src/main/java/org/apache/impala/GenericBufferAlteringUdf.java@25
PS17, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 17
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 12:10:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@79
PS4, Line 79: length
'length' is superfluous, could use bufferCapacity_ directly.


http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@104
PS4, Line 104: equals
If initialize() has already been called but the array is actually empty, this condition will be true, so we will call initialize() again.
We could use EMPTY_ARRAY == array_ here to check for identity, not equality.


http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@117
PS4, Line 117: equals
See L104.


http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
File fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java:

http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@173
PS4, Line 173:     // Allocate StringVal sizeof(StringVal) = 8 + 4
Optional: we could mention that the 8 bytes are the pointer and 4 bytes are for the length.


http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java@25
PS4, Line 25: public class CachedWritablesUdf extends UDF {
Could include a short description of what we test here and why, including the Jira number.
Should also mention what we do in the 2-argument overload of evaluate().


http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@29
PS4, Line 29: public class GenericCachedWritablesUdf extends GenericUDF {
Could include a short description of what we test here and why, including the Jira number.


http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@30
PS4, Line 30: objectInspector
Nit: should end with _.


http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@37
PS4, Line 37: This function takes 1 argument.
Could include in the error message how many arguments were provided. Also see L52.


http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@56
PS4, Line 56:     byte[] bytes;
'bytes' could be declared within the switch cases as it is not used outside.


http://gerrit.cloudera.org:8080/#/c/19507/4/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@73
PS4, Line 73: value
Shouldn't it be "unexpected type"?


http://gerrit.cloudera.org:8080/#/c/19507/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
File testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test:

http://gerrit.cloudera.org:8080/#/c/19507/4/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@350
PS4, Line 350: ---- QUERY
Do you think it would make sense to add a test with a CHAR() type just for consistency, even though CHAR is not affected by this change?



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 4
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Feb 2023 16:16:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12471/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 20:52:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 16:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12534/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 16
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Sat, 04 Mar 2023 01:58:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed as these operations are now
handled in the byte array. ImpalaStringWritable class is also removed,
writables that used it before now store the data directly.

Tests:
 - Test UDFs added as CachedWritablesUdf and GenericCachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
D fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Reloadable.java
A fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
16 files changed, 418 insertions(+), 253 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/16
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 16
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed from ImpalaStringWritable
as these operations are now handled in the byte array.

Tests:
 - Test UDFs added as CachedWritablesUdf and Generic CachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Resettable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
14 files changed, 407 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/12
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/12/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/12/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@25
PS12, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 20:33:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@26
PS8, Line 26:  * values that map to StringValue in the BE. The byte array pointed by ptr is
> Can you mention the caching behavior?
Done


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@59
PS8, Line 59:  
> optional:
bufferCapacity_ is removed, now the capacity and length properties are reflected by the length of array_


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@84
PS8, Line 84:   // No-op if the new capacity is smaller.
            :   public void setCapacity(int newCap) {
            :    
> What happens if this is called before initialize()?
Proxied to getLength


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@88
PS8, Line 88:     }
> What is the purpose of having a separate 'capacity' and 'length'? As far as
Done


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@94
PS8, Line 94: the length of the underlying a
> What if we call setCapacity() first then getBytes()? Then we won't have cal
Initialize part added


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@104
PS8, Line 104:   // Updates the u
> But what if we have called setSize() before?
I added the initialization part to setCapacity, in this case, the optimization could work well.


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@116
PS8, Line 116: 
             : 
             : 
> This could be skipped in this case, as the original array won't be used.
Done


http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
File testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test:

http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@358
PS8, Line 358: select increment(NULL);
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
File testdata/workloads/functional-query/queries/QueryTest/java-udf.test:

http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test@371
PS8, Line 371: select increment("a");
> Can you add a test with empty string? The UDF seems to handle this case and
Done


http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test@378
PS8, Line 378: select increment(NULL);
> nit: extra space
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Feb 2023 12:15:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@26
PS8, Line 26:  * values that map to StringValue in the BE.
Can you mention the caching behavior?


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@59
PS8, Line 59: ;
optional:
bufferCapacity_ could be set here with getLengthFromNativeHeap()


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@84
PS8, Line 84:   public int getCapacity() {
            :     return bufferCapacity_;
            :   }
What happens if this is called before initialize()?


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@94
PS8, Line 94: Arrays.copyOf(array_, newCap);
Can't this be called while array_ is still EMPTY_ARRAY?


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@104
PS8, Line 104:       initalize();
optional: returning getLengthFromNativeHeap() would allow optimizing the case when the length is accessed, but the data is not


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@116
PS8, Line 116:     if (EMPTY_ARRAY == array_) {
             :       initalize();
             :     }
This could be skipped in this case, as the original array won't be used.


http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
File testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test:

http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test@358
PS8, Line 358: select increment( cast("a" as binary));
nit: extra space


http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test
File testdata/workloads/functional-query/queries/QueryTest/java-udf.test:

http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test@371
PS8, Line 371: select increment("a");
Can you add a test with empty string? The UDF seems to handle this case and it would be nice to test if the executor works in this case. (same for the generic case)


http://gerrit.cloudera.org:8080/#/c/19507/8/testdata/workloads/functional-query/queries/QueryTest/java-udf.test@378
PS8, Line 378: select increment( cast("a" as binary));
nit: extra space



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Feb 2023 16:57:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12415/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 12:34:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed as these operations are now
handled in the byte array. ImpalaStringWritable class is also removed,
writables that used it before now store the data directly.

Tests:
 - Test UDFs added as CachedWritablesUdf and GenericCachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
D fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Reloadable.java
A fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
16 files changed, 418 insertions(+), 253 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/15
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 15
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 16:

(1 comment)

> Patch Set 14:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/14/fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/14/fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java@35
PS14, Line 35:     super.set(bytes, 0, bytes.length);
> But if the UDF respects the contract and only uses the buffer in the valid 
Yes, if the contract is respected, it would be OK without setCapacity. It's more like an assumption that users would expect a ByteWritable, where the underlying array matches the input, without any extra bytes, like how it goes with Hive.



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 16
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Mar 2023 09:17:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/3/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/3/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@8
PS3, Line 8: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Feb 2023 08:14:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed from ImpalaStringWritable
as these operations are now handled in the byte array.

Tests:
 - Test UDFs added as CachedWritablesUdf and Generic CachedWritableUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
---
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
A fe/src/main/java/org/apache/impala/hive/executor/Resettable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
13 files changed, 338 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/19507/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 4
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12404/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Feb 2023 08:33:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@88
PS8, Line 88:   // Updates the new capacity. No-op if the new capacity is smaller.
What is the purpose of having a separate 'capacity' and 'length'? As far as I can see they always have the same value and are set together.


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@94
PS8, Line 94: Arrays.copyOf(array_, newCap);
> Can't this be called while array_ is still EMPTY_ARRAY?
What if we call setCapacity() first then getBytes()? Then we won't have called initialize() and will get an empty array.


http://gerrit.cloudera.org:8080/#/c/19507/8/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@104
PS8, Line 104:       initalize();
> optional: returning getLengthFromNativeHeap() would allow optimizing the ca
But what if we have called setSize() before?



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 09:23:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@25
PS7, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 12:15:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9122/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 18
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 14:45:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................

IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

This change fixes the behavior of BytesWritable and TextWritable's
getBytes() method. Now the returned byte array could be handled as
the underlying buffer as it gets loaded before the UDF's evaluation,
and tracks the changes as a regular Java byte array; the resizing
operation still resets the reference. The operations that wrote back
to the native heap were also removed as these operations are now
handled in the byte array. ImpalaStringWritable class is also removed,
writables that used it before now store the data directly.

Tests:
 - Test UDFs added as BufferAlteringUdf and GenericBufferAlteringUdf
 - E2E test ran for UDFs

Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Reviewed-on: http://gerrit.cloudera.org:8080/19507
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/compat-hive-3/java/org/apache/impala/compat/HiveEsriGeospatialBuiltins.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java
M fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
D fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
M fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java
A fe/src/main/java/org/apache/impala/hive/executor/Reloadable.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
A java/test-hive-udfs/src/main/java/org/apache/impala/BufferAlteringUdf.java
A java/test-hive-udfs/src/main/java/org/apache/impala/GenericBufferAlteringUdf.java
M testdata/workloads/functional-query/queries/QueryTest/generic-java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M testdata/workloads/functional-query/queries/QueryTest/load-generic-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
16 files changed, 546 insertions(+), 425 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 19
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/4/fe/src/main/java/org/apache/impala/hive/executor/ImpalaStringWritable.java@104
PS4, Line 104: 
> By default, Object's equals method will do exactly the same as the == opera
Ok, for some reason I thought that arrays overrode 'equals()' to do what Lists do.


http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java@29
PS7, Line 29: now every data manipulation
            :  * operation on array-backed writables loads the corresponding data
It is a bit unclear to me, could you make it clearer that we load it once and subsequent manipulations use the already loaded (and possibly changed) value?


http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/7/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@75
PS7, Line 75: bytes
'bytes' could be declared and assigned here, no need to forward declare it. See alson L80.



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 13:48:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12405/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 4
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Feb 2023 09:35:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@25
PS16, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 16
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Mar 2023 15:59:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/15/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/15/java/test-hive-udfs/src/main/java/org/apache/impala/GenericCachedWritablesUdf.java@25
PS15, Line 25: import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 15
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 16:27:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java@27
PS16, Line 27:   private final long ptr_;
I think that the best would be to drop the Impala writables and store an array of longs (pointers) to the native values in the executor + reload it on every invocation.

As this change is mainly a fix I am ok with not doing it here to reduce the scope of the change.


http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/ImpalaTextWritable.java@34
PS16, Line 34:     super.set(new String(bytes));
This is slower than passing the bytes directly as Text stores the string as a utf8 byte array instead of a java String.

https://github.com/apache/hadoop/blob/927401886ae5be5f3c8dd6d82f13363bba594396/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java#L230


http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java
File fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java:

http://gerrit.cloudera.org:8080/#/c/19507/16/fe/src/main/java/org/apache/impala/hive/executor/StringValueReader.java@29
PS16, Line 29: StringValueReader
optional: could be merged to https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java ?


http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java
File java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java:

http://gerrit.cloudera.org:8080/#/c/19507/16/java/test-hive-udfs/src/main/java/org/apache/impala/CachedWritablesUdf.java@34
PS16, Line 34: CachedWritablesUdf
optional: maybe another name would be clearer now as we no longer cache the strings but simply copy them, e.g. BufferAlteringUdf



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 16
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Mar 2023 07:45:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 16: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19507/14/fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java
File fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java:

http://gerrit.cloudera.org:8080/#/c/19507/14/fe/src/main/java/org/apache/impala/hive/executor/ImpalaBytesWritable.java@35
PS14, Line 35:     super.set(bytes, 0, bytes.length);
> Yes, if the contract is respected, it would be OK without setCapacity. It's
Ok, we can be cautious here.

It would be interesting to see how much performance we could gain by not reallocating the array for every row, but that is outside the scope of this change.



-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 16
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Mar 2023 09:32:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12582/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 17
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 12:29:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19507 )

Change subject: IMPALA-11854: ImpalaStringWritable's underlying array can't be changed in UDFs
......................................................................


Patch Set 18: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb28bd0dce7b0482c7abe1f61f245691fcbfe212
Gerrit-Change-Number: 19507
Gerrit-PatchSet: 18
Gerrit-Owner: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 14:45:48 +0000
Gerrit-HasComments: No