You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/21 05:51:39 UTC

[GitHub] [doris] zhangstar333 opened a new pull request, #10296: [vectorized][udf] improvement java-udaf with group by clause

zhangstar333 opened a new pull request, #10296:
URL: https://github.com/apache/doris/pull/10296

   Now call add_batch will calculate the whole batch at once
   
   
   # Proposed changes
   
   Issue Number: close #10295 
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #10296:
URL: https://github.com/apache/doris/pull/10296#discussion_r918855523


##########
be/src/vec/aggregate_functions/aggregate_function_java_udaf.h:
##########
@@ -292,54 +316,87 @@ class AggregateJavaUdaf final
     }
 
     void create(AggregateDataPtr __restrict place) const override {
-        new (place) Data(argument_types.size());
-        Status status = Status::OK();
-        RETURN_IF_STATUS_ERROR(status, data(place).init_udaf(_fn));
+        if (_first_created) {
+            new (place) Data(argument_types.size());
+            Status status = Status::OK();
+            RETURN_IF_STATUS_ERROR(status, this->data(place).init_udaf(_fn));
+            RETURN_IF_STATUS_ERROR(status,
+                                   this->data(place).set_create_flag(
+                                           place, const_cast<AggregateDataPtr&>(_exec_place),
+                                           const_cast<bool*>(&_first_created)));
+        }
+    }
+
+    void destroy(AggregateDataPtr __restrict place) const noexcept override {
+        if (place == _exec_place) {
+            this->data(_exec_place).destroy(reinterpret_cast<int64_t>(place));

Review Comment:
   should not call here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10296:
URL: https://github.com/apache/doris/pull/10296#issuecomment-1182877378

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10296:
URL: https://github.com/apache/doris/pull/10296#issuecomment-1183940296

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt merged pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt merged PR #10296:
URL: https://github.com/apache/doris/pull/10296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on PR #10296:
URL: https://github.com/apache/doris/pull/10296#issuecomment-1163875032

   Btw, we should try to put all for-loop inside java side. Besides batch_add, we also should make `merge` a batch call because  calling a JNI for each row lead to too much JNI overhead now 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #10296:
URL: https://github.com/apache/doris/pull/10296#discussion_r918867005


##########
be/src/vec/aggregate_functions/aggregate_function_java_udaf.h:
##########
@@ -96,6 +98,7 @@ struct AggregateJavaUdafData {
             ctor_params.__set_input_buffer_ptrs((int64_t)input_values_buffer_ptr.get());
             ctor_params.__set_input_nulls_ptrs((int64_t)input_nulls_buffer_ptr.get());
             ctor_params.__set_output_buffer_ptr((int64_t)output_value_buffer.get());
+            ctor_params.__set_input_places_ptr((int64_t)input_place_ptrs.get());

Review Comment:
   the strunct is too big, rethink the logic and try to opt the struct. because of after the pr, the struct should be simple



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on a diff in pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #10296:
URL: https://github.com/apache/doris/pull/10296#discussion_r904462502


##########
fe/java-udf/src/main/java/org/apache/doris/udf/UdafExecutor.java:
##########
@@ -132,24 +133,33 @@ protected void finalize() throws Throwable {
     /**
      * invoke add function, add row in loop [rowStart, rowEnd).
      */
-    public void add(long rowStart, long rowEnd) throws UdfRuntimeException {
+    public void add(boolean isSinglePlace, long rowStart, long rowEnd) throws UdfRuntimeException {
         try {
-            Object[] inputArgs = new Object[argTypes.length + 1];
-            inputArgs[0] = stateObj;
-            for (long row = rowStart; row < rowEnd; ++row) {
-                Object[] inputObjects = allocateInputObjects(row);
-                for (int i = 0; i < argTypes.length; ++i) {
-                    if (UdfUtils.UNSAFE.getLong(null, UdfUtils.getAddressAtOffset(inputNullsPtrs, i)) == -1
-                            || UdfUtils.UNSAFE.getByte(null,
-                                    UdfUtils.UNSAFE.getLong(null, UdfUtils.getAddressAtOffset(inputNullsPtrs, i)) + row)
-                                    == 0) {
-                        inputArgs[i + 1] = inputObjects[i];
-                    } else {
-                        inputArgs[i + 1] = null;
-                    }
+            long idx = rowStart;
+            do {
+                Long curPlace = UdfUtils.UNSAFE.getLong(null, UdfUtils.UNSAFE.getLong(null, inputPlacesPtr) + 8L * idx);
+                Object[] inputArgs = new Object[argTypes.length + 1];
+                if (!stateObjMap.containsKey(curPlace)) {
+                    Object stateObj = create();
+                    stateObjMap.put(curPlace, stateObj);
                 }
-                allMethods.get(UDAF_ADD_FUNCTION).invoke(udaf, inputArgs);
-            }
+                inputArgs[0] = stateObjMap.get(curPlace);
+                do {
+                    Object[] inputObjects = allocateInputObjects(idx);

Review Comment:
   If this row is NULL, we should avoid this unnecessary allocation.
   I think this logics is same as `evaluate` in `UdfExecutor`. Maybe you could make both `inputObjects` and `inputArgs` from these two class allocated in a common method?



##########
fe/java-udf/src/main/java/org/apache/doris/udf/UdafExecutor.java:
##########
@@ -197,15 +209,20 @@ public byte[] serialize() throws UdfRuntimeException {
      * invoke merge function and it's have done deserialze.
      * here call deserialize first, and call merge.
      */
-    public void merge(byte[] data) throws UdfRuntimeException {
+    public void merge(long place, byte[] data) throws UdfRuntimeException {
         try {
             Object[] args = new Object[2];
             ByteArrayInputStream bins = new ByteArrayInputStream(data);
             args[0] = create();
             args[1] = new DataInputStream(bins);
             allMethods.get(UDAF_DESERIALIZE_FUNCTION).invoke(udaf, args);
             args[1] = args[0];
-            args[0] = stateObj;
+            Long curPlace = place;
+            if (!stateObjMap.containsKey(curPlace)) {
+                Object stateObj = create();

Review Comment:
   change `create()` to another name. maybe `createAggState`



##########
fe/java-udf/src/main/java/org/apache/doris/udf/UdafExecutor.java:
##########
@@ -132,24 +133,33 @@ protected void finalize() throws Throwable {
     /**
      * invoke add function, add row in loop [rowStart, rowEnd).
      */
-    public void add(long rowStart, long rowEnd) throws UdfRuntimeException {
+    public void add(boolean isSinglePlace, long rowStart, long rowEnd) throws UdfRuntimeException {
         try {
-            Object[] inputArgs = new Object[argTypes.length + 1];
-            inputArgs[0] = stateObj;
-            for (long row = rowStart; row < rowEnd; ++row) {
-                Object[] inputObjects = allocateInputObjects(row);
-                for (int i = 0; i < argTypes.length; ++i) {
-                    if (UdfUtils.UNSAFE.getLong(null, UdfUtils.getAddressAtOffset(inputNullsPtrs, i)) == -1
-                            || UdfUtils.UNSAFE.getByte(null,
-                                    UdfUtils.UNSAFE.getLong(null, UdfUtils.getAddressAtOffset(inputNullsPtrs, i)) + row)
-                                    == 0) {
-                        inputArgs[i + 1] = inputObjects[i];
-                    } else {
-                        inputArgs[i + 1] = null;
-                    }
+            long idx = rowStart;
+            do {
+                Long curPlace = UdfUtils.UNSAFE.getLong(null, UdfUtils.UNSAFE.getLong(null, inputPlacesPtr) + 8L * idx);
+                Object[] inputArgs = new Object[argTypes.length + 1];
+                if (!stateObjMap.containsKey(curPlace)) {

Review Comment:
   stateObjMap.putIfAbsent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhangstar333 commented on a diff in pull request #10296: [vectorized][udf] improvement java-udaf with group by clause

Posted by GitBox <gi...@apache.org>.
zhangstar333 commented on code in PR #10296:
URL: https://github.com/apache/doris/pull/10296#discussion_r918871188


##########
be/src/vec/aggregate_functions/aggregate_function_java_udaf.h:
##########
@@ -96,6 +98,7 @@ struct AggregateJavaUdafData {
             ctor_params.__set_input_buffer_ptrs((int64_t)input_values_buffer_ptr.get());
             ctor_params.__set_input_nulls_ptrs((int64_t)input_nulls_buffer_ptr.get());
             ctor_params.__set_output_buffer_ptr((int64_t)output_value_buffer.get());
+            ctor_params.__set_input_places_ptr((int64_t)input_place_ptrs.get());

Review Comment:
   OK, will be refactor this in next PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org