You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/04/26 19:57:29 UTC

[GitHub] [flink-table-store] SteNicholas opened a new pull request, #104: [FLINK-27336] Avoid merging when there is only one record

SteNicholas opened a new pull request, #104:
URL: https://github.com/apache/flink-table-store/pull/104

   If there is just one record, still use `MergeFunction` to merge. This is not necessary, just output directly.
   
   **The brief change log**
   
   - Updates the merge in the `SortBufferMemTable` and `SortMergeReader` to output directly if there is only one record, and use `MergeFunction` to merge in the case of multiple records.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] LadyForest commented on pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
LadyForest commented on PR #104:
URL: https://github.com/apache/flink-table-store/pull/104#issuecomment-1155391000

   Hi @SteNicholas, are you still following up?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] SteNicholas commented on pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #104:
URL: https://github.com/apache/flink-table-store/pull/104#issuecomment-1170787052

   @JingsongLi, I have addressed above comments and the failure of the UTCase/ITCase isn't related to this change. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] SteNicholas commented on pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on PR #104:
URL: https://github.com/apache/flink-table-store/pull/104#issuecomment-1111663553

   @JingsongLi @tsreaper, IMO, the current tests could cover the execution logic branch, hence I have not added some tests. WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi merged pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
JingsongLi merged PR #104:
URL: https://github.com/apache/flink-table-store/pull/104


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #104:
URL: https://github.com/apache/flink-table-store/pull/104#discussion_r912611645


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/compact/MergeFunctionHelper.java:
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.file.mergetree.compact;
+
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.RowData;
+
+import static org.apache.flink.table.store.file.mergetree.compact.ValueCountMergeFunction.count;
+
+/** Helper functions for the interaction with {@link MergeFunction}. */
+public class MergeFunctionHelper {
+
+    private final MergeFunction mergeFunction;
+
+    private RowData rowData;
+    private boolean isInitialized;
+
+    public MergeFunctionHelper(MergeFunction mergeFunction) {
+        this.mergeFunction = mergeFunction;
+    }
+
+    /**
+     * Resets the {@link MergeFunction} helper to its default state: 1. Clears the one record which
+     * the helper maintains. 2. Resets the {@link MergeFunction} to its default state. 3. Clears the
+     * initialized state of the {@link MergeFunction}.
+     */
+    public void reset() {
+        rowData = null;
+        mergeFunction.reset();
+        isInitialized = false;
+    }
+
+    /** Adds the given {@link RowData} to the {@link MergeFunction} helper. */
+    public void add(RowData value) {
+        if (rowData == null) {
+            rowData = value;
+        } else {
+            if (!isInitialized) {
+                mergeFunction.add(rowData);
+                isInitialized = true;
+            }
+            mergeFunction.add(value);
+        }
+    }
+
+    /**
+     * Get current value of the {@link MergeFunction} helper. Return null if the value should be
+     * skipped.
+     */
+    public RowData getValue() {
+        return isInitialized
+                ? mergeFunction.getValue()
+                : mergeFunction instanceof ValueCountMergeFunction

Review Comment:
   Just `rowData`?
   The input of `ValueCountMergeFunction` should not be count zero.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on PR #104:
URL: https://github.com/apache/flink-table-store/pull/104#issuecomment-1111684316

   > @JingsongLi @tsreaper, IMO, the current tests could cover the execution logic branch, hence I have not added some extra tests to verify. WDYT?
   
   At least, you can add some unit tests on the new `Helper`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #104:
URL: https://github.com/apache/flink-table-store/pull/104#discussion_r859320053


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/SortBufferMemTable.java:
##########
@@ -166,19 +169,31 @@ private void advanceIfNeeded() {
                 if (previousRow == null) {
                     return;
                 }
-                mergeFunction.reset();
-                mergeFunction.add(previous.getReusedKv().value());
 
+                boolean hasOne = true;
                 while (readOnce()) {
                     if (keyComparator.compare(
                                     previous.getReusedKv().key(), current.getReusedKv().key())
                             != 0) {
                         break;
                     }
+                    // avoid merging when there is only one record
+                    if (hasOne) {
+                        mergeFunction.reset();
+                        mergeFunction.add(previous.getReusedKv().value());
+                        hasOne = false;
+                    }
                     mergeFunction.add(current.getReusedKv().value());
                     swapSerializers();
                 }
-                result = mergeFunction.getValue();
+                RowData previousValue = previous.getReusedKv().value();
+                result =
+                        hasOne
+                                ? mergeFunction instanceof ValueCountMergeFunction

Review Comment:
   I think we can ignore that count is zero for `ValueCountMergeFunction`.
   Because if there is no merge, the count should never be zero.



##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/SortBufferMemTable.java:
##########
@@ -166,19 +169,31 @@ private void advanceIfNeeded() {
                 if (previousRow == null) {
                     return;
                 }
-                mergeFunction.reset();
-                mergeFunction.add(previous.getReusedKv().value());
 
+                boolean hasOne = true;

Review Comment:
   `hasOne = true` -> `mergeFuncInitialized = false`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] SteNicholas commented on a diff in pull request #104: [FLINK-27336] Avoid merging when there is only one record

Posted by GitBox <gi...@apache.org>.
SteNicholas commented on code in PR #104:
URL: https://github.com/apache/flink-table-store/pull/104#discussion_r860401394


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/mergetree/SortBufferMemTable.java:
##########
@@ -166,19 +169,31 @@ private void advanceIfNeeded() {
                 if (previousRow == null) {
                     return;
                 }
-                mergeFunction.reset();
-                mergeFunction.add(previous.getReusedKv().value());
 
+                boolean hasOne = true;
                 while (readOnce()) {
                     if (keyComparator.compare(
                                     previous.getReusedKv().key(), current.getReusedKv().key())
                             != 0) {
                         break;
                     }
+                    // avoid merging when there is only one record
+                    if (hasOne) {
+                        mergeFunction.reset();
+                        mergeFunction.add(previous.getReusedKv().value());
+                        hasOne = false;
+                    }
                     mergeFunction.add(current.getReusedKv().value());
                     swapSerializers();
                 }
-                result = mergeFunction.getValue();
+                RowData previousValue = previous.getReusedKv().value();
+                result =
+                        hasOne
+                                ? mergeFunction instanceof ValueCountMergeFunction

Review Comment:
   @JingsongLi, in theory the count should never be zero, but in current tests there is 0 test case. Hence I add this logic to avoid the zero situation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org