You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu> on 2021/09/07 20:26:47 UTC

Change in asterixdb[mad-hatter]: [ASTERIXDB-2949][RUN][FUN] SUBSTR function produces malformed string

From Ali Alsuliman <al...@gmail.com>:

Ali Alsuliman has submitted this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/12864 )

Change subject: [ASTERIXDB-2949][RUN][FUN] SUBSTR function produces malformed string
......................................................................

[ASTERIXDB-2949][RUN][FUN] SUBSTR function produces malformed string

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
- Fix UTF8StringBuilder grow logic

UTF8StringBuilder initially takes an estimated length of the
string to be written and reserves space at the beginning
of the buffer to later store the length of the data written.
When the actual data written happens to be greater than the
estimated length requiring more space to store the length,
the string content needs to be shifted.

This patch is to fix the starting offset of the data to be shifted.
Also, the estimated length calculation of the substring method of
the UTF8StringPointable is modified to account for
SUBSTR(input_string, 0, num_chars_to_substring) with start offset = 0.

Change-Id: If36253ff884a9c19eaa130c4e5e926f2dd9eea1d
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/12864
Integration-Tests: Jenkins <je...@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
Reviewed-by: Ali Alsuliman <al...@gmail.com>
Reviewed-by: Ian Maxon <im...@uci.edu>
---
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/results/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.adm
M asterixdb/asterix-app/src/test/resources/runtimets/testsuite_sqlpp.xml
M hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java
M hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java
M hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/test/java/org/apache/hyracks/data/std/primitive/UTF8StringPointableTest.java
6 files changed, 70 insertions(+), 3 deletions(-)

Approvals:
  Ian Maxon: Looks good to me, approved
  Ali Alsuliman: Looks good to me, but someone else must approve
  Jenkins: Verified; Verified

Objections:
  Anon. E. Moose #1000171: Violations found



diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.query.sqlpp
new file mode 100644
index 0000000..22105a4
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.query.sqlpp
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+/*
+ * Description    : Test the issue described in ASTERIXDB-2949
+ * Success        : Yes
+ */
+
+
+SELECT SUBSTR("•\tABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789\n•\tabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\tABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789", 0, 1000) AS s;
\ No newline at end of file
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/results/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.adm b/asterixdb/asterix-app/src/test/resources/runtimets/results/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.adm
new file mode 100644
index 0000000..a36b551
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/results/string/substr-ASTERIXDB-2949/substr-ASTERIXDB-2949.0.adm
@@ -0,0 +1 @@
+{ "s": "•\tABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789\n•\tabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\tABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" }
\ No newline at end of file
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_sqlpp.xml b/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_sqlpp.xml
index 8d06f71..600dde8 100644
--- a/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_sqlpp.xml
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_sqlpp.xml
@@ -9296,6 +9296,11 @@
       </compilation-unit>
     </test-case>
     <test-case FilePath="string">
+      <compilation-unit name="substr-ASTERIXDB-2949">
+        <output-dir compare="Text">substr-ASTERIXDB-2949</output-dir>
+      </compilation-unit>
+    </test-case>
+    <test-case FilePath="string">
       <compilation-unit name="substring-after-1">
         <output-dir compare="Text">substring-after-1</output-dir>
       </compilation-unit>
diff --git a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java
index 9ff2f41..eff71de 100644
--- a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java
+++ b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/primitive/UTF8StringPointable.java
@@ -368,7 +368,9 @@
             return false;
         }
 
-        builder.reset(out, Math.min(utfLen - byteIdx, (int) (charLength * 1.0 * byteIdx / chIdx)));
+        // for byteIdx = 0, this estimate assumes that every char size = 1 byte
+        int estimateOutBytes = byteIdx == 0 ? charLength : (int) (charLength * 1.0 * byteIdx / chIdx);
+        builder.reset(out, Math.min(utfLen - byteIdx, estimateOutBytes));
         chIdx = 0;
         while (byteIdx < utfLen && chIdx < charLength) {
             builder.appendChar(src.charAt(src.getMetaDataLength() + byteIdx));
diff --git a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java
index 452710e..d188f16 100644
--- a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java
+++ b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/AbstractVarLenObjectBuilder.java
@@ -85,8 +85,9 @@
                 for (int i = 0; i < diff; i++) {
                     out.writeByte(0);
                 }
-                for (int i = ary.getLength() - 1; i >= actualDataStart + diff; i--) {
-                    ary.getByteArray()[i] = ary.getByteArray()[i - diff];
+                int firstCharIdx = startOffset + estimateMetaLen;
+                for (int dest = ary.getLength() - 1, src = dest - diff; src >= firstCharIdx; dest--, src--) {
+                    ary.getByteArray()[dest] = ary.getByteArray()[src];
                 }
             }
         }
diff --git a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/test/java/org/apache/hyracks/data/std/primitive/UTF8StringPointableTest.java b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/test/java/org/apache/hyracks/data/std/primitive/UTF8StringPointableTest.java
index 302e7a0..ed439cb 100644
--- a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/test/java/org/apache/hyracks/data/std/primitive/UTF8StringPointableTest.java
+++ b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/test/java/org/apache/hyracks/data/std/primitive/UTF8StringPointableTest.java
@@ -257,4 +257,37 @@
         assertEquals(0, expected.compareTo(result));
     }
 
+    @Test
+    public void testStringBuilder() throws Exception {
+        UTF8StringBuilder builder = new UTF8StringBuilder();
+        GrowableArray array = new GrowableArray();
+        UTF8StringPointable stringPointable = new UTF8StringPointable();
+        String writtenString;
+        int startIdx;
+
+        array.append(STRING_UTF8_MIX.getByteArray(), STRING_UTF8_MIX.getStartOffset(), STRING_UTF8_MIX.getLength());
+        String chunk = "ABC";
+        String originalString = chunk.repeat(699051);
+
+        // test grow path
+        startIdx = array.getLength();
+        builder.reset(array, 2);
+        builder.appendString(originalString);
+        builder.finish();
+        stringPointable.set(array.getByteArray(), startIdx, array.getLength());
+        writtenString = stringPointable.toString();
+        assertEquals(originalString, writtenString);
+
+        // test shrink path
+        array.reset();
+        array.append(STRING_UTF8_MIX.getByteArray(), STRING_UTF8_MIX.getStartOffset(), STRING_UTF8_MIX.getLength());
+        startIdx = array.getLength();
+        builder.reset(array, 699051);
+        builder.appendString(chunk);
+        builder.finish();
+        stringPointable.set(array.getByteArray(), startIdx, array.getLength());
+        writtenString = stringPointable.toString();
+        assertEquals(chunk, writtenString);
+    }
+
 }

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/12864
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: mad-hatter
Gerrit-Change-Id: If36253ff884a9c19eaa130c4e5e926f2dd9eea1d
Gerrit-Change-Number: 12864
Gerrit-PatchSet: 4
Gerrit-Owner: Ali Alsuliman <al...@gmail.com>
Gerrit-Reviewer: Ali Alsuliman <al...@gmail.com>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-MessageType: merged