You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu> on 2021/04/02 00:55:26 UTC

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

From Ian Maxon <im...@uci.edu>:

Ian Maxon has uploaded this change for review. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )


Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................

[ASTERIXDB-2849][RT] Fix Python UDF string serde

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- Avoid string creation on serialization to msgpack
- Properly handle UTF-8 to Modified UTF-8 conversion
  on msgpack to ADM deserialization
- Add test with some unicode combiners and ZWJ

Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
---
M asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp
R asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.6.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
12 files changed, 164 insertions(+), 61 deletions(-)



  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/26/10826/1

diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
index a8ba8a1..76a90a7 100644
--- a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
@@ -21,3 +21,6 @@
 create function typeValidation(a, b, c, d, e, f, g)
   as "roundtrip", "Tests.roundtrip" at testlib;
 
+create function stringTest(s) as "roundtrip",
+  "Tests.roundtrip" at testlib;
+
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp
new file mode 100644
index 0000000..cede236
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+use externallibtest;
+stringTest("Ḣ̵̠̝̖͔̫̫̬̠͇̲̈́̍̅͆̋̉͛̓̽͊̉͘͜Ę̵̻͎̀̄̋̈́͐̅̈́͜ ̷͍͇̹͊̈̂̀̀̅̔Ć̸̡̨̡̧̭̙̪̩͔̤̭͙͐͋̄͌̈̓͐͘͠͝͝Ǫ̴̨̡̯̟̰̪̲͎̮̪͚͒̀̌͋̽́̒͜Ṃ̵̢̛͓̑̐̑̌̍̋͝Ę̶̛̜̦̖̘̲̠͔͓͕̞̠͛̋̐̊̓̌͗̓̏͘͜S̵̢̧̬̣͈̖̀̃̉͊̑̀͐͂̀̂");
+
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp
new file mode 100644
index 0000000..0d82a8e
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+use externallibtest;
+stringTest("👩‍👨‍👦‍👧");
+
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.ddl.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.6.ddl.sqlpp
similarity index 100%
rename from asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.ddl.sqlpp
rename to asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.6.ddl.sqlpp
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
index 65a7c81..c960344 100644
--- a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
@@ -1,9 +1,9 @@
 { "id": 670301227662491648, "len": 20, "sent": 1 }
 { "id": 670301227553566720, "len": 139, "sent": 0 }
 { "id": 670301227041857536, "len": 112, "sent": 0 }
-{ "id": 670301227037519876, "len": 33, "sent": 0 }
+{ "id": 670301227037519876, "len": 34, "sent": 0 }
 { "id": 670301226987159552, "len": 57, "sent": 0 }
-{ "id": 670301226513391616, "len": 28, "sent": 1 }
+{ "id": 670301226513391616, "len": 29, "sent": 1 }
 { "id": 670301226337202180, "len": 77, "sent": 1 }
 { "id": 670301226190278656, "len": 25, "sent": 0 }
 { "id": 670301225959579648, "len": 112, "sent": 1 }
@@ -15,25 +15,25 @@
 { "id": 670301225162661889, "len": 28, "sent": 1 }
 { "id": 670301224885837824, "len": 63, "sent": 0 }
 { "id": 670301224814698496, "len": 59, "sent": 0 }
-{ "id": 670301224709849090, "len": 33, "sent": 1 }
+{ "id": 670301224709849090, "len": 37, "sent": 1 }
 { "id": 670301224684556288, "len": 21, "sent": 0 }
 { "id": 670301224680480768, "len": 39, "sent": 0 }
 { "id": 670301224348946433, "len": 64, "sent": 1 }
 { "id": 670301224261058560, "len": 61, "sent": 1 }
-{ "id": 670301224231690240, "len": 33, "sent": 0 }
-{ "id": 670301224214794240, "len": 33, "sent": 0 }
+{ "id": 670301224231690240, "len": 34, "sent": 0 }
+{ "id": 670301224214794240, "len": 41, "sent": 0 }
 { "id": 670301223753351168, "len": 105, "sent": 1 }
 { "id": 670301223426367488, "len": 23, "sent": 0 }
-{ "id": 670301223216545792, "len": 31, "sent": 0 }
+{ "id": 670301223216545792, "len": 34, "sent": 0 }
 { "id": 670301223182974976, "len": 34, "sent": 1 }
 { "id": 670301223128535041, "len": 21, "sent": 0 }
 { "id": 670301222759301121, "len": 132, "sent": 0 }
 { "id": 670301222734307329, "len": 110, "sent": 1 }
 { "id": 670301222717419520, "len": 81, "sent": 0 }
 { "id": 670301222318936064, "len": 110, "sent": 1 }
-{ "id": 670301222302150657, "len": 131, "sent": 0 }
+{ "id": 670301222302150657, "len": 135, "sent": 0 }
 { "id": 670301222222602240, "len": 43, "sent": 1 }
-{ "id": 670301222113517568, "len": 27, "sent": 0 }
+{ "id": 670301222113517568, "len": 29, "sent": 0 }
 { "id": 670301221836615680, "len": 44, "sent": 1 }
 { "id": 670301221719310336, "len": 28, "sent": 0 }
 { "id": 670301221442486272, "len": 34, "sent": 0 }
@@ -44,13 +44,13 @@
 { "id": 670301220305821696, "len": 140, "sent": 0 }
 { "id": 670301220247072770, "len": 83, "sent": 1 }
 { "id": 670301220196626432, "len": 36, "sent": 0 }
-{ "id": 670301220079312901, "len": 31, "sent": 1 }
-{ "id": 670301219949305857, "len": 70, "sent": 1 }
+{ "id": 670301220079312901, "len": 32, "sent": 1 }
+{ "id": 670301219949305857, "len": 94, "sent": 1 }
 { "id": 670301219739574273, "len": 131, "sent": 1 }
 { "id": 670301219206877184, "len": 27, "sent": 0 }
 { "id": 670301219139620864, "len": 124, "sent": 0 }
-{ "id": 670301218737123328, "len": 124, "sent": 0 }
-{ "id": 670301218640531458, "len": 31, "sent": 1 }
+{ "id": 670301218737123328, "len": 126, "sent": 0 }
+{ "id": 670301218640531458, "len": 33, "sent": 1 }
 { "id": 670301218598756352, "len": 47, "sent": 0 }
 { "id": 670301218565156865, "len": 44, "sent": 0 }
 { "id": 670301218414206976, "len": 71, "sent": 1 }
@@ -58,14 +58,14 @@
 { "id": 670301218078629888, "len": 9, "sent": 0 }
 { "id": 670301217851990017, "len": 111, "sent": 0 }
 { "id": 670301217793269760, "len": 113, "sent": 0 }
-{ "id": 670301217508036608, "len": 47, "sent": 0 }
+{ "id": 670301217508036608, "len": 55, "sent": 0 }
 { "id": 670301217369657344, "len": 137, "sent": 0 }
-{ "id": 670301217311088641, "len": 28, "sent": 0 }
+{ "id": 670301217311088641, "len": 29, "sent": 0 }
 { "id": 670301217231347712, "len": 123, "sent": 0 }
 { "id": 670301216891473920, "len": 44, "sent": 0 }
 { "id": 670301216874721280, "len": 68, "sent": 0 }
 { "id": 670301216799232000, "len": 50, "sent": 1 }
-{ "id": 670301216669171713, "len": 54, "sent": 0 }
+{ "id": 670301216669171713, "len": 55, "sent": 0 }
 { "id": 670301216493060097, "len": 113, "sent": 1 }
 { "id": 670301216400924676, "len": 35, "sent": 1 }
 { "id": 670301216371552258, "len": 58, "sent": 0 }
@@ -78,21 +78,21 @@
 { "id": 670301214958055424, "len": 58, "sent": 1 }
 { "id": 670301214605733888, "len": 139, "sent": 1 }
 { "id": 670301214509129728, "len": 114, "sent": 1 }
-{ "id": 670301214442041344, "len": 18, "sent": 1 }
+{ "id": 670301214442041344, "len": 19, "sent": 1 }
 { "id": 670301214295392256, "len": 47, "sent": 0 }
-{ "id": 670301213737529344, "len": 9, "sent": 0 }
+{ "id": 670301213737529344, "len": 10, "sent": 0 }
 { "id": 670301213544595457, "len": 63, "sent": 1 }
 { "id": 670301213515235333, "len": 107, "sent": 0 }
 { "id": 670301213464899584, "len": 105, "sent": 1 }
 { "id": 670301213120942080, "len": 39, "sent": 0 }
 { "id": 670301212961603585, "len": 63, "sent": 0 }
-{ "id": 670301212961603584, "len": 20, "sent": 0 }
-{ "id": 670301212856737792, "len": 51, "sent": 0 }
+{ "id": 670301212961603584, "len": 25, "sent": 0 }
+{ "id": 670301212856737792, "len": 55, "sent": 0 }
 { "id": 670301212760117248, "len": 133, "sent": 1 }
 { "id": 670301211808010240, "len": 103, "sent": 0 }
-{ "id": 670301211774468096, "len": 40, "sent": 0 }
+{ "id": 670301211774468096, "len": 41, "sent": 0 }
 { "id": 670301211703144450, "len": 138, "sent": 1 }
-{ "id": 670301211581685761, "len": 25, "sent": 1 }
+{ "id": 670301211581685761, "len": 26, "sent": 1 }
 { "id": 670301211560685568, "len": 12, "sent": 1 }
 { "id": 670301211090751490, "len": 140, "sent": 0 }
 { "id": 670301210654699520, "len": 13, "sent": 0 }
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm
new file mode 100644
index 0000000..9dad8cc
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm
@@ -0,0 +1 @@
+[ "Ḣ̵̠̝̖͔̫̫̬̠͇̲̈́̍̅͆̋̉͛̓̽͊̉͘͜Ę̵̻͎̀̄̋̈́͐̅̈́͜ ̷͍͇̹͊̈̂̀̀̅̔Ć̸̡̨̡̧̭̙̪̩͔̤̭͙͐͋̄͌̈̓͐͘͠͝͝Ǫ̴̨̡̯̟̰̪̲͎̮̪͚͒̀̌͋̽́̒͜Ṃ̵̢̛͓̑̐̑̌̍̋͝Ę̶̛̜̦̖̘̲̠͔͓͕̞̠͛̋̐̊̓̌͗̓̏͘͜S̵̢̧̬̣͈̖̀̃̉͊̑̀͐͂̀̂" ]
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm
new file mode 100644
index 0000000..c00f342
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm
@@ -0,0 +1 @@
+[ "👩‍👨‍👦‍👧" ]
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
index e664f47..4629220 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
@@ -59,6 +59,7 @@
 
     private MessageUnpacker unpacker;
     private ArrayBufferInput unpackerInput;
+    private MessageUnpackerToADM unpackerToADM;
 
     private long fnId;
 
@@ -83,6 +84,7 @@
         this.sourceLocation = sourceLoc;
         this.unpackerInput = new ArrayBufferInput(new byte[0]);
         this.unpacker = MessagePack.newDefaultUnpacker(unpackerInput);
+        this.unpackerToADM = new MessageUnpackerToADM();
     }
 
     @Override
@@ -94,7 +96,7 @@
                 return;
             }
             try {
-                PythonLibraryEvaluator.setArgument(argTypes[i], argValues[i], argHolder, finfo.getNullCall());
+                libraryEvaluator.setArgument(argTypes[i], argValues[i], argHolder, finfo.getNullCall());
             } catch (IOException e) {
                 throw new HyracksDataException("Error evaluating Python UDF", e);
             }
@@ -125,7 +127,7 @@
             }
             int numresults = resultWrapper.get() ^ FIXARRAY_PREFIX;
             if (numresults > 0) {
-                MessageUnpackerToADM.unpack(resultWrapper, outputWrapper, true);
+                unpackerToADM.unpack(resultWrapper, outputWrapper, true);
             }
             unpackerInput.reset(resultWrapper.array(), resultWrapper.position() + resultWrapper.arrayOffset(),
                     resultWrapper.remaining());
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
index e2229ee..6255d2a 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
@@ -68,6 +68,7 @@
     private TaskAttemptId task;
     private IWarningCollector warningCollector;
     private SourceLocation sourceLoc;
+    private MessagePackerFromADM packerFromADM;
 
     public PythonLibraryEvaluator(JobId jobId, PythonLibraryEvaluatorId evaluatorId, ILibraryManager libMgr,
             File pythonHome, String sitePkgs, List<String> pythonArgs, ExternalFunctionResultRouter router,
@@ -82,6 +83,7 @@
         this.ipcSys = ipcSys;
         this.warningCollector = warningCollector;
         this.sourceLoc = sourceLoc;
+        this.packerFromADM = new MessagePackerFromADM();
 
     }
 
@@ -165,17 +167,17 @@
         router.removeRoute(proto.getRouteId());
     }
 
-    public static ATypeTag setArgument(IAType type, IValueReference valueReference, ByteBuffer argHolder,
-            boolean nullCall) throws IOException {
+    public ATypeTag setArgument(IAType type, IValueReference valueReference, ByteBuffer argHolder, boolean nullCall)
+            throws IOException {
         ATypeTag tag = type.getTypeTag();
         if (tag == ATypeTag.ANY) {
             TaggedValuePointable pointy = TaggedValuePointable.FACTORY.createPointable();
             pointy.set(valueReference);
             ATypeTag rtTypeTag = EnumDeserializer.ATYPETAGDESERIALIZER.deserialize(pointy.getTag());
             IAType rtType = TypeTagUtil.getBuiltinTypeByTag(rtTypeTag);
-            return MessagePackerFromADM.pack(valueReference, rtType, argHolder, nullCall);
+            return packerFromADM.pack(valueReference, rtType, argHolder, nullCall);
         } else {
-            return MessagePackerFromADM.pack(valueReference, type, argHolder, nullCall);
+            return packerFromADM.pack(valueReference, type, argHolder, nullCall);
         }
     }
 
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
index f0ac56e..f923d0c 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
@@ -16,6 +16,10 @@
  */
 package org.apache.asterix.external.library.msgpack;
 
+import static org.apache.hyracks.util.string.UTF8StringUtil.charAt;
+import static org.apache.hyracks.util.string.UTF8StringUtil.getModifiedUTF8Len;
+import static org.apache.hyracks.util.string.UTF8StringUtil.getNumBytesToStoreLength;
+import static org.apache.hyracks.util.string.UTF8StringUtil.getUTFLength;
 import static org.msgpack.core.MessagePack.Code.ARRAY32;
 import static org.msgpack.core.MessagePack.Code.FALSE;
 import static org.msgpack.core.MessagePack.Code.FIXARRAY_PREFIX;
@@ -32,10 +36,13 @@
 import static org.msgpack.core.MessagePack.Code.TRUE;
 
 import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+import java.nio.charset.CharsetEncoder;
 import java.nio.charset.StandardCharsets;
 
 import org.apache.asterix.common.exceptions.AsterixException;
 import org.apache.asterix.common.exceptions.ErrorCode;
+import org.apache.asterix.external.util.ExternalDataConstants;
 import org.apache.asterix.om.types.ARecordType;
 import org.apache.asterix.om.types.ATypeTag;
 import org.apache.asterix.om.types.AUnionType;
@@ -62,13 +69,20 @@
     private static final int LENGTH_SIZE = 4;
     private static final int ITEM_COUNT_SIZE = 4;
     private static final int ITEM_OFFSET_SIZE = 4;
+    private final CharsetEncoder encoder;
+    private final CharBuffer cbuf;
 
-    public static ATypeTag pack(IValueReference ptr, IAType type, ByteBuffer out, boolean packUnknown)
+    public MessagePackerFromADM() {
+        encoder = StandardCharsets.UTF_8.newEncoder();
+        cbuf = CharBuffer.allocate(ExternalDataConstants.DEFAULT_BUFFER_SIZE);
+    }
+
+    public ATypeTag pack(IValueReference ptr, IAType type, ByteBuffer out, boolean packUnknown)
             throws HyracksDataException {
         return pack(ptr.getByteArray(), ptr.getStartOffset(), type, true, packUnknown, out);
     }
 
-    public static ATypeTag pack(byte[] ptr, int offs, IAType type, boolean tagged, boolean packUnknown, ByteBuffer out)
+    public ATypeTag pack(byte[] ptr, int offs, IAType type, boolean tagged, boolean packUnknown, ByteBuffer out)
             throws HyracksDataException {
         int relOffs = tagged ? offs + 1 : offs;
         ATypeTag tag = type.getTypeTag();
@@ -194,23 +208,33 @@
         out.put(strBytes);
     }
 
-    private static void packStr(byte[] in, int offs, ByteBuffer out) {
-        out.put(STR32);
+    private void packStr(byte[] in, int offs, ByteBuffer out) {
         //TODO: tagged/untagged. closed support is borked so always tagged rn
-        String str = UTF8StringUtil.toString(in, offs);
-        byte[] strBytes = str.getBytes(StandardCharsets.UTF_8);
-        out.putInt(strBytes.length);
-        out.put(strBytes);
-    }
-
-    public static void packStr(String str, ByteBuffer out) {
+        cbuf.clear();
+        cbuf.position(0);
+        encoder.reset();
         out.put(STR32);
-        byte[] strBytes = str.getBytes(StandardCharsets.UTF_8);
-        out.putInt(strBytes.length);
-        out.put(strBytes);
+        final int calculatedLength = getUTFLength(in, offs);
+        int remainingLen = calculatedLength;
+        final int varSzOffset = getNumBytesToStoreLength(calculatedLength);
+        int pos = varSzOffset;
+        while (remainingLen > 0) {
+            char c = charAt(in, pos + offs);
+            cbuf.put(c);
+            int charLen = getModifiedUTF8Len(c);
+            pos += charLen;
+            remainingLen -= charLen;
+        }
+        int sizeStart = out.position();
+        out.putInt(-1);
+        cbuf.flip();
+        int stringStart = out.position();
+        encoder.encode(cbuf, out, true);
+        encoder.flush(out);
+        out.putInt(sizeStart, out.position() - stringStart);
     }
 
-    private static void packArray(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
+    private void packArray(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
         //TODO: - could optimize to pack fixarray/array16 for small arrays
         //      - this code is basically a static version of AListPointable, could be deduped
         AbstractCollectionType collType = (AbstractCollectionType) type;
@@ -234,7 +258,7 @@
         }
     }
 
-    private static void packObject(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
+    private void packObject(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
         ARecordType recType = (ARecordType) type;
         out.put(MAP32);
         int fieldCt = recType.getFieldNames().length + RecordUtils.getOpenFieldCount(in, offs, recType);
@@ -242,7 +266,7 @@
         for (int i = 0; i < recType.getFieldNames().length; i++) {
             String field = recType.getFieldNames()[i];
             IAType fieldType = RecordUtils.getClosedFieldType(recType, i);
-            packStr(field, out);
+            packStr(out, field);
             pack(in, RecordUtils.getClosedFieldOffset(in, offs, recType, i), fieldType, false, true, out);
         }
         if (RecordUtils.isExpanded(in, offs, recType)) {
@@ -332,7 +356,7 @@
         }
 
         public static int getOpenFieldNameSize(byte[] bytes, int start, ARecordType recordType, int fieldId) {
-            int utfleng = UTF8StringUtil.getUTFLength(bytes, getOpenFieldNameOffset(bytes, start, recordType, fieldId));
+            int utfleng = getUTFLength(bytes, getOpenFieldNameOffset(bytes, start, recordType, fieldId));
             return utfleng + UTF8StringUtil.getNumBytesToStoreLength(utfleng);
         }
 
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
index 4af1121..c28b87d 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
@@ -18,18 +18,36 @@
 
 import static org.msgpack.core.MessagePack.Code.*;
 
+import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+import java.nio.charset.CharsetDecoder;
+import java.nio.charset.StandardCharsets;
 
 import org.apache.asterix.common.exceptions.AsterixException;
 import org.apache.asterix.common.exceptions.ErrorCode;
+import org.apache.asterix.external.util.ExternalDataConstants;
 import org.apache.asterix.om.types.ATypeTag;
 import org.apache.hyracks.api.exceptions.HyracksDataException;
-import org.apache.hyracks.util.encoding.VarLenIntEncoderDecoder;
+import org.apache.hyracks.data.std.util.ArrayBackedValueStorage;
 import org.apache.hyracks.util.string.UTF8StringUtil;
+import org.apache.hyracks.util.string.UTF8StringWriter;
 
 public class MessageUnpackerToADM {
 
-    public static void unpack(ByteBuffer in, ByteBuffer out, boolean tagged) throws HyracksDataException {
+    private final CharBuffer stringCharBuffer;
+    private final CharsetDecoder decoder;
+    private final UTF8StringWriter stringWriter;
+    private final ArrayBackedValueStorage stringOut;
+
+    public MessageUnpackerToADM() {
+        this.stringCharBuffer = CharBuffer.allocate(ExternalDataConstants.DEFAULT_BUFFER_SIZE);
+        this.decoder = StandardCharsets.UTF_8.newDecoder();
+        this.stringWriter = new UTF8StringWriter();
+        this.stringOut = new ArrayBackedValueStorage(ExternalDataConstants.DEFAULT_BUFFER_SIZE);
+    }
+
+    public void unpack(ByteBuffer in, ByteBuffer out, boolean tagged) throws IOException, HyracksDataException {
         byte tag = NIL;
         if (in != null) {
             tag = in.get();
@@ -196,7 +214,7 @@
         out.putDouble(in.getDouble());
     }
 
-    public static void unpackArray(ByteBuffer in, ByteBuffer out, long uLen) throws HyracksDataException {
+    public void unpackArray(ByteBuffer in, ByteBuffer out, long uLen) throws HyracksDataException, IOException {
         if (uLen > Integer.MAX_VALUE) {
             throw new UnsupportedOperationException("Array is too long");
         }
@@ -220,7 +238,7 @@
         out.putInt(asxLenPos, totalLen);
     }
 
-    public static void unpackMap(ByteBuffer in, ByteBuffer out, int count) throws HyracksDataException {
+    public void unpackMap(ByteBuffer in, ByteBuffer out, int count) throws IOException, HyracksDataException {
         //TODO: need to handle typed records. this only produces a completely open record.
         //hdr size = 6?
         int startOffs = out.position();
@@ -243,6 +261,7 @@
         for (int i = 0; i < count; i++) {
             int offs = out.position() + out.arrayOffset();
             int relOffs = offs - startOffs;
+            int stroffs = in.position();
             unpack(in, out, false);
             int hash = UTF8StringUtil.hash(out.array(), offs);
             out.putInt(offsetAryPos, hash);
@@ -254,8 +273,7 @@
         out.putInt(totalSizeOffs, out.position() - startOffs);
     }
 
-    public static void unpackStr(ByteBuffer in, ByteBuffer out, long uLen, boolean tag) {
-        //TODO: this probably breaks for 3 and 4 byte UTF-8
+    public void unpackStr(ByteBuffer in, ByteBuffer out, long uLen, boolean tag) throws IOException {
         if (tag) {
             out.put(ATypeTag.SERIALIZED_STRING_TYPE_TAG);
         }
@@ -263,13 +281,20 @@
             throw new UnsupportedOperationException("String is too long");
         }
         int len = (int) uLen;
-        int strLen = UTF8StringUtil.getStringLength(in.array(), in.position() + in.arrayOffset(), len);
-        int adv = VarLenIntEncoderDecoder.encode(strLen, out.array(), out.position() + out.arrayOffset());
-        out.position(out.position() + adv);
-        System.arraycopy(in.array(), in.arrayOffset() + in.position(), out.array(), out.arrayOffset() + out.position(),
-                len);
-        out.position(out.position() + len);
-        in.position(in.position() + len);
+        stringCharBuffer.clear();
+        stringCharBuffer.position(0);
+        stringOut.reset();
+        decoder.reset();
+        int limit = in.limit();
+        in.limit(in.position() + len);
+        decoder.decode(in, stringCharBuffer, true);
+        decoder.flush(stringCharBuffer);
+        UTF8StringUtil.writeUTF8(stringCharBuffer.array(), 0, stringCharBuffer.position(), stringOut.getDataOutput(),
+                stringWriter);
+        System.arraycopy(stringOut.getByteArray(), stringOut.getStartOffset(), out.array(),
+                out.arrayOffset() + out.position(), stringOut.getLength());
+        out.position(out.position() + stringOut.getLength());
+        in.limit(limit);
     }
 
 }
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
index 39e480a..bf8823f 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
@@ -89,6 +89,7 @@
             private MessageUnpacker unpacker;
             private ArrayBufferInput unpackerInput;
             private List<Pair<ByteBuffer, Counter>> batchResults;
+            private MessageUnpackerToADM unpackerToADM;
 
             @Override
             public void open() throws HyracksDataException {
@@ -121,6 +122,7 @@
                 }
                 unpackerInput = new ArrayBufferInput(new byte[0]);
                 unpacker = MessagePack.newDefaultUnpacker(unpackerInput);
+                unpackerToADM = new MessageUnpackerToADM();
             }
 
             private void resetBuffers(int numTuples, int[] numCalls) {
@@ -211,8 +213,9 @@
                                 for (int colIdx = 0; colIdx < cols.length; colIdx++) {
                                     ref.set(buffer.array(), tRef.getFieldStart(cols[colIdx]),
                                             tRef.getFieldLength(cols[colIdx]));
-                                    PythonLibraryEvaluator.setArgument(fnDescs[func].getArgumentTypes()[colIdx], ref,
-                                            argHolders.get(func), fnDescs[func].getFunctionInfo().getNullCall());
+                                    libraryEvaluators.get(func).getSecond().setArgument(
+                                            fnDescs[func].getArgumentTypes()[colIdx], ref, argHolders.get(func),
+                                            fnDescs[func].getFunctionInfo().getNullCall());
                                 }
                             } else {
                                 numCalls[func]--;
@@ -266,7 +269,7 @@
                                 ATypeTag functionCalled = nullCalls[k][i];
                                 if (functionCalled == ATypeTag.TYPE) {
                                     if (result.getSecond().get() > 0) {
-                                        MessageUnpackerToADM.unpack(result.getFirst(), outputWrapper, true);
+                                        unpackerToADM.unpack(result.getFirst(), outputWrapper, true);
                                         result.getSecond().set(result.getSecond().get() - 1);
                                     } else {
                                         //emit NULL for functions which failed with a warning

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 1
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-MessageType: newchange

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ian Maxon <im...@uci.edu>:

Hello Glenn Galvizo, Anon. E. Moose #1000171, Hussain Towaileb, Till Westmann, Jenkins, Michael Blow, Murtadha Hubail, Dmitry Lychagin, Wael Alkowaileet, 

I'd like you to reexamine a change. Please visit

    https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826

to look at the new patch set (#5).

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................

[ASTERIXDB-2849][RT] Fix Python UDF string serde

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- Avoid string creation on serialization to msgpack
- Properly handle UTF-8 to Modified UTF-8 conversion
  on msgpack to ADM deserialization
- Add test with some unicode combiners and ZWJ

Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
---
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.11.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.12.query.sqlpp
R asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.13.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.10.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.9.adm
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
12 files changed, 164 insertions(+), 61 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/26/10826/5
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 5
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-MessageType: newpatchset

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ian Maxon <im...@uci.edu>:

Ian Maxon has uploaded this change for review. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )


Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................

[ASTERIXDB-2849][RT] Fix Python UDF string serde

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- Avoid string creation on serialization to msgpack
- Properly handle UTF-8 to Modified UTF-8 conversion
  on msgpack to ADM deserialization
- Add test with some unicode combiners and ZWJ

Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
---
M asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp
A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp
R asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.6.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm
A asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
12 files changed, 164 insertions(+), 61 deletions(-)



  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/26/10826/1

diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
index a8ba8a1..76a90a7 100644
--- a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
@@ -21,3 +21,6 @@
 create function typeValidation(a, b, c, d, e, f, g)
   as "roundtrip", "Tests.roundtrip" at testlib;
 
+create function stringTest(s) as "roundtrip",
+  "Tests.roundtrip" at testlib;
+
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp
new file mode 100644
index 0000000..cede236
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.query.sqlpp
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+use externallibtest;
+stringTest("Ḣ̵̠̝̖͔̫̫̬̠͇̲̈́̍̅͆̋̉͛̓̽͊̉͘͜Ę̵̻͎̀̄̋̈́͐̅̈́͜ ̷͍͇̹͊̈̂̀̀̅̔Ć̸̡̨̡̧̭̙̪̩͔̤̭͙͐͋̄͌̈̓͐͘͠͝͝Ǫ̴̨̡̯̟̰̪̲͎̮̪͚͒̀̌͋̽́̒͜Ṃ̵̢̛͓̑̐̑̌̍̋͝Ę̶̛̜̦̖̘̲̠͔͓͕̞̠͛̋̐̊̓̌͗̓̏͘͜S̵̢̧̬̣͈̖̀̃̉͊̑̀͐͂̀̂");
+
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp
new file mode 100644
index 0000000..0d82a8e
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.5.query.sqlpp
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+use externallibtest;
+stringTest("👩‍👨‍👦‍👧");
+
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.ddl.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.6.ddl.sqlpp
similarity index 100%
rename from asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.4.ddl.sqlpp
rename to asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.6.ddl.sqlpp
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
index 65a7c81..c960344 100644
--- a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
@@ -1,9 +1,9 @@
 { "id": 670301227662491648, "len": 20, "sent": 1 }
 { "id": 670301227553566720, "len": 139, "sent": 0 }
 { "id": 670301227041857536, "len": 112, "sent": 0 }
-{ "id": 670301227037519876, "len": 33, "sent": 0 }
+{ "id": 670301227037519876, "len": 34, "sent": 0 }
 { "id": 670301226987159552, "len": 57, "sent": 0 }
-{ "id": 670301226513391616, "len": 28, "sent": 1 }
+{ "id": 670301226513391616, "len": 29, "sent": 1 }
 { "id": 670301226337202180, "len": 77, "sent": 1 }
 { "id": 670301226190278656, "len": 25, "sent": 0 }
 { "id": 670301225959579648, "len": 112, "sent": 1 }
@@ -15,25 +15,25 @@
 { "id": 670301225162661889, "len": 28, "sent": 1 }
 { "id": 670301224885837824, "len": 63, "sent": 0 }
 { "id": 670301224814698496, "len": 59, "sent": 0 }
-{ "id": 670301224709849090, "len": 33, "sent": 1 }
+{ "id": 670301224709849090, "len": 37, "sent": 1 }
 { "id": 670301224684556288, "len": 21, "sent": 0 }
 { "id": 670301224680480768, "len": 39, "sent": 0 }
 { "id": 670301224348946433, "len": 64, "sent": 1 }
 { "id": 670301224261058560, "len": 61, "sent": 1 }
-{ "id": 670301224231690240, "len": 33, "sent": 0 }
-{ "id": 670301224214794240, "len": 33, "sent": 0 }
+{ "id": 670301224231690240, "len": 34, "sent": 0 }
+{ "id": 670301224214794240, "len": 41, "sent": 0 }
 { "id": 670301223753351168, "len": 105, "sent": 1 }
 { "id": 670301223426367488, "len": 23, "sent": 0 }
-{ "id": 670301223216545792, "len": 31, "sent": 0 }
+{ "id": 670301223216545792, "len": 34, "sent": 0 }
 { "id": 670301223182974976, "len": 34, "sent": 1 }
 { "id": 670301223128535041, "len": 21, "sent": 0 }
 { "id": 670301222759301121, "len": 132, "sent": 0 }
 { "id": 670301222734307329, "len": 110, "sent": 1 }
 { "id": 670301222717419520, "len": 81, "sent": 0 }
 { "id": 670301222318936064, "len": 110, "sent": 1 }
-{ "id": 670301222302150657, "len": 131, "sent": 0 }
+{ "id": 670301222302150657, "len": 135, "sent": 0 }
 { "id": 670301222222602240, "len": 43, "sent": 1 }
-{ "id": 670301222113517568, "len": 27, "sent": 0 }
+{ "id": 670301222113517568, "len": 29, "sent": 0 }
 { "id": 670301221836615680, "len": 44, "sent": 1 }
 { "id": 670301221719310336, "len": 28, "sent": 0 }
 { "id": 670301221442486272, "len": 34, "sent": 0 }
@@ -44,13 +44,13 @@
 { "id": 670301220305821696, "len": 140, "sent": 0 }
 { "id": 670301220247072770, "len": 83, "sent": 1 }
 { "id": 670301220196626432, "len": 36, "sent": 0 }
-{ "id": 670301220079312901, "len": 31, "sent": 1 }
-{ "id": 670301219949305857, "len": 70, "sent": 1 }
+{ "id": 670301220079312901, "len": 32, "sent": 1 }
+{ "id": 670301219949305857, "len": 94, "sent": 1 }
 { "id": 670301219739574273, "len": 131, "sent": 1 }
 { "id": 670301219206877184, "len": 27, "sent": 0 }
 { "id": 670301219139620864, "len": 124, "sent": 0 }
-{ "id": 670301218737123328, "len": 124, "sent": 0 }
-{ "id": 670301218640531458, "len": 31, "sent": 1 }
+{ "id": 670301218737123328, "len": 126, "sent": 0 }
+{ "id": 670301218640531458, "len": 33, "sent": 1 }
 { "id": 670301218598756352, "len": 47, "sent": 0 }
 { "id": 670301218565156865, "len": 44, "sent": 0 }
 { "id": 670301218414206976, "len": 71, "sent": 1 }
@@ -58,14 +58,14 @@
 { "id": 670301218078629888, "len": 9, "sent": 0 }
 { "id": 670301217851990017, "len": 111, "sent": 0 }
 { "id": 670301217793269760, "len": 113, "sent": 0 }
-{ "id": 670301217508036608, "len": 47, "sent": 0 }
+{ "id": 670301217508036608, "len": 55, "sent": 0 }
 { "id": 670301217369657344, "len": 137, "sent": 0 }
-{ "id": 670301217311088641, "len": 28, "sent": 0 }
+{ "id": 670301217311088641, "len": 29, "sent": 0 }
 { "id": 670301217231347712, "len": 123, "sent": 0 }
 { "id": 670301216891473920, "len": 44, "sent": 0 }
 { "id": 670301216874721280, "len": 68, "sent": 0 }
 { "id": 670301216799232000, "len": 50, "sent": 1 }
-{ "id": 670301216669171713, "len": 54, "sent": 0 }
+{ "id": 670301216669171713, "len": 55, "sent": 0 }
 { "id": 670301216493060097, "len": 113, "sent": 1 }
 { "id": 670301216400924676, "len": 35, "sent": 1 }
 { "id": 670301216371552258, "len": 58, "sent": 0 }
@@ -78,21 +78,21 @@
 { "id": 670301214958055424, "len": 58, "sent": 1 }
 { "id": 670301214605733888, "len": 139, "sent": 1 }
 { "id": 670301214509129728, "len": 114, "sent": 1 }
-{ "id": 670301214442041344, "len": 18, "sent": 1 }
+{ "id": 670301214442041344, "len": 19, "sent": 1 }
 { "id": 670301214295392256, "len": 47, "sent": 0 }
-{ "id": 670301213737529344, "len": 9, "sent": 0 }
+{ "id": 670301213737529344, "len": 10, "sent": 0 }
 { "id": 670301213544595457, "len": 63, "sent": 1 }
 { "id": 670301213515235333, "len": 107, "sent": 0 }
 { "id": 670301213464899584, "len": 105, "sent": 1 }
 { "id": 670301213120942080, "len": 39, "sent": 0 }
 { "id": 670301212961603585, "len": 63, "sent": 0 }
-{ "id": 670301212961603584, "len": 20, "sent": 0 }
-{ "id": 670301212856737792, "len": 51, "sent": 0 }
+{ "id": 670301212961603584, "len": 25, "sent": 0 }
+{ "id": 670301212856737792, "len": 55, "sent": 0 }
 { "id": 670301212760117248, "len": 133, "sent": 1 }
 { "id": 670301211808010240, "len": 103, "sent": 0 }
-{ "id": 670301211774468096, "len": 40, "sent": 0 }
+{ "id": 670301211774468096, "len": 41, "sent": 0 }
 { "id": 670301211703144450, "len": 138, "sent": 1 }
-{ "id": 670301211581685761, "len": 25, "sent": 1 }
+{ "id": 670301211581685761, "len": 26, "sent": 1 }
 { "id": 670301211560685568, "len": 12, "sent": 1 }
 { "id": 670301211090751490, "len": 140, "sent": 0 }
 { "id": 670301210654699520, "len": 13, "sent": 0 }
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm
new file mode 100644
index 0000000..9dad8cc
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.2.adm
@@ -0,0 +1 @@
+[ "Ḣ̵̠̝̖͔̫̫̬̠͇̲̈́̍̅͆̋̉͛̓̽͊̉͘͜Ę̵̻͎̀̄̋̈́͐̅̈́͜ ̷͍͇̹͊̈̂̀̀̅̔Ć̸̡̨̡̧̭̙̪̩͔̤̭͙͐͋̄͌̈̓͐͘͠͝͝Ǫ̴̨̡̯̟̰̪̲͎̮̪͚͒̀̌͋̽́̒͜Ṃ̵̢̛͓̑̐̑̌̍̋͝Ę̶̛̜̦̖̘̲̠͔͓͕̞̠͛̋̐̊̓̌͗̓̏͘͜S̵̢̧̬̣͈̖̀̃̉͊̑̀͐͂̀̂" ]
diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm
new file mode 100644
index 0000000..c00f342
--- /dev/null
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/python_open_type_validation/type_validation.3.adm
@@ -0,0 +1 @@
+[ "👩‍👨‍👦‍👧" ]
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
index e664f47..4629220 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
@@ -59,6 +59,7 @@
 
     private MessageUnpacker unpacker;
     private ArrayBufferInput unpackerInput;
+    private MessageUnpackerToADM unpackerToADM;
 
     private long fnId;
 
@@ -83,6 +84,7 @@
         this.sourceLocation = sourceLoc;
         this.unpackerInput = new ArrayBufferInput(new byte[0]);
         this.unpacker = MessagePack.newDefaultUnpacker(unpackerInput);
+        this.unpackerToADM = new MessageUnpackerToADM();
     }
 
     @Override
@@ -94,7 +96,7 @@
                 return;
             }
             try {
-                PythonLibraryEvaluator.setArgument(argTypes[i], argValues[i], argHolder, finfo.getNullCall());
+                libraryEvaluator.setArgument(argTypes[i], argValues[i], argHolder, finfo.getNullCall());
             } catch (IOException e) {
                 throw new HyracksDataException("Error evaluating Python UDF", e);
             }
@@ -125,7 +127,7 @@
             }
             int numresults = resultWrapper.get() ^ FIXARRAY_PREFIX;
             if (numresults > 0) {
-                MessageUnpackerToADM.unpack(resultWrapper, outputWrapper, true);
+                unpackerToADM.unpack(resultWrapper, outputWrapper, true);
             }
             unpackerInput.reset(resultWrapper.array(), resultWrapper.position() + resultWrapper.arrayOffset(),
                     resultWrapper.remaining());
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
index e2229ee..6255d2a 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
@@ -68,6 +68,7 @@
     private TaskAttemptId task;
     private IWarningCollector warningCollector;
     private SourceLocation sourceLoc;
+    private MessagePackerFromADM packerFromADM;
 
     public PythonLibraryEvaluator(JobId jobId, PythonLibraryEvaluatorId evaluatorId, ILibraryManager libMgr,
             File pythonHome, String sitePkgs, List<String> pythonArgs, ExternalFunctionResultRouter router,
@@ -82,6 +83,7 @@
         this.ipcSys = ipcSys;
         this.warningCollector = warningCollector;
         this.sourceLoc = sourceLoc;
+        this.packerFromADM = new MessagePackerFromADM();
 
     }
 
@@ -165,17 +167,17 @@
         router.removeRoute(proto.getRouteId());
     }
 
-    public static ATypeTag setArgument(IAType type, IValueReference valueReference, ByteBuffer argHolder,
-            boolean nullCall) throws IOException {
+    public ATypeTag setArgument(IAType type, IValueReference valueReference, ByteBuffer argHolder, boolean nullCall)
+            throws IOException {
         ATypeTag tag = type.getTypeTag();
         if (tag == ATypeTag.ANY) {
             TaggedValuePointable pointy = TaggedValuePointable.FACTORY.createPointable();
             pointy.set(valueReference);
             ATypeTag rtTypeTag = EnumDeserializer.ATYPETAGDESERIALIZER.deserialize(pointy.getTag());
             IAType rtType = TypeTagUtil.getBuiltinTypeByTag(rtTypeTag);
-            return MessagePackerFromADM.pack(valueReference, rtType, argHolder, nullCall);
+            return packerFromADM.pack(valueReference, rtType, argHolder, nullCall);
         } else {
-            return MessagePackerFromADM.pack(valueReference, type, argHolder, nullCall);
+            return packerFromADM.pack(valueReference, type, argHolder, nullCall);
         }
     }
 
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
index f0ac56e..f923d0c 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
@@ -16,6 +16,10 @@
  */
 package org.apache.asterix.external.library.msgpack;
 
+import static org.apache.hyracks.util.string.UTF8StringUtil.charAt;
+import static org.apache.hyracks.util.string.UTF8StringUtil.getModifiedUTF8Len;
+import static org.apache.hyracks.util.string.UTF8StringUtil.getNumBytesToStoreLength;
+import static org.apache.hyracks.util.string.UTF8StringUtil.getUTFLength;
 import static org.msgpack.core.MessagePack.Code.ARRAY32;
 import static org.msgpack.core.MessagePack.Code.FALSE;
 import static org.msgpack.core.MessagePack.Code.FIXARRAY_PREFIX;
@@ -32,10 +36,13 @@
 import static org.msgpack.core.MessagePack.Code.TRUE;
 
 import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+import java.nio.charset.CharsetEncoder;
 import java.nio.charset.StandardCharsets;
 
 import org.apache.asterix.common.exceptions.AsterixException;
 import org.apache.asterix.common.exceptions.ErrorCode;
+import org.apache.asterix.external.util.ExternalDataConstants;
 import org.apache.asterix.om.types.ARecordType;
 import org.apache.asterix.om.types.ATypeTag;
 import org.apache.asterix.om.types.AUnionType;
@@ -62,13 +69,20 @@
     private static final int LENGTH_SIZE = 4;
     private static final int ITEM_COUNT_SIZE = 4;
     private static final int ITEM_OFFSET_SIZE = 4;
+    private final CharsetEncoder encoder;
+    private final CharBuffer cbuf;
 
-    public static ATypeTag pack(IValueReference ptr, IAType type, ByteBuffer out, boolean packUnknown)
+    public MessagePackerFromADM() {
+        encoder = StandardCharsets.UTF_8.newEncoder();
+        cbuf = CharBuffer.allocate(ExternalDataConstants.DEFAULT_BUFFER_SIZE);
+    }
+
+    public ATypeTag pack(IValueReference ptr, IAType type, ByteBuffer out, boolean packUnknown)
             throws HyracksDataException {
         return pack(ptr.getByteArray(), ptr.getStartOffset(), type, true, packUnknown, out);
     }
 
-    public static ATypeTag pack(byte[] ptr, int offs, IAType type, boolean tagged, boolean packUnknown, ByteBuffer out)
+    public ATypeTag pack(byte[] ptr, int offs, IAType type, boolean tagged, boolean packUnknown, ByteBuffer out)
             throws HyracksDataException {
         int relOffs = tagged ? offs + 1 : offs;
         ATypeTag tag = type.getTypeTag();
@@ -194,23 +208,33 @@
         out.put(strBytes);
     }
 
-    private static void packStr(byte[] in, int offs, ByteBuffer out) {
-        out.put(STR32);
+    private void packStr(byte[] in, int offs, ByteBuffer out) {
         //TODO: tagged/untagged. closed support is borked so always tagged rn
-        String str = UTF8StringUtil.toString(in, offs);
-        byte[] strBytes = str.getBytes(StandardCharsets.UTF_8);
-        out.putInt(strBytes.length);
-        out.put(strBytes);
-    }
-
-    public static void packStr(String str, ByteBuffer out) {
+        cbuf.clear();
+        cbuf.position(0);
+        encoder.reset();
         out.put(STR32);
-        byte[] strBytes = str.getBytes(StandardCharsets.UTF_8);
-        out.putInt(strBytes.length);
-        out.put(strBytes);
+        final int calculatedLength = getUTFLength(in, offs);
+        int remainingLen = calculatedLength;
+        final int varSzOffset = getNumBytesToStoreLength(calculatedLength);
+        int pos = varSzOffset;
+        while (remainingLen > 0) {
+            char c = charAt(in, pos + offs);
+            cbuf.put(c);
+            int charLen = getModifiedUTF8Len(c);
+            pos += charLen;
+            remainingLen -= charLen;
+        }
+        int sizeStart = out.position();
+        out.putInt(-1);
+        cbuf.flip();
+        int stringStart = out.position();
+        encoder.encode(cbuf, out, true);
+        encoder.flush(out);
+        out.putInt(sizeStart, out.position() - stringStart);
     }
 
-    private static void packArray(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
+    private void packArray(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
         //TODO: - could optimize to pack fixarray/array16 for small arrays
         //      - this code is basically a static version of AListPointable, could be deduped
         AbstractCollectionType collType = (AbstractCollectionType) type;
@@ -234,7 +258,7 @@
         }
     }
 
-    private static void packObject(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
+    private void packObject(byte[] in, int offs, IAType type, ByteBuffer out) throws HyracksDataException {
         ARecordType recType = (ARecordType) type;
         out.put(MAP32);
         int fieldCt = recType.getFieldNames().length + RecordUtils.getOpenFieldCount(in, offs, recType);
@@ -242,7 +266,7 @@
         for (int i = 0; i < recType.getFieldNames().length; i++) {
             String field = recType.getFieldNames()[i];
             IAType fieldType = RecordUtils.getClosedFieldType(recType, i);
-            packStr(field, out);
+            packStr(out, field);
             pack(in, RecordUtils.getClosedFieldOffset(in, offs, recType, i), fieldType, false, true, out);
         }
         if (RecordUtils.isExpanded(in, offs, recType)) {
@@ -332,7 +356,7 @@
         }
 
         public static int getOpenFieldNameSize(byte[] bytes, int start, ARecordType recordType, int fieldId) {
-            int utfleng = UTF8StringUtil.getUTFLength(bytes, getOpenFieldNameOffset(bytes, start, recordType, fieldId));
+            int utfleng = getUTFLength(bytes, getOpenFieldNameOffset(bytes, start, recordType, fieldId));
             return utfleng + UTF8StringUtil.getNumBytesToStoreLength(utfleng);
         }
 
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
index 4af1121..c28b87d 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
@@ -18,18 +18,36 @@
 
 import static org.msgpack.core.MessagePack.Code.*;
 
+import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+import java.nio.charset.CharsetDecoder;
+import java.nio.charset.StandardCharsets;
 
 import org.apache.asterix.common.exceptions.AsterixException;
 import org.apache.asterix.common.exceptions.ErrorCode;
+import org.apache.asterix.external.util.ExternalDataConstants;
 import org.apache.asterix.om.types.ATypeTag;
 import org.apache.hyracks.api.exceptions.HyracksDataException;
-import org.apache.hyracks.util.encoding.VarLenIntEncoderDecoder;
+import org.apache.hyracks.data.std.util.ArrayBackedValueStorage;
 import org.apache.hyracks.util.string.UTF8StringUtil;
+import org.apache.hyracks.util.string.UTF8StringWriter;
 
 public class MessageUnpackerToADM {
 
-    public static void unpack(ByteBuffer in, ByteBuffer out, boolean tagged) throws HyracksDataException {
+    private final CharBuffer stringCharBuffer;
+    private final CharsetDecoder decoder;
+    private final UTF8StringWriter stringWriter;
+    private final ArrayBackedValueStorage stringOut;
+
+    public MessageUnpackerToADM() {
+        this.stringCharBuffer = CharBuffer.allocate(ExternalDataConstants.DEFAULT_BUFFER_SIZE);
+        this.decoder = StandardCharsets.UTF_8.newDecoder();
+        this.stringWriter = new UTF8StringWriter();
+        this.stringOut = new ArrayBackedValueStorage(ExternalDataConstants.DEFAULT_BUFFER_SIZE);
+    }
+
+    public void unpack(ByteBuffer in, ByteBuffer out, boolean tagged) throws IOException, HyracksDataException {
         byte tag = NIL;
         if (in != null) {
             tag = in.get();
@@ -196,7 +214,7 @@
         out.putDouble(in.getDouble());
     }
 
-    public static void unpackArray(ByteBuffer in, ByteBuffer out, long uLen) throws HyracksDataException {
+    public void unpackArray(ByteBuffer in, ByteBuffer out, long uLen) throws HyracksDataException, IOException {
         if (uLen > Integer.MAX_VALUE) {
             throw new UnsupportedOperationException("Array is too long");
         }
@@ -220,7 +238,7 @@
         out.putInt(asxLenPos, totalLen);
     }
 
-    public static void unpackMap(ByteBuffer in, ByteBuffer out, int count) throws HyracksDataException {
+    public void unpackMap(ByteBuffer in, ByteBuffer out, int count) throws IOException, HyracksDataException {
         //TODO: need to handle typed records. this only produces a completely open record.
         //hdr size = 6?
         int startOffs = out.position();
@@ -243,6 +261,7 @@
         for (int i = 0; i < count; i++) {
             int offs = out.position() + out.arrayOffset();
             int relOffs = offs - startOffs;
+            int stroffs = in.position();
             unpack(in, out, false);
             int hash = UTF8StringUtil.hash(out.array(), offs);
             out.putInt(offsetAryPos, hash);
@@ -254,8 +273,7 @@
         out.putInt(totalSizeOffs, out.position() - startOffs);
     }
 
-    public static void unpackStr(ByteBuffer in, ByteBuffer out, long uLen, boolean tag) {
-        //TODO: this probably breaks for 3 and 4 byte UTF-8
+    public void unpackStr(ByteBuffer in, ByteBuffer out, long uLen, boolean tag) throws IOException {
         if (tag) {
             out.put(ATypeTag.SERIALIZED_STRING_TYPE_TAG);
         }
@@ -263,13 +281,20 @@
             throw new UnsupportedOperationException("String is too long");
         }
         int len = (int) uLen;
-        int strLen = UTF8StringUtil.getStringLength(in.array(), in.position() + in.arrayOffset(), len);
-        int adv = VarLenIntEncoderDecoder.encode(strLen, out.array(), out.position() + out.arrayOffset());
-        out.position(out.position() + adv);
-        System.arraycopy(in.array(), in.arrayOffset() + in.position(), out.array(), out.arrayOffset() + out.position(),
-                len);
-        out.position(out.position() + len);
-        in.position(in.position() + len);
+        stringCharBuffer.clear();
+        stringCharBuffer.position(0);
+        stringOut.reset();
+        decoder.reset();
+        int limit = in.limit();
+        in.limit(in.position() + len);
+        decoder.decode(in, stringCharBuffer, true);
+        decoder.flush(stringCharBuffer);
+        UTF8StringUtil.writeUTF8(stringCharBuffer.array(), 0, stringCharBuffer.position(), stringOut.getDataOutput(),
+                stringWriter);
+        System.arraycopy(stringOut.getByteArray(), stringOut.getStartOffset(), out.array(),
+                out.arrayOffset() + out.position(), stringOut.getLength());
+        out.position(out.position() + stringOut.getLength());
+        in.limit(limit);
     }
 
 }
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
index 39e480a..bf8823f 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
@@ -89,6 +89,7 @@
             private MessageUnpacker unpacker;
             private ArrayBufferInput unpackerInput;
             private List<Pair<ByteBuffer, Counter>> batchResults;
+            private MessageUnpackerToADM unpackerToADM;
 
             @Override
             public void open() throws HyracksDataException {
@@ -121,6 +122,7 @@
                 }
                 unpackerInput = new ArrayBufferInput(new byte[0]);
                 unpacker = MessagePack.newDefaultUnpacker(unpackerInput);
+                unpackerToADM = new MessageUnpackerToADM();
             }
 
             private void resetBuffers(int numTuples, int[] numCalls) {
@@ -211,8 +213,9 @@
                                 for (int colIdx = 0; colIdx < cols.length; colIdx++) {
                                     ref.set(buffer.array(), tRef.getFieldStart(cols[colIdx]),
                                             tRef.getFieldLength(cols[colIdx]));
-                                    PythonLibraryEvaluator.setArgument(fnDescs[func].getArgumentTypes()[colIdx], ref,
-                                            argHolders.get(func), fnDescs[func].getFunctionInfo().getNullCall());
+                                    libraryEvaluators.get(func).getSecond().setArgument(
+                                            fnDescs[func].getArgumentTypes()[colIdx], ref, argHolders.get(func),
+                                            fnDescs[func].getFunctionInfo().getNullCall());
                                 }
                             } else {
                                 numCalls[func]--;
@@ -266,7 +269,7 @@
                                 ATypeTag functionCalled = nullCalls[k][i];
                                 if (functionCalled == ATypeTag.TYPE) {
                                     if (result.getSecond().get() > 0) {
-                                        MessageUnpackerToADM.unpack(result.getFirst(), outputWrapper, true);
+                                        unpackerToADM.unpack(result.getFirst(), outputWrapper, true);
                                         result.getSecond().set(result.getSecond().get() - 1);
                                     } else {
                                         //emit NULL for functions which failed with a warning

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 1
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-MessageType: newchange

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
Anon. E. Moose #1000171 has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 7:

Analytics Compatibility Compilation Successful
https://cbjenkins.page.link/kU6tJtozZEozEVVj7 : SUCCESS


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 7
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Aug 2021 04:35:17 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
Anon. E. Moose #1000171 has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 7: Contrib-2

Analytics Compatibility Tests Failed
https://cbjenkins.page.link/qCdcKdFiFjgbVcWB6 : UNSTABLE


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 7
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Aug 2021 07:05:32 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
Anon. E. Moose #1000171 has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 4: Contrib-2

Analytics Compatibility Compilation Failed
https://cbjenkins.page.link/HQ2Ynpm8GKxgnQzL8 : UNSTABLE


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 4
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Comment-Date: Wed, 04 Aug 2021 17:10:27 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
Anon. E. Moose #1000171 has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 1: Contrib-2

Analytics Compatibility Tests Failed
https://cbjenkins.page.link/cdhtgYHNv87TTMhM8 : UNSTABLE


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 1
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Comment-Date: Fri, 02 Apr 2021 03:41:20 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ian Maxon <im...@uci.edu>:

Ian Maxon has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 6: Contrib+1 Code-Review+1

doesn't make tests any worse


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 6
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 02:52:00 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ian Maxon <im...@uci.edu>:

Ian Maxon has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 7: Code-Review+1


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 7
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Fri, 06 Aug 2021 04:29:22 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
Anon. E. Moose #1000171 has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 3:

Analytics Compatibility Compilation Successful
https://cbjenkins.page.link/dnZqqR9xLLrjpnXT6 : SUCCESS


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 3
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Comment-Date: Tue, 03 Aug 2021 19:57:06 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Jenkins <je...@fulliautomatix.ics.uci.edu>:

Jenkins has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 2: Integration-Tests-1

Integration Tests Failed

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/12309/ : FAILURE


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 2
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Comment-Date: Tue, 03 Aug 2021 17:52:45 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ian Maxon <im...@uci.edu>:

Ian Maxon has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 4
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 21:53:24 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ian Maxon <im...@uci.edu>:

Hello Anon. E. Moose #1000171, Jenkins, 

I'd like you to reexamine a change. Please visit

    https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826

to look at the new patch set (#3).

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................

[ASTERIXDB-2849][RT] Fix Python UDF string serde

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- Avoid string creation on serialization to msgpack
- Properly handle UTF-8 to Modified UTF-8 conversion
  on msgpack to ADM deserialization
- Add test with some unicode combiners and ZWJ

Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
---
D asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.11.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
8 files changed, 120 insertions(+), 81 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/26/10826/3
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 3
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-MessageType: newpatchset

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Ian Maxon <im...@uci.edu>:

Hello Anon. E. Moose #1000171, Jenkins, 

I'd like you to reexamine a change. Please visit

    https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826

to look at the new patch set (#2).

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................

[ASTERIXDB-2849][RT] Fix Python UDF string serde

- user model changes: no
- storage format changes: no
- interface changes: no

Details:

- Avoid string creation on serialization to msgpack
- Properly handle UTF-8 to Modified UTF-8 conversion
  on msgpack to ADM deserialization
- Add test with some unicode combiners and ZWJ

Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
---
D asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.11.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-library/python_open_type_validation/type_validation.2.ddl.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/results/external-library/mysentiment_twitter/mysentiment_twitter.13.adm
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/ExternalScalarPythonFunctionEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/PythonLibraryEvaluator.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessagePackerFromADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/library/msgpack/MessageUnpackerToADM.java
M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/operators/ExternalAssignBatchRuntimeFactory.java
8 files changed, 119 insertions(+), 80 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/26/10826/2
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 2
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-MessageType: newpatchset

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
Anon. E. Moose #1000171 has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 4:

Analytics Compatibility Compilation Failed
https://cbjenkins.page.link/44m3ZzXwgTLuSUBR9 : UNSTABLE


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 4
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 21:57:00 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
From Jenkins <je...@fulliautomatix.ics.uci.edu>:

Jenkins has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 6: Integration-Tests+1

Integration Tests Successful

https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-integration-tests/12326/ : SUCCESS


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 6
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 00:32:11 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

Change in asterixdb[master]: [ASTERIXDB-2849][RT] Fix Python UDF string serde

Posted by AsterixDB Code Review <do...@asterix-gerrit.ics.uci.edu>.
Anon. E. Moose #1000171 has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826 )

Change subject: [ASTERIXDB-2849][RT] Fix Python UDF string serde
......................................................................


Patch Set 6: Contrib-2

Analytics Compatibility Tests Failed
https://cbjenkins.page.link/6C3xrYHAdH5yNVhV9 : UNSTABLE


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/10826
To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I5af4da999985afcc33cdfacea79576f1d6109174
Gerrit-Change-Number: 10826
Gerrit-PatchSet: 6
Gerrit-Owner: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Dmitry Lychagin <dm...@couchbase.com>
Gerrit-Reviewer: Glenn Galvizo <gg...@uci.edu>
Gerrit-Reviewer: Hussain Towaileb <hu...@gmail.com>
Gerrit-Reviewer: Ian Maxon <im...@uci.edu>
Gerrit-Reviewer: Jenkins <je...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Murtadha Hubail <mh...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Wael Alkowaileet <wa...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 01:59:05 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment