You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/21 17:53:00 UTC
[jira] [Work logged] (HIVE-22944) Upgrade to Kryo5

     [ https://issues.apache.org/jira/browse/HIVE-22944?focusedWorklogId=539229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-539229 ]

ASF GitHub Bot logged work on HIVE-22944:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jan/21 17:52
            Start Date: 21/Jan/21 17:52
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on a change in pull request #1798:
URL: https://github.com/apache/hive/pull/1798#discussion_r562072942



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java
##########
@@ -2855,16 +2836,26 @@ public void TestBigintSarg() throws Exception {
   }
 
   @Test
-  public void TestBooleanSarg() throws Exception {
-    String serialAst =
-        "AQEAamF2YS51dGlsLkFycmF5TGlz9AECAQFvcmcuYXBhY2hlLmhhZG9vcC5oaXZlLnFsLnBsYW4uRXh" +
-            "wck5vZGVHZW5lcmljRnVuY0Rlc+MBAQABAgECb3JnLmFwYWNoZS5oYWRvb3AuaGl2ZS5xbC5wbGFuLk" +
-            "V4cHJOb2RlQ29sdW1uRGVz4wEBYrEAAAFib29sb3LjAQNvcmcuYXBhY2hlLmhhZG9vcC5oaXZlLnNlc" +
-            "mRlMi50eXBlaW5mby5QcmltaXRpdmVUeXBlSW5m7wEBYm9vbGVh7gEEb3JnLmFwYWNoZS5oYWRvb3Au" +
-            "aGl2ZS5xbC5wbGFuLkV4cHJOb2RlQ29uc3RhbnREZXPjAQEDCQUBAQVvcmcuYXBhY2hlLmhhZG9vcC5" +
-            "oaXZlLnFsLnVkZi5nZW5lcmljLkdlbmVyaWNVREZPUEVxdWHsAQAAAYI9AUVRVUHMAQZvcmcuYXBhY2" +
-            "hlLmhhZG9vcC5pby5Cb29sZWFuV3JpdGFibOUBAAABAwkBAgEBYrIAAAgBAwkBB29yZy5hcGFjaGUua" +
-            "GFkb29wLmhpdmUucWwudWRmLmdlbmVyaWMuR2VuZXJpY1VERk9QQW7kAQEGAQAAAQMJ";
+  public void testBooleanSarg() throws Exception {
+    ExprNodeDesc column1 =

Review comment:
       nit: maybe move Sarg  Expr creation to separate method for consistency?

##########
File path: pom.xml
##########
@@ -170,7 +170,9 @@
     <junit.jupiter.version>5.6.2</junit.jupiter.version>
     <junit.vintage.version>5.6.2</junit.vintage.version>
     <kafka.version>2.5.0</kafka.version>
-    <kryo.version>4.0.2</kryo.version>
+    <kryo.version>5.0.3</kryo.version>
+    <kryo4.version>4.0.2</kryo4.version> <!-- old kryo, for spark-client -->

Review comment:
       Add a comment linking to Spark removal JIRA as a TODO?

##########
File path: itests/hive-jmh/src/main/java/org/apache/hive/benchmark/ql/exec/KryoBench.java
##########
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hive.benchmark.ql.exec;
+
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.exec.SerializationUtilities;
+import org.apache.hadoop.hive.ql.exec.Utilities;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat;
+import org.apache.hadoop.hive.ql.io.orc.OrcSerde;
+import org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.BigRowInspector;
+import org.apache.hadoop.hive.ql.plan.MapWork;
+import org.apache.hadoop.hive.ql.plan.PartitionDesc;
+import org.apache.hadoop.hive.ql.plan.TableDesc;
+import org.apache.hadoop.hive.ql.plan.VectorPartitionDesc;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.mapred.JobConf;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.RunnerException;
+import org.openjdk.jmh.runner.options.Options;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+
+import com.google.common.io.ByteStreams;
+
+@State(Scope.Benchmark)
+public class KryoBench {
+
+  @BenchmarkMode(Mode.AverageTime)
+  @Fork(1)
+  @State(Scope.Thread)
+  @OutputTimeUnit(TimeUnit.MILLISECONDS)
+  public static class BaseBench {
+
+    private MapWork mapWork;
+
+    @Setup
+    public void setup() throws Exception {
+      mapWork = KryoBench.mockMapWork("my_table", 1000, new BigRowInspector());
+    }
+
+    @Benchmark
+    @Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
+    @Measurement(iterations = 20, time = 2, timeUnit = TimeUnit.SECONDS)
+    public void testSerializeMapWork() {
+      SerializationUtilities.serializePlan(mapWork, ByteStreams.nullOutputStream());
+    }
+  }
+
+  public static void main(String[] args) throws RunnerException {

Review comment:
       This is cool! Shall we add a more complex Map for bench here and a perf number for reference?
   Maybe previous kryo4 vs kryo5?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java
##########
@@ -224,51 +224,57 @@ public Registration readClass(Input input) {
 
   private static final Object FAKE_REFERENCE = new Object();
 
-  private static KryoFactory factory = new KryoFactory() {
-    @Override
-    public Kryo create() {
-      KryoWithHooks kryo = new KryoWithHooks();
-      kryo.register(java.sql.Date.class, new SqlDateSerializer());
-      kryo.register(java.sql.Timestamp.class, new TimestampSerializer());
-      kryo.register(TimestampTZ.class, new TimestampTZSerializer());
-      kryo.register(Path.class, new PathSerializer());
-      kryo.register(Arrays.asList("").getClass(), new ArraysAsListSerializer());
-      kryo.register(new java.util.ArrayList().subList(0,0).getClass(), new ArrayListSubListSerializer());
-      kryo.register(CopyOnFirstWriteProperties.class, new CopyOnFirstWritePropertiesSerializer());
-      kryo.register(PartitionDesc.class, new PartitionDescSerializer(kryo, PartitionDesc.class));
-
-      ((Kryo.DefaultInstantiatorStrategy) kryo.getInstantiatorStrategy())
-          .setFallbackInstantiatorStrategy(
-              new StdInstantiatorStrategy());
-      removeField(kryo, AbstractOperatorDesc.class, "colExprMap");
-      removeField(kryo, AbstractOperatorDesc.class, "statistics");
-      kryo.register(ReduceWork.class);
-      kryo.register(TableDesc.class);
-      kryo.register(UnionOperator.class);
-      kryo.register(FileSinkOperator.class);
-      kryo.register(VectorFileSinkOperator.class);
-      kryo.register(HiveIgnoreKeyTextOutputFormat.class);
-      kryo.register(StandardConstantListObjectInspector.class);
-      kryo.register(StandardConstantMapObjectInspector.class);
-      kryo.register(StandardConstantStructObjectInspector.class);
-      kryo.register(SequenceFileInputFormat.class);
-      kryo.register(RCFileInputFormat.class);
-      kryo.register(HiveSequenceFileOutputFormat.class);
-      kryo.register(LlapOutputFormat.class);
-      kryo.register(SparkEdgeProperty.class);
-      kryo.register(SparkWork.class);
-      kryo.register(Pair.class);
-      kryo.register(MemoryMonitorInfo.class);
-
-      // This must be called after all the explicit register calls.
-      return kryo.processHooks(kryoTypeHooks, globalHook);
-    }
-  };
-
   // Bounded queue could be specified here but that will lead to blocking.
   // ConcurrentLinkedQueue is unbounded and will release soft referenced kryo instances under
   // memory pressure.
-  private static KryoPool kryoPool = new KryoPool.Builder(factory).softReferences().build();
+  private static Pool<Kryo> kryoPool = new Pool<Kryo>(true, false, 8) {
+    protected Kryo create() {
+      return createNewKryo();
+    }
+  };
+
+  public static Kryo createNewKryo() {
+    KryoWithHooks kryo = new KryoWithHooks();
+
+    kryo.setReferences(true);
+    kryo.setCopyReferences(true);
+    kryo.setRegistrationRequired(false);

Review comment:
       Maybe add a comment why the above 3 lines are needed? 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java
##########
@@ -278,7 +284,7 @@ public Kryo create() {
    * @return kryo instance
    */
   public static Kryo borrowKryo() {
-    Kryo kryo = kryoPool.borrow();
+    Kryo kryo = kryoPool.obtain();

Review comment:
       Maybe rename the method itself here as well?  something like obtainKryo() ?

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSparkInvalidFileFormat.java
##########
@@ -30,11 +30,13 @@
 import org.apache.hadoop.hive.ql.session.SessionState;
 
 import org.junit.Assert;
+import org.junit.Ignore;
 import org.junit.Test;
 
 import java.io.File;
 import java.io.IOException;
 
+@Ignore("HIVE-22944: Kryo 5 upgrade conflicts with Spark, which is not supported anymore")

Review comment:
       Does this mean this is a breaking change for Spark?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 539229)
    Time Spent: 20m  (was: 10m)

> Upgrade to Kryo5
> ----------------
>
>                 Key: HIVE-22944
>                 URL: https://issues.apache.org/jira/browse/HIVE-22944
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22944.01.patch, kryo4_vs_5_benchmark.log
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Maybe we should consider upgrading to kryo5 (plan ser/deser). Not sure about performance benefits, but looking at the code, e.g. FieldSerializer in Kryo5 seems to let us extend it easier (less private fields), which could be a benefit if we want to change its behavior, e.g. defining different logic for different fields of an object.
> Kryo 4 FieldSerializer: https://github.com/EsotericSoftware/kryo/blob/kryo-4/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> Kryo 5 FieldSerialier: https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/serializers/FieldSerializer.java
> UPDATE: currently we are at kryo 5.0.3
> TODO: why kryo-shaded artifact has been used so far?
> https://javalibs.com/artifact/com.esotericsoftware/kryo-shaded
> "This contains the shaded reflectasm jar to prevent conflicts with other versions of asm."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)