You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/06 09:54:48 UTC

[GitHub] [incubator-iceberg] massdosage commented on a change in pull request #843: InputFormat support for Iceberg

massdosage commented on a change in pull request #843: InputFormat support for Iceberg
URL: https://github.com/apache/incubator-iceberg/pull/843#discussion_r403967401
 
 

 ##########
 File path: mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormat.java
 ##########
 @@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.mr;
+
+import com.google.common.collect.FluentIterable;
+import com.google.common.collect.ImmutableMap;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Locale;
+import java.util.function.Function;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.TaskAttemptID;
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Files;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.TestHelpers.Row;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.data.RandomGenericData;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.data.avro.DataWriter;
+import org.apache.iceberg.data.parquet.GenericParquetWriter;
+import org.apache.iceberg.hadoop.HadoopCatalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.io.FileAppender;
+import org.apache.iceberg.parquet.Parquet;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+
+@RunWith(Parameterized.class)
+public class TestIcebergInputFormat {
 
 Review comment:
   We've had similar issues in our branch where we are trying to get the Hive InputFormat to work. Hive 2.3.6 requires Guava 11.0.2, if a newer Guava version is on the classpath Hive is unable to use the InputFormat due to exceptions similar to the one above.  So we have to remove Guava as an exposed dependency from all Iceberg artifacts which appear on the Hive classpath. The only way we've managed to get it to work is by doing the following:
   
   * Alter every Iceberg module that uses Guava to shade and relocate it (which IMHO is a good thing to do anyway so external users of Iceberg can use their own versions of Guava).
   * Depend on the shaded version of these modules from iceberg-mr.
   * Remove Guava from `versions.props` so that different subprojects can depend on different versions of it.
   * The Guava version that then gets used in iceberg-mr is the transitive one from Hive 2.3.6 (in this case) which is Guava 11.0.2.
   
   You can see these changes here: https://github.com/ExpediaGroup/incubator-iceberg/blob/078a06ddd78d08648127d8b2e8dc41e0febf7f49/build.gradle We're not Gradle experts so hopefully there is an easier way to do all of this but I think the general steps outlined above will still be required.
   
   Ultimately I think this issue is going to have to be solved for both InputFormats so any changes that would allow different versions of Guava to be used would be great.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org