You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/24 01:38:17 UTC
[GitHub] [hudi] cdmikechen edited a comment on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

cdmikechen edited a comment on issue #2005:
URL: https://github.com/apache/hudi/issues/2005#issuecomment-678860501


   > @cdmikechen : Also, if you look at integration tests ITTestHoodieDemo, we cover the tests with hive syncing and this test has been passing for us. Can you take a look at the tests to see what the difference is ?
   
   @bvaradar I checked `hudi-integ-test` package and found the reason:
   In `hudi-integ-test` pom.xml where contains `ITTestHoodieDemo`, hudi contains `hudi-exec-2.3.1` In pom dependencies. So that if we new a `MapredParquetInputFormat` class, hudi will use this class by `hudi-exec-2.3.1`.
   ```java 
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.io.NullWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.InputSplit;
   import org.apache.hadoop.mapred.JobConf;
   import org.apache.hadoop.mapred.RecordReader;
   import org.apache.hadoop.mapred.Reporter;
   import org.slf4j.Logger;
   import org.slf4j.LoggerFactory;
   
   import org.apache.parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat<NullWritable, ArrayWritable> implements VectorizedInputFormatInterface {
   ```
   But if we just use a standalone spark environmental without a hive-2.3.1 dependencies (like starting a new project and only depend spark lib), hudi will use `hive-exec-1.2.1-spark`.
   ```java
   package org.apache.hadoop.hive.ql.io.parquet;
   
   import java.io.IOException;
   import org.apache.commons.logging.Log;
   import org.apache.commons.logging.LogFactory;
   import org.apache.hadoop.hive.ql.exec.Utilities;
   import org.apache.hadoop.hive.ql.exec.vector.VectorizedInputFormatInterface;
   import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;
   import org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper;
   import org.apache.hadoop.io.ArrayWritable;
   import org.apache.hadoop.mapred.FileInputFormat;
   import org.apache.hadoop.mapred.RecordReader;
   
   import parquet.hadoop.ParquetInputFormat;
   
   public class MapredParquetInputFormat extends FileInputFormat<Void, ArrayWritable> {
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org