You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/28 01:33:07 UTC

[GitHub] [incubator-hudi] tieke1121 commented on issue #1568: [SUPPORT] java.lang.reflect.InvocationTargetException when upsert

tieke1121 commented on issue #1568:
URL: https://github.com/apache/incubator-hudi/issues/1568#issuecomment-620324107


   I've set it up
    dataFrame.writeStream
         .format("org.apache.hudi")
         .option("path", conf.getString("hudi.basePath"))
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, conf.getString("hudi.recordkey"))
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, conf.getString("hudi.precombineKey"))
         .option(HoodieWriteConfig.TABLE_NAME, conf.getString("hudi.tableName"))
         .option("checkpointLocation", conf.getString("hudi.checkpoinPath"))
         .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, conf.getString("hive.database"))
         .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, conf.getString("hive.table"))
         .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, conf.getString("hive.url"))
         .option(DataSourceWriteOptions.HIVE_USER_OPT_KEY, conf.getString("hive.username"))
         .option(DataSourceWriteOptions.HIVE_PASS_OPT_KEY, conf.getString("hive.password"))
         .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
         .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,classOf[NonpartitionedKeyGenerator].getCanonicalName)
         .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, classOf[NonPartitionedExtractor].getCanonicalName)
         .outputMode(OutputMode.Append())
         .start()
   
   and the hdfs path is : 
   -rw-r--r--   3 root supergroup     737196 2020-04-28 01:16 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-1068-281898_20200428011554.parquet
   -rw-r--r--   3 root supergroup     745158 2020-04-28 01:16 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-1120-295461_20200428011603.parquet
   -rw-r--r--   3 root supergroup     750006 2020-04-28 01:16 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-1168-309014_20200428011613.parquet
   -rw-r--r--   3 root supergroup     755947 2020-04-28 01:16 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-1217-322579_20200428011624.parquet
   -rw-r--r--   3 root supergroup     765879 2020-04-28 01:16 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-1267-336149_20200428011634.parquet
   -rw-r--r--   3 root supergroup     690225 2020-04-28 01:14 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-770-200500_20200428011449.parquet
   -rw-r--r--   3 root supergroup     698213 2020-04-28 01:15 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-819-214064_20200428011500.parquet
   -rw-r--r--   3 root supergroup     705870 2020-04-28 01:15 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-870-227637_20200428011511.parquet
   -rw-r--r--   3 root supergroup     713830 2020-04-28 01:15 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-918-241200_20200428011521.parquet
   -rw-r--r--   3 root supergroup     720687 2020-04-28 01:15 /wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-967-254767_20200428011532.parquet
   
   when i quey simple hive sql : select deviceid from device_status_hudi_1; 
   the query is ok
   
   but I use the complex hive SQL : select deviceid from device_status_hudi_1 group by deviceid having count(deviceid)>1;
   
   Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
   
   2020-04-28 01:29:16,698 INFO [main] org.apache.hadoop.hive.conf.HiveConf: Found configuration file null
   2020-04-28 01:29:16,880 INFO [main] org.apache.hadoop.hive.ql.exec.SerializationUtilities: Deserializing MapWork using kryo
   2020-04-28 01:29:17,045 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException
   	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
   	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
   	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
   	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217)
   	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
   	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:702)
   	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
   	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
   	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
   	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
   Caused by: java.lang.reflect.InvocationTargetException
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
   	... 11 more
   Caused by: java.io.FileNotFoundException: File does not exist: hdfs://sap-namenode1:8020/wap-olap/data/device/status/data_1/b33868cc-6609-47a3-8e93-bdd248deb21e-0_0-4288-1150066_20200428012704.parquet
   	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
   	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
   	at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
   	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:413)
   	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:400)
   	at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:79)
   	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:78)
   	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:63)
   	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:297)
   	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:68)
   	... 16 more


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org