You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/12 18:55:03 UTC

[GitHub] [hudi] tooptoop4 opened a new issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

tooptoop4 opened a new issue #1730:
URL: https://github.com/apache/hudi/issues/1730


   ```
   using hoodie 0.4.6 and spark 2.3.4
   
   run below in hiveserver2 (v2.3.4):
   
   CREATE EXTERNAL TABLE `someschema.mytbl`(
   col1 string,
   col2 string,
   col3 string)
   PARTITIONED BY ( 
     `mydate` string)
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
   STORED AS INPUTFORMAT 
     'com.uber.hoodie.hadoop.HoodieInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     's3a://redact/M5/table/mytbl'
     
     #use spark to create COW hudi parquet under s3://redact/M5/table/mytbl/2016/11/07/ and s3://redact/M/table/mytbl/2019/12/01/
     
     run below in hiveserver2:
     ALTER TABLE someschema.mytbl ADD IF NOT EXISTS PARTITION(mydate='2016-11-07')
   LOCATION 's3a://redact/M5/table/mytbl/2016/11/07/'
   ALTER TABLE someschema.mytbl ADD IF NOT EXISTS PARTITION(mydate='2019-12-01')
   LOCATION 's3a://redact/M/table/mytbl/2019/12/01/'
     
     
     hive metastore shows below 2 rows:
     
     select TBLS.TBL_NAME,PARTITIONS.PART_NAME,SDS.LOCATION
   from SDS,TBLS,PARTITIONS
   where PARTITIONS.SD_ID = SDS.SD_ID
   and TBLS.TBL_ID=PARTITIONS.TBL_ID
   and TBLS.TBL_NAME = 'mytbl'
   order by 1,2;
   
   
   mytbl	mydate=2016-11-07	s3a://redact/M5/table/mytbl/2016/11/07
   mytbl	mydate=2019-12-01	s3a://redact/M/table/mytbl/2019/12/01
   
   
   
   
   
   query1:
   select count(1) from someschema.mytbl where datestr = '2016-11-07'
   
   works fine from both hiveserver2 and presto
   
   query2:
   select count(1) from someschema.mytbl where datestr = '2019-12-01'
   
   presto gives unhelpful error:
   
   io.prestosql.spi.PrestoException: HIVE_UNKNOWN_ERROR
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:223)
   	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
   	at io.prestosql.$gen.Presto_ff748c3_dirty____20200610_171635_2.run(Unknown Source)
   	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: undefined
   
   
   hiveserver2 gives more verbose yet still not too helpful error:
   2020-06-12T18:22:23,375  WARN [HiveServer2-Handler-Pool: Thread-12109] thrift.ThriftCLIService: Error fetching results:
   org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 2
           at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:499) ~[hive-service-2.3.4.jar:2.3.4]
           at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307) ~[hive-service-2.3.4.jar:2.3.4]
           at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878) ~[hive-service-2.3.4.jar:2.3.4]
           at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source) ~[?:?]
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
           at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
           at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-2.3.4.jar:2.3.4]
           at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.4.jar:2.3.4]
           at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.4.jar:2.3.4]
           at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_252]
           at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_252]
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ~[hadoop-common-2.8.5.jar:?]
           at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.4.jar:2.3.4]
           at com.sun.proxy.$Proxy42.fetchResults(Unknown Source) ~[?:?]
           at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559) ~[hive-service-2.3.4.jar:2.3.4]
           at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751) ~[hive-service-2.3.4.jar:2.3.4]
           at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-2.3.4.jar:2.3.4]
           at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.4.jar:2.3.4]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
   Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 2
           at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494) ~[hive-service-2.3.4.jar:2.3.4]
           ... 24 more
   Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
           at com.uber.hoodie.common.util.FSUtils.getCommitTime(FSUtils.java:120) ~[hoodiebundle.jar:?]
           at com.uber.hoodie.common.model.HoodieDataFile.getCommitTime(HoodieDataFile.java:37) ~[hoodiebundle.jar:?]
           at com.uber.hoodie.common.model.HoodieFileGroup.addDataFile(HoodieFileGroup.java:89) ~[hoodiebundle.jar:?]
           at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.lambda$null$3(HoodieTableFileSystemView.java:155) ~[hoodiebundle.jar:?]
           at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_252]
           at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.lambda$addFilesToView$5(HoodieTableFileSystemView.java:155) ~[hoodiebundle.jar:?]
           at java.lang.Iterable.forEach(Iterable.java:75) ~[?:1.8.0_252]
           at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.addFilesToView(HoodieTableFileSystemView.java:151) ~[hoodiebundle.jar:?]
           at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:107) ~[hoodiebundle.jar:?]
           at com.uber.hoodie.hadoop.HoodieInputFormat.listStatus(HoodieInputFormat.java:88) ~[hoodiebundle.jar:?]
           at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) ~[hadoop-mapreduce-client-core-2.8.5.jar:?]
           at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) ~[hive-exec-2.3.4.jar:2.3.4]
           at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494) ~[hive-service-2.3.4.jar:2.3.4]
           ... 24 more
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100


   prestosql 336 with hudi 0.5.3 gives better error:
   
   ```
   io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
   	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
   	at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
   	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
   	at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
   	at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
   	at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
   	at java.base/java.util.ArrayList.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
   	at java.base/java.lang.Iterable.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
   	... 6 more
   ```
   
   after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName  is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644468948


   what i posted is from server log will upg in some time


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-643477146


   @tooptoop4 what version of presto are you using ? You might want to turn the logging level to INFO/WARN for Hoodie in Presto. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100


   prestosql 336 with hudi 0.5.3 gives better error:
   
   ```
   io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
   	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
   	at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
   	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
   	at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
   	at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
   	at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
   	at java.base/java.util.ArrayList.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
   	at java.base/java.lang.Iterable.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
   	... 6 more
   ```
   
   after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName  is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
   
   
   
   s3 listing under partition folder of table that works  (there is .hoodie/ folder under base table path):
   2020-03-24 15:18:55         93 .hoodie_partition_metadata
   2020-03-24 15:18:57    2194374 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
   
   
   s3 listing under partition folder of table that gets the error (there is .hoodie/ folder under base table path):
   2020-03-24 15:18:44          0 _SUCCESS
   2020-03-24 15:18:37   10649992 part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:38    8787785 part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:39    9562198 part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:40    9359329 part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:41   10519118 part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:42   10452807 part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:42    9104366 part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:43    9016423 part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   
   
   **UPDATE**
   This is really old table, and got corrupted along the way. After removing .hoodie/ folder select works ok


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-653263476


   all this code is very different now.. Are you  facing similar issues with org.apache.hudi?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-643592629


   peesto334 it is info already


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100


   prestosql 336 with hudi 0.5.3 gives better error:
   
   ```
   io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
   	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
   	at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
   	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
   	at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
   	at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
   	at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
   	at java.base/java.util.ArrayList.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
   	at java.base/java.lang.Iterable.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
   	... 6 more
   ```
   
   after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644968272


   does info logging cause issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100


   prestosql 336 with hudi 0.5.3 gives better error:
   
   ```
   io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
   	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
   	at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
   	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
   	at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
   	at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
   	at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
   	at java.base/java.util.ArrayList.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
   	at java.base/java.lang.Iterable.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
   	... 6 more
   ```
   
   after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName  is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
   
   
   
   s3 listing under partition folder of table that works:
   2020-03-24 15:18:55         93 .hoodie_partition_metadata
   2020-03-24 15:18:57    2194374 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
   
   
   s3 listing under partition folder of table that gets the error:
   2020-03-24 15:18:44          0 _SUCCESS
   2020-03-24 15:18:37   10649992 part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:38    8787785 part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:39    9562198 part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:40    9359329 part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:41   10519118 part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:42   10452807 part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:42    9104366 part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:43    9016423 part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-655176203


   yes.. makes sense.. closing this issue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100


   prestosql 336 with hudi 0.5.3 gives better error:
   
   ```
   io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
   	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
   	at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
   	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
   	at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
   	at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
   	at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
   	at java.base/java.util.ArrayList.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
   	at java.base/java.lang.Iterable.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
   	... 6 more
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100


   prestosql 336 with hudi 0.5.3 gives better error:
   
   ```
   io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
   	at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
   	at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
   	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
   	at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
   	at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
   	at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
   	at java.base/java.util.ArrayList.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
   	at java.base/java.lang.Iterable.forEach(Unknown Source)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
   	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
   	at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
   	... 6 more
   ```
   
   after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName  is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
   
   
   
   s3 listing under partition folder of table that works  (there is .hoodie/ folder under base table path):
   2020-03-24 15:18:55         93 .hoodie_partition_metadata
   2020-03-24 15:18:57    2194374 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
   
   
   s3 listing under partition folder of table that gets the error (but there is no .hoodie/ folder under base table path):
   2020-03-24 15:18:44          0 _SUCCESS
   2020-03-24 15:18:37   10649992 part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:38    8787785 part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:39    9562198 part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:40    9359329 part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:41   10519118 part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:42   10452807 part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:42    9104366 part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   2020-03-24 15:18:43    9016423 part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar closed issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
vinothchandar closed issue #1730:
URL: https://github.com/apache/hudi/issues/1730


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644434568


   prestosql ? It seems like the log level is not picked properly. Have you checked the server logs to see if there are log messages related to com.uber.hoodie ? Just checking to if its just the presto client or both server and client are not picking up the log msgs. 
   
   side question, your hudi version is very old. Have you tried the recent version ? Or is there any specific reason around using this version of Hudi ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644961734


   can you try changing the level to `com.uber.hoodie=WARN` in etc/log.properties  and check again ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org