You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/12 18:55:03 UTC
[GitHub] [hudi] tooptoop4 opened a new issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
tooptoop4 opened a new issue #1730:
URL: https://github.com/apache/hudi/issues/1730
```
using hoodie 0.4.6 and spark 2.3.4
run below in hiveserver2 (v2.3.4):
CREATE EXTERNAL TABLE `someschema.mytbl`(
col1 string,
col2 string,
col3 string)
PARTITIONED BY (
`mydate` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'com.uber.hoodie.hadoop.HoodieInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3a://redact/M5/table/mytbl'
#use spark to create COW hudi parquet under s3://redact/M5/table/mytbl/2016/11/07/ and s3://redact/M/table/mytbl/2019/12/01/
run below in hiveserver2:
ALTER TABLE someschema.mytbl ADD IF NOT EXISTS PARTITION(mydate='2016-11-07')
LOCATION 's3a://redact/M5/table/mytbl/2016/11/07/'
ALTER TABLE someschema.mytbl ADD IF NOT EXISTS PARTITION(mydate='2019-12-01')
LOCATION 's3a://redact/M/table/mytbl/2019/12/01/'
hive metastore shows below 2 rows:
select TBLS.TBL_NAME,PARTITIONS.PART_NAME,SDS.LOCATION
from SDS,TBLS,PARTITIONS
where PARTITIONS.SD_ID = SDS.SD_ID
and TBLS.TBL_ID=PARTITIONS.TBL_ID
and TBLS.TBL_NAME = 'mytbl'
order by 1,2;
mytbl mydate=2016-11-07 s3a://redact/M5/table/mytbl/2016/11/07
mytbl mydate=2019-12-01 s3a://redact/M/table/mytbl/2019/12/01
query1:
select count(1) from someschema.mytbl where datestr = '2016-11-07'
works fine from both hiveserver2 and presto
query2:
select count(1) from someschema.mytbl where datestr = '2019-12-01'
presto gives unhelpful error:
io.prestosql.spi.PrestoException: HIVE_UNKNOWN_ERROR
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:223)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_ff748c3_dirty____20200610_171635_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: undefined
hiveserver2 gives more verbose yet still not too helpful error:
2020-06-12T18:22:23,375 WARN [HiveServer2-Handler-Pool: Thread-12109] thrift.ThriftCLIService: Error fetching results:
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:499) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878) ~[hive-service-2.3.4.jar:2.3.4]
at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.4.jar:2.3.4]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_252]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_252]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) ~[hadoop-common-2.8.5.jar:?]
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.4.jar:2.3.4]
at com.sun.proxy.$Proxy42.fetchResults(Unknown Source) ~[?:?]
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-2.3.4.jar:2.3.4]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.4.jar:2.3.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494) ~[hive-service-2.3.4.jar:2.3.4]
... 24 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at com.uber.hoodie.common.util.FSUtils.getCommitTime(FSUtils.java:120) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.model.HoodieDataFile.getCommitTime(HoodieDataFile.java:37) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.model.HoodieFileGroup.addDataFile(HoodieFileGroup.java:89) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.lambda$null$3(HoodieTableFileSystemView.java:155) ~[hoodiebundle.jar:?]
at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_252]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.lambda$addFilesToView$5(HoodieTableFileSystemView.java:155) ~[hoodiebundle.jar:?]
at java.lang.Iterable.forEach(Iterable.java:75) ~[?:1.8.0_252]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.addFilesToView(HoodieTableFileSystemView.java:151) ~[hoodiebundle.jar:?]
at com.uber.hoodie.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:107) ~[hoodiebundle.jar:?]
at com.uber.hoodie.hadoop.HoodieInputFormat.listStatus(HoodieInputFormat.java:88) ~[hoodiebundle.jar:?]
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322) ~[hadoop-mapreduce-client-core-2.8.5.jar:?]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) ~[hive-exec-2.3.4.jar:2.3.4]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494) ~[hive-service-2.3.4.jar:2.3.4]
... 24 more
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100
prestosql 336 with hudi 0.5.3 gives better error:
```
io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
... 6 more
```
after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644468948
what i posted is from server log will upg in some time
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bhasudha commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-643477146
@tooptoop4 what version of presto are you using ? You might want to turn the logging level to INFO/WARN for Hoodie in Presto.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100
prestosql 336 with hudi 0.5.3 gives better error:
```
io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
... 6 more
```
after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that works (there is .hoodie/ folder under base table path):
2020-03-24 15:18:55 93 .hoodie_partition_metadata
2020-03-24 15:18:57 2194374 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that gets the error (there is .hoodie/ folder under base table path):
2020-03-24 15:18:44 0 _SUCCESS
2020-03-24 15:18:37 10649992 part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:38 8787785 part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:39 9562198 part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:40 9359329 part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:41 10519118 part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 10452807 part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 9104366 part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:43 9016423 part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
**UPDATE**
This is really old table, and got corrupted along the way. After removing .hoodie/ folder select works ok
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-653263476
all this code is very different now.. Are you facing similar issues with org.apache.hudi?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-643592629
peesto334 it is info already
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100
prestosql 336 with hudi 0.5.3 gives better error:
```
io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
... 6 more
```
after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644968272
does info logging cause issue?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100
prestosql 336 with hudi 0.5.3 gives better error:
```
io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
... 6 more
```
after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that works:
2020-03-24 15:18:55 93 .hoodie_partition_metadata
2020-03-24 15:18:57 2194374 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that gets the error:
2020-03-24 15:18:44 0 _SUCCESS
2020-03-24 15:18:37 10649992 part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:38 8787785 part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:39 9562198 part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:40 9359329 part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:41 10519118 part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 10452807 part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 9104366 part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:43 9016423 part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-655176203
yes.. makes sense.. closing this issue
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100
prestosql 336 with hudi 0.5.3 gives better error:
```
io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
... 6 more
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 edited a comment on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
tooptoop4 edited a comment on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-654797100
prestosql 336 with hudi 0.5.3 gives better error:
```
io.prestosql.spi.PrestoException: Index 2 out of bounds for length 1
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:234)
at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.prestosql.$gen.Presto_1c5b75e_dirty____20200705_204556_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 1
at org.apache.hudi.common.util.FSUtils.getCommitTime(FSUtils.java:137)
at org.apache.hudi.common.model.HoodieBaseFile.getCommitTime(HoodieBaseFile.java:55)
at org.apache.hudi.common.model.HoodieFileGroup.addBaseFile(HoodieFileGroup.java:86)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$buildFileGroups$4(AbstractTableFileSystemView.java:161)
at java.base/java.lang.Iterable.forEach(Unknown Source)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:157)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.buildFileGroups(AbstractTableFileSystemView.java:135)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.addFilesToView(AbstractTableFileSystemView.java:115)
at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:120)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.filterFileStatusForSnapshotMode(HoodieParquetInputFormat.java:239)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:428)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:298)
at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:227)
... 6 more
```
after putting a log statement for fullFileName i see the value is part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet, while for a table that can be queried fullFileName is 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that works (there is .hoodie/ folder under base table path):
2020-03-24 15:18:55 93 .hoodie_partition_metadata
2020-03-24 15:18:57 2194374 4b37466c-8b75-458e-ba28-1e0f4c350dbe_0_20200324151845.parquet
s3 listing under partition folder of table that gets the error (but there is no .hoodie/ folder under base table path):
2020-03-24 15:18:44 0 _SUCCESS
2020-03-24 15:18:37 10649992 part-00000-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:38 8787785 part-00001-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:39 9562198 part-00002-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:40 9359329 part-00003-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:41 10519118 part-00004-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 10452807 part-00005-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:42 9104366 part-00006-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
2020-03-24 15:18:43 9016423 part-00007-75dea991-eba7-4fb1-801c-af264bb5bfc3-c000.snappy.parquet
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar closed issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
vinothchandar closed issue #1730:
URL: https://github.com/apache/hudi/issues/1730
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bhasudha commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644434568
prestosql ? It seems like the log level is not picked properly. Have you checked the server logs to see if there are log messages related to com.uber.hoodie ? Just checking to if its just the presto client or both server and client are not picking up the log msgs.
side question, your hudi version is very old. Have you tried the recent version ? Or is there any specific reason around using this version of Hudi ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bhasudha commented on issue #1730: [SUPPORT] unhelpful error message when there are parquets outside table base path
Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1730:
URL: https://github.com/apache/hudi/issues/1730#issuecomment-644961734
can you try changing the level to `com.uber.hoodie=WARN` in etc/log.properties and check again ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org