You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/14 12:00:50 UTC
[GitHub] [hudi] hellochueng opened a new issue, #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
hellochueng opened a new issue, #5867:
URL: https://github.com/apache/hudi/issues/5867
2022-06-14 19:58:19,560 ERROR org.apache.hudi.io.HoodieMergeHandle [] - Error writing record HoodieRecord{key=HoodieKey { recordKey=fdbid:79505959536,fbillid:79505959731,fentryid:16,dim:hz partitionPath=fdatemonth=202203}, currentLocation='null', newLocation='null'}
java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:192) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:184) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:348) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:171) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:148) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:130) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:301) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvroWithMetadata(HoodieParquetWriter.java:81) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:294) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:273) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:369) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:377) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.table.action.commit.FlinkMergeHelper.runMerge(FlinkMergeHelper.java:108) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:368) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:359) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:197) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.sink.compact.CompactFunction.doCompaction(CompactFunction.java:104) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.sink.compact.CompactFunction.lambda$processElement$0(CompactFunction.java:92) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:93) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_281]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_281]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_281]
![image](https://user-images.githubusercontent.com/29030883/173572096-d4aa8ab0-7188-401c-a990-2b576863ccf3.png)
mor upsert
![image](https://user-images.githubusercontent.com/29030883/173572145-0fb23763-207e-444c-994b-0c7454724045.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
URL: https://github.com/apache/hudi/issues/5867
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1166152197
Do you mean in `HoodieCompactionHandler#handleInsert`, we do not close the file handle correctly when exception occurs ? That's a valid point, maybe we can wrap the handles in `try finally` resource block.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
codope commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1155288510
cc @danny0405
Have you come across such an issue with filnk compactor?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] wwli05 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
wwli05 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1298412561
@nsivabalan @danny0405 could you looked at this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
codope commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1155287154
@hellochueng Can you give us more details about your setup? Is it multi-writer or single writer? Is it consistently reproducible?
The stacktrace suggests that the `HoodieCompactor` was attempting ParquetFileWriter `write/close` which was simultaneously being attempted by another writer. If it's a multi-writer setup, have you configured conurrency mode and lock provider?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1184240708
The issue is expected to be resolved by this pr: https://github.com/apache/hudi/pull/6106, feel free to re-open it if the problem still exists.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
codope commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1156301574
@hellochueng Can you please share the steps to reproduce the issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] JerryYue-M commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
JerryYue-M commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1165582325
@danny0405 @codope
With Hudi release-0.11.0 version. this error appears frequently in the compact task so it can make compact fail.
I found that. firstly it may appear ` RemoteException: File does not exist` error that can cause mergeHandle close,before close it flush some records.at finally.it occur follow error:
2022-06-24 21:22:55,019 ERROR org.apache.hudi.io.HoodieMergeHandle [] - Error writing record HoodieRecord{key=HoodieKey { recordKey=xxx ea125773f partitionPath=2022-06-21/18}, currentLocation='null', newLocation='null'}
java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:217) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:209) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:407) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:184) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:158) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:140) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:104) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.io.HoodieMergeHandle.writeToFile(HoodieMergeHandle.java:367) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:296) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:277) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:380) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:388) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.table.action.commit.FlinkMergeHelper.runMerge(FlinkMergeHelper.java:108) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:379) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:370) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:227) ~
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1156131910
You mean the error throws because of multi components were trying to modify the same parquet file ? In flink write pipeline, the only component that may modify the parquet files is the `CompactFunction`, in theory, it does not expect to be in concurrency modification.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] wwli05 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
Posted by GitBox <gi...@apache.org>.
wwli05 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1298411127
this issue seems not resolved in below case:
1. job manager is reboot
2. all task managers register again
3. say task manager1 will be assigned different file group id compact task, which maybe distribute to task manager2 before task manager re-register
i noticed that, when task manager re-registered, the compact task will continue to run which start before this re-register
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org