You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/14 12:00:50 UTC

[GitHub] [hudi] hellochueng opened a new issue, #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

hellochueng opened a new issue, #5867:
URL: https://github.com/apache/hudi/issues/5867

   2022-06-14 19:58:19,560 ERROR org.apache.hudi.io.HoodieMergeHandle                         [] - Error writing record  HoodieRecord{key=HoodieKey { recordKey=fdbid:79505959536,fbillid:79505959731,fentryid:16,dim:hz partitionPath=fdatemonth=202203}, currentLocation='null', newLocation='null'}
   java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:192) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:184) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:348) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:171) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:148) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:130) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:301) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvroWithMetadata(HoodieParquetWriter.java:81) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:294) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:273) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:369) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:377) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.table.action.commit.FlinkMergeHelper.runMerge(FlinkMergeHelper.java:108) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:368) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:359) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:197) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.sink.compact.CompactFunction.doCompaction(CompactFunction.java:104) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.sink.compact.CompactFunction.lambda$processElement$0(CompactFunction.java:92) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:93) ~[hudi-flink-bundle_2.11-0.10.1.jar:0.10.1]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_281]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_281]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_281]
   ![image](https://user-images.githubusercontent.com/29030883/173572096-d4aa8ab0-7188-401c-a990-2b576863ccf3.png)
   mor upsert
   ![image](https://user-images.githubusercontent.com/29030883/173572145-0fb23763-207e-444c-994b-0c7454724045.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
URL: https://github.com/apache/hudi/issues/5867


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1166152197

   Do  you mean in `HoodieCompactionHandler#handleInsert`, we do not close the file handle correctly when exception occurs ? That's a valid point, maybe we can wrap the handles in `try finally` resource block.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1155288510

   cc @danny0405 
   Have you come across such an issue with filnk compactor?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wwli05 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
wwli05 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1298412561

   @nsivabalan @danny0405 could you looked at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1155287154

   @hellochueng Can you give us more details about your setup? Is it multi-writer or single writer? Is it consistently reproducible? 
   The stacktrace suggests that the `HoodieCompactor` was attempting ParquetFileWriter `write/close` which was simultaneously being attempted by another writer. If it's a multi-writer setup, have you configured conurrency mode and lock  provider? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1184240708

   The issue is expected to be resolved by this pr: https://github.com/apache/hudi/pull/6106, feel free to re-open it if the problem still exists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1156301574

   @hellochueng Can you please share the steps to reproduce the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] JerryYue-M commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
JerryYue-M commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1165582325

   @danny0405 @codope 
   With Hudi release-0.11.0 version. this error appears frequently in the compact task so it can make compact fail.
   I found that. firstly it may appear ` RemoteException: File does not exist` error that can cause mergeHandle close,before close it flush some records.at finally.it occur follow error:
   
   2022-06-24 21:22:55,019 ERROR org.apache.hudi.io.HoodieMergeHandle                         [] - Error writing record  HoodieRecord{key=HoodieKey { recordKey=xxx ea125773f partitionPath=2022-06-21/18}, currentLocation='null', newLocation='null'}
   java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:217) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:209) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:407) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:184) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:158) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:140) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:104) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.io.HoodieMergeHandle.writeToFile(HoodieMergeHandle.java:367) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:296) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:277) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:380) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:388) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.table.action.commit.FlinkMergeHelper.runMerge(FlinkMergeHelper.java:108) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:379) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:370) ~[blob_p-295f7415f20d1fe87ffb9658937af184c87dc096-45deddc0573ab868da621d786b6f266a:0.11.0]
   	at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:227) ~
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1156131910

   You mean the error throws because of multi components were trying to modify the same parquet file ? In flink write pipeline, the only component that may modify the parquet files is the `CompactFunction`, in theory, it does not expect to be  in concurrency modification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wwli05 commented on issue #5867: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN

Posted by GitBox <gi...@apache.org>.
wwli05 commented on issue #5867:
URL: https://github.com/apache/hudi/issues/5867#issuecomment-1298411127

   this issue seems not resolved in below case:
   1. job manager is reboot
   2. all task managers register again
   3. say task manager1  will be assigned different file group id compact task, which maybe distribute to task manager2 before task manager re-register
   
   i noticed that, when task manager re-registered, the compact task will continue to run which start before this re-register


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org