You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/24 07:11:58 UTC

[GitHub] [iceberg] xwmr-max opened a new issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

xwmr-max opened a new issue #2627:
URL: https://github.com/apache/iceberg/issues/2627


   When multiple pieces of data with the same primary key value are inserted in the same batch in iceberg, query through flink sql, the data cannot be queried, and the following error will be reported:
   java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
   Caused by: java.lang.IllegalArgumentException: Row arity: 3, but serializer
   arity: 2
           at
   org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:117)
           at
   org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
           at
   org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:715)
           at
   org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:692)
           at
   org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:672)
           at
   org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
           at
   org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
           at
   org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
           at
   org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
           at
   org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:94)
           at
   org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
           at
   org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
           at
   org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)
   
   This error means that the original data is two columns, and the schema of the table is also two columns, but now the schema has become three columns.
   But when the primary keys are different, inserting multiple pieces of data in the same batch can be queried normally, and can also be upsert


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] sfiend edited a comment on issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

sfiend edited a comment on issue #2627:
URL: https://github.com/apache/iceberg/issues/2627#issuecomment-849458968


   I had met the same problem before. When multiple pieces of data with the same primary key value are inserted in the same batch, besides equality delete files, iceberg will also write position delete files. During your query after that, when FlinkInputFormat initialize the RowDataIterator and read next, the iterator will initialize the FlinkDeleteFilter, in this initialization, the FlinkDeleteFilter's parent will add a column named '_pos' if the current split has position delete file. I think the purpose of adding column is to apply the position delete file depending on the position of each row, but iceberg did not delete it before sending the result rows to flink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

openinx commented on issue #2627:
URL: https://github.com/apache/iceberg/issues/2627#issuecomment-956049734


   Closed via https://github.com/apache/iceberg/pull/3240.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx closed issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

openinx closed issue #2627:
URL: https://github.com/apache/iceberg/issues/2627


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] fatjoshua commented on issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

fatjoshua commented on issue #2627:
URL: https://github.com/apache/iceberg/issues/2627#issuecomment-928718143


   I also met this exception.
   My case is that I want to write the flink cdc data into iceberg, after I updated the table to V2, the update and delete operation in mysql can be wrote into the icegerg table, but when I want to read from this iceberg table using flink sql, the same exception 'java.lang.IllegalArgumentException: Row arity: 5, but serializer arity: 4' would be thrown.
   Hope this read upsert ability can be merged into iceberg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] sfiend commented on issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

sfiend commented on issue #2627:
URL: https://github.com/apache/iceberg/issues/2627#issuecomment-849458968


   I had met the same problem before. When multiple pieces of data with the same primary key value are inserted in the same batch, besides equality delete files, iceberg will also write position delete files. A fter that during your query, when FlinkInputFormat initialize the RowDataIterator and read next, the iterator will initialize the FlinkDeleteFilter, in this initialization, the FlinkDeleteFilter's parent will add a column named '_pos' if the current split has position delete file. I think the purpose of adding column is to apply the position delete file depending on the position of each row, but iceberg did not delete it before sending the result rows to flink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] fatjoshua commented on issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

fatjoshua commented on issue #2627:
URL: https://github.com/apache/iceberg/issues/2627#issuecomment-928718143


   I also met this exception.
   My case is that I want to write the flink cdc data into iceberg, after I updated the table to V2, the update and delete operation in mysql can be wrote into the icegerg table, but when I want to read from this iceberg table using flink sql, the same exception 'java.lang.IllegalArgumentException: Row arity: 5, but serializer arity: 4' would be thrown.
   Hope this read upsert ability can be merged into iceberg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

openinx commented on issue #2627:
URL: https://github.com/apache/iceberg/issues/2627#issuecomment-928932912


   Thanks all for the feedback @xwmr-max , @sfiend , @fatjoshua ,  we will treat this issue as a top priority to get this merge in ! 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] openinx commented on issue #2627: Using Kafka to insert multiple pieces of data with the same primary key value in Iceberg at one time, the data cannot be queried

Posted by GitBox <gi...@apache.org>.

openinx commented on issue #2627:
URL: https://github.com/apache/iceberg/issues/2627#issuecomment-928932912


   Thanks all for the feedback @xwmr-max , @sfiend , @fatjoshua ,  we will treat this issue as a top priority to get this merge in ! 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org