You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/14 09:20:27 UTC
[GitHub] [iceberg] mazhiyu123 opened a new issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
mazhiyu123 opened a new issue #3114:
URL: https://github.com/apache/iceberg/issues/3114
flink version: 1.12
iceberg version: master brach(2021-09-13)
hadoop version: hadoop-2.6.0-cdh5.15.0
create catalog:
CREATE CATALOG hadoop_catalog WITH (
'type'='iceberg',
'catalog-type'='hadoop',
'warehouse'='hdfs://xxx-hdfs/flink/tmp/iceberg_test',
'property-version'='1'
)
;
create table:
CREATE TABLE hadoop_catalog.iceberg_db.sample_test (
id BIGINT COMMENT 'unique id',
data STRING,
PRIMARY KEY(id) NOT ENFORCED
)
WITH (
'format-version'= '2',
'write.upsert.enable'='true'
)
;
insert sql:
INSERT INTO hadoop_catalog.iceberg_db.sample_test VALUES (10, 'test10_U'), (11, 'test11'), (12, 'test12');
INSERT INTO hadoop_catalog.iceberg_db.sample_test VALUES (12, 'test12_update');
select sql:
select * from hadoop_catalog.iceberg_db.sample_test;
after run select sql, flink job will report an error:
2021-09-14 17:13:14
org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:224)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:217)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:208)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:610)
at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89)
at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:419)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:286)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:201)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
at org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:121)
at org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:50)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:69)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:215)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] openinx commented on issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
openinx commented on issue #3114:
URL: https://github.com/apache/iceberg/issues/3114#issuecomment-918978509
Hi @mazhiyu123 , I think you will need to apply this PR to your own branch: https://github.com/apache/iceberg/pull/2731
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] openinx commented on issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
openinx commented on issue #3114:
URL: https://github.com/apache/iceberg/issues/3114#issuecomment-919073754
I've checked the behavior by using iceberg master branch (git commit-id: 838cc652273c1444155bec2e1d6029cfbdbf3ea3) and flink 1.12, all work as expected:
```sql
CREATE TABLE iceberg_table_v2 (
id BIGINT,
data STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector'='iceberg',
'format-version' = '2',
'catalog-name'='hadoop_prod',
'catalog-type'='hadoop',
'write.upsert.enable'='true',
'warehouse'='file:///Users/openinx/test/iceberg-warehouse'
);
select * from iceberg_table_v2;
+----+------+
| id | data |
+----+------+
0 row in set
INSERT INTO iceberg_table_v2 VALUES (14, 'test14'), (14, 'test14_update');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 083d3008e12bc533aacec0c2bdae9d0a
select * from iceberg_table_v2;
+----+---------------+
| id | data |
+----+---------------+
| 14 | test14_update |
+----+---------------+
1 row in set
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] openinx commented on issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
openinx commented on issue #3114:
URL: https://github.com/apache/iceberg/issues/3114#issuecomment-919064052
I think you will need to rebase to use the latest master branch and enable the `'write.upsert.enable'='true'` from this [PR](https://github.com/apache/iceberg/pull/2863), which has enabled the upsert semantics in flink iceberg integration work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] openinx edited a comment on issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
openinx edited a comment on issue #3114:
URL: https://github.com/apache/iceberg/issues/3114#issuecomment-919073754
I've checked the behavior by using iceberg master branch (git commit-id: 838cc652273c1444155bec2e1d6029cfbdbf3ea3) and flink 1.12, all work as expected:
```sql
CREATE TABLE iceberg_table_v2 (
id BIGINT,
data STRING,
PRIMARY KEY (id) NOT ENFORCED
) WITH (
'connector'='iceberg',
'format-version' = '2',
'catalog-name'='hadoop_prod',
'catalog-type'='hadoop',
'write.upsert.enable'='true',
'warehouse'='file:///Users/openinx/test/iceberg-warehouse'
);
select * from iceberg_table_v2;
+----+------+
| id | data |
+----+------+
0 row in set
INSERT INTO iceberg_table_v2 VALUES (14, 'test14'), (14, 'test14_update');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 083d3008e12bc533aacec0c2bdae9d0a
select * from iceberg_table_v2;
+----+---------------+
| id | data |
+----+---------------+
| 14 | test14_update |
+----+---------------+
1 row in set
INSERT INTO iceberg_table_v2 VALUES (14, 'test14_3');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 9528163e8d06db33c090c16d9a858b45
select * from iceberg_table_v2;
+----+----------+
| id | data |
+----+----------+
| 14 | test14_3 |
+----+----------+
1 row in set
INSERT INTO iceberg_table_v2 VALUES (14, 'test14_4');
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 9543c0c29bc730ece8420066119f9abf
select * from iceberg_table_v2;
+----+----------+
| id | data |
+----+----------+
| 14 | test14_4 |
+----+----------+
1 row in set
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] openinx closed issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
openinx closed issue #3114:
URL: https://github.com/apache/iceberg/issues/3114
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] mazhiyu123 commented on issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
mazhiyu123 commented on issue #3114:
URL: https://github.com/apache/iceberg/issues/3114#issuecomment-919006807
> Hi @mazhiyu123 , I think you will need to apply this PR to your own branch: #2731
thank you for replying soon!!!
I already try this [pr(),](https://github.com/apache/iceberg/pull/2731) , but i will get a new situation:
when i insert data with same primary key on one sql, iceberg table will has duplicate data. such as:
insert sql:
INSERT INTO hadoop_catalog.iceberg_db.sample_test VALUES (14, 'test14'), (14, 'test14_update');
select sql:
select * from hadoop_catalog.iceberg_db.sample_test;
expect :
id data
14 test14_update
but it return:
id data
14 test14_update
14 test14
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] openinx commented on issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
openinx commented on issue #3114:
URL: https://github.com/apache/iceberg/issues/3114#issuecomment-919074210
Close this issue now, feel free to reopen this if you have more questions !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] mazhiyu123 edited a comment on issue #3114: after set 'write.upsert.enable'='true' in flink sql, using flink sql read iceberg table will get exception: java.lang.IllegalArgumentException: Row arity: 3, but serializer arity: 2
Posted by GitBox <gi...@apache.org>.
mazhiyu123 edited a comment on issue #3114:
URL: https://github.com/apache/iceberg/issues/3114#issuecomment-919006807
> Hi @mazhiyu123 , I think you will need to apply this PR to your own branch: #2731
thank you for replying soon!!!
I already try this [#2731](https://github.com/apache/iceberg/pull/2731) , but i will get a new situation:
when i insert data with same primary key on one sql, iceberg table will has duplicate data. such as:
insert sql:
INSERT INTO hadoop_catalog.iceberg_db.sample_test VALUES (14, 'test14'), (14, 'test14_update');
select sql:
select * from hadoop_catalog.iceberg_db.sample_test;
expect :
id data
14 test14_update
but it return:
id data
14 test14_update
14 test14
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org