You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/17 03:11:57 UTC

[GitHub] [hudi] eric9204 opened a new issue, #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

eric9204 opened a new issue, #6966:
URL: https://github.com/apache/hudi/issues/6966

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   spark write to hudi, Error occurred when configuring the following parameters.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   ```
   hoodie.datasource.write.operation=insert
   hoodie.datasource.write.table.type=MERGE_ON_READ
   hoodie.datasource.write.precombine.field=ts
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=part
   hoodie.table.name=ss_bucket_dsj_parquet_12
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
   hoodie.insert.shuffle.parallelism=8
   hoodie.datasource.compaction.async.enable=true
   hoodie.compact.inline.max.delta.commits=4
   hoodie.index.type=BUCKET
   hoodie.bucket.index.num.buckets=8
   hoodie.bucket.index.hash.field=id
   hoodie.storage.layout.partitioner.class=org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner
   hoodie.storage.layout.type=BUCKET
   hoodie.metadata.enable=true
   hoodie.embed.timeline.server=false
   path=/tmp/hudi/ss_bucket_dsj_parquet_12
   checkpointLocation=/tmp/hudi/ckp
   hoodie.datasource.hive_sync.enable=true
   hoodie.datasource.hive_sync.username=ocdp
   hoodie.datasource.hive_sync.database=default
   hoodie.datasource.hive_sync.table=ss_bucket_dsj_parquet_12
   hoodie.datasource.hive_sync.password=ocdp
   hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://10.1.9.44:10000
   hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
   hoodie.datasource.hive_sync.partition_fields=part
   hoodie.datasource.write.hive_style_partitioning=true
   hoodie.datasource.hive_sync.bucket_sync=true
   
   hoodie.datasource.write.drop.partition.columns=true
   ```
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
      Hudi-0.12.0
   * Spark version :
      Spark-3.1.1
   * Hive version :
      None
   * Hadoop version :
      Hadoop-3.3.0
   * Storage (HDFS/S3/GCS..) :
      Hdfs
   * Running on Docker? (yes/no) :
      No
   
   **Additional context**
   
   
   
   **Stacktrace**
   
   ```
   22/10/17 11:04:08 ERROR HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id438436 partitionPath=part=202210171102}, currentLocation='null', newLocation='null'}
   java.io.EOFException
           at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
           at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
           at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
           at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
           at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
           at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
           at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
           at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
           at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
           at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
           at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
           at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
           at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
           at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:158)
           at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:148)
           at org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.getInsertValue(OverwriteWithLatestAvroPayload.java:75)
           at org.apache.hudi.common.model.HoodieRecordPayload.getInsertValue(HoodieRecordPayload.java:105)
           at org.apache.hudi.execution.HoodieLazyInsertIterable$HoodieInsertValueGenResult.<init>(HoodieLazyInsertIterable.java:90)
           at org.apache.hudi.execution.HoodieLazyInsertIterable.lambda$getTransformFunction$0(HoodieLazyInsertIterable.java:103)
           at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.insertRecord(BoundedInMemoryQueue.java:190)
           at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:46)
           at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:106)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1283660685

   > 
   
   hope this helps


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1301576908

   if set `hoodie.datasource.write.drop.partition.columns= true` and sync meta to hive,data can be query by hive. otherwise,issue #6965 will occur.
   So we need to fix this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1301564285

   @eric9204 : may I know if we can work w/ hoodie.datasource.write.drop.partition.columns=false only for your use-case. Or do you really looking to drop the partition columns. As called out earlier, we had some issues around dropping partition columns. we might need to spend time taking a detailed look at all flows and fix them. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 closed issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by "eric9204 (via GitHub)" <gi...@apache.org>.

eric9204 closed issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}
URL: https://github.com/apache/hudi/issues/6966


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1291411679

   https://github.com/apache/hudi/pull/7063


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1301570933

   
   
   
   
   > if yes, I can file a jira and take a stab at triaging it based on priorities. let us know.
   
   @nsivabalan Yes, you are right. There aren't any issues when I set `hoodie.datasource.write.drop.partition.columns= false`.
   This issue occurs when I set `hoodie.datasource.write.drop.partition.columns= true`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1286481012

   I will let @fengjian428 follow up. but curious to know why are you setting max delta commits config value to 0. You might as well switch to using COW. but w/ spark structured streaming using MOR table, its not recommended to have very aggressive compaction. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1287999251

   > sorry, not sure I understand you. Do you mean to say, issue is resolved w/ resolution given in 6128. or is there any more pending?
   
   @nsivabalan I mean, I solved this issue `6128` in my env. Besides the solution given in 6128, it also needed some necessary work.
   Add parameters `spark.sql.hive.metastore.jars /$path/hive/lib/* ` to `spark.default.conf`, it makes sure that spark can find 
   class `org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat` in `hudi-hadoop-mr-bundle.jar` when spark sql connect to hive standalone metastore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1286651498

   @fengjian428 issue #6128 recurs and was not be solved, cannot reproduce your case in my env


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1301564538

   if I am not wrong, you claim that you are not facing any issues if you set hoodie.datasource.write.drop.partition.columns= false right. so the issue is only when we set hoodie.datasource.write.drop.partition.columns=true? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1283092207

   I don't think this config is supported across all flows. is this happening for an existing table (i.e. you had few commits with default value for this config and later you explicitly set to true)? or are you seeing this even for new table altogether? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1287878477

   sorry, not sure I understand you. Do you mean to say, issue is resolved w/ resolution given in 6128. or is there any more pending? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1311163178

@fengjian428 when spark-sql was used to write data to hudi, the deltacommit action and compaction action were performed one by one, therefore, they will not influence each other. But structured streaming is not, the compaction service and writing processes share the same `HoodieWriteConfig`.
`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java` line 87 `HoodieWriteConfig configCopy = config;`

the `configCopy` used by compaction service just points to the original configuration.

If set hoodie.datasource.write.drop.partition.columns= true, the value of `hoodie.avro.schema` in the `config` removed partition fields by write process,

The compaction service will reset value of `hoodie.avro.schema` (`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java` line 94 `configCopy.setSchema(schemaPair.getRight(). get()); `).

So, the value of `hoodie.avro.schema` used by write process was changed too.

after the first successful compaction, the changed avro schema and records which was removed partition fields by write process are inconsistent, Therefore, this problem arises.

Therefore, another way to solve the problem is deep copy config,instead of just pointing to the original configuration.（`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java` line 87 `HoodieWriteConfig configCopy = config;`）

Or adopt this solution. #7167，This may be a little simpler.

any suggestion ?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1293344544

   > hi @eric9204 I still cannot reproduce your case by using StructureStreamingTest, could you provide a UT for this case?
   
   @fengjian428 OK, I will provide a UT.
   found a new situation, that if the hudi table is a non-partitioned table, this problem will not occur with structured streaming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1301564746

   if yes, I can file a jira and take a stab at triaging it based on priorities. let us know. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1283265991

   @nsivabalan This is a new hudi table,no existing data.
   
   All `deltacommit` after the first successful `compaction` will fail because of the above-mentioned error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1283658133

   > ```
   > create table hudi_mor_nonsavepoint (
   >   id int,
   >   name string,
   >   price double,
   >   ts long,
   >   par string
   > ) using hudi
   > tblproperties (
   >   type = 'mor',
   >   primaryKey = 'id',
   >   preCombineField = 'ts',
   >   hoodie.datasource.write.operation='insert',
   >     hoodie.datasource.write.drop.partition.columns=true,
   >     hoodie.index.type='BUCKET',
   > hoodie.bucket.index.num.buckets=8,
   > hoodie.bucket.index.hash.field='id',
   > hoodie.storage.layout.partitioner.class='org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner',
   > hoodie.storage.layout.type='BUCKET'
   > )
   > partitioned by (par)
   > location 'hdfs://xxx/hudi_mor_nonsavepoint';
   >  set hoodie.datasource.write.operation=insert;
   > set hoodie.merge.allow.duplicate.on.inserts=true;
   >  set hoodie.index.type=BUCKET;
   >  set hoodie.bucket.index.num.buckets=8;
   >  set hoodie.storage.layout.partitioner.class=org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner;
   > set hoodie.storage.layout.type=BUCKET;
   > set hoodie.datasource.write.recordkey.field=id;
   >  set hoodie.bucket.index.hash.field=id;
   > insert into hudi_mor_nonsavepoint select 1, 'a1', 20, 1000,'p1';
   > insert into hudi_mor_nonsavepoint select 2, 'b1', 10, 344,'p1';
   > 
   > insert into hudi_mor_nonsavepoint select 1, 'a2', 20, 1001,'p1';
   >  set hoodie.compact.inline.max.delta.commits=0;
   >  call run_compaction(op => 'schedule', table => 'hudi_mor_nonsavepoint')
   >  call run_compaction(op => 'run', table => 'hudi_mor_nonsavepoint');
   > 
   > insert into hudi_mor_nonsavepoint select 2, 'b2', 10, 344,'p1';
   > ```
   > 
   > I tried to reproduce but the commands above work fine @eric9204 is there anything I missed?
   
   @fengjian428 I wrote hudi table with spark structured streaming, every deltacommit after the first compaction will be rollback because of the above-mentioned error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1311131446

   can you help to figure out why spark-SQL works fine but structure streaming cannot? @eric9204 I saw you make an API level change in #7167 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1286891162

   @fengjian428 I have solved this issue #6128.
   I have tested your reproduced case in my env。The same with you, spark sql commands work fine, and the result did not reproduce the problem that I found with spark structured streaming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1283611915

   ```
   create table hudi_mor_nonsavepoint (
     id int,
     name string,
     price double,
     ts long,
     par string
   ) using hudi
   tblproperties (
     type = 'mor',
     primaryKey = 'id',
     preCombineField = 'ts',
     hoodie.datasource.write.operation='insert',
       hoodie.datasource.write.drop.partition.columns=true,
       hoodie.index.type='BUCKET',
   hoodie.bucket.index.num.buckets=8,
   hoodie.bucket.index.hash.field='id',
   hoodie.storage.layout.partitioner.class='org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner',
   hoodie.storage.layout.type='BUCKET'
   )
   partitioned by (par)
   location 'hdfs://xxx/hudi_mor_nonsavepoint';
    set hoodie.datasource.write.operation=insert;
   set hoodie.merge.allow.duplicate.on.inserts=true;
    set hoodie.index.type=BUCKET;
    set hoodie.bucket.index.num.buckets=8;
    set hoodie.storage.layout.partitioner.class=org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner;
   set hoodie.storage.layout.type=BUCKET;
   set hoodie.datasource.write.recordkey.field=id;
    set hoodie.bucket.index.hash.field=id;
   insert into hudi_mor_nonsavepoint select 1, 'a1', 20, 1000,'p1';
   insert into hudi_mor_nonsavepoint select 2, 'b1', 10, 344,'p1';
   
   insert into hudi_mor_nonsavepoint select 1, 'a2', 20, 1001,'p1';
    set hoodie.compact.inline.max.delta.commits=0;
    call run_compaction(op => 'schedule', table => 'hudi_mor_nonsavepoint')
    call run_compaction(op => 'run', table => 'hudi_mor_nonsavepoint');
   
   insert into hudi_mor_nonsavepoint select 2, 'b2', 10, 344,'p1';
   ```
   I tried to reproduce but the commands above work fine
   @eric9204  is there anything I missed?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1285886893

   > > ```
   > > create table hudi_mor_nonsavepoint (
   > >   id int,
   > >   name string,
   > >   price double,
   > >   ts long,
   > >   par string
   > > ) using hudi
   > > tblproperties (
   > >   type = 'mor',
   > >   primaryKey = 'id',
   > >   preCombineField = 'ts',
   > >   hoodie.datasource.write.operation='insert',
   > >     hoodie.datasource.write.drop.partition.columns=true,
   > >     hoodie.index.type='BUCKET',
   > > hoodie.bucket.index.num.buckets=8,
   > > hoodie.bucket.index.hash.field='id',
   > > hoodie.storage.layout.partitioner.class='org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner',
   > > hoodie.storage.layout.type='BUCKET'
   > > )
   > > partitioned by (par)
   > > location 'hdfs://xxx/hudi_mor_nonsavepoint';
   > >  set hoodie.datasource.write.operation=insert;
   > > set hoodie.merge.allow.duplicate.on.inserts=true;
   > >  set hoodie.index.type=BUCKET;
   > >  set hoodie.bucket.index.num.buckets=8;
   > >  set hoodie.storage.layout.partitioner.class=org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner;
   > > set hoodie.storage.layout.type=BUCKET;
   > > set hoodie.datasource.write.recordkey.field=id;
   > >  set hoodie.bucket.index.hash.field=id;
   > > insert into hudi_mor_nonsavepoint select 1, 'a1', 20, 1000,'p1';
   > > insert into hudi_mor_nonsavepoint select 2, 'b1', 10, 344,'p1';
   > > 
   > > insert into hudi_mor_nonsavepoint select 1, 'a2', 20, 1001,'p1';
   > >  set hoodie.compact.inline.max.delta.commits=0;
   > >  call run_compaction(op => 'schedule', table => 'hudi_mor_nonsavepoint')
   > >  call run_compaction(op => 'run', table => 'hudi_mor_nonsavepoint');
   > > 
   > > insert into hudi_mor_nonsavepoint select 2, 'b2', 10, 344,'p1';
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > I tried to reproduce but the commands above work fine @eric9204 is there anything I missed?
   > 
   > @fengjian428 I wrote hudi table with spark structured streaming, every deltacommit after the first compaction will be rollback because of the above-mentioned error.
   
   ok,  can you also try my reproduce case to check if it works in your env?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1292804009

   hi @eric9204  I still cannot reproduce your case by using StructureStreamingTest, could you provide a UT for this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1289946021

   ```
   dataFrame.writeStream
             .format("org.apache.hudi").options(hoodieConf)
             .outputMode(OutputMode.Append())
             .trigger(Trigger.ProcessingTime("10 seconds"))
             .start()
             .awaitTermination()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1294832847

   @fengjian428 This case can reproduce above issue
   ```
   import java.io.EOFException
   
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.SparkSession
   import org.apache.spark.sql.streaming.StreamingQueryListener.{QueryProgressEvent, QueryStartedEvent, QueryTerminatedEvent}
   import org.apache.spark.sql.streaming.{OutputMode, StreamingQueryListener, Trigger}
   
   object RatePerMicroBatchSourceTest {
     def main(args: Array[String]): Unit = {
       val sparkConf = new SparkConf().set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
       val spark = SparkSession.builder().config(sparkConf).master("local[6]").getOrCreate()
       val dataFrame = spark.readStream
         .format("rate")
         .option("rowsPerBatch", 1)
         .option("numPartitions", 1)
         .option("startTimestamp", 0)
         .option("advanceMillisPerBatch", 1000)
         .load()
       dataFrame.createTempView("t")
       val resultDf = spark.sql(
         """
           |select timestamp,value,date_format(now(),'yyyyMMddHH') as part from t
           |""".stripMargin)
       try {
         runListener(spark)
         resultDf.writeStream
           .queryName("RateStreamSource")
           .format("hudi")
           .options(getHudiConfig)
           .outputMode(OutputMode.Append())
           .trigger(Trigger.ProcessingTime("10 seconds"))
           .start()
           .awaitTermination()
       } catch {
         case e: EOFException => println("eof" + e)
         case _: Exception => println("停止查询")
       }
     }
   
     def runListener(spark: SparkSession) = {
       spark.streams.addListener(new StreamingQueryListener() {
         override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = {
           println("查询开始: " + queryStarted.id)
         }
   
         override def onQueryTerminated(queryTerminated: QueryTerminatedEvent): Unit = {
           println(queryTerminated.id)
         }
   
         override def onQueryProgress(queryProgress: QueryProgressEvent): Unit = {
           if (queryProgress.progress.batchId >= 4) spark.streams.active.foreach(_.stop())
           println("查询进度 batchId : " + queryProgress.progress.batchId)
         }
       })
       spark.streams.active.foreach(_.stop())
     }
   
     def getHudiConfig = {
       Map(
         "hoodie.datasource.write.operation" -> "insert",
         "hoodie.merge.allow.duplicate.on.inserts" -> "true",
         "hoodie.datasource.write.table.type" -> "MERGE_ON_READ",
         "hoodie.datasource.write.precombine.field" -> "timestamp",
         "hoodie.datasource.write.recordkey.field" -> "value",
         "hoodie.datasource.write.partitionpath.field" -> "part",
         "hoodie.table.name" -> "hudiTest",
         "hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.SimpleKeyGenerator",
         "hoodie.datasource.write.drop.partition.columns" -> "true",
         "hoodie.upsert.shuffle.parallelism" -> "1",
         "hoodie.insert.shuffle.parallelism" -> "1",
         "hoodie.datasource.compaction.async.enable" -> "false",
         "hoodie.compact.inline"->"true",
         "hoodie.compact.inline.max.delta.commits" -> "3",
         "hoodie.index.type" -> "BUCKET",
         "hoodie.bucket.index.num.buckets" -> "1",
         "hoodie.bucket.index.hash.field" -> "value",
         "hoodie.storage.layout.partitioner.class" -> "org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner",
         "hoodie.storage.layout.type" -> "BUCKET",
         "hoodie.metadata.enable" -> "false",
         "hoodie.embed.timeline.server" -> "false",
         "path" -> "/tmp/hudi/hudiTest",
         "checkpointLocation" -> "/tmp/hudi/ckp"
       )
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] fengjian428 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

fengjian428 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1281983165

   how do you insert your data?  is there any schema change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1282067543

   > @eric9204 how do you insert your data? is there any schema change?
   
   @fengjian428 the schema of table has not changed,only this parameter is added `hoodie.datasource.write.drop.partition.columns=true`
   Without this parameter, the program runs normally.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Posted by GitBox <gi...@apache.org>.

eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1309826887

   #7167 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org