You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/08 22:49:28 UTC

[GitHub] [hudi] prashanthpdesai opened a new issue #1811: Deltastreamer Offset exception -Prod

prashanthpdesai opened a new issue #1811:
URL: https://github.com/apache/hudi/issues/1811


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   Yes
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   The Deltstremer is picking the offset number which is not available in topic , even though we pass the new group id . 
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   0.5.2
   
   * Spark version :
   2.2.1
   
   * Hive version :
   
   * Hadoop version :
   2.7
   
   * Storage (HDFS/S3/GCS..) :
   hdfs
   
   * Running on Docker? (yes/no) :
   no
   
   
   **Additional context**
   
   Our topic has 600 partitions , each of them has its own offset number .
   
   **Stacktrace**
   
   20/07/08 09:17:43 INFO kafka010.KafkaRDD: Computing topic topic.v1, partition 0 offsets 0 -> 16667
   20/07/08 09:17:43 INFO kafka010.KafkaDataConsumer: Initializing cache 16 64 0.75
   20/07/08 09:17:43 INFO consumer.ConsumerConfig: ConsumerConfig values:
   	auto.commit.interval.ms = 5000
   	auto.offset.reset = none
   	check.crcs = true
   	client.id =
   	connections.max.idle.ms = 540000
   	enable.auto.commit = false
   	exclude.internal.topics = true
   	fetch.max.bytes = 52428800
   	fetch.max.wait.ms = 500
   	fetch.min.bytes = 1
   	fs.mapr.hardmount = true
   	fs.mapr.rpc.timeout = 300
   	group.id = spark-executor-hudi-prod-elg-latest-topic-test
   	heartbeat.interval.ms = 3000
   	interceptor.classes = null
   	internal.leave.group.on.close = true
   	isolation.level = read_uncommitted
   	key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
   	max.partition.fetch.bytes = 1048576
   	max.poll.interval.ms = 300000
   	max.poll.records = 500
   	metadata.max.age.ms = 300000
   	metric.reporters = []
   	metrics.num.samples = 2
   	metrics.recording.level = INFO
   	metrics.sample.window.ms = 30000
   	partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
   	receive.buffer.bytes = 65536
   	reconnect.backoff.max.ms = 1000
   	reconnect.backoff.ms = 50
   	request.timeout.ms = 305000
   	retry.backoff.ms = 100
   	ssl.endpoint.identification.algorithm = null
   	streams.zerooffset.record.on.eof = false
   	value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer
   
   20/07/08 09:17:43 INFO serializers.KafkaAvroDeserializerConfig: KafkaAvroDeserializerConfig values:
   	schema.registry.url = [xxx..com]
   	max.schemas.per.subject = 1000
   	specific.avro.reader = false
   
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.parquet.max.file.size' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.schemaprovider.registry.url' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.parquet.small.file.limit' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.bulkinsert.shuffle.parallelism' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.kafka.source.maxEvents' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.upsert.shuffle.parallelism' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.insert.shuffle.parallelism' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'metadata.broker.list' was supplied but isn't a known config.
   20/07/08 09:17:44 WARN consumer.ConsumerConfig: The configuration 'hoodie.compact.inline.max.delta.commits' was supplied but isn't a known config.
   20/07/08 09:17:44 INFO utils.AppInfoParser: Kafka version : 1.0.1-mapr-1803
   20/07/08 09:17:44 INFO utils.AppInfoParser: Kafka commitId : 236acd265c09ea55
   20/07/08 09:17:44 INFO kafka010.InternalKafkaConsumer: Initial fetch for spark-executor-hudi-prod-elg-latest-topic-test topic.v1-0 0
   20/07/08 09:17:44 INFO kafka010.InternalKafkaConsumer: Buffer miss for spark-executor-hudi-prod-elg-latest-topic-test topic.v1-0 0
   20/07/08 09:17:44 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
   **java.lang.IllegalArgumentException: requirement failed: Got wrong record for spark-executor-hudi-prod-elg-latest-topic-test topic.v1-0 even after seeking to offset 0 got offset 17424315 instead. If this is a compacted topic, consider enabling spark.streaming.kafka.allowNonConsecutiveOffsets**
   	at scala.Predef$.require(Predef.scala:224)
   	at org.apache.spark.streaming.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:146)
   	at org.apache.spark.streaming.kafka010.KafkaDataConsumer$class.get(KafkaDataConsumer.scala:36)
   	at org.apache.spark.streaming.kafka010.KafkaDataConsumer$CachedKafkaDataConsumer.get(KafkaDataConsumer.scala:212)
   	at org.apache.spark.streaming.kafka010.KafkaRDDIterator.next(KafkaRDD.scala:261)
   	at org.apache.spark.streaming.kafka010.KafkaRDDIterator.next(KafkaRDD.scala:229)
   	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
   	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
   	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
   	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
   	at scala.collection.AbstractIterator.to(Iterator.scala:1336)
   	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
   	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
   	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
   	at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
   	at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
   	at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
   	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
   	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
   	at org.apache.spark.scheduler.Task.run(Task.scala:108)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   20/07/08 09:17:44 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #1811: Deltastreamer Offset exception -Prod

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #1811:
URL: https://github.com/apache/hudi/issues/1811#issuecomment-655885342


   Seems related to spark bug https://issues.apache.org/jira/browse/SPARK-17147
   
   It must be fixed in 2.4. Can you upgrade spark and try? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1811: Deltastreamer Offset exception -Prod

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1811:
URL: https://github.com/apache/hudi/issues/1811#issuecomment-668662589


   Please reopen if the issue persists.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar closed issue #1811: Deltastreamer Offset exception -Prod

Posted by GitBox <gi...@apache.org>.

bvaradar closed issue #1811:
URL: https://github.com/apache/hudi/issues/1811


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #1811: Deltastreamer Offset exception -Prod

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #1811:
URL: https://github.com/apache/hudi/issues/1811#issuecomment-655883408


   This seems related to HUDI-1007 ..  great we have a stacktrace now


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] prashanthpdesai commented on issue #1811: Deltastreamer Offset exception -Prod

Posted by GitBox <gi...@apache.org>.

prashanthpdesai commented on issue #1811:
URL: https://github.com/apache/hudi/issues/1811#issuecomment-656285157


   @vc:Sure vc will update the version 2.4 and give a try. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org