You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hélder Hugo Ferreira (Jira)" <ji...@apache.org> on 2020/10/16 11:35:00 UTC

[jira] [Commented] (SPARK-32032) Eliminate deprecated poll(long) API calls to avoid infinite wait in driver

    [ https://issues.apache.org/jira/browse/SPARK-32032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215352#comment-17215352 ] 

Hélder Hugo Ferreira commented on SPARK-32032:
----------------------------------------------

Hi, we are getting the following error when using the kafka StreamReader functionality "_option("startingOffsetsByTimestamp")_":  

 
{code:java}
WARN KafkaOffsetReader: Error in attempt 1 getting Kafka offsets: java.lang.AssertionError: assertion failed: No offset matched from request of topic-partition MyTopic-6 and timestamp 1602673380000. at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$fetchSpecificTimestampBasedOffsets$6(KafkaOffsetReader.scala:238) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$fetchSpecificTimestampBasedOffsets$4(KafkaOffsetReader.scala:236) at org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$fetchSpecificOffsets0$1(KafkaOffsetReader.scala:265) at org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$partitionsAssignedToConsumer$2(KafkaOffsetReader.scala:550) at org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$withRetriesWithoutInterrupt$1(KafkaOffsetReader.scala:600) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) at org.apache.spark.sql.kafka010.KafkaOffsetReader.withRetriesWithoutInterrupt(KafkaOffsetReader.scala:599) at org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$partitionsAssignedToConsumer$1(KafkaOffsetReader.scala:536) at org.apache.spark.sql.kafka010.KafkaOffsetReader.runUninterruptibly(KafkaOffsetReader.scala:567) at org.apache.spark.sql.kafka010.KafkaOffsetReader.partitionsAssignedToConsumer(KafkaOffsetReader.scala:536) at org.apache.spark.sql.kafka010.KafkaOffsetReader.fetchSpecificOffsets0(KafkaOffsetReader.scala:261) at org.apache.spark.sql.kafka010.KafkaOffsetReader.fetchSpecificTimestampBasedOffsets(KafkaOffsetReader.scala:254) at org.apache.spark.sql.kafka010.KafkaMicroBatchStream.$anonfun$getOrCreateInitialPartitionOffsets$1(KafkaMicroBatchStream.scala:157) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.kafka010.KafkaMicroBatchStream.getOrCreateInitialPartitionOffsets(KafkaMicroBatchStream.scala:148) at org.apache.spark.sql.kafka010.KafkaMicroBatchStream.initialOffset(KafkaMicroBatchStream.scala:76) at 
...
{code}
We figured out in this topic partition there was only 1 message older than the input timestamps set in the _startingOffsetsByTimestamp_. We are using kafka 2.5.1.

Since we noticed in this PR changes are being performed in the _fetchSpecificTimestampBasedOffsets_ function, any chance that this will also resolve our issue?

Thanks in advance.

Best Regards,

Hélder Hugo Ferreira
 

> Eliminate deprecated poll(long) API calls to avoid infinite wait in driver
> --------------------------------------------------------------------------
>
>                 Key: SPARK-32032
>                 URL: https://issues.apache.org/jira/browse/SPARK-32032
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Structured Streaming
>    Affects Versions: 3.1.0
>            Reporter: Gabor Somogyi
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org