You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jungtaek Lim <ka...@gmail.com> on 2019/08/07 07:24:01 UTC

Alternative of poll(0) without pulling records

Hi devs,

I'm trying to replace deprecated poll(long) with poll(Duration), and
realized there's no alternative which behaves exactly same as poll(0), as
poll(0) has been used as a hack to only update metadata instead of pulling
records. poll(Duration.ZERO) wouldn't behave same since even updating
metadata will be timed-out. So now end users would need to give more
timeout and even pull some records even they're only interested in metadata.

I looked back some KIPs which brought the change, and "discarded" KIP
(KIP-288 [1]) actually proposed a new API which only pulls metadata.
KIP-266 [2] is picked up instead but it didn't cover all the things what
KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
supported officially, but the hack has been widely used and they can't be
ignored.

Kafka test code itself relies on either deprecated poll(0),
or updateAssignmentMetadataIfNeeded, which seems to be private API only for
testing.
(Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded as
avoiding deprecated method - if it works I'll submit a PR.)

I'm feeling that it would be ideal to expose
`updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming
as `waitForAssignment` which was proposed in KIP-288 if it feels too long.

What do you think? If it sounds feasible I'd like to try out contribution
on this. I'm new to contribute Kafka community, so not sure it would
require a new KIP or not.

Thanks,
Jungtaek Lim (HeartSaVioR)

1.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
2.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior

Re: Alternative of poll(0) without pulling records

Posted by Jungtaek Lim <ka...@gmail.com>.
Thanks Viktor for guiding me through this!

I would initiate new thread to ask edit permission on wiki. Once I got
permission I'll come up with simple KIP page and initiate discussion thread.

Thanks again,
Jungtaek Lim

On Thu, Aug 8, 2019 at 9:42 PM Viktor Somogyi-Vass <vi...@gmail.com>
wrote:

> Hey Jungtaek,
>
> Thanks for your interest, sometimes I also think such an API would be a
> good thing.
> I don't see any strong reasons neither in KIP-288 nor in KIP-266 why such
> an API shouldn't be created, so go ahead with it, although you'll need to
> create a short KIP for this as the KafkaConsumer class considered to be a
> public API.
>
> Best,
> Viktor
>
> On Wed, Aug 7, 2019 at 9:26 AM Jungtaek Lim <ka...@gmail.com> wrote:
>
> > If we just wanted to remove deprecation and let both co-exist, that would
> > be also viable, though `poll(0)` is still a hack and it would be ideal to
> > provide official approach to do so.
> >
> > On Wed, Aug 7, 2019 at 4:24 PM Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > > Hi devs,
> > >
> > > I'm trying to replace deprecated poll(long) with poll(Duration), and
> > > realized there's no alternative which behaves exactly same as poll(0),
> as
> > > poll(0) has been used as a hack to only update metadata instead of
> > pulling
> > > records. poll(Duration.ZERO) wouldn't behave same since even updating
> > > metadata will be timed-out. So now end users would need to give more
> > > timeout and even pull some records even they're only interested in
> > metadata.
> > >
> > > I looked back some KIPs which brought the change, and "discarded" KIP
> > > (KIP-288 [1]) actually proposed a new API which only pulls metadata.
> > > KIP-266 [2] is picked up instead but it didn't cover all the things
> what
> > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
> > > supported officially, but the hack has been widely used and they can't
> be
> > > ignored.
> > >
> > > Kafka test code itself relies on either deprecated poll(0),
> > > or updateAssignmentMetadataIfNeeded, which seems to be private API only
> > for
> > > testing.
> > > (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded
> > as
> > > avoiding deprecated method - if it works I'll submit a PR.)
> > >
> > > I'm feeling that it would be ideal to expose
> > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with
> renaming
> > > as `waitForAssignment` which was proposed in KIP-288 if it feels too
> > long.
> > >
> > > What do you think? If it sounds feasible I'd like to try out
> contribution
> > > on this. I'm new to contribute Kafka community, so not sure it would
> > > require a new KIP or not.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > 1.
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> > > 2.
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
> > >
> > >
> >
> > --
> > Name : Jungtaek Lim
> > Blog : http://medium.com/@heartsavior
> > Twitter : http://twitter.com/heartsavior
> > LinkedIn : http://www.linkedin.com/in/heartsavior
> >
>


-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: Alternative of poll(0) without pulling records

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.
Hey Jungtaek,

Thanks for your interest, sometimes I also think such an API would be a
good thing.
I don't see any strong reasons neither in KIP-288 nor in KIP-266 why such
an API shouldn't be created, so go ahead with it, although you'll need to
create a short KIP for this as the KafkaConsumer class considered to be a
public API.

Best,
Viktor

On Wed, Aug 7, 2019 at 9:26 AM Jungtaek Lim <ka...@gmail.com> wrote:

> If we just wanted to remove deprecation and let both co-exist, that would
> be also viable, though `poll(0)` is still a hack and it would be ideal to
> provide official approach to do so.
>
> On Wed, Aug 7, 2019 at 4:24 PM Jungtaek Lim <ka...@gmail.com> wrote:
>
> > Hi devs,
> >
> > I'm trying to replace deprecated poll(long) with poll(Duration), and
> > realized there's no alternative which behaves exactly same as poll(0), as
> > poll(0) has been used as a hack to only update metadata instead of
> pulling
> > records. poll(Duration.ZERO) wouldn't behave same since even updating
> > metadata will be timed-out. So now end users would need to give more
> > timeout and even pull some records even they're only interested in
> metadata.
> >
> > I looked back some KIPs which brought the change, and "discarded" KIP
> > (KIP-288 [1]) actually proposed a new API which only pulls metadata.
> > KIP-266 [2] is picked up instead but it didn't cover all the things what
> > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
> > supported officially, but the hack has been widely used and they can't be
> > ignored.
> >
> > Kafka test code itself relies on either deprecated poll(0),
> > or updateAssignmentMetadataIfNeeded, which seems to be private API only
> for
> > testing.
> > (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded
> as
> > avoiding deprecated method - if it works I'll submit a PR.)
> >
> > I'm feeling that it would be ideal to expose
> > `updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming
> > as `waitForAssignment` which was proposed in KIP-288 if it feels too
> long.
> >
> > What do you think? If it sounds feasible I'd like to try out contribution
> > on this. I'm new to contribute Kafka community, so not sure it would
> > require a new KIP or not.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 1.
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> > 2.
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
> >
> >
>
> --
> Name : Jungtaek Lim
> Blog : http://medium.com/@heartsavior
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>

Re: Alternative of poll(0) without pulling records

Posted by Jungtaek Lim <ka...@gmail.com>.
If we just wanted to remove deprecation and let both co-exist, that would
be also viable, though `poll(0)` is still a hack and it would be ideal to
provide official approach to do so.

On Wed, Aug 7, 2019 at 4:24 PM Jungtaek Lim <ka...@gmail.com> wrote:

> Hi devs,
>
> I'm trying to replace deprecated poll(long) with poll(Duration), and
> realized there's no alternative which behaves exactly same as poll(0), as
> poll(0) has been used as a hack to only update metadata instead of pulling
> records. poll(Duration.ZERO) wouldn't behave same since even updating
> metadata will be timed-out. So now end users would need to give more
> timeout and even pull some records even they're only interested in metadata.
>
> I looked back some KIPs which brought the change, and "discarded" KIP
> (KIP-288 [1]) actually proposed a new API which only pulls metadata.
> KIP-266 [2] is picked up instead but it didn't cover all the things what
> KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
> supported officially, but the hack has been widely used and they can't be
> ignored.
>
> Kafka test code itself relies on either deprecated poll(0),
> or updateAssignmentMetadataIfNeeded, which seems to be private API only for
> testing.
> (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded as
> avoiding deprecated method - if it works I'll submit a PR.)
>
> I'm feeling that it would be ideal to expose
> `updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming
> as `waitForAssignment` which was proposed in KIP-288 if it feels too long.
>
> What do you think? If it sounds feasible I'd like to try out contribution
> on this. I'm new to contribute Kafka community, so not sure it would
> require a new KIP or not.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> 2.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
>
>

-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: Alternative of poll(0) without pulling records

Posted by Gabor Somogyi <ga...@gmail.com>.
@Jungtaek, thanks for the explanation!
@Colin, please see the attached code in my previous mail for all the
details.

On Sun, Aug 11, 2019 at 1:20 PM Jungtaek Lim <ka...@gmail.com> wrote:

> Btw, I'd like to ask you to move on thread for KIP discussion, as it will
> make us reaching conclusion faster and have single channel to discuss.
>
> On Sun, Aug 11, 2019 at 8:16 PM Jungtaek Lim <ka...@gmail.com> wrote:
>
> > So we have some use case which we don't just rely on everything what
> Kafka
> > consumer provides. We want to know current assignment on this consumer,
> and
> > to get the latest assignment, we called the hack `poll(0)`.
> >
> > That said, we don't want to pull any records here, and there's no way to
> > accomplish this. Please correct me if I'm missing something.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > On Sat, Aug 10, 2019 at 7:32 AM Colin McCabe <cm...@apache.org> wrote:
> >
> >> Hi Gabor,
> >>
> >> What is it that you want to do here?  If you just want to check that the
> >> partitions exist, but not fetch any data, you could use
> >> AdminClient#describeTopics for that.  If you want to create the topics,
> you
> >> could use AdminClient#createTopics.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Fri, Aug 9, 2019, at 11:23, Gabor Somogyi wrote:
> >> > > Each KafkaConsumer method that returns metadata will already block
> >> until
> >> > such metadata is available
> >> > The old API was waiting infinitely but the new has a timeout which has
> >> > effect on the metadata fetch as well.
> >> >
> >> > Spark is interested in only the assigned partitions and/or
> >> > latest/earliest/... offsets.
> >> > Please see
> >> >
> >>
> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
> >> > At the moment the old poll(0) is used which can block infinitely and
> the
> >> > API is marked deprecated.
> >> > We would like to switch to the new API, see the background here:
> >> > https://github.com/apache/spark/pull/25135
> >> > However in such case there is no need to fetch any kind of data since
> >> > driver is not doing any data processing.
> >> >
> >> > G
> >> >
> >> >
> >> > On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ry...@gmail.com>
> >> wrote:
> >> >
> >> > > > pull some records even they're only interested in metadata.
> >> > >
> >> > > Jungtaek, what is the use-case here? Each KafkaConsumer method that
> >> returns
> >> > > metadata will already block until such metadata is available... so
> why
> >> > > would you need to apply this "hack" in the first place?
> >> > >
> >> > > Ryanne
> >> > >
> >> > > On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <ka...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi devs,
> >> > > >
> >> > > > I'm trying to replace deprecated poll(long) with poll(Duration),
> and
> >> > > > realized there's no alternative which behaves exactly same as
> >> poll(0), as
> >> > > > poll(0) has been used as a hack to only update metadata instead of
> >> > > pulling
> >> > > > records. poll(Duration.ZERO) wouldn't behave same since even
> >> updating
> >> > > > metadata will be timed-out. So now end users would need to give
> more
> >> > > > timeout and even pull some records even they're only interested in
> >> > > > metadata.
> >> > > >
> >> > > > I looked back some KIPs which brought the change, and "discarded"
> >> KIP
> >> > > > (KIP-288 [1]) actually proposed a new API which only pulls
> metadata.
> >> > > > KIP-266 [2] is picked up instead but it didn't cover all the
> things
> >> what
> >> > > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't
> been
> >> > > > supported officially, but the hack has been widely used and they
> >> can't be
> >> > > > ignored.
> >> > > >
> >> > > > Kafka test code itself relies on either deprecated poll(0),
> >> > > > or updateAssignmentMetadataIfNeeded, which seems to be private API
> >> only
> >> > > for
> >> > > > testing.
> >> > > > (Btw, I'd try out replacing poll(0) to
> >> updateAssignmentMetadataIfNeeded
> >> > > as
> >> > > > avoiding deprecated method - if it works I'll submit a PR.)
> >> > > >
> >> > > > I'm feeling that it would be ideal to expose
> >> > > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with
> >> renaming
> >> > > > as `waitForAssignment` which was proposed in KIP-288 if it feels
> too
> >> > > long.
> >> > > >
> >> > > > What do you think? If it sounds feasible I'd like to try out
> >> contribution
> >> > > > on this. I'm new to contribute Kafka community, so not sure it
> would
> >> > > > require a new KIP or not.
> >> > > >
> >> > > > Thanks,
> >> > > > Jungtaek Lim (HeartSaVioR)
> >> > > >
> >> > > > 1.
> >> > > >
> >> > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> >> > > > 2.
> >> > > >
> >> > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> > --
> > Name : Jungtaek Lim
> > Blog : http://medium.com/@heartsavior
> > Twitter : http://twitter.com/heartsavior
> > LinkedIn : http://www.linkedin.com/in/heartsavior
> >
>
>
> --
> Name : Jungtaek Lim
> Blog : http://medium.com/@heartsavior
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>

Re: Alternative of poll(0) without pulling records

Posted by Jungtaek Lim <ka...@gmail.com>.
Btw, I'd like to ask you to move on thread for KIP discussion, as it will
make us reaching conclusion faster and have single channel to discuss.

On Sun, Aug 11, 2019 at 8:16 PM Jungtaek Lim <ka...@gmail.com> wrote:

> So we have some use case which we don't just rely on everything what Kafka
> consumer provides. We want to know current assignment on this consumer, and
> to get the latest assignment, we called the hack `poll(0)`.
>
> That said, we don't want to pull any records here, and there's no way to
> accomplish this. Please correct me if I'm missing something.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Sat, Aug 10, 2019 at 7:32 AM Colin McCabe <cm...@apache.org> wrote:
>
>> Hi Gabor,
>>
>> What is it that you want to do here?  If you just want to check that the
>> partitions exist, but not fetch any data, you could use
>> AdminClient#describeTopics for that.  If you want to create the topics, you
>> could use AdminClient#createTopics.
>>
>> best,
>> Colin
>>
>>
>> On Fri, Aug 9, 2019, at 11:23, Gabor Somogyi wrote:
>> > > Each KafkaConsumer method that returns metadata will already block
>> until
>> > such metadata is available
>> > The old API was waiting infinitely but the new has a timeout which has
>> > effect on the metadata fetch as well.
>> >
>> > Spark is interested in only the assigned partitions and/or
>> > latest/earliest/... offsets.
>> > Please see
>> >
>> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
>> > At the moment the old poll(0) is used which can block infinitely and the
>> > API is marked deprecated.
>> > We would like to switch to the new API, see the background here:
>> > https://github.com/apache/spark/pull/25135
>> > However in such case there is no need to fetch any kind of data since
>> > driver is not doing any data processing.
>> >
>> > G
>> >
>> >
>> > On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ry...@gmail.com>
>> wrote:
>> >
>> > > > pull some records even they're only interested in metadata.
>> > >
>> > > Jungtaek, what is the use-case here? Each KafkaConsumer method that
>> returns
>> > > metadata will already block until such metadata is available... so why
>> > > would you need to apply this "hack" in the first place?
>> > >
>> > > Ryanne
>> > >
>> > > On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <ka...@gmail.com>
>> wrote:
>> > >
>> > > > Hi devs,
>> > > >
>> > > > I'm trying to replace deprecated poll(long) with poll(Duration), and
>> > > > realized there's no alternative which behaves exactly same as
>> poll(0), as
>> > > > poll(0) has been used as a hack to only update metadata instead of
>> > > pulling
>> > > > records. poll(Duration.ZERO) wouldn't behave same since even
>> updating
>> > > > metadata will be timed-out. So now end users would need to give more
>> > > > timeout and even pull some records even they're only interested in
>> > > > metadata.
>> > > >
>> > > > I looked back some KIPs which brought the change, and "discarded"
>> KIP
>> > > > (KIP-288 [1]) actually proposed a new API which only pulls metadata.
>> > > > KIP-266 [2] is picked up instead but it didn't cover all the things
>> what
>> > > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
>> > > > supported officially, but the hack has been widely used and they
>> can't be
>> > > > ignored.
>> > > >
>> > > > Kafka test code itself relies on either deprecated poll(0),
>> > > > or updateAssignmentMetadataIfNeeded, which seems to be private API
>> only
>> > > for
>> > > > testing.
>> > > > (Btw, I'd try out replacing poll(0) to
>> updateAssignmentMetadataIfNeeded
>> > > as
>> > > > avoiding deprecated method - if it works I'll submit a PR.)
>> > > >
>> > > > I'm feeling that it would be ideal to expose
>> > > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with
>> renaming
>> > > > as `waitForAssignment` which was proposed in KIP-288 if it feels too
>> > > long.
>> > > >
>> > > > What do you think? If it sounds feasible I'd like to try out
>> contribution
>> > > > on this. I'm new to contribute Kafka community, so not sure it would
>> > > > require a new KIP or not.
>> > > >
>> > > > Thanks,
>> > > > Jungtaek Lim (HeartSaVioR)
>> > > >
>> > > > 1.
>> > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
>> > > > 2.
>> > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
>> > > >
>> > >
>> >
>>
>
>
> --
> Name : Jungtaek Lim
> Blog : http://medium.com/@heartsavior
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>


-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: Alternative of poll(0) without pulling records

Posted by Jungtaek Lim <ka...@gmail.com>.
So we have some use case which we don't just rely on everything what Kafka
consumer provides. We want to know current assignment on this consumer, and
to get the latest assignment, we called the hack `poll(0)`.

That said, we don't want to pull any records here, and there's no way to
accomplish this. Please correct me if I'm missing something.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Sat, Aug 10, 2019 at 7:32 AM Colin McCabe <cm...@apache.org> wrote:

> Hi Gabor,
>
> What is it that you want to do here?  If you just want to check that the
> partitions exist, but not fetch any data, you could use
> AdminClient#describeTopics for that.  If you want to create the topics, you
> could use AdminClient#createTopics.
>
> best,
> Colin
>
>
> On Fri, Aug 9, 2019, at 11:23, Gabor Somogyi wrote:
> > > Each KafkaConsumer method that returns metadata will already block
> until
> > such metadata is available
> > The old API was waiting infinitely but the new has a timeout which has
> > effect on the metadata fetch as well.
> >
> > Spark is interested in only the assigned partitions and/or
> > latest/earliest/... offsets.
> > Please see
> >
> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
> > At the moment the old poll(0) is used which can block infinitely and the
> > API is marked deprecated.
> > We would like to switch to the new API, see the background here:
> > https://github.com/apache/spark/pull/25135
> > However in such case there is no need to fetch any kind of data since
> > driver is not doing any data processing.
> >
> > G
> >
> >
> > On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ry...@gmail.com>
> wrote:
> >
> > > > pull some records even they're only interested in metadata.
> > >
> > > Jungtaek, what is the use-case here? Each KafkaConsumer method that
> returns
> > > metadata will already block until such metadata is available... so why
> > > would you need to apply this "hack" in the first place?
> > >
> > > Ryanne
> > >
> > > On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <ka...@gmail.com> wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I'm trying to replace deprecated poll(long) with poll(Duration), and
> > > > realized there's no alternative which behaves exactly same as
> poll(0), as
> > > > poll(0) has been used as a hack to only update metadata instead of
> > > pulling
> > > > records. poll(Duration.ZERO) wouldn't behave same since even updating
> > > > metadata will be timed-out. So now end users would need to give more
> > > > timeout and even pull some records even they're only interested in
> > > > metadata.
> > > >
> > > > I looked back some KIPs which brought the change, and "discarded" KIP
> > > > (KIP-288 [1]) actually proposed a new API which only pulls metadata.
> > > > KIP-266 [2] is picked up instead but it didn't cover all the things
> what
> > > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
> > > > supported officially, but the hack has been widely used and they
> can't be
> > > > ignored.
> > > >
> > > > Kafka test code itself relies on either deprecated poll(0),
> > > > or updateAssignmentMetadataIfNeeded, which seems to be private API
> only
> > > for
> > > > testing.
> > > > (Btw, I'd try out replacing poll(0) to
> updateAssignmentMetadataIfNeeded
> > > as
> > > > avoiding deprecated method - if it works I'll submit a PR.)
> > > >
> > > > I'm feeling that it would be ideal to expose
> > > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with
> renaming
> > > > as `waitForAssignment` which was proposed in KIP-288 if it feels too
> > > long.
> > > >
> > > > What do you think? If it sounds feasible I'd like to try out
> contribution
> > > > on this. I'm new to contribute Kafka community, so not sure it would
> > > > require a new KIP or not.
> > > >
> > > > Thanks,
> > > > Jungtaek Lim (HeartSaVioR)
> > > >
> > > > 1.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> > > > 2.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
> > > >
> > >
> >
>


-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: Alternative of poll(0) without pulling records

Posted by Colin McCabe <cm...@apache.org>.
Hi Gabor,

What is it that you want to do here?  If you just want to check that the partitions exist, but not fetch any data, you could use AdminClient#describeTopics for that.  If you want to create the topics, you could use AdminClient#createTopics.

best,
Colin


On Fri, Aug 9, 2019, at 11:23, Gabor Somogyi wrote:
> > Each KafkaConsumer method that returns metadata will already block until
> such metadata is available
> The old API was waiting infinitely but the new has a timeout which has
> effect on the metadata fetch as well.
> 
> Spark is interested in only the assigned partitions and/or
> latest/earliest/... offsets.
> Please see
> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
> At the moment the old poll(0) is used which can block infinitely and the
> API is marked deprecated.
> We would like to switch to the new API, see the background here:
> https://github.com/apache/spark/pull/25135
> However in such case there is no need to fetch any kind of data since
> driver is not doing any data processing.
> 
> G
> 
> 
> On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ry...@gmail.com> wrote:
> 
> > > pull some records even they're only interested in metadata.
> >
> > Jungtaek, what is the use-case here? Each KafkaConsumer method that returns
> > metadata will already block until such metadata is available... so why
> > would you need to apply this "hack" in the first place?
> >
> > Ryanne
> >
> > On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > > Hi devs,
> > >
> > > I'm trying to replace deprecated poll(long) with poll(Duration), and
> > > realized there's no alternative which behaves exactly same as poll(0), as
> > > poll(0) has been used as a hack to only update metadata instead of
> > pulling
> > > records. poll(Duration.ZERO) wouldn't behave same since even updating
> > > metadata will be timed-out. So now end users would need to give more
> > > timeout and even pull some records even they're only interested in
> > > metadata.
> > >
> > > I looked back some KIPs which brought the change, and "discarded" KIP
> > > (KIP-288 [1]) actually proposed a new API which only pulls metadata.
> > > KIP-266 [2] is picked up instead but it didn't cover all the things what
> > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
> > > supported officially, but the hack has been widely used and they can't be
> > > ignored.
> > >
> > > Kafka test code itself relies on either deprecated poll(0),
> > > or updateAssignmentMetadataIfNeeded, which seems to be private API only
> > for
> > > testing.
> > > (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded
> > as
> > > avoiding deprecated method - if it works I'll submit a PR.)
> > >
> > > I'm feeling that it would be ideal to expose
> > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming
> > > as `waitForAssignment` which was proposed in KIP-288 if it feels too
> > long.
> > >
> > > What do you think? If it sounds feasible I'd like to try out contribution
> > > on this. I'm new to contribute Kafka community, so not sure it would
> > > require a new KIP or not.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > 1.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> > > 2.
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
> > >
> >
>

Re: Alternative of poll(0) without pulling records

Posted by Gabor Somogyi <ga...@gmail.com>.
> Each KafkaConsumer method that returns metadata will already block until
such metadata is available
The old API was waiting infinitely but the new has a timeout which has
effect on the metadata fetch as well.

Spark is interested in only the assigned partitions and/or
latest/earliest/... offsets.
Please see
https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
At the moment the old poll(0) is used which can block infinitely and the
API is marked deprecated.
We would like to switch to the new API, see the background here:
https://github.com/apache/spark/pull/25135
However in such case there is no need to fetch any kind of data since
driver is not doing any data processing.

G


On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ry...@gmail.com> wrote:

> > pull some records even they're only interested in metadata.
>
> Jungtaek, what is the use-case here? Each KafkaConsumer method that returns
> metadata will already block until such metadata is available... so why
> would you need to apply this "hack" in the first place?
>
> Ryanne
>
> On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <ka...@gmail.com> wrote:
>
> > Hi devs,
> >
> > I'm trying to replace deprecated poll(long) with poll(Duration), and
> > realized there's no alternative which behaves exactly same as poll(0), as
> > poll(0) has been used as a hack to only update metadata instead of
> pulling
> > records. poll(Duration.ZERO) wouldn't behave same since even updating
> > metadata will be timed-out. So now end users would need to give more
> > timeout and even pull some records even they're only interested in
> > metadata.
> >
> > I looked back some KIPs which brought the change, and "discarded" KIP
> > (KIP-288 [1]) actually proposed a new API which only pulls metadata.
> > KIP-266 [2] is picked up instead but it didn't cover all the things what
> > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
> > supported officially, but the hack has been widely used and they can't be
> > ignored.
> >
> > Kafka test code itself relies on either deprecated poll(0),
> > or updateAssignmentMetadataIfNeeded, which seems to be private API only
> for
> > testing.
> > (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded
> as
> > avoiding deprecated method - if it works I'll submit a PR.)
> >
> > I'm feeling that it would be ideal to expose
> > `updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming
> > as `waitForAssignment` which was proposed in KIP-288 if it feels too
> long.
> >
> > What do you think? If it sounds feasible I'd like to try out contribution
> > on this. I'm new to contribute Kafka community, so not sure it would
> > require a new KIP or not.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 1.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> > 2.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
> >
>

Re: Alternative of poll(0) without pulling records

Posted by Ryanne Dolan <ry...@gmail.com>.
> pull some records even they're only interested in metadata.

Jungtaek, what is the use-case here? Each KafkaConsumer method that returns
metadata will already block until such metadata is available... so why
would you need to apply this "hack" in the first place?

Ryanne

On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <ka...@gmail.com> wrote:

> Hi devs,
>
> I'm trying to replace deprecated poll(long) with poll(Duration), and
> realized there's no alternative which behaves exactly same as poll(0), as
> poll(0) has been used as a hack to only update metadata instead of pulling
> records. poll(Duration.ZERO) wouldn't behave same since even updating
> metadata will be timed-out. So now end users would need to give more
> timeout and even pull some records even they're only interested in
> metadata.
>
> I looked back some KIPs which brought the change, and "discarded" KIP
> (KIP-288 [1]) actually proposed a new API which only pulls metadata.
> KIP-266 [2] is picked up instead but it didn't cover all the things what
> KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been
> supported officially, but the hack has been widely used and they can't be
> ignored.
>
> Kafka test code itself relies on either deprecated poll(0),
> or updateAssignmentMetadataIfNeeded, which seems to be private API only for
> testing.
> (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded as
> avoiding deprecated method - if it works I'll submit a PR.)
>
> I'm feeling that it would be ideal to expose
> `updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming
> as `waitForAssignment` which was proposed in KIP-288 if it feels too long.
>
> What do you think? If it sounds feasible I'd like to try out contribution
> on this. I'm new to contribute Kafka community, so not sure it would
> require a new KIP or not.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method
> 2.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior
>