You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Erwan ALLAIN <ea...@gmail.com> on 2015/10/06 11:46:07 UTC

does KafkaCluster can be public ?

Hello,

I'm currently testing spark streaming with kafka.
I'm creating DirectStream with KafkaUtils and everything's fine. However I
would like to use the signature where I can specify my own message handler
(to play with partition and offset). In this case, I need to manage
offset/partition by myself to fill fromOffsets argument.
I have found a Jira on this usecase
https://issues.apache.org/jira/browse/SPARK-6714 but it has been closed
telling that it's too specific.
I'm aware that it can be done using kafka api (TopicMetaDataRequest and
OffsetRequest) but what I have to do is almost the same as the KafkaCluster
which is private.

is it possible to :
 - add another signature in KafkaUtils ?
 - make KafkaCluster public ?

or do you have any other srmart solution where I don't need to copy/paste
KafkaCluster ?

Thanks.

Regards,
Erwan ALLAIN

Re: does KafkaCluster can be public ?

Posted by Ted Yu <yu...@gmail.com>.
Or maybe annotate with @DeveloperApi

Cheers

On Tue, Oct 6, 2015 at 7:24 AM, Cody Koeninger <co...@koeninger.org> wrote:

> I personally think KafkaCluster (or the equivalent) should be made
> public.  When I'm deploying spark I just sed out the private[spark] and
> rebuild.
>
> There's a general reluctance to make things public due to backwards
> compatibility, but if enough people ask for it... ?
>
> On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney <jc...@gmail.com>
> wrote:
>
>> You can put a class in the org.apache.spark namespace to access anything
>> that is private[spark]. You can then make enrichments there to access
>> whatever you need. Just beware upgrade pain :)
>>
>>
>> El martes, 6 de octubre de 2015, Erwan ALLAIN <ea...@gmail.com>
>> escribió:
>>
>>> Hello,
>>>
>>> I'm currently testing spark streaming with kafka.
>>> I'm creating DirectStream with KafkaUtils and everything's fine. However
>>> I would like to use the signature where I can specify my own message
>>> handler (to play with partition and offset). In this case, I need to manage
>>> offset/partition by myself to fill fromOffsets argument.
>>> I have found a Jira on this usecase
>>> https://issues.apache.org/jira/browse/SPARK-6714 but it has been closed
>>> telling that it's too specific.
>>> I'm aware that it can be done using kafka api (TopicMetaDataRequest and
>>> OffsetRequest) but what I have to do is almost the same as the KafkaCluster
>>> which is private.
>>>
>>> is it possible to :
>>>  - add another signature in KafkaUtils ?
>>>  - make KafkaCluster public ?
>>>
>>> or do you have any other srmart solution where I don't need to
>>> copy/paste KafkaCluster ?
>>>
>>> Thanks.
>>>
>>> Regards,
>>> Erwan ALLAIN
>>>
>>
>

Re: does KafkaCluster can be public ?

Posted by Cody Koeninger <co...@koeninger.org>.
If anyone is interested in keeping tabs on it, the jira for this is

https://issues.apache.org/jira/browse/SPARK-10963

On Wed, Oct 7, 2015 at 3:16 AM, Erwan ALLAIN <ea...@gmail.com>
wrote:

> Thanks guys !
>
> On Wed, Oct 7, 2015 at 1:41 AM, Cody Koeninger <co...@koeninger.org> wrote:
>
>> Sure no prob.
>>
>> On Tue, Oct 6, 2015 at 6:35 PM, Tathagata Das <td...@databricks.com>
>> wrote:
>>
>>> Given the interest, I am also inclining towards making it a public
>>> developer API. Maybe even experimental. Cody, mind submitting a patch?
>>>
>>>
>>> On Tue, Oct 6, 2015 at 7:45 AM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> For what it's worth, I also use this class in an app, but it happens
>>>> to be from Java code where it acts as if it's public. So no problem
>>>> for my use case, but I suppose, another small vote for the usefulness
>>>> of this class to the caller. I end up using getLatestLeaderOffsets to
>>>> figure out how to initialize initial offsets.
>>>>
>>>> On Tue, Oct 6, 2015 at 3:24 PM, Cody Koeninger <co...@koeninger.org>
>>>> wrote:
>>>> > I personally think KafkaCluster (or the equivalent) should be made
>>>> public.
>>>> > When I'm deploying spark I just sed out the private[spark] and
>>>> rebuild.
>>>> >
>>>> > There's a general reluctance to make things public due to backwards
>>>> > compatibility, but if enough people ask for it... ?
>>>> >
>>>> > On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney <jc...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> You can put a class in the org.apache.spark namespace to access
>>>> anything
>>>> >> that is private[spark]. You can then make enrichments there to access
>>>> >> whatever you need. Just beware upgrade pain :)
>>>> >>
>>>> >>
>>>> >> El martes, 6 de octubre de 2015, Erwan ALLAIN <
>>>> eallain.poctu@gmail.com>
>>>> >> escribió:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> I'm currently testing spark streaming with kafka.
>>>> >>> I'm creating DirectStream with KafkaUtils and everything's fine.
>>>> However
>>>> >>> I would like to use the signature where I can specify my own
>>>> message handler
>>>> >>> (to play with partition and offset). In this case, I need to manage
>>>> >>> offset/partition by myself to fill fromOffsets argument.
>>>> >>> I have found a Jira on this usecase
>>>> >>> https://issues.apache.org/jira/browse/SPARK-6714 but it has been
>>>> closed
>>>> >>> telling that it's too specific.
>>>> >>> I'm aware that it can be done using kafka api (TopicMetaDataRequest
>>>> and
>>>> >>> OffsetRequest) but what I have to do is almost the same as the
>>>> KafkaCluster
>>>> >>> which is private.
>>>> >>>
>>>> >>> is it possible to :
>>>> >>>  - add another signature in KafkaUtils ?
>>>> >>>  - make KafkaCluster public ?
>>>> >>>
>>>> >>> or do you have any other srmart solution where I don't need to
>>>> copy/paste
>>>> >>> KafkaCluster ?
>>>> >>>
>>>> >>> Thanks.
>>>> >>>
>>>> >>> Regards,
>>>> >>> Erwan ALLAIN
>>>> >
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: does KafkaCluster can be public ?

Posted by Erwan ALLAIN <ea...@gmail.com>.
Thanks guys !

On Wed, Oct 7, 2015 at 1:41 AM, Cody Koeninger <co...@koeninger.org> wrote:

> Sure no prob.
>
> On Tue, Oct 6, 2015 at 6:35 PM, Tathagata Das <td...@databricks.com> wrote:
>
>> Given the interest, I am also inclining towards making it a public
>> developer API. Maybe even experimental. Cody, mind submitting a patch?
>>
>>
>> On Tue, Oct 6, 2015 at 7:45 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> For what it's worth, I also use this class in an app, but it happens
>>> to be from Java code where it acts as if it's public. So no problem
>>> for my use case, but I suppose, another small vote for the usefulness
>>> of this class to the caller. I end up using getLatestLeaderOffsets to
>>> figure out how to initialize initial offsets.
>>>
>>> On Tue, Oct 6, 2015 at 3:24 PM, Cody Koeninger <co...@koeninger.org>
>>> wrote:
>>> > I personally think KafkaCluster (or the equivalent) should be made
>>> public.
>>> > When I'm deploying spark I just sed out the private[spark] and rebuild.
>>> >
>>> > There's a general reluctance to make things public due to backwards
>>> > compatibility, but if enough people ask for it... ?
>>> >
>>> > On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney <jc...@gmail.com>
>>> wrote:
>>> >>
>>> >> You can put a class in the org.apache.spark namespace to access
>>> anything
>>> >> that is private[spark]. You can then make enrichments there to access
>>> >> whatever you need. Just beware upgrade pain :)
>>> >>
>>> >>
>>> >> El martes, 6 de octubre de 2015, Erwan ALLAIN <
>>> eallain.poctu@gmail.com>
>>> >> escribió:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> I'm currently testing spark streaming with kafka.
>>> >>> I'm creating DirectStream with KafkaUtils and everything's fine.
>>> However
>>> >>> I would like to use the signature where I can specify my own message
>>> handler
>>> >>> (to play with partition and offset). In this case, I need to manage
>>> >>> offset/partition by myself to fill fromOffsets argument.
>>> >>> I have found a Jira on this usecase
>>> >>> https://issues.apache.org/jira/browse/SPARK-6714 but it has been
>>> closed
>>> >>> telling that it's too specific.
>>> >>> I'm aware that it can be done using kafka api (TopicMetaDataRequest
>>> and
>>> >>> OffsetRequest) but what I have to do is almost the same as the
>>> KafkaCluster
>>> >>> which is private.
>>> >>>
>>> >>> is it possible to :
>>> >>>  - add another signature in KafkaUtils ?
>>> >>>  - make KafkaCluster public ?
>>> >>>
>>> >>> or do you have any other srmart solution where I don't need to
>>> copy/paste
>>> >>> KafkaCluster ?
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>> Regards,
>>> >>> Erwan ALLAIN
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: does KafkaCluster can be public ?

Posted by Cody Koeninger <co...@koeninger.org>.
Sure no prob.

On Tue, Oct 6, 2015 at 6:35 PM, Tathagata Das <td...@databricks.com> wrote:

> Given the interest, I am also inclining towards making it a public
> developer API. Maybe even experimental. Cody, mind submitting a patch?
>
>
> On Tue, Oct 6, 2015 at 7:45 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> For what it's worth, I also use this class in an app, but it happens
>> to be from Java code where it acts as if it's public. So no problem
>> for my use case, but I suppose, another small vote for the usefulness
>> of this class to the caller. I end up using getLatestLeaderOffsets to
>> figure out how to initialize initial offsets.
>>
>> On Tue, Oct 6, 2015 at 3:24 PM, Cody Koeninger <co...@koeninger.org>
>> wrote:
>> > I personally think KafkaCluster (or the equivalent) should be made
>> public.
>> > When I'm deploying spark I just sed out the private[spark] and rebuild.
>> >
>> > There's a general reluctance to make things public due to backwards
>> > compatibility, but if enough people ask for it... ?
>> >
>> > On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney <jc...@gmail.com>
>> wrote:
>> >>
>> >> You can put a class in the org.apache.spark namespace to access
>> anything
>> >> that is private[spark]. You can then make enrichments there to access
>> >> whatever you need. Just beware upgrade pain :)
>> >>
>> >>
>> >> El martes, 6 de octubre de 2015, Erwan ALLAIN <eallain.poctu@gmail.com
>> >
>> >> escribió:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I'm currently testing spark streaming with kafka.
>> >>> I'm creating DirectStream with KafkaUtils and everything's fine.
>> However
>> >>> I would like to use the signature where I can specify my own message
>> handler
>> >>> (to play with partition and offset). In this case, I need to manage
>> >>> offset/partition by myself to fill fromOffsets argument.
>> >>> I have found a Jira on this usecase
>> >>> https://issues.apache.org/jira/browse/SPARK-6714 but it has been
>> closed
>> >>> telling that it's too specific.
>> >>> I'm aware that it can be done using kafka api (TopicMetaDataRequest
>> and
>> >>> OffsetRequest) but what I have to do is almost the same as the
>> KafkaCluster
>> >>> which is private.
>> >>>
>> >>> is it possible to :
>> >>>  - add another signature in KafkaUtils ?
>> >>>  - make KafkaCluster public ?
>> >>>
>> >>> or do you have any other srmart solution where I don't need to
>> copy/paste
>> >>> KafkaCluster ?
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Regards,
>> >>> Erwan ALLAIN
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: does KafkaCluster can be public ?

Posted by Tathagata Das <td...@databricks.com>.
Given the interest, I am also inclining towards making it a public
developer API. Maybe even experimental. Cody, mind submitting a patch?


On Tue, Oct 6, 2015 at 7:45 AM, Sean Owen <so...@cloudera.com> wrote:

> For what it's worth, I also use this class in an app, but it happens
> to be from Java code where it acts as if it's public. So no problem
> for my use case, but I suppose, another small vote for the usefulness
> of this class to the caller. I end up using getLatestLeaderOffsets to
> figure out how to initialize initial offsets.
>
> On Tue, Oct 6, 2015 at 3:24 PM, Cody Koeninger <co...@koeninger.org> wrote:
> > I personally think KafkaCluster (or the equivalent) should be made
> public.
> > When I'm deploying spark I just sed out the private[spark] and rebuild.
> >
> > There's a general reluctance to make things public due to backwards
> > compatibility, but if enough people ask for it... ?
> >
> > On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney <jc...@gmail.com>
> wrote:
> >>
> >> You can put a class in the org.apache.spark namespace to access anything
> >> that is private[spark]. You can then make enrichments there to access
> >> whatever you need. Just beware upgrade pain :)
> >>
> >>
> >> El martes, 6 de octubre de 2015, Erwan ALLAIN <ea...@gmail.com>
> >> escribió:
> >>>
> >>> Hello,
> >>>
> >>> I'm currently testing spark streaming with kafka.
> >>> I'm creating DirectStream with KafkaUtils and everything's fine.
> However
> >>> I would like to use the signature where I can specify my own message
> handler
> >>> (to play with partition and offset). In this case, I need to manage
> >>> offset/partition by myself to fill fromOffsets argument.
> >>> I have found a Jira on this usecase
> >>> https://issues.apache.org/jira/browse/SPARK-6714 but it has been
> closed
> >>> telling that it's too specific.
> >>> I'm aware that it can be done using kafka api (TopicMetaDataRequest and
> >>> OffsetRequest) but what I have to do is almost the same as the
> KafkaCluster
> >>> which is private.
> >>>
> >>> is it possible to :
> >>>  - add another signature in KafkaUtils ?
> >>>  - make KafkaCluster public ?
> >>>
> >>> or do you have any other srmart solution where I don't need to
> copy/paste
> >>> KafkaCluster ?
> >>>
> >>> Thanks.
> >>>
> >>> Regards,
> >>> Erwan ALLAIN
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: does KafkaCluster can be public ?

Posted by Sean Owen <so...@cloudera.com>.
For what it's worth, I also use this class in an app, but it happens
to be from Java code where it acts as if it's public. So no problem
for my use case, but I suppose, another small vote for the usefulness
of this class to the caller. I end up using getLatestLeaderOffsets to
figure out how to initialize initial offsets.

On Tue, Oct 6, 2015 at 3:24 PM, Cody Koeninger <co...@koeninger.org> wrote:
> I personally think KafkaCluster (or the equivalent) should be made public.
> When I'm deploying spark I just sed out the private[spark] and rebuild.
>
> There's a general reluctance to make things public due to backwards
> compatibility, but if enough people ask for it... ?
>
> On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney <jc...@gmail.com> wrote:
>>
>> You can put a class in the org.apache.spark namespace to access anything
>> that is private[spark]. You can then make enrichments there to access
>> whatever you need. Just beware upgrade pain :)
>>
>>
>> El martes, 6 de octubre de 2015, Erwan ALLAIN <ea...@gmail.com>
>> escribió:
>>>
>>> Hello,
>>>
>>> I'm currently testing spark streaming with kafka.
>>> I'm creating DirectStream with KafkaUtils and everything's fine. However
>>> I would like to use the signature where I can specify my own message handler
>>> (to play with partition and offset). In this case, I need to manage
>>> offset/partition by myself to fill fromOffsets argument.
>>> I have found a Jira on this usecase
>>> https://issues.apache.org/jira/browse/SPARK-6714 but it has been closed
>>> telling that it's too specific.
>>> I'm aware that it can be done using kafka api (TopicMetaDataRequest and
>>> OffsetRequest) but what I have to do is almost the same as the KafkaCluster
>>> which is private.
>>>
>>> is it possible to :
>>>  - add another signature in KafkaUtils ?
>>>  - make KafkaCluster public ?
>>>
>>> or do you have any other srmart solution where I don't need to copy/paste
>>> KafkaCluster ?
>>>
>>> Thanks.
>>>
>>> Regards,
>>> Erwan ALLAIN
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: does KafkaCluster can be public ?

Posted by Cody Koeninger <co...@koeninger.org>.
I personally think KafkaCluster (or the equivalent) should be made public.
When I'm deploying spark I just sed out the private[spark] and rebuild.

There's a general reluctance to make things public due to backwards
compatibility, but if enough people ask for it... ?

On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney <jc...@gmail.com> wrote:

> You can put a class in the org.apache.spark namespace to access anything
> that is private[spark]. You can then make enrichments there to access
> whatever you need. Just beware upgrade pain :)
>
>
> El martes, 6 de octubre de 2015, Erwan ALLAIN <ea...@gmail.com>
> escribió:
>
>> Hello,
>>
>> I'm currently testing spark streaming with kafka.
>> I'm creating DirectStream with KafkaUtils and everything's fine. However
>> I would like to use the signature where I can specify my own message
>> handler (to play with partition and offset). In this case, I need to manage
>> offset/partition by myself to fill fromOffsets argument.
>> I have found a Jira on this usecase
>> https://issues.apache.org/jira/browse/SPARK-6714 but it has been closed
>> telling that it's too specific.
>> I'm aware that it can be done using kafka api (TopicMetaDataRequest and
>> OffsetRequest) but what I have to do is almost the same as the KafkaCluster
>> which is private.
>>
>> is it possible to :
>>  - add another signature in KafkaUtils ?
>>  - make KafkaCluster public ?
>>
>> or do you have any other srmart solution where I don't need to copy/paste
>> KafkaCluster ?
>>
>> Thanks.
>>
>> Regards,
>> Erwan ALLAIN
>>
>

Re: does KafkaCluster can be public ?

Posted by Jonathan Coveney <jc...@gmail.com>.
You can put a class in the org.apache.spark namespace to access anything
that is private[spark]. You can then make enrichments there to access
whatever you need. Just beware upgrade pain :)

El martes, 6 de octubre de 2015, Erwan ALLAIN <ea...@gmail.com>
escribió:

> Hello,
>
> I'm currently testing spark streaming with kafka.
> I'm creating DirectStream with KafkaUtils and everything's fine. However I
> would like to use the signature where I can specify my own message handler
> (to play with partition and offset). In this case, I need to manage
> offset/partition by myself to fill fromOffsets argument.
> I have found a Jira on this usecase
> https://issues.apache.org/jira/browse/SPARK-6714 but it has been closed
> telling that it's too specific.
> I'm aware that it can be done using kafka api (TopicMetaDataRequest and
> OffsetRequest) but what I have to do is almost the same as the KafkaCluster
> which is private.
>
> is it possible to :
>  - add another signature in KafkaUtils ?
>  - make KafkaCluster public ?
>
> or do you have any other srmart solution where I don't need to copy/paste
> KafkaCluster ?
>
> Thanks.
>
> Regards,
> Erwan ALLAIN
>