You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by trung kien <ki...@gmail.com> on 2016/05/25 13:15:50 UTC

Spark Streaming - Kafka Direct Approach: re-compute from specific time

Hi all,

Is there any way to re-compute using Spark Streaming - Kafka Direct
Approach from specific time?

In some cases, I want to re-compute again from specific time (e.g beginning
of day)? is that possible?



-- 
Thanks
Kien

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

Posted by trung kien <ki...@gmail.com>.
Ah right i see.

Thank you very much.
On May 25, 2016 11:11 AM, "Cody Koeninger" <co...@koeninger.org> wrote:

> There's an overloaded createDirectStream method that takes a map from
> topicpartition to offset for the starting point of the stream.
>
> On Wed, May 25, 2016 at 9:59 AM, trung kien <ki...@gmail.com> wrote:
> > Thank Cody.
> >
> > I can build the mapping from time ->offset. However how can i pass this
> > offset to Spark Streaming job using that offset? ( using Direct Approach)
> >
> > On May 25, 2016 9:42 AM, "Cody Koeninger" <co...@koeninger.org> wrote:
> >>
> >> Kafka does not yet have meaningful time indexing, there's a kafka
> >> improvement proposal for it but it has gotten pushed back to at least
> >> 0.10.1
> >>
> >> If you want to do this kind of thing, you will need to maintain your
> >> own index from time to offset.
> >>
> >> On Wed, May 25, 2016 at 8:15 AM, trung kien <ki...@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > Is there any way to re-compute using Spark Streaming - Kafka Direct
> >> > Approach
> >> > from specific time?
> >> >
> >> > In some cases, I want to re-compute again from specific time (e.g
> >> > beginning
> >> > of day)? is that possible?
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks
> >> > Kien
>

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

Posted by Cody Koeninger <co...@koeninger.org>.
There's an overloaded createDirectStream method that takes a map from
topicpartition to offset for the starting point of the stream.

On Wed, May 25, 2016 at 9:59 AM, trung kien <ki...@gmail.com> wrote:
> Thank Cody.
>
> I can build the mapping from time ->offset. However how can i pass this
> offset to Spark Streaming job using that offset? ( using Direct Approach)
>
> On May 25, 2016 9:42 AM, "Cody Koeninger" <co...@koeninger.org> wrote:
>>
>> Kafka does not yet have meaningful time indexing, there's a kafka
>> improvement proposal for it but it has gotten pushed back to at least
>> 0.10.1
>>
>> If you want to do this kind of thing, you will need to maintain your
>> own index from time to offset.
>>
>> On Wed, May 25, 2016 at 8:15 AM, trung kien <ki...@gmail.com> wrote:
>> > Hi all,
>> >
>> > Is there any way to re-compute using Spark Streaming - Kafka Direct
>> > Approach
>> > from specific time?
>> >
>> > In some cases, I want to re-compute again from specific time (e.g
>> > beginning
>> > of day)? is that possible?
>> >
>> >
>> >
>> > --
>> > Thanks
>> > Kien

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

Posted by trung kien <ki...@gmail.com>.
Thank Cody.

I can build the mapping from time ->offset. However how can i pass this
offset to Spark Streaming job using that offset? ( using Direct Approach)
On May 25, 2016 9:42 AM, "Cody Koeninger" <co...@koeninger.org> wrote:

> Kafka does not yet have meaningful time indexing, there's a kafka
> improvement proposal for it but it has gotten pushed back to at least
> 0.10.1
>
> If you want to do this kind of thing, you will need to maintain your
> own index from time to offset.
>
> On Wed, May 25, 2016 at 8:15 AM, trung kien <ki...@gmail.com> wrote:
> > Hi all,
> >
> > Is there any way to re-compute using Spark Streaming - Kafka Direct
> Approach
> > from specific time?
> >
> > In some cases, I want to re-compute again from specific time (e.g
> beginning
> > of day)? is that possible?
> >
> >
> >
> > --
> > Thanks
> > Kien
>

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

Posted by Cody Koeninger <co...@koeninger.org>.
Kafka does not yet have meaningful time indexing, there's a kafka
improvement proposal for it but it has gotten pushed back to at least
0.10.1

If you want to do this kind of thing, you will need to maintain your
own index from time to offset.

On Wed, May 25, 2016 at 8:15 AM, trung kien <ki...@gmail.com> wrote:
> Hi all,
>
> Is there any way to re-compute using Spark Streaming - Kafka Direct Approach
> from specific time?
>
> In some cases, I want to re-compute again from specific time (e.g beginning
> of day)? is that possible?
>
>
>
> --
> Thanks
> Kien

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org