You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Andras Nagy <an...@gmail.com> on 2019/05/22 14:11:52 UTC

Question about handling late arrival in streaming ingestion

Dear All,

I have a question about the handling of events that arrive significantly
later than the logical event timestamp, in streaming ingestion.

In the blog post from 2016 at
http://kylin.apache.org/blog/2016/10/18/new-nrt-streaming/ , I read this:
"To let the late/early message can be queried, Cube segments allow overlap
for the partition time dimension: each segment has a “min” date/time and a
“max” date/time; Kylin will scan all segments which matched with the
queried time scope. Figure 2 illurates this. ..."

On the other hand, I found a ticket:
https://issues.apache.org/jira/browse/KYLIN-1210 titled "Allowing segment
overlap to solve streaming data completeness problem" which seems to be
about the same issue, but its status is Open/unresolved.

There is also another ticket:
https://issues.apache.org/jira/browse/KYLIN-1744 titled "Separate concepts
of source offset and date range on cube segments", which seems to be
related again. This one is Closed/Fixed in 1.5.3.

Can you please help to clarify this, what is the status of this capability?
What is the best practice currently to handle late arrival of events with
Kylin?

Many thanks,
Andras

Re: Question about handling late arrival in streaming ingestion

Posted by Andras Nagy <an...@gmail.com>.
Hi Shaofeng,
Sounds good, thanks a lot for the quick answer!
Best regards,
Andras

On Wed, May 22, 2019 at 4:56 PM ShaoFeng Shi <sh...@apache.org> wrote:

> Hello Andras,
>
> The description in the "new-nrt-streaming" is correct: the late message
> will be built into next segments, while the segments' time range can have
> overlap, and Kylin will scan all segments which matches with the query time.
>
> I just closed KYLIN-1210 which was overlooked before.
>
> KYLIN-1744 is a (pre-requisite) refactor work, which is a sub task of
> KYLIN-1726; The KYLIN-1726 was released in v1.6.0.
>
> Thanks for your feedback!
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>
>
> Andras Nagy <an...@gmail.com> 于2019年5月22日周三 下午10:18写道:
>
>> Dear All,
>>
>> I have a question about the handling of events that arrive significantly
>> later than the logical event timestamp, in streaming ingestion.
>>
>> In the blog post from 2016 at
>> http://kylin.apache.org/blog/2016/10/18/new-nrt-streaming/ , I read this:
>> "To let the late/early message can be queried, Cube segments allow
>> overlap for the partition time dimension: each segment has a “min”
>> date/time and a “max” date/time; Kylin will scan all segments which matched
>> with the queried time scope. Figure 2 illurates this. ..."
>>
>> On the other hand, I found a ticket:
>> https://issues.apache.org/jira/browse/KYLIN-1210 titled "Allowing
>> segment overlap to solve streaming data completeness problem" which seems
>> to be about the same issue, but its status is Open/unresolved.
>>
>> There is also another ticket:
>> https://issues.apache.org/jira/browse/KYLIN-1744 titled "Separate
>> concepts of source offset and date range on cube segments", which seems to
>> be related again. This one is Closed/Fixed in 1.5.3.
>>
>> Can you please help to clarify this, what is the status of this
>> capability?
>> What is the best practice currently to handle late arrival of events with
>> Kylin?
>>
>> Many thanks,
>> Andras
>>
>

Re: Question about handling late arrival in streaming ingestion

Posted by ShaoFeng Shi <sh...@apache.org>.
Hello Andras,

The description in the "new-nrt-streaming" is correct: the late message
will be built into next segments, while the segments' time range can have
overlap, and Kylin will scan all segments which matches with the query time.

I just closed KYLIN-1210 which was overlooked before.

KYLIN-1744 is a (pre-requisite) refactor work, which is a sub task of
KYLIN-1726; The KYLIN-1726 was released in v1.6.0.

Thanks for your feedback!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Andras Nagy <an...@gmail.com> 于2019年5月22日周三 下午10:18写道:

> Dear All,
>
> I have a question about the handling of events that arrive significantly
> later than the logical event timestamp, in streaming ingestion.
>
> In the blog post from 2016 at
> http://kylin.apache.org/blog/2016/10/18/new-nrt-streaming/ , I read this:
> "To let the late/early message can be queried, Cube segments allow overlap
> for the partition time dimension: each segment has a “min” date/time and a
> “max” date/time; Kylin will scan all segments which matched with the
> queried time scope. Figure 2 illurates this. ..."
>
> On the other hand, I found a ticket:
> https://issues.apache.org/jira/browse/KYLIN-1210 titled "Allowing segment
> overlap to solve streaming data completeness problem" which seems to be
> about the same issue, but its status is Open/unresolved.
>
> There is also another ticket:
> https://issues.apache.org/jira/browse/KYLIN-1744 titled "Separate
> concepts of source offset and date range on cube segments", which seems to
> be related again. This one is Closed/Fixed in 1.5.3.
>
> Can you please help to clarify this, what is the status of this
> capability?
> What is the best practice currently to handle late arrival of events with
> Kylin?
>
> Many thanks,
> Andras
>