You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Abhishek Ravi (JIRA)" <ji...@apache.org> on 2018/04/02 05:43:00 UTC

[jira] [Commented] (DRILL-5977) predicate pushdown support kafkaMsgOffset

    [ https://issues.apache.org/jira/browse/DRILL-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421951#comment-16421951 ] 

Abhishek Ravi commented on DRILL-5977:
--------------------------------------

Thank you for review [~akumarb2010]. Yes, you are absolutely right. As an initial approach to tackle this problem I plan to do the following after obtaining *top-level predicates* in an expression.
 # Check if condition on {{kafkaMsgTimestamp}} / {{kafkaMsgOffset exists.}}
 # Check if there is no {{OR}}  joining top-level predicates.

Do filter pushdown only when both checks succeed. Does  this sound good?

> predicate pushdown support kafkaMsgOffset
> -----------------------------------------
>
>                 Key: DRILL-5977
>                 URL: https://issues.apache.org/jira/browse/DRILL-5977
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: B Anil Kumar
>            Assignee: Bhallamudi Venkata Siva Kamesh
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As part of Kafka storage plugin review, below is the suggestion from Paul.
> {noformat}
> Does it make sense to provide a way to select a range of messages: a starting point or a count? Perhaps I want to run my query every five minutes, scanning only those messages since the previous scan. Or, I want to limit my take to, say, the next 1000 messages. Could we use a pseudo-column such as "kafkaMsgOffset" for that purpose? Maybe
> SELECT * FROM <some topic> WHERE kafkaMsgOffset > 12345
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)