You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2016/09/28 23:23:20 UTC

[jira] [Comment Edited] (CALCITE-1391) CRUD operations using Calcite for DRUID

    [ https://issues.apache.org/jira/browse/CALCITE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531009#comment-15531009 ] 

slim bouguerra edited comment on CALCITE-1391 at 9/28/16 11:22 PM:
-------------------------------------------------------------------

I think the scope of this request is not very clear. I will try to explain what druid is capable of hope that help to make the question more narrow.
First let's keep in mind that druid is built around a very strong design hypothesis which is the immutability of the segments. By having a read only segments we can achieve huge speedup/caching.
But this it does not mean that you can not insert/append or delete data.
Assuming your are ingestion data via batch ingestion, let's start with the simple case which is delete data.
Druid has the concept of load and drop rules for segments, hence to drop  an entire segment all we need to do is to send a drop request to the coordinator. For instance in a reporting use case, usually you want to load the data for the last 3 months only, you can tell druid to only keep the segment within the last 3 months.  
Although if your use case is to delete one or N-rows from one segment that is not trivial. What you need to do is to send a new index task with the new data for that specific interval then druid will replace the old segment and by the new one.
What this imply is that you won't really impact how fast or slow is the query. But druid will happily answer to the query using the old segment till the re-index task is done.

For append/insert druid support delta batch ingestion. What this means you can trigger an index job and you tell druid that you want to append/insert this set of rows to this existing segment. 
What is the impact of this on the query. The answer is the same as above. Druid will answer to the query with the old data till the index map-reduce job is done.

If you use case is realtime it is totally a different discussion.
[~arupadhy] hope this make sense to you. Please feel free to explain more the use case you are targeting.


was (Author: bslim):
I think the scope of this request is not very clear. I will try to explain what druid is capable of hope that help to make the question more narrow.
First let's keep in mind that druid is built around a very strong design hypothesis which is the immutability of the segments. By having a read only segments we can achieve huge speedup/caching.
But this it does not mean that you can not insert/append or delete data.
Assuming your are ingestion data via batch ingestion, let's start with the simple case which is delete data.
Druid has the concept of load and drop rules for segments, hence to drop  an entire segment all we need to do is to send a drop request to the coordinator. For instance in a reporting use case, usually you want to load the data for the last 3 months only, you can tell druid to only keep the segment within the last 3 months.  
Although if your use case is to delete one or N-rows from one segment that is not trivial. What you need to do is to send a new index task with the new data for that specific interval then druid will replace the old segment and by the new one.
What this imply is that you won't really impact how fast or slow is the query. But druid will happily answer to the query using the old segment till the re-index task is done.

For append/insert druid support delta batch ingestion. What his means you can trigger an index job and you tell druid that you want to append/insert this set of rows to this existing segment. 
What is the impact of this on the query. The answer is the same as above. Druid will answer to the query with the old data till the index map-reduce job is done.

If you use case is realtime it is totally a different discussion.
[~arupadhy] hope this make sense to you. Please feel free to explain more the use case you are targeting.

> CRUD operations using Calcite for DRUID
> ---------------------------------------
>
>                 Key: CALCITE-1391
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1391
>             Project: Calcite
>          Issue Type: Wish
>          Components: druid
>         Environment: Druid Production Environment
>            Reporter: Arvind
>            Assignee: Julian Hyde
>            Priority: Critical
>
> Hi Team,
> I know that we are able to read data from druid using Calcite SQLLINE command but I would like to know if we can insert, Update and delete segments in DRUID using Calcite. We are using DRUID as a reporting platform and for us to incrementally update, delete and append data we need these operations. As we are dealing with huge volumes of data we depend on DRUID and very badly need these Insert, Update and Delete.
> Any help is much appreciated.
> Thanks,
> Arvind



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)