You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Amit Jain <aj...@gmail.com> on 2016/02/19 14:30:25 UTC

Support for Hive Mutation Streaming

Hi All,

We have use case where we want to make use of hive mutation streaming
feature. Do we have support for this in upcoming releases.

https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API

--
Thanks,
Amit

Re: Support for Hive Mutation Streaming

Posted by Roshan Naik <ro...@hortonworks.com>.
I won't say its of no use, I am sure someone will figure out a reasonable
use case for it with Flume. But there appears to be some impedance
mismatch.


Flume being a streaming event movement product is basically continuously
inserting data into hive,hdfs,etc.
The Hive streaming APIs serve that purpose and were even designed with
Flume kind of products in mind.

As per the Hcat  Mutation API docs, they seem to be designed around use
cases involving:
  - Where you are trying to modify existing data
  - keeping a replica in sync with updates on a master copy
  - infrequently apply large sets of mutations to a data set in an atomic
fashion

As opposed to Streaming APIs which :
  - Focuses on surfacing a continuous stream of new data into a Hive table
and does so by batching small sets of writes into multiple short-lived
transactions


The notion of a stream of relatively smaller batches (short lived
transactions) fits nicely with flume's transactions. In contrast the
'infrequently apply large sets' use case does not seem to fit Flume very
well. That model seems to fit Sqoop better.


Do you have thoughts on some good Flume use cases that would require the
Mutation APIs over Streaming APIs ?

-roshan



On 2/20/16, 5:20 AM, "Amit Jain" <aj...@gmail.com> wrote:

>Hi Roshan,
>
>Could you please help me learn why Hive Mutation Streaming APIs would not
>be good value addition to Flume?
>
>
>--
>Thanks,
>Amit
>
>On Sat, Feb 20, 2016 at 1:55 AM, Roshan Naik <ro...@hortonworks.com>
>wrote:
>
>> For the Flume kind of streaming ingest, the Hive Streaming APIs should
>>be
>> more appropriate Š which is already supported.
>> -roshan
>>
>>
>> On 2/19/16, 5:30 AM, "Amit Jain" <aj...@gmail.com> wrote:
>>
>> >Hi All,
>> >
>> >We have use case where we want to make use of hive mutation streaming
>> >feature. Do we have support for this in upcoming releases.
>> >
>> >
>> 
>>https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutat
>>i
>> >on+API
>> >
>> >--
>> >Thanks,
>> >Amit
>>
>>


Re: Support for Hive Mutation Streaming

Posted by Amit Jain <aj...@gmail.com>.
Hi Roshan,

Could you please help me learn why Hive Mutation Streaming APIs would not
be good value addition to Flume?


--
Thanks,
Amit

On Sat, Feb 20, 2016 at 1:55 AM, Roshan Naik <ro...@hortonworks.com> wrote:

> For the Flume kind of streaming ingest, the Hive Streaming APIs should be
> more appropriate Š which is already supported.
> -roshan
>
>
> On 2/19/16, 5:30 AM, "Amit Jain" <aj...@gmail.com> wrote:
>
> >Hi All,
> >
> >We have use case where we want to make use of hive mutation streaming
> >feature. Do we have support for this in upcoming releases.
> >
> >
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutati
> >on+API
> >
> >--
> >Thanks,
> >Amit
>
>

Re: Support for Hive Mutation Streaming

Posted by Roshan Naik <ro...@hortonworks.com>.
For the Flume kind of streaming ingest, the Hive Streaming APIs should be
more appropriate Š which is already supported.
-roshan


On 2/19/16, 5:30 AM, "Amit Jain" <aj...@gmail.com> wrote:

>Hi All,
>
>We have use case where we want to make use of hive mutation streaming
>feature. Do we have support for this in upcoming releases.
>
>https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutati
>on+API
>
>--
>Thanks,
>Amit