You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apex.apache.org by Bhupesh Chawda <bh...@datatorrent.com> on 2015/12/18 12:09:38 UTC

Adding features to HBase Input Operators in Malhar-contrib

Hi All,

The current HBasePOJOInputOperator does not allow us to do the following:

   1. Allow us to specify a set of "column family: column" and fetch data
   only for these columns.
   2. Output format is currently a POJO. We need to have other output
   formats such that "columnFamily:column" representation is supported. Map /
   CSV are some of the options.
   3. Allow specifying "end row-key" to stop scanning a table.
   4. No metrics.

I am planning to add the above functionality to the HBase Input operators.
These features may go into the HBaseScanOperator / HBasePOJOInputOperator.

Please let me know your comments.

Thanks.

Bhupesh

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.

I shall do that in a day or two.

Regards,
Sandeep

On Thu, Mar 24, 2016 at 6:10 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Dear Community,
>
> Can anyone help review the pull request:
> https://github.com/apache/incubator-apex-malhar/pull/212
>
> Thanks.
>
> ~Bhupesh
>
> On Thu, Mar 17, 2016 at 4:16 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi,
> >
> > I have opened a pull request for the changes as described in the previous
> > emails. Here is the pull request:
> > https://github.com/apache/incubator-apex-malhar/pull/212
> >
> > Here is a short description of the changes:
> >
> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> > of HBaseOperatorBase.
> > HBaseScanOperator - Takes care of scanning the table in a non-blocking
> > manner. Exposes operationScan() and getTuple() as before.
> > HBasePOJOInputOperator - Implements operationScan() and getTuple() and
> > outputs a POJO on the output port.
> >
> > Please help review these changes.
> >
> > Thanks
> > ~Bhupesh
> >
> > On Fri, Mar 11, 2016 at 4:42 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> >> Hi All,
> >>
> >> In the current design of HBase input and output operators, the row key
> is
> >> hard-coded to be of String type.
> >> I foresee the following issue:
> >>
> >>    - In case of numeric keys which are type casted to String,
> *incremental
> >>    read* is problematic. For example, after reading key = 9, we may not
> >>    be able to read any record with say, key = 8888, when though
> numerically
> >>    8888 > 9, lexicographically "9" > "8888".
> >>    - This is the case only when data is being written to HBase and being
> >>    read from simultaneously.
> >>
> >> My suggestion is to parametrize the type of row key in the HBase input
> >> and output operators, and let the user instantiate the required type for
> >> row key. We can have default implementations for String and/ or Long. By
> >> parametrizing the row key type, the user can even use complex row keys
> >> which are a combination of multiple fields.
> >>
> >> Thoughts?
> >>
> >> PS: I understand that there is a performance concern in making a
> >> monotonically increasing key as the row key. Given that, how do we
> address
> >> the incremental read scenario?
> >>
> >> Thanks
> >>
> >> -Bhupesh
> >>
> >> On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <
> >> sandeep@datatorrent.com> wrote:
> >>
> >>> Looks fine to me.
> >>>
> >>> Regards,
> >>> Sandeep
> >>>
> >>> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> >>> >
> >>> wrote:
> >>>
> >>> > Here is the final hierarchy I am considering:
> >>> >
> >>> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got
> >>> rid
> >>> > of HBaseOperatorBase.
> >>> >     HBaseScanOperator - Takes care of scanning the table in a
> >>> non-blocking
> >>> > manner. Exposes operationScan() and getTuple() as before.
> >>> >         HBasePOJOInputOperator - Implements operationScan() and
> >>> getTuple()
> >>> > and outputs a POJO on the output port.
> >>> >
> >>> > Comments?
> >>> >
> >>> > -Bhupesh
> >>> >
> >>> >
> >>> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <
> >>> bhupesh@datatorrent.com>
> >>> > wrote:
> >>> >
> >>> > > The class HBaseInputOperator seems to be quite old. HBaseStore
> seems
> >>> to
> >>> > be
> >>> > > having all the functionality provided by HBaseInputOperator and
> even
> >>> more
> >>> > > (including Kerberos authentication).
> >>> > >
> >>> > > It would be a good idea to avoid the usage of HBaseInputOperator
> >>> going
> >>> > > forward and use HBaseStore instead.
> >>> > >
> >>> > > I will also work on abstracting out the HBase input functionality
> in
> >>> the
> >>> > > HBaseInputOperator, which can be extended by concrete
> >>> implementations.
> >>> > >
> >>> > > -Bhupesh
> >>> > >
> >>> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
> >>> bhupesh@datatorrent.com
> >>> > >
> >>> > > wrote:
> >>> > >
> >>> > >> Thanks for the inputs.
> >>> > >> As an input operator, I am targeting just the Scan operation. Get
> >>> > >> operation may be supported better as a generic operator (like a
> >>> query
> >>> > >> operator) which I can take up later.
> >>> > >>
> >>> > >> -Bhupesh
> >>> > >>
> >>> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <
> >>> mohit@datatorrent.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >>> +1
> >>> > >>>
> >>> > >>> Regards,
> >>> > >>> Mohit
> >>> > >>>
> >>> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> >>> > >>> chinmay@datatorrent.com
> >>> > >>> > wrote:
> >>> > >>>
> >>> > >>> > +1 for above.
> >>> > >>> > I see that there is HbaseGetOperator but but its abstract no
> >>> concrete
> >>> > >>> > implementation of this I can find.
> >>> > >>> > Are you going to implement of that too?
> >>> > >>> >
> >>> > >>> > Maybe the concrete implementation of HbaseGetOperator should
> have
> >>> > this.
> >>> > >>> >
> >>> > >>> > Also, I want to mention one thing about scan from my previous
> >>> > >>> experience of
> >>> > >>> > Hbase. The Hbase client is synchronous.
> >>> > >>> > This means when you fire a scan call, until certain number of
> >>> records
> >>> > >>> are
> >>> > >>> > received at client end, the function blocks.
> >>> > >>> > This causes a lot of problems in the current thread as it might
> >>> just
> >>> > >>> get
> >>> > >>> > blocked for a long period of time.
> >>> > >>> > Plus, there are always network related latency to add to the
> >>> problem.
> >>> > >>> >
> >>> > >>> > Usually the way to deal with this is to fire scan like queries
> >>> on a
> >>> > >>> > separate thread and then consume the results in the main
> thread.
> >>> > >>> >
> >>> > >>> > Please take care of this scenario while implementation of scan
> >>> > >>> operator.
> >>> > >>> >
> >>> > >>> > -Chinmay.
> >>> > >>> >
> >>> > >>> >
> >>> > >>> > ~ Chinmay.
> >>> > >>> >
> >>> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> >>> > >>> > sandeep@datatorrent.com>
> >>> > >>> > wrote:
> >>> > >>> >
> >>> > >>> > > +1 for this Bhupesh.
> >>> > >>> > >
> >>> > >>> > > Additionally, I would suggest to add support for;
> >>> > >>> > > 1. Point query
> >>> > >>> > > 2. Returning any row version
> >>> > >>> > >
> >>> > >>> > > The above two are key features of HBase and should be
> >>> supported.
> >>> > >>> > >
> >>> > >>> > > Regards,
> >>> > >>> > > Sandeep
> >>> > >>> > >
> >>> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> >>> > >>> bhupesh@datatorrent.com
> >>> > >>> > >
> >>> > >>> > > wrote:
> >>> > >>> > >
> >>> > >>> > > > Hi All,
> >>> > >>> > > >
> >>> > >>> > > > The current HBasePOJOInputOperator does not allow us to do
> >>> the
> >>> > >>> > following:
> >>> > >>> > > >
> >>> > >>> > > >    1. Allow us to specify a set of "column family: column"
> >>> and
> >>> > >>> fetch
> >>> > >>> > data
> >>> > >>> > > >    only for these columns.
> >>> > >>> > > >    2. Output format is currently a POJO. We need to have
> >>> other
> >>> > >>> output
> >>> > >>> > > >    formats such that "columnFamily:column" representation
> is
> >>> > >>> supported.
> >>> > >>> > > > Map /
> >>> > >>> > > >    CSV are some of the options.
> >>> > >>> > > >    3. Allow specifying "end row-key" to stop scanning a
> >>> table.
> >>> > >>> > > >    4. No metrics.
> >>> > >>> > > >
> >>> > >>> > > > I am planning to add the above functionality to the HBase
> >>> Input
> >>> > >>> > > operators.
> >>> > >>> > > > These features may go into the HBaseScanOperator /
> >>> > >>> > > HBasePOJOInputOperator.
> >>> > >>> > > >
> >>> > >>> > > > Please let me know your comments.
> >>> > >>> > > >
> >>> > >>> > > > Thanks.
> >>> > >>> > > >
> >>> > >>> > > > Bhupesh
> >>> > >>> > > >
> >>> > >>> > >
> >>> > >>> >
> >>> > >>>
> >>> > >>
> >>> > >>
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Bhupesh Chawda <bh...@datatorrent.com>.

Dear Community,

Can anyone help review the pull request:
https://github.com/apache/incubator-apex-malhar/pull/212

Thanks.

~Bhupesh

On Thu, Mar 17, 2016 at 4:16 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi,
>
> I have opened a pull request for the changes as described in the previous
> emails. Here is the pull request:
> https://github.com/apache/incubator-apex-malhar/pull/212
>
> Here is a short description of the changes:
>
> HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> of HBaseOperatorBase.
> HBaseScanOperator - Takes care of scanning the table in a non-blocking
> manner. Exposes operationScan() and getTuple() as before.
> HBasePOJOInputOperator - Implements operationScan() and getTuple() and
> outputs a POJO on the output port.
>
> Please help review these changes.
>
> Thanks
> ~Bhupesh
>
> On Fri, Mar 11, 2016 at 4:42 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Hi All,
>>
>> In the current design of HBase input and output operators, the row key is
>> hard-coded to be of String type.
>> I foresee the following issue:
>>
>>    - In case of numeric keys which are type casted to String, *incremental
>>    read* is problematic. For example, after reading key = 9, we may not
>>    be able to read any record with say, key = 8888, when though numerically
>>    8888 > 9, lexicographically "9" > "8888".
>>    - This is the case only when data is being written to HBase and being
>>    read from simultaneously.
>>
>> My suggestion is to parametrize the type of row key in the HBase input
>> and output operators, and let the user instantiate the required type for
>> row key. We can have default implementations for String and/ or Long. By
>> parametrizing the row key type, the user can even use complex row keys
>> which are a combination of multiple fields.
>>
>> Thoughts?
>>
>> PS: I understand that there is a performance concern in making a
>> monotonically increasing key as the row key. Given that, how do we address
>> the incremental read scenario?
>>
>> Thanks
>>
>> -Bhupesh
>>
>> On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <
>> sandeep@datatorrent.com> wrote:
>>
>>> Looks fine to me.
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bhupesh@datatorrent.com
>>> >
>>> wrote:
>>>
>>> > Here is the final hierarchy I am considering:
>>> >
>>> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got
>>> rid
>>> > of HBaseOperatorBase.
>>> >     HBaseScanOperator - Takes care of scanning the table in a
>>> non-blocking
>>> > manner. Exposes operationScan() and getTuple() as before.
>>> >         HBasePOJOInputOperator - Implements operationScan() and
>>> getTuple()
>>> > and outputs a POJO on the output port.
>>> >
>>> > Comments?
>>> >
>>> > -Bhupesh
>>> >
>>> >
>>> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com>
>>> > wrote:
>>> >
>>> > > The class HBaseInputOperator seems to be quite old. HBaseStore seems
>>> to
>>> > be
>>> > > having all the functionality provided by HBaseInputOperator and even
>>> more
>>> > > (including Kerberos authentication).
>>> > >
>>> > > It would be a good idea to avoid the usage of HBaseInputOperator
>>> going
>>> > > forward and use HBaseStore instead.
>>> > >
>>> > > I will also work on abstracting out the HBase input functionality in
>>> the
>>> > > HBaseInputOperator, which can be extended by concrete
>>> implementations.
>>> > >
>>> > > -Bhupesh
>>> > >
>>> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com
>>> > >
>>> > > wrote:
>>> > >
>>> > >> Thanks for the inputs.
>>> > >> As an input operator, I am targeting just the Scan operation. Get
>>> > >> operation may be supported better as a generic operator (like a
>>> query
>>> > >> operator) which I can take up later.
>>> > >>
>>> > >> -Bhupesh
>>> > >>
>>> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <
>>> mohit@datatorrent.com>
>>> > >> wrote:
>>> > >>
>>> > >>> +1
>>> > >>>
>>> > >>> Regards,
>>> > >>> Mohit
>>> > >>>
>>> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>>> > >>> chinmay@datatorrent.com
>>> > >>> > wrote:
>>> > >>>
>>> > >>> > +1 for above.
>>> > >>> > I see that there is HbaseGetOperator but but its abstract no
>>> concrete
>>> > >>> > implementation of this I can find.
>>> > >>> > Are you going to implement of that too?
>>> > >>> >
>>> > >>> > Maybe the concrete implementation of HbaseGetOperator should have
>>> > this.
>>> > >>> >
>>> > >>> > Also, I want to mention one thing about scan from my previous
>>> > >>> experience of
>>> > >>> > Hbase. The Hbase client is synchronous.
>>> > >>> > This means when you fire a scan call, until certain number of
>>> records
>>> > >>> are
>>> > >>> > received at client end, the function blocks.
>>> > >>> > This causes a lot of problems in the current thread as it might
>>> just
>>> > >>> get
>>> > >>> > blocked for a long period of time.
>>> > >>> > Plus, there are always network related latency to add to the
>>> problem.
>>> > >>> >
>>> > >>> > Usually the way to deal with this is to fire scan like queries
>>> on a
>>> > >>> > separate thread and then consume the results in the main thread.
>>> > >>> >
>>> > >>> > Please take care of this scenario while implementation of scan
>>> > >>> operator.
>>> > >>> >
>>> > >>> > -Chinmay.
>>> > >>> >
>>> > >>> >
>>> > >>> > ~ Chinmay.
>>> > >>> >
>>> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>>> > >>> > sandeep@datatorrent.com>
>>> > >>> > wrote:
>>> > >>> >
>>> > >>> > > +1 for this Bhupesh.
>>> > >>> > >
>>> > >>> > > Additionally, I would suggest to add support for;
>>> > >>> > > 1. Point query
>>> > >>> > > 2. Returning any row version
>>> > >>> > >
>>> > >>> > > The above two are key features of HBase and should be
>>> supported.
>>> > >>> > >
>>> > >>> > > Regards,
>>> > >>> > > Sandeep
>>> > >>> > >
>>> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>>> > >>> bhupesh@datatorrent.com
>>> > >>> > >
>>> > >>> > > wrote:
>>> > >>> > >
>>> > >>> > > > Hi All,
>>> > >>> > > >
>>> > >>> > > > The current HBasePOJOInputOperator does not allow us to do
>>> the
>>> > >>> > following:
>>> > >>> > > >
>>> > >>> > > >    1. Allow us to specify a set of "column family: column"
>>> and
>>> > >>> fetch
>>> > >>> > data
>>> > >>> > > >    only for these columns.
>>> > >>> > > >    2. Output format is currently a POJO. We need to have
>>> other
>>> > >>> output
>>> > >>> > > >    formats such that "columnFamily:column" representation is
>>> > >>> supported.
>>> > >>> > > > Map /
>>> > >>> > > >    CSV are some of the options.
>>> > >>> > > >    3. Allow specifying "end row-key" to stop scanning a
>>> table.
>>> > >>> > > >    4. No metrics.
>>> > >>> > > >
>>> > >>> > > > I am planning to add the above functionality to the HBase
>>> Input
>>> > >>> > > operators.
>>> > >>> > > > These features may go into the HBaseScanOperator /
>>> > >>> > > HBasePOJOInputOperator.
>>> > >>> > > >
>>> > >>> > > > Please let me know your comments.
>>> > >>> > > >
>>> > >>> > > > Thanks.
>>> > >>> > > >
>>> > >>> > > > Bhupesh
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Bhupesh Chawda <bh...@datatorrent.com>.

Hi,

I have opened a pull request for the changes as described in the previous
emails. Here is the pull request:
https://github.com/apache/incubator-apex-malhar/pull/212

Here is a short description of the changes:

HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
of HBaseOperatorBase.
HBaseScanOperator - Takes care of scanning the table in a non-blocking
manner. Exposes operationScan() and getTuple() as before.
HBasePOJOInputOperator - Implements operationScan() and getTuple() and
outputs a POJO on the output port.

Please help review these changes.

Thanks
~Bhupesh

On Fri, Mar 11, 2016 at 4:42 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi All,
>
> In the current design of HBase input and output operators, the row key is
> hard-coded to be of String type.
> I foresee the following issue:
>
>    - In case of numeric keys which are type casted to String, *incremental
>    read* is problematic. For example, after reading key = 9, we may not
>    be able to read any record with say, key = 8888, when though numerically
>    8888 > 9, lexicographically "9" > "8888".
>    - This is the case only when data is being written to HBase and being
>    read from simultaneously.
>
> My suggestion is to parametrize the type of row key in the HBase input and
> output operators, and let the user instantiate the required type for row
> key. We can have default implementations for String and/ or Long. By
> parametrizing the row key type, the user can even use complex row keys
> which are a combination of multiple fields.
>
> Thoughts?
>
> PS: I understand that there is a performance concern in making a
> monotonically increasing key as the row key. Given that, how do we address
> the incremental read scenario?
>
> Thanks
>
> -Bhupesh
>
> On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <sandeep@datatorrent.com
> > wrote:
>
>> Looks fine to me.
>>
>> Regards,
>> Sandeep
>>
>> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
>> wrote:
>>
>> > Here is the final hierarchy I am considering:
>> >
>> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got
>> rid
>> > of HBaseOperatorBase.
>> >     HBaseScanOperator - Takes care of scanning the table in a
>> non-blocking
>> > manner. Exposes operationScan() and getTuple() as before.
>> >         HBasePOJOInputOperator - Implements operationScan() and
>> getTuple()
>> > and outputs a POJO on the output port.
>> >
>> > Comments?
>> >
>> > -Bhupesh
>> >
>> >
>> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com>
>> > wrote:
>> >
>> > > The class HBaseInputOperator seems to be quite old. HBaseStore seems
>> to
>> > be
>> > > having all the functionality provided by HBaseInputOperator and even
>> more
>> > > (including Kerberos authentication).
>> > >
>> > > It would be a good idea to avoid the usage of HBaseInputOperator going
>> > > forward and use HBaseStore instead.
>> > >
>> > > I will also work on abstracting out the HBase input functionality in
>> the
>> > > HBaseInputOperator, which can be extended by concrete implementations.
>> > >
>> > > -Bhupesh
>> > >
>> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com
>> > >
>> > > wrote:
>> > >
>> > >> Thanks for the inputs.
>> > >> As an input operator, I am targeting just the Scan operation. Get
>> > >> operation may be supported better as a generic operator (like a query
>> > >> operator) which I can take up later.
>> > >>
>> > >> -Bhupesh
>> > >>
>> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <
>> mohit@datatorrent.com>
>> > >> wrote:
>> > >>
>> > >>> +1
>> > >>>
>> > >>> Regards,
>> > >>> Mohit
>> > >>>
>> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>> > >>> chinmay@datatorrent.com
>> > >>> > wrote:
>> > >>>
>> > >>> > +1 for above.
>> > >>> > I see that there is HbaseGetOperator but but its abstract no
>> concrete
>> > >>> > implementation of this I can find.
>> > >>> > Are you going to implement of that too?
>> > >>> >
>> > >>> > Maybe the concrete implementation of HbaseGetOperator should have
>> > this.
>> > >>> >
>> > >>> > Also, I want to mention one thing about scan from my previous
>> > >>> experience of
>> > >>> > Hbase. The Hbase client is synchronous.
>> > >>> > This means when you fire a scan call, until certain number of
>> records
>> > >>> are
>> > >>> > received at client end, the function blocks.
>> > >>> > This causes a lot of problems in the current thread as it might
>> just
>> > >>> get
>> > >>> > blocked for a long period of time.
>> > >>> > Plus, there are always network related latency to add to the
>> problem.
>> > >>> >
>> > >>> > Usually the way to deal with this is to fire scan like queries on
>> a
>> > >>> > separate thread and then consume the results in the main thread.
>> > >>> >
>> > >>> > Please take care of this scenario while implementation of scan
>> > >>> operator.
>> > >>> >
>> > >>> > -Chinmay.
>> > >>> >
>> > >>> >
>> > >>> > ~ Chinmay.
>> > >>> >
>> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>> > >>> > sandeep@datatorrent.com>
>> > >>> > wrote:
>> > >>> >
>> > >>> > > +1 for this Bhupesh.
>> > >>> > >
>> > >>> > > Additionally, I would suggest to add support for;
>> > >>> > > 1. Point query
>> > >>> > > 2. Returning any row version
>> > >>> > >
>> > >>> > > The above two are key features of HBase and should be supported.
>> > >>> > >
>> > >>> > > Regards,
>> > >>> > > Sandeep
>> > >>> > >
>> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>> > >>> bhupesh@datatorrent.com
>> > >>> > >
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Hi All,
>> > >>> > > >
>> > >>> > > > The current HBasePOJOInputOperator does not allow us to do the
>> > >>> > following:
>> > >>> > > >
>> > >>> > > >    1. Allow us to specify a set of "column family: column" and
>> > >>> fetch
>> > >>> > data
>> > >>> > > >    only for these columns.
>> > >>> > > >    2. Output format is currently a POJO. We need to have other
>> > >>> output
>> > >>> > > >    formats such that "columnFamily:column" representation is
>> > >>> supported.
>> > >>> > > > Map /
>> > >>> > > >    CSV are some of the options.
>> > >>> > > >    3. Allow specifying "end row-key" to stop scanning a table.
>> > >>> > > >    4. No metrics.
>> > >>> > > >
>> > >>> > > > I am planning to add the above functionality to the HBase
>> Input
>> > >>> > > operators.
>> > >>> > > > These features may go into the HBaseScanOperator /
>> > >>> > > HBasePOJOInputOperator.
>> > >>> > > >
>> > >>> > > > Please let me know your comments.
>> > >>> > > >
>> > >>> > > > Thanks.
>> > >>> > > >
>> > >>> > > > Bhupesh
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Bhupesh Chawda <bh...@datatorrent.com>.

Hi All,

In the current design of HBase input and output operators, the row key is
hard-coded to be of String type.
I foresee the following issue:

   - In case of numeric keys which are type casted to String, *incremental
   read* is problematic. For example, after reading key = 9, we may not be
   able to read any record with say, key = 8888, when though numerically 8888
   > 9, lexicographically "9" > "8888".
   - This is the case only when data is being written to HBase and being
   read from simultaneously.

My suggestion is to parametrize the type of row key in the HBase input and
output operators, and let the user instantiate the required type for row
key. We can have default implementations for String and/ or Long. By
parametrizing the row key type, the user can even use complex row keys
which are a combination of multiple fields.

Thoughts?

PS: I understand that there is a performance concern in making a
monotonically increasing key as the row key. Given that, how do we address
the incremental read scenario?

Thanks

-Bhupesh

On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> Looks fine to me.
>
> Regards,
> Sandeep
>
> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Here is the final hierarchy I am considering:
> >
> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> > of HBaseOperatorBase.
> >     HBaseScanOperator - Takes care of scanning the table in a
> non-blocking
> > manner. Exposes operationScan() and getTuple() as before.
> >         HBasePOJOInputOperator - Implements operationScan() and
> getTuple()
> > and outputs a POJO on the output port.
> >
> > Comments?
> >
> > -Bhupesh
> >
> >
> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > The class HBaseInputOperator seems to be quite old. HBaseStore seems to
> > be
> > > having all the functionality provided by HBaseInputOperator and even
> more
> > > (including Kerberos authentication).
> > >
> > > It would be a good idea to avoid the usage of HBaseInputOperator going
> > > forward and use HBaseStore instead.
> > >
> > > I will also work on abstracting out the HBase input functionality in
> the
> > > HBaseInputOperator, which can be extended by concrete implementations.
> > >
> > > -Bhupesh
> > >
> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > >> Thanks for the inputs.
> > >> As an input operator, I am targeting just the Scan operation. Get
> > >> operation may be supported better as a generic operator (like a query
> > >> operator) which I can take up later.
> > >>
> > >> -Bhupesh
> > >>
> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mohit@datatorrent.com
> >
> > >> wrote:
> > >>
> > >>> +1
> > >>>
> > >>> Regards,
> > >>> Mohit
> > >>>
> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> > >>> chinmay@datatorrent.com
> > >>> > wrote:
> > >>>
> > >>> > +1 for above.
> > >>> > I see that there is HbaseGetOperator but but its abstract no
> concrete
> > >>> > implementation of this I can find.
> > >>> > Are you going to implement of that too?
> > >>> >
> > >>> > Maybe the concrete implementation of HbaseGetOperator should have
> > this.
> > >>> >
> > >>> > Also, I want to mention one thing about scan from my previous
> > >>> experience of
> > >>> > Hbase. The Hbase client is synchronous.
> > >>> > This means when you fire a scan call, until certain number of
> records
> > >>> are
> > >>> > received at client end, the function blocks.
> > >>> > This causes a lot of problems in the current thread as it might
> just
> > >>> get
> > >>> > blocked for a long period of time.
> > >>> > Plus, there are always network related latency to add to the
> problem.
> > >>> >
> > >>> > Usually the way to deal with this is to fire scan like queries on a
> > >>> > separate thread and then consume the results in the main thread.
> > >>> >
> > >>> > Please take care of this scenario while implementation of scan
> > >>> operator.
> > >>> >
> > >>> > -Chinmay.
> > >>> >
> > >>> >
> > >>> > ~ Chinmay.
> > >>> >
> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> > >>> > sandeep@datatorrent.com>
> > >>> > wrote:
> > >>> >
> > >>> > > +1 for this Bhupesh.
> > >>> > >
> > >>> > > Additionally, I would suggest to add support for;
> > >>> > > 1. Point query
> > >>> > > 2. Returning any row version
> > >>> > >
> > >>> > > The above two are key features of HBase and should be supported.
> > >>> > >
> > >>> > > Regards,
> > >>> > > Sandeep
> > >>> > >
> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> > >>> bhupesh@datatorrent.com
> > >>> > >
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Hi All,
> > >>> > > >
> > >>> > > > The current HBasePOJOInputOperator does not allow us to do the
> > >>> > following:
> > >>> > > >
> > >>> > > >    1. Allow us to specify a set of "column family: column" and
> > >>> fetch
> > >>> > data
> > >>> > > >    only for these columns.
> > >>> > > >    2. Output format is currently a POJO. We need to have other
> > >>> output
> > >>> > > >    formats such that "columnFamily:column" representation is
> > >>> supported.
> > >>> > > > Map /
> > >>> > > >    CSV are some of the options.
> > >>> > > >    3. Allow specifying "end row-key" to stop scanning a table.
> > >>> > > >    4. No metrics.
> > >>> > > >
> > >>> > > > I am planning to add the above functionality to the HBase Input
> > >>> > > operators.
> > >>> > > > These features may go into the HBaseScanOperator /
> > >>> > > HBasePOJOInputOperator.
> > >>> > > >
> > >>> > > > Please let me know your comments.
> > >>> > > >
> > >>> > > > Thanks.
> > >>> > > >
> > >>> > > > Bhupesh
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.

Looks fine to me.

Regards,
Sandeep

On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Here is the final hierarchy I am considering:
>
> HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> of HBaseOperatorBase.
>     HBaseScanOperator - Takes care of scanning the table in a non-blocking
> manner. Exposes operationScan() and getTuple() as before.
>         HBasePOJOInputOperator - Implements operationScan() and getTuple()
> and outputs a POJO on the output port.
>
> Comments?
>
> -Bhupesh
>
>
> On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > The class HBaseInputOperator seems to be quite old. HBaseStore seems to
> be
> > having all the functionality provided by HBaseInputOperator and even more
> > (including Kerberos authentication).
> >
> > It would be a good idea to avoid the usage of HBaseInputOperator going
> > forward and use HBaseStore instead.
> >
> > I will also work on abstracting out the HBase input functionality in the
> > HBaseInputOperator, which can be extended by concrete implementations.
> >
> > -Bhupesh
> >
> > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> >> Thanks for the inputs.
> >> As an input operator, I am targeting just the Scan operation. Get
> >> operation may be supported better as a generic operator (like a query
> >> operator) which I can take up later.
> >>
> >> -Bhupesh
> >>
> >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
> >> wrote:
> >>
> >>> +1
> >>>
> >>> Regards,
> >>> Mohit
> >>>
> >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> >>> chinmay@datatorrent.com
> >>> > wrote:
> >>>
> >>> > +1 for above.
> >>> > I see that there is HbaseGetOperator but but its abstract no concrete
> >>> > implementation of this I can find.
> >>> > Are you going to implement of that too?
> >>> >
> >>> > Maybe the concrete implementation of HbaseGetOperator should have
> this.
> >>> >
> >>> > Also, I want to mention one thing about scan from my previous
> >>> experience of
> >>> > Hbase. The Hbase client is synchronous.
> >>> > This means when you fire a scan call, until certain number of records
> >>> are
> >>> > received at client end, the function blocks.
> >>> > This causes a lot of problems in the current thread as it might just
> >>> get
> >>> > blocked for a long period of time.
> >>> > Plus, there are always network related latency to add to the problem.
> >>> >
> >>> > Usually the way to deal with this is to fire scan like queries on a
> >>> > separate thread and then consume the results in the main thread.
> >>> >
> >>> > Please take care of this scenario while implementation of scan
> >>> operator.
> >>> >
> >>> > -Chinmay.
> >>> >
> >>> >
> >>> > ~ Chinmay.
> >>> >
> >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> >>> > sandeep@datatorrent.com>
> >>> > wrote:
> >>> >
> >>> > > +1 for this Bhupesh.
> >>> > >
> >>> > > Additionally, I would suggest to add support for;
> >>> > > 1. Point query
> >>> > > 2. Returning any row version
> >>> > >
> >>> > > The above two are key features of HBase and should be supported.
> >>> > >
> >>> > > Regards,
> >>> > > Sandeep
> >>> > >
> >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> >>> bhupesh@datatorrent.com
> >>> > >
> >>> > > wrote:
> >>> > >
> >>> > > > Hi All,
> >>> > > >
> >>> > > > The current HBasePOJOInputOperator does not allow us to do the
> >>> > following:
> >>> > > >
> >>> > > >    1. Allow us to specify a set of "column family: column" and
> >>> fetch
> >>> > data
> >>> > > >    only for these columns.
> >>> > > >    2. Output format is currently a POJO. We need to have other
> >>> output
> >>> > > >    formats such that "columnFamily:column" representation is
> >>> supported.
> >>> > > > Map /
> >>> > > >    CSV are some of the options.
> >>> > > >    3. Allow specifying "end row-key" to stop scanning a table.
> >>> > > >    4. No metrics.
> >>> > > >
> >>> > > > I am planning to add the above functionality to the HBase Input
> >>> > > operators.
> >>> > > > These features may go into the HBaseScanOperator /
> >>> > > HBasePOJOInputOperator.
> >>> > > >
> >>> > > > Please let me know your comments.
> >>> > > >
> >>> > > > Thanks.
> >>> > > >
> >>> > > > Bhupesh
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Bhupesh Chawda <bh...@datatorrent.com>.

Here is the final hierarchy I am considering:

HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
of HBaseOperatorBase.
    HBaseScanOperator - Takes care of scanning the table in a non-blocking
manner. Exposes operationScan() and getTuple() as before.
        HBasePOJOInputOperator - Implements operationScan() and getTuple()
and outputs a POJO on the output port.

Comments?

-Bhupesh


On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> The class HBaseInputOperator seems to be quite old. HBaseStore seems to be
> having all the functionality provided by HBaseInputOperator and even more
> (including Kerberos authentication).
>
> It would be a good idea to avoid the usage of HBaseInputOperator going
> forward and use HBaseStore instead.
>
> I will also work on abstracting out the HBase input functionality in the
> HBaseInputOperator, which can be extended by concrete implementations.
>
> -Bhupesh
>
> On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Thanks for the inputs.
>> As an input operator, I am targeting just the Scan operation. Get
>> operation may be supported better as a generic operator (like a query
>> operator) which I can take up later.
>>
>> -Bhupesh
>>
>> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
>> wrote:
>>
>>> +1
>>>
>>> Regards,
>>> Mohit
>>>
>>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>>> chinmay@datatorrent.com
>>> > wrote:
>>>
>>> > +1 for above.
>>> > I see that there is HbaseGetOperator but but its abstract no concrete
>>> > implementation of this I can find.
>>> > Are you going to implement of that too?
>>> >
>>> > Maybe the concrete implementation of HbaseGetOperator should have this.
>>> >
>>> > Also, I want to mention one thing about scan from my previous
>>> experience of
>>> > Hbase. The Hbase client is synchronous.
>>> > This means when you fire a scan call, until certain number of records
>>> are
>>> > received at client end, the function blocks.
>>> > This causes a lot of problems in the current thread as it might just
>>> get
>>> > blocked for a long period of time.
>>> > Plus, there are always network related latency to add to the problem.
>>> >
>>> > Usually the way to deal with this is to fire scan like queries on a
>>> > separate thread and then consume the results in the main thread.
>>> >
>>> > Please take care of this scenario while implementation of scan
>>> operator.
>>> >
>>> > -Chinmay.
>>> >
>>> >
>>> > ~ Chinmay.
>>> >
>>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>>> > sandeep@datatorrent.com>
>>> > wrote:
>>> >
>>> > > +1 for this Bhupesh.
>>> > >
>>> > > Additionally, I would suggest to add support for;
>>> > > 1. Point query
>>> > > 2. Returning any row version
>>> > >
>>> > > The above two are key features of HBase and should be supported.
>>> > >
>>> > > Regards,
>>> > > Sandeep
>>> > >
>>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hi All,
>>> > > >
>>> > > > The current HBasePOJOInputOperator does not allow us to do the
>>> > following:
>>> > > >
>>> > > >    1. Allow us to specify a set of "column family: column" and
>>> fetch
>>> > data
>>> > > >    only for these columns.
>>> > > >    2. Output format is currently a POJO. We need to have other
>>> output
>>> > > >    formats such that "columnFamily:column" representation is
>>> supported.
>>> > > > Map /
>>> > > >    CSV are some of the options.
>>> > > >    3. Allow specifying "end row-key" to stop scanning a table.
>>> > > >    4. No metrics.
>>> > > >
>>> > > > I am planning to add the above functionality to the HBase Input
>>> > > operators.
>>> > > > These features may go into the HBaseScanOperator /
>>> > > HBasePOJOInputOperator.
>>> > > >
>>> > > > Please let me know your comments.
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > Bhupesh
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Bhupesh Chawda <bh...@datatorrent.com>.

The class HBaseInputOperator seems to be quite old. HBaseStore seems to be
having all the functionality provided by HBaseInputOperator and even more
(including Kerberos authentication).

It would be a good idea to avoid the usage of HBaseInputOperator going
forward and use HBaseStore instead.

I will also work on abstracting out the HBase input functionality in the
HBaseInputOperator, which can be extended by concrete implementations.

-Bhupesh

On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Thanks for the inputs.
> As an input operator, I am targeting just the Scan operation. Get
> operation may be supported better as a generic operator (like a query
> operator) which I can take up later.
>
> -Bhupesh
>
> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
> wrote:
>
>> +1
>>
>> Regards,
>> Mohit
>>
>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>> chinmay@datatorrent.com
>> > wrote:
>>
>> > +1 for above.
>> > I see that there is HbaseGetOperator but but its abstract no concrete
>> > implementation of this I can find.
>> > Are you going to implement of that too?
>> >
>> > Maybe the concrete implementation of HbaseGetOperator should have this.
>> >
>> > Also, I want to mention one thing about scan from my previous
>> experience of
>> > Hbase. The Hbase client is synchronous.
>> > This means when you fire a scan call, until certain number of records
>> are
>> > received at client end, the function blocks.
>> > This causes a lot of problems in the current thread as it might just get
>> > blocked for a long period of time.
>> > Plus, there are always network related latency to add to the problem.
>> >
>> > Usually the way to deal with this is to fire scan like queries on a
>> > separate thread and then consume the results in the main thread.
>> >
>> > Please take care of this scenario while implementation of scan operator.
>> >
>> > -Chinmay.
>> >
>> >
>> > ~ Chinmay.
>> >
>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>> > sandeep@datatorrent.com>
>> > wrote:
>> >
>> > > +1 for this Bhupesh.
>> > >
>> > > Additionally, I would suggest to add support for;
>> > > 1. Point query
>> > > 2. Returning any row version
>> > >
>> > > The above two are key features of HBase and should be supported.
>> > >
>> > > Regards,
>> > > Sandeep
>> > >
>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com
>> > >
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > The current HBasePOJOInputOperator does not allow us to do the
>> > following:
>> > > >
>> > > >    1. Allow us to specify a set of "column family: column" and fetch
>> > data
>> > > >    only for these columns.
>> > > >    2. Output format is currently a POJO. We need to have other
>> output
>> > > >    formats such that "columnFamily:column" representation is
>> supported.
>> > > > Map /
>> > > >    CSV are some of the options.
>> > > >    3. Allow specifying "end row-key" to stop scanning a table.
>> > > >    4. No metrics.
>> > > >
>> > > > I am planning to add the above functionality to the HBase Input
>> > > operators.
>> > > > These features may go into the HBaseScanOperator /
>> > > HBasePOJOInputOperator.
>> > > >
>> > > > Please let me know your comments.
>> > > >
>> > > > Thanks.
>> > > >
>> > > > Bhupesh
>> > > >
>> > >
>> >
>>
>
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Bhupesh Chawda <bh...@datatorrent.com>.

Thanks for the inputs.
As an input operator, I am targeting just the Scan operation. Get operation
may be supported better as a generic operator (like a query operator) which
I can take up later.

-Bhupesh

On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
wrote:

> +1
>
> Regards,
> Mohit
>
> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> chinmay@datatorrent.com
> > wrote:
>
> > +1 for above.
> > I see that there is HbaseGetOperator but but its abstract no concrete
> > implementation of this I can find.
> > Are you going to implement of that too?
> >
> > Maybe the concrete implementation of HbaseGetOperator should have this.
> >
> > Also, I want to mention one thing about scan from my previous experience
> of
> > Hbase. The Hbase client is synchronous.
> > This means when you fire a scan call, until certain number of records are
> > received at client end, the function blocks.
> > This causes a lot of problems in the current thread as it might just get
> > blocked for a long period of time.
> > Plus, there are always network related latency to add to the problem.
> >
> > Usually the way to deal with this is to fire scan like queries on a
> > separate thread and then consume the results in the main thread.
> >
> > Please take care of this scenario while implementation of scan operator.
> >
> > -Chinmay.
> >
> >
> > ~ Chinmay.
> >
> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> > sandeep@datatorrent.com>
> > wrote:
> >
> > > +1 for this Bhupesh.
> > >
> > > Additionally, I would suggest to add support for;
> > > 1. Point query
> > > 2. Returning any row version
> > >
> > > The above two are key features of HBase and should be supported.
> > >
> > > Regards,
> > > Sandeep
> > >
> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > The current HBasePOJOInputOperator does not allow us to do the
> > following:
> > > >
> > > >    1. Allow us to specify a set of "column family: column" and fetch
> > data
> > > >    only for these columns.
> > > >    2. Output format is currently a POJO. We need to have other output
> > > >    formats such that "columnFamily:column" representation is
> supported.
> > > > Map /
> > > >    CSV are some of the options.
> > > >    3. Allow specifying "end row-key" to stop scanning a table.
> > > >    4. No metrics.
> > > >
> > > > I am planning to add the above functionality to the HBase Input
> > > operators.
> > > > These features may go into the HBaseScanOperator /
> > > HBasePOJOInputOperator.
> > > >
> > > > Please let me know your comments.
> > > >
> > > > Thanks.
> > > >
> > > > Bhupesh
> > > >
> > >
> >
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Mohit Jotwani <mo...@datatorrent.com>.

+1

Regards,
Mohit

On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <chinmay@datatorrent.com
> wrote:

> +1 for above.
> I see that there is HbaseGetOperator but but its abstract no concrete
> implementation of this I can find.
> Are you going to implement of that too?
>
> Maybe the concrete implementation of HbaseGetOperator should have this.
>
> Also, I want to mention one thing about scan from my previous experience of
> Hbase. The Hbase client is synchronous.
> This means when you fire a scan call, until certain number of records are
> received at client end, the function blocks.
> This causes a lot of problems in the current thread as it might just get
> blocked for a long period of time.
> Plus, there are always network related latency to add to the problem.
>
> Usually the way to deal with this is to fire scan like queries on a
> separate thread and then consume the results in the main thread.
>
> Please take care of this scenario while implementation of scan operator.
>
> -Chinmay.
>
>
> ~ Chinmay.
>
> On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> sandeep@datatorrent.com>
> wrote:
>
> > +1 for this Bhupesh.
> >
> > Additionally, I would suggest to add support for;
> > 1. Point query
> > 2. Returning any row version
> >
> > The above two are key features of HBase and should be supported.
> >
> > Regards,
> > Sandeep
> >
> > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > The current HBasePOJOInputOperator does not allow us to do the
> following:
> > >
> > >    1. Allow us to specify a set of "column family: column" and fetch
> data
> > >    only for these columns.
> > >    2. Output format is currently a POJO. We need to have other output
> > >    formats such that "columnFamily:column" representation is supported.
> > > Map /
> > >    CSV are some of the options.
> > >    3. Allow specifying "end row-key" to stop scanning a table.
> > >    4. No metrics.
> > >
> > > I am planning to add the above functionality to the HBase Input
> > operators.
> > > These features may go into the HBaseScanOperator /
> > HBasePOJOInputOperator.
> > >
> > > Please let me know your comments.
> > >
> > > Thanks.
> > >
> > > Bhupesh
> > >
> >
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Chinmay Kolhatkar <ch...@datatorrent.com>.

+1 for above.
I see that there is HbaseGetOperator but but its abstract no concrete
implementation of this I can find.
Are you going to implement of that too?

Maybe the concrete implementation of HbaseGetOperator should have this.

Also, I want to mention one thing about scan from my previous experience of
Hbase. The Hbase client is synchronous.
This means when you fire a scan call, until certain number of records are
received at client end, the function blocks.
This causes a lot of problems in the current thread as it might just get
blocked for a long period of time.
Plus, there are always network related latency to add to the problem.

Usually the way to deal with this is to fire scan like queries on a
separate thread and then consume the results in the main thread.

Please take care of this scenario while implementation of scan operator.

-Chinmay.

~ Chinmay.

On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> +1 for this Bhupesh.
>
> Additionally, I would suggest to add support for;
> 1. Point query
> 2. Returning any row version
>
> The above two are key features of HBase and should be supported.
>
> Regards,
> Sandeep
>
> On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> > The current HBasePOJOInputOperator does not allow us to do the following:
> >
> >    1. Allow us to specify a set of "column family: column" and fetch data
> >    only for these columns.
> >    2. Output format is currently a POJO. We need to have other output
> >    formats such that "columnFamily:column" representation is supported.
> > Map /
> >    CSV are some of the options.
> >    3. Allow specifying "end row-key" to stop scanning a table.
> >    4. No metrics.
> >
> > I am planning to add the above functionality to the HBase Input
> operators.
> > These features may go into the HBaseScanOperator /
> HBasePOJOInputOperator.
> >
> > Please let me know your comments.
> >
> > Thanks.
> >
> > Bhupesh
> >
>

Re: Adding features to HBase Input Operators in Malhar-contrib

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.

+1 for this Bhupesh.

Additionally, I would suggest to add support for;
1. Point query
2. Returning any row version

The above two are key features of HBase and should be supported.

Regards,
Sandeep

On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi All,
>
> The current HBasePOJOInputOperator does not allow us to do the following:
>
>    1. Allow us to specify a set of "column family: column" and fetch data
>    only for these columns.
>    2. Output format is currently a POJO. We need to have other output
>    formats such that "columnFamily:column" representation is supported.
> Map /
>    CSV are some of the options.
>    3. Allow specifying "end row-key" to stop scanning a table.
>    4. No metrics.
>
> I am planning to add the above functionality to the HBase Input operators.
> These features may go into the HBaseScanOperator / HBasePOJOInputOperator.
>
> Please let me know your comments.
>
> Thanks.
>
> Bhupesh
>