You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Bhupesh Chawda <bh...@datatorrent.com> on 2015/12/18 12:09:38 UTC
Adding features to HBase Input Operators in Malhar-contrib
Hi All,
The current HBasePOJOInputOperator does not allow us to do the following:
1. Allow us to specify a set of "column family: column" and fetch data
only for these columns.
2. Output format is currently a POJO. We need to have other output
formats such that "columnFamily:column" representation is supported. Map /
CSV are some of the options.
3. Allow specifying "end row-key" to stop scanning a table.
4. No metrics.
I am planning to add the above functionality to the HBase Input operators.
These features may go into the HBaseScanOperator / HBasePOJOInputOperator.
Please let me know your comments.
Thanks.
Bhupesh
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
I shall do that in a day or two.
Regards,
Sandeep
On Thu, Mar 24, 2016 at 6:10 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:
> Dear Community,
>
> Can anyone help review the pull request:
> https://github.com/apache/incubator-apex-malhar/pull/212
>
> Thanks.
>
> ~Bhupesh
>
> On Thu, Mar 17, 2016 at 4:16 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi,
> >
> > I have opened a pull request for the changes as described in the previous
> > emails. Here is the pull request:
> > https://github.com/apache/incubator-apex-malhar/pull/212
> >
> > Here is a short description of the changes:
> >
> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> > of HBaseOperatorBase.
> > HBaseScanOperator - Takes care of scanning the table in a non-blocking
> > manner. Exposes operationScan() and getTuple() as before.
> > HBasePOJOInputOperator - Implements operationScan() and getTuple() and
> > outputs a POJO on the output port.
> >
> > Please help review these changes.
> >
> > Thanks
> > ~Bhupesh
> >
> > On Fri, Mar 11, 2016 at 4:42 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> >> Hi All,
> >>
> >> In the current design of HBase input and output operators, the row key
> is
> >> hard-coded to be of String type.
> >> I foresee the following issue:
> >>
> >> - In case of numeric keys which are type casted to String,
> *incremental
> >> read* is problematic. For example, after reading key = 9, we may not
> >> be able to read any record with say, key = 8888, when though
> numerically
> >> 8888 > 9, lexicographically "9" > "8888".
> >> - This is the case only when data is being written to HBase and being
> >> read from simultaneously.
> >>
> >> My suggestion is to parametrize the type of row key in the HBase input
> >> and output operators, and let the user instantiate the required type for
> >> row key. We can have default implementations for String and/ or Long. By
> >> parametrizing the row key type, the user can even use complex row keys
> >> which are a combination of multiple fields.
> >>
> >> Thoughts?
> >>
> >> PS: I understand that there is a performance concern in making a
> >> monotonically increasing key as the row key. Given that, how do we
> address
> >> the incremental read scenario?
> >>
> >> Thanks
> >>
> >> -Bhupesh
> >>
> >> On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <
> >> sandeep@datatorrent.com> wrote:
> >>
> >>> Looks fine to me.
> >>>
> >>> Regards,
> >>> Sandeep
> >>>
> >>> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> >>> >
> >>> wrote:
> >>>
> >>> > Here is the final hierarchy I am considering:
> >>> >
> >>> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got
> >>> rid
> >>> > of HBaseOperatorBase.
> >>> > HBaseScanOperator - Takes care of scanning the table in a
> >>> non-blocking
> >>> > manner. Exposes operationScan() and getTuple() as before.
> >>> > HBasePOJOInputOperator - Implements operationScan() and
> >>> getTuple()
> >>> > and outputs a POJO on the output port.
> >>> >
> >>> > Comments?
> >>> >
> >>> > -Bhupesh
> >>> >
> >>> >
> >>> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <
> >>> bhupesh@datatorrent.com>
> >>> > wrote:
> >>> >
> >>> > > The class HBaseInputOperator seems to be quite old. HBaseStore
> seems
> >>> to
> >>> > be
> >>> > > having all the functionality provided by HBaseInputOperator and
> even
> >>> more
> >>> > > (including Kerberos authentication).
> >>> > >
> >>> > > It would be a good idea to avoid the usage of HBaseInputOperator
> >>> going
> >>> > > forward and use HBaseStore instead.
> >>> > >
> >>> > > I will also work on abstracting out the HBase input functionality
> in
> >>> the
> >>> > > HBaseInputOperator, which can be extended by concrete
> >>> implementations.
> >>> > >
> >>> > > -Bhupesh
> >>> > >
> >>> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
> >>> bhupesh@datatorrent.com
> >>> > >
> >>> > > wrote:
> >>> > >
> >>> > >> Thanks for the inputs.
> >>> > >> As an input operator, I am targeting just the Scan operation. Get
> >>> > >> operation may be supported better as a generic operator (like a
> >>> query
> >>> > >> operator) which I can take up later.
> >>> > >>
> >>> > >> -Bhupesh
> >>> > >>
> >>> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <
> >>> mohit@datatorrent.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >>> +1
> >>> > >>>
> >>> > >>> Regards,
> >>> > >>> Mohit
> >>> > >>>
> >>> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> >>> > >>> chinmay@datatorrent.com
> >>> > >>> > wrote:
> >>> > >>>
> >>> > >>> > +1 for above.
> >>> > >>> > I see that there is HbaseGetOperator but but its abstract no
> >>> concrete
> >>> > >>> > implementation of this I can find.
> >>> > >>> > Are you going to implement of that too?
> >>> > >>> >
> >>> > >>> > Maybe the concrete implementation of HbaseGetOperator should
> have
> >>> > this.
> >>> > >>> >
> >>> > >>> > Also, I want to mention one thing about scan from my previous
> >>> > >>> experience of
> >>> > >>> > Hbase. The Hbase client is synchronous.
> >>> > >>> > This means when you fire a scan call, until certain number of
> >>> records
> >>> > >>> are
> >>> > >>> > received at client end, the function blocks.
> >>> > >>> > This causes a lot of problems in the current thread as it might
> >>> just
> >>> > >>> get
> >>> > >>> > blocked for a long period of time.
> >>> > >>> > Plus, there are always network related latency to add to the
> >>> problem.
> >>> > >>> >
> >>> > >>> > Usually the way to deal with this is to fire scan like queries
> >>> on a
> >>> > >>> > separate thread and then consume the results in the main
> thread.
> >>> > >>> >
> >>> > >>> > Please take care of this scenario while implementation of scan
> >>> > >>> operator.
> >>> > >>> >
> >>> > >>> > -Chinmay.
> >>> > >>> >
> >>> > >>> >
> >>> > >>> > ~ Chinmay.
> >>> > >>> >
> >>> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> >>> > >>> > sandeep@datatorrent.com>
> >>> > >>> > wrote:
> >>> > >>> >
> >>> > >>> > > +1 for this Bhupesh.
> >>> > >>> > >
> >>> > >>> > > Additionally, I would suggest to add support for;
> >>> > >>> > > 1. Point query
> >>> > >>> > > 2. Returning any row version
> >>> > >>> > >
> >>> > >>> > > The above two are key features of HBase and should be
> >>> supported.
> >>> > >>> > >
> >>> > >>> > > Regards,
> >>> > >>> > > Sandeep
> >>> > >>> > >
> >>> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> >>> > >>> bhupesh@datatorrent.com
> >>> > >>> > >
> >>> > >>> > > wrote:
> >>> > >>> > >
> >>> > >>> > > > Hi All,
> >>> > >>> > > >
> >>> > >>> > > > The current HBasePOJOInputOperator does not allow us to do
> >>> the
> >>> > >>> > following:
> >>> > >>> > > >
> >>> > >>> > > > 1. Allow us to specify a set of "column family: column"
> >>> and
> >>> > >>> fetch
> >>> > >>> > data
> >>> > >>> > > > only for these columns.
> >>> > >>> > > > 2. Output format is currently a POJO. We need to have
> >>> other
> >>> > >>> output
> >>> > >>> > > > formats such that "columnFamily:column" representation
> is
> >>> > >>> supported.
> >>> > >>> > > > Map /
> >>> > >>> > > > CSV are some of the options.
> >>> > >>> > > > 3. Allow specifying "end row-key" to stop scanning a
> >>> table.
> >>> > >>> > > > 4. No metrics.
> >>> > >>> > > >
> >>> > >>> > > > I am planning to add the above functionality to the HBase
> >>> Input
> >>> > >>> > > operators.
> >>> > >>> > > > These features may go into the HBaseScanOperator /
> >>> > >>> > > HBasePOJOInputOperator.
> >>> > >>> > > >
> >>> > >>> > > > Please let me know your comments.
> >>> > >>> > > >
> >>> > >>> > > > Thanks.
> >>> > >>> > > >
> >>> > >>> > > > Bhupesh
> >>> > >>> > > >
> >>> > >>> > >
> >>> > >>> >
> >>> > >>>
> >>> > >>
> >>> > >>
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Dear Community,
Can anyone help review the pull request:
https://github.com/apache/incubator-apex-malhar/pull/212
Thanks.
~Bhupesh
On Thu, Mar 17, 2016 at 4:16 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:
> Hi,
>
> I have opened a pull request for the changes as described in the previous
> emails. Here is the pull request:
> https://github.com/apache/incubator-apex-malhar/pull/212
>
> Here is a short description of the changes:
>
> HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> of HBaseOperatorBase.
> HBaseScanOperator - Takes care of scanning the table in a non-blocking
> manner. Exposes operationScan() and getTuple() as before.
> HBasePOJOInputOperator - Implements operationScan() and getTuple() and
> outputs a POJO on the output port.
>
> Please help review these changes.
>
> Thanks
> ~Bhupesh
>
> On Fri, Mar 11, 2016 at 4:42 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Hi All,
>>
>> In the current design of HBase input and output operators, the row key is
>> hard-coded to be of String type.
>> I foresee the following issue:
>>
>> - In case of numeric keys which are type casted to String, *incremental
>> read* is problematic. For example, after reading key = 9, we may not
>> be able to read any record with say, key = 8888, when though numerically
>> 8888 > 9, lexicographically "9" > "8888".
>> - This is the case only when data is being written to HBase and being
>> read from simultaneously.
>>
>> My suggestion is to parametrize the type of row key in the HBase input
>> and output operators, and let the user instantiate the required type for
>> row key. We can have default implementations for String and/ or Long. By
>> parametrizing the row key type, the user can even use complex row keys
>> which are a combination of multiple fields.
>>
>> Thoughts?
>>
>> PS: I understand that there is a performance concern in making a
>> monotonically increasing key as the row key. Given that, how do we address
>> the incremental read scenario?
>>
>> Thanks
>>
>> -Bhupesh
>>
>> On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <
>> sandeep@datatorrent.com> wrote:
>>
>>> Looks fine to me.
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bhupesh@datatorrent.com
>>> >
>>> wrote:
>>>
>>> > Here is the final hierarchy I am considering:
>>> >
>>> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got
>>> rid
>>> > of HBaseOperatorBase.
>>> > HBaseScanOperator - Takes care of scanning the table in a
>>> non-blocking
>>> > manner. Exposes operationScan() and getTuple() as before.
>>> > HBasePOJOInputOperator - Implements operationScan() and
>>> getTuple()
>>> > and outputs a POJO on the output port.
>>> >
>>> > Comments?
>>> >
>>> > -Bhupesh
>>> >
>>> >
>>> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com>
>>> > wrote:
>>> >
>>> > > The class HBaseInputOperator seems to be quite old. HBaseStore seems
>>> to
>>> > be
>>> > > having all the functionality provided by HBaseInputOperator and even
>>> more
>>> > > (including Kerberos authentication).
>>> > >
>>> > > It would be a good idea to avoid the usage of HBaseInputOperator
>>> going
>>> > > forward and use HBaseStore instead.
>>> > >
>>> > > I will also work on abstracting out the HBase input functionality in
>>> the
>>> > > HBaseInputOperator, which can be extended by concrete
>>> implementations.
>>> > >
>>> > > -Bhupesh
>>> > >
>>> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com
>>> > >
>>> > > wrote:
>>> > >
>>> > >> Thanks for the inputs.
>>> > >> As an input operator, I am targeting just the Scan operation. Get
>>> > >> operation may be supported better as a generic operator (like a
>>> query
>>> > >> operator) which I can take up later.
>>> > >>
>>> > >> -Bhupesh
>>> > >>
>>> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <
>>> mohit@datatorrent.com>
>>> > >> wrote:
>>> > >>
>>> > >>> +1
>>> > >>>
>>> > >>> Regards,
>>> > >>> Mohit
>>> > >>>
>>> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>>> > >>> chinmay@datatorrent.com
>>> > >>> > wrote:
>>> > >>>
>>> > >>> > +1 for above.
>>> > >>> > I see that there is HbaseGetOperator but but its abstract no
>>> concrete
>>> > >>> > implementation of this I can find.
>>> > >>> > Are you going to implement of that too?
>>> > >>> >
>>> > >>> > Maybe the concrete implementation of HbaseGetOperator should have
>>> > this.
>>> > >>> >
>>> > >>> > Also, I want to mention one thing about scan from my previous
>>> > >>> experience of
>>> > >>> > Hbase. The Hbase client is synchronous.
>>> > >>> > This means when you fire a scan call, until certain number of
>>> records
>>> > >>> are
>>> > >>> > received at client end, the function blocks.
>>> > >>> > This causes a lot of problems in the current thread as it might
>>> just
>>> > >>> get
>>> > >>> > blocked for a long period of time.
>>> > >>> > Plus, there are always network related latency to add to the
>>> problem.
>>> > >>> >
>>> > >>> > Usually the way to deal with this is to fire scan like queries
>>> on a
>>> > >>> > separate thread and then consume the results in the main thread.
>>> > >>> >
>>> > >>> > Please take care of this scenario while implementation of scan
>>> > >>> operator.
>>> > >>> >
>>> > >>> > -Chinmay.
>>> > >>> >
>>> > >>> >
>>> > >>> > ~ Chinmay.
>>> > >>> >
>>> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>>> > >>> > sandeep@datatorrent.com>
>>> > >>> > wrote:
>>> > >>> >
>>> > >>> > > +1 for this Bhupesh.
>>> > >>> > >
>>> > >>> > > Additionally, I would suggest to add support for;
>>> > >>> > > 1. Point query
>>> > >>> > > 2. Returning any row version
>>> > >>> > >
>>> > >>> > > The above two are key features of HBase and should be
>>> supported.
>>> > >>> > >
>>> > >>> > > Regards,
>>> > >>> > > Sandeep
>>> > >>> > >
>>> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>>> > >>> bhupesh@datatorrent.com
>>> > >>> > >
>>> > >>> > > wrote:
>>> > >>> > >
>>> > >>> > > > Hi All,
>>> > >>> > > >
>>> > >>> > > > The current HBasePOJOInputOperator does not allow us to do
>>> the
>>> > >>> > following:
>>> > >>> > > >
>>> > >>> > > > 1. Allow us to specify a set of "column family: column"
>>> and
>>> > >>> fetch
>>> > >>> > data
>>> > >>> > > > only for these columns.
>>> > >>> > > > 2. Output format is currently a POJO. We need to have
>>> other
>>> > >>> output
>>> > >>> > > > formats such that "columnFamily:column" representation is
>>> > >>> supported.
>>> > >>> > > > Map /
>>> > >>> > > > CSV are some of the options.
>>> > >>> > > > 3. Allow specifying "end row-key" to stop scanning a
>>> table.
>>> > >>> > > > 4. No metrics.
>>> > >>> > > >
>>> > >>> > > > I am planning to add the above functionality to the HBase
>>> Input
>>> > >>> > > operators.
>>> > >>> > > > These features may go into the HBaseScanOperator /
>>> > >>> > > HBasePOJOInputOperator.
>>> > >>> > > >
>>> > >>> > > > Please let me know your comments.
>>> > >>> > > >
>>> > >>> > > > Thanks.
>>> > >>> > > >
>>> > >>> > > > Bhupesh
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi,
I have opened a pull request for the changes as described in the previous
emails. Here is the pull request:
https://github.com/apache/incubator-apex-malhar/pull/212
Here is a short description of the changes:
HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
of HBaseOperatorBase.
HBaseScanOperator - Takes care of scanning the table in a non-blocking
manner. Exposes operationScan() and getTuple() as before.
HBasePOJOInputOperator - Implements operationScan() and getTuple() and
outputs a POJO on the output port.
Please help review these changes.
Thanks
~Bhupesh
On Fri, Mar 11, 2016 at 4:42 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:
> Hi All,
>
> In the current design of HBase input and output operators, the row key is
> hard-coded to be of String type.
> I foresee the following issue:
>
> - In case of numeric keys which are type casted to String, *incremental
> read* is problematic. For example, after reading key = 9, we may not
> be able to read any record with say, key = 8888, when though numerically
> 8888 > 9, lexicographically "9" > "8888".
> - This is the case only when data is being written to HBase and being
> read from simultaneously.
>
> My suggestion is to parametrize the type of row key in the HBase input and
> output operators, and let the user instantiate the required type for row
> key. We can have default implementations for String and/ or Long. By
> parametrizing the row key type, the user can even use complex row keys
> which are a combination of multiple fields.
>
> Thoughts?
>
> PS: I understand that there is a performance concern in making a
> monotonically increasing key as the row key. Given that, how do we address
> the incremental read scenario?
>
> Thanks
>
> -Bhupesh
>
> On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <sandeep@datatorrent.com
> > wrote:
>
>> Looks fine to me.
>>
>> Regards,
>> Sandeep
>>
>> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
>> wrote:
>>
>> > Here is the final hierarchy I am considering:
>> >
>> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got
>> rid
>> > of HBaseOperatorBase.
>> > HBaseScanOperator - Takes care of scanning the table in a
>> non-blocking
>> > manner. Exposes operationScan() and getTuple() as before.
>> > HBasePOJOInputOperator - Implements operationScan() and
>> getTuple()
>> > and outputs a POJO on the output port.
>> >
>> > Comments?
>> >
>> > -Bhupesh
>> >
>> >
>> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com>
>> > wrote:
>> >
>> > > The class HBaseInputOperator seems to be quite old. HBaseStore seems
>> to
>> > be
>> > > having all the functionality provided by HBaseInputOperator and even
>> more
>> > > (including Kerberos authentication).
>> > >
>> > > It would be a good idea to avoid the usage of HBaseInputOperator going
>> > > forward and use HBaseStore instead.
>> > >
>> > > I will also work on abstracting out the HBase input functionality in
>> the
>> > > HBaseInputOperator, which can be extended by concrete implementations.
>> > >
>> > > -Bhupesh
>> > >
>> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com
>> > >
>> > > wrote:
>> > >
>> > >> Thanks for the inputs.
>> > >> As an input operator, I am targeting just the Scan operation. Get
>> > >> operation may be supported better as a generic operator (like a query
>> > >> operator) which I can take up later.
>> > >>
>> > >> -Bhupesh
>> > >>
>> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <
>> mohit@datatorrent.com>
>> > >> wrote:
>> > >>
>> > >>> +1
>> > >>>
>> > >>> Regards,
>> > >>> Mohit
>> > >>>
>> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>> > >>> chinmay@datatorrent.com
>> > >>> > wrote:
>> > >>>
>> > >>> > +1 for above.
>> > >>> > I see that there is HbaseGetOperator but but its abstract no
>> concrete
>> > >>> > implementation of this I can find.
>> > >>> > Are you going to implement of that too?
>> > >>> >
>> > >>> > Maybe the concrete implementation of HbaseGetOperator should have
>> > this.
>> > >>> >
>> > >>> > Also, I want to mention one thing about scan from my previous
>> > >>> experience of
>> > >>> > Hbase. The Hbase client is synchronous.
>> > >>> > This means when you fire a scan call, until certain number of
>> records
>> > >>> are
>> > >>> > received at client end, the function blocks.
>> > >>> > This causes a lot of problems in the current thread as it might
>> just
>> > >>> get
>> > >>> > blocked for a long period of time.
>> > >>> > Plus, there are always network related latency to add to the
>> problem.
>> > >>> >
>> > >>> > Usually the way to deal with this is to fire scan like queries on
>> a
>> > >>> > separate thread and then consume the results in the main thread.
>> > >>> >
>> > >>> > Please take care of this scenario while implementation of scan
>> > >>> operator.
>> > >>> >
>> > >>> > -Chinmay.
>> > >>> >
>> > >>> >
>> > >>> > ~ Chinmay.
>> > >>> >
>> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>> > >>> > sandeep@datatorrent.com>
>> > >>> > wrote:
>> > >>> >
>> > >>> > > +1 for this Bhupesh.
>> > >>> > >
>> > >>> > > Additionally, I would suggest to add support for;
>> > >>> > > 1. Point query
>> > >>> > > 2. Returning any row version
>> > >>> > >
>> > >>> > > The above two are key features of HBase and should be supported.
>> > >>> > >
>> > >>> > > Regards,
>> > >>> > > Sandeep
>> > >>> > >
>> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>> > >>> bhupesh@datatorrent.com
>> > >>> > >
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Hi All,
>> > >>> > > >
>> > >>> > > > The current HBasePOJOInputOperator does not allow us to do the
>> > >>> > following:
>> > >>> > > >
>> > >>> > > > 1. Allow us to specify a set of "column family: column" and
>> > >>> fetch
>> > >>> > data
>> > >>> > > > only for these columns.
>> > >>> > > > 2. Output format is currently a POJO. We need to have other
>> > >>> output
>> > >>> > > > formats such that "columnFamily:column" representation is
>> > >>> supported.
>> > >>> > > > Map /
>> > >>> > > > CSV are some of the options.
>> > >>> > > > 3. Allow specifying "end row-key" to stop scanning a table.
>> > >>> > > > 4. No metrics.
>> > >>> > > >
>> > >>> > > > I am planning to add the above functionality to the HBase
>> Input
>> > >>> > > operators.
>> > >>> > > > These features may go into the HBaseScanOperator /
>> > >>> > > HBasePOJOInputOperator.
>> > >>> > > >
>> > >>> > > > Please let me know your comments.
>> > >>> > > >
>> > >>> > > > Thanks.
>> > >>> > > >
>> > >>> > > > Bhupesh
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi All,
In the current design of HBase input and output operators, the row key is
hard-coded to be of String type.
I foresee the following issue:
- In case of numeric keys which are type casted to String, *incremental
read* is problematic. For example, after reading key = 9, we may not be
able to read any record with say, key = 8888, when though numerically 8888
> 9, lexicographically "9" > "8888".
- This is the case only when data is being written to HBase and being
read from simultaneously.
My suggestion is to parametrize the type of row key in the HBase input and
output operators, and let the user instantiate the required type for row
key. We can have default implementations for String and/ or Long. By
parametrizing the row key type, the user can even use complex row keys
which are a combination of multiple fields.
Thoughts?
PS: I understand that there is a performance concern in making a
monotonically increasing key as the row key. Given that, how do we address
the incremental read scenario?
Thanks
-Bhupesh
On Wed, Dec 30, 2015 at 7:49 PM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:
> Looks fine to me.
>
> Regards,
> Sandeep
>
> On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Here is the final hierarchy I am considering:
> >
> > HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> > of HBaseOperatorBase.
> > HBaseScanOperator - Takes care of scanning the table in a
> non-blocking
> > manner. Exposes operationScan() and getTuple() as before.
> > HBasePOJOInputOperator - Implements operationScan() and
> getTuple()
> > and outputs a POJO on the output port.
> >
> > Comments?
> >
> > -Bhupesh
> >
> >
> > On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > The class HBaseInputOperator seems to be quite old. HBaseStore seems to
> > be
> > > having all the functionality provided by HBaseInputOperator and even
> more
> > > (including Kerberos authentication).
> > >
> > > It would be a good idea to avoid the usage of HBaseInputOperator going
> > > forward and use HBaseStore instead.
> > >
> > > I will also work on abstracting out the HBase input functionality in
> the
> > > HBaseInputOperator, which can be extended by concrete implementations.
> > >
> > > -Bhupesh
> > >
> > > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > >> Thanks for the inputs.
> > >> As an input operator, I am targeting just the Scan operation. Get
> > >> operation may be supported better as a generic operator (like a query
> > >> operator) which I can take up later.
> > >>
> > >> -Bhupesh
> > >>
> > >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mohit@datatorrent.com
> >
> > >> wrote:
> > >>
> > >>> +1
> > >>>
> > >>> Regards,
> > >>> Mohit
> > >>>
> > >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> > >>> chinmay@datatorrent.com
> > >>> > wrote:
> > >>>
> > >>> > +1 for above.
> > >>> > I see that there is HbaseGetOperator but but its abstract no
> concrete
> > >>> > implementation of this I can find.
> > >>> > Are you going to implement of that too?
> > >>> >
> > >>> > Maybe the concrete implementation of HbaseGetOperator should have
> > this.
> > >>> >
> > >>> > Also, I want to mention one thing about scan from my previous
> > >>> experience of
> > >>> > Hbase. The Hbase client is synchronous.
> > >>> > This means when you fire a scan call, until certain number of
> records
> > >>> are
> > >>> > received at client end, the function blocks.
> > >>> > This causes a lot of problems in the current thread as it might
> just
> > >>> get
> > >>> > blocked for a long period of time.
> > >>> > Plus, there are always network related latency to add to the
> problem.
> > >>> >
> > >>> > Usually the way to deal with this is to fire scan like queries on a
> > >>> > separate thread and then consume the results in the main thread.
> > >>> >
> > >>> > Please take care of this scenario while implementation of scan
> > >>> operator.
> > >>> >
> > >>> > -Chinmay.
> > >>> >
> > >>> >
> > >>> > ~ Chinmay.
> > >>> >
> > >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> > >>> > sandeep@datatorrent.com>
> > >>> > wrote:
> > >>> >
> > >>> > > +1 for this Bhupesh.
> > >>> > >
> > >>> > > Additionally, I would suggest to add support for;
> > >>> > > 1. Point query
> > >>> > > 2. Returning any row version
> > >>> > >
> > >>> > > The above two are key features of HBase and should be supported.
> > >>> > >
> > >>> > > Regards,
> > >>> > > Sandeep
> > >>> > >
> > >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> > >>> bhupesh@datatorrent.com
> > >>> > >
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Hi All,
> > >>> > > >
> > >>> > > > The current HBasePOJOInputOperator does not allow us to do the
> > >>> > following:
> > >>> > > >
> > >>> > > > 1. Allow us to specify a set of "column family: column" and
> > >>> fetch
> > >>> > data
> > >>> > > > only for these columns.
> > >>> > > > 2. Output format is currently a POJO. We need to have other
> > >>> output
> > >>> > > > formats such that "columnFamily:column" representation is
> > >>> supported.
> > >>> > > > Map /
> > >>> > > > CSV are some of the options.
> > >>> > > > 3. Allow specifying "end row-key" to stop scanning a table.
> > >>> > > > 4. No metrics.
> > >>> > > >
> > >>> > > > I am planning to add the above functionality to the HBase Input
> > >>> > > operators.
> > >>> > > > These features may go into the HBaseScanOperator /
> > >>> > > HBasePOJOInputOperator.
> > >>> > > >
> > >>> > > > Please let me know your comments.
> > >>> > > >
> > >>> > > > Thanks.
> > >>> > > >
> > >>> > > > Bhupesh
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
Looks fine to me.
Regards,
Sandeep
On Wed, Dec 30, 2015 at 7:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:
> Here is the final hierarchy I am considering:
>
> HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
> of HBaseOperatorBase.
> HBaseScanOperator - Takes care of scanning the table in a non-blocking
> manner. Exposes operationScan() and getTuple() as before.
> HBasePOJOInputOperator - Implements operationScan() and getTuple()
> and outputs a POJO on the output port.
>
> Comments?
>
> -Bhupesh
>
>
> On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > The class HBaseInputOperator seems to be quite old. HBaseStore seems to
> be
> > having all the functionality provided by HBaseInputOperator and even more
> > (including Kerberos authentication).
> >
> > It would be a good idea to avoid the usage of HBaseInputOperator going
> > forward and use HBaseStore instead.
> >
> > I will also work on abstracting out the HBase input functionality in the
> > HBaseInputOperator, which can be extended by concrete implementations.
> >
> > -Bhupesh
> >
> > On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> >> Thanks for the inputs.
> >> As an input operator, I am targeting just the Scan operation. Get
> >> operation may be supported better as a generic operator (like a query
> >> operator) which I can take up later.
> >>
> >> -Bhupesh
> >>
> >> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
> >> wrote:
> >>
> >>> +1
> >>>
> >>> Regards,
> >>> Mohit
> >>>
> >>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> >>> chinmay@datatorrent.com
> >>> > wrote:
> >>>
> >>> > +1 for above.
> >>> > I see that there is HbaseGetOperator but but its abstract no concrete
> >>> > implementation of this I can find.
> >>> > Are you going to implement of that too?
> >>> >
> >>> > Maybe the concrete implementation of HbaseGetOperator should have
> this.
> >>> >
> >>> > Also, I want to mention one thing about scan from my previous
> >>> experience of
> >>> > Hbase. The Hbase client is synchronous.
> >>> > This means when you fire a scan call, until certain number of records
> >>> are
> >>> > received at client end, the function blocks.
> >>> > This causes a lot of problems in the current thread as it might just
> >>> get
> >>> > blocked for a long period of time.
> >>> > Plus, there are always network related latency to add to the problem.
> >>> >
> >>> > Usually the way to deal with this is to fire scan like queries on a
> >>> > separate thread and then consume the results in the main thread.
> >>> >
> >>> > Please take care of this scenario while implementation of scan
> >>> operator.
> >>> >
> >>> > -Chinmay.
> >>> >
> >>> >
> >>> > ~ Chinmay.
> >>> >
> >>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> >>> > sandeep@datatorrent.com>
> >>> > wrote:
> >>> >
> >>> > > +1 for this Bhupesh.
> >>> > >
> >>> > > Additionally, I would suggest to add support for;
> >>> > > 1. Point query
> >>> > > 2. Returning any row version
> >>> > >
> >>> > > The above two are key features of HBase and should be supported.
> >>> > >
> >>> > > Regards,
> >>> > > Sandeep
> >>> > >
> >>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> >>> bhupesh@datatorrent.com
> >>> > >
> >>> > > wrote:
> >>> > >
> >>> > > > Hi All,
> >>> > > >
> >>> > > > The current HBasePOJOInputOperator does not allow us to do the
> >>> > following:
> >>> > > >
> >>> > > > 1. Allow us to specify a set of "column family: column" and
> >>> fetch
> >>> > data
> >>> > > > only for these columns.
> >>> > > > 2. Output format is currently a POJO. We need to have other
> >>> output
> >>> > > > formats such that "columnFamily:column" representation is
> >>> supported.
> >>> > > > Map /
> >>> > > > CSV are some of the options.
> >>> > > > 3. Allow specifying "end row-key" to stop scanning a table.
> >>> > > > 4. No metrics.
> >>> > > >
> >>> > > > I am planning to add the above functionality to the HBase Input
> >>> > > operators.
> >>> > > > These features may go into the HBaseScanOperator /
> >>> > > HBasePOJOInputOperator.
> >>> > > >
> >>> > > > Please let me know your comments.
> >>> > > >
> >>> > > > Thanks.
> >>> > > >
> >>> > > > Bhupesh
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Here is the final hierarchy I am considering:
HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
of HBaseOperatorBase.
HBaseScanOperator - Takes care of scanning the table in a non-blocking
manner. Exposes operationScan() and getTuple() as before.
HBasePOJOInputOperator - Implements operationScan() and getTuple()
and outputs a POJO on the output port.
Comments?
-Bhupesh
On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:
> The class HBaseInputOperator seems to be quite old. HBaseStore seems to be
> having all the functionality provided by HBaseInputOperator and even more
> (including Kerberos authentication).
>
> It would be a good idea to avoid the usage of HBaseInputOperator going
> forward and use HBaseStore instead.
>
> I will also work on abstracting out the HBase input functionality in the
> HBaseInputOperator, which can be extended by concrete implementations.
>
> -Bhupesh
>
> On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Thanks for the inputs.
>> As an input operator, I am targeting just the Scan operation. Get
>> operation may be supported better as a generic operator (like a query
>> operator) which I can take up later.
>>
>> -Bhupesh
>>
>> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
>> wrote:
>>
>>> +1
>>>
>>> Regards,
>>> Mohit
>>>
>>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>>> chinmay@datatorrent.com
>>> > wrote:
>>>
>>> > +1 for above.
>>> > I see that there is HbaseGetOperator but but its abstract no concrete
>>> > implementation of this I can find.
>>> > Are you going to implement of that too?
>>> >
>>> > Maybe the concrete implementation of HbaseGetOperator should have this.
>>> >
>>> > Also, I want to mention one thing about scan from my previous
>>> experience of
>>> > Hbase. The Hbase client is synchronous.
>>> > This means when you fire a scan call, until certain number of records
>>> are
>>> > received at client end, the function blocks.
>>> > This causes a lot of problems in the current thread as it might just
>>> get
>>> > blocked for a long period of time.
>>> > Plus, there are always network related latency to add to the problem.
>>> >
>>> > Usually the way to deal with this is to fire scan like queries on a
>>> > separate thread and then consume the results in the main thread.
>>> >
>>> > Please take care of this scenario while implementation of scan
>>> operator.
>>> >
>>> > -Chinmay.
>>> >
>>> >
>>> > ~ Chinmay.
>>> >
>>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>>> > sandeep@datatorrent.com>
>>> > wrote:
>>> >
>>> > > +1 for this Bhupesh.
>>> > >
>>> > > Additionally, I would suggest to add support for;
>>> > > 1. Point query
>>> > > 2. Returning any row version
>>> > >
>>> > > The above two are key features of HBase and should be supported.
>>> > >
>>> > > Regards,
>>> > > Sandeep
>>> > >
>>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hi All,
>>> > > >
>>> > > > The current HBasePOJOInputOperator does not allow us to do the
>>> > following:
>>> > > >
>>> > > > 1. Allow us to specify a set of "column family: column" and
>>> fetch
>>> > data
>>> > > > only for these columns.
>>> > > > 2. Output format is currently a POJO. We need to have other
>>> output
>>> > > > formats such that "columnFamily:column" representation is
>>> supported.
>>> > > > Map /
>>> > > > CSV are some of the options.
>>> > > > 3. Allow specifying "end row-key" to stop scanning a table.
>>> > > > 4. No metrics.
>>> > > >
>>> > > > I am planning to add the above functionality to the HBase Input
>>> > > operators.
>>> > > > These features may go into the HBaseScanOperator /
>>> > > HBasePOJOInputOperator.
>>> > > >
>>> > > > Please let me know your comments.
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > Bhupesh
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Bhupesh Chawda <bh...@datatorrent.com>.
The class HBaseInputOperator seems to be quite old. HBaseStore seems to be
having all the functionality provided by HBaseInputOperator and even more
(including Kerberos authentication).
It would be a good idea to avoid the usage of HBaseInputOperator going
forward and use HBaseStore instead.
I will also work on abstracting out the HBase input functionality in the
HBaseInputOperator, which can be extended by concrete implementations.
-Bhupesh
On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:
> Thanks for the inputs.
> As an input operator, I am targeting just the Scan operation. Get
> operation may be supported better as a generic operator (like a query
> operator) which I can take up later.
>
> -Bhupesh
>
> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
> wrote:
>
>> +1
>>
>> Regards,
>> Mohit
>>
>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>> chinmay@datatorrent.com
>> > wrote:
>>
>> > +1 for above.
>> > I see that there is HbaseGetOperator but but its abstract no concrete
>> > implementation of this I can find.
>> > Are you going to implement of that too?
>> >
>> > Maybe the concrete implementation of HbaseGetOperator should have this.
>> >
>> > Also, I want to mention one thing about scan from my previous
>> experience of
>> > Hbase. The Hbase client is synchronous.
>> > This means when you fire a scan call, until certain number of records
>> are
>> > received at client end, the function blocks.
>> > This causes a lot of problems in the current thread as it might just get
>> > blocked for a long period of time.
>> > Plus, there are always network related latency to add to the problem.
>> >
>> > Usually the way to deal with this is to fire scan like queries on a
>> > separate thread and then consume the results in the main thread.
>> >
>> > Please take care of this scenario while implementation of scan operator.
>> >
>> > -Chinmay.
>> >
>> >
>> > ~ Chinmay.
>> >
>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>> > sandeep@datatorrent.com>
>> > wrote:
>> >
>> > > +1 for this Bhupesh.
>> > >
>> > > Additionally, I would suggest to add support for;
>> > > 1. Point query
>> > > 2. Returning any row version
>> > >
>> > > The above two are key features of HBase and should be supported.
>> > >
>> > > Regards,
>> > > Sandeep
>> > >
>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com
>> > >
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > The current HBasePOJOInputOperator does not allow us to do the
>> > following:
>> > > >
>> > > > 1. Allow us to specify a set of "column family: column" and fetch
>> > data
>> > > > only for these columns.
>> > > > 2. Output format is currently a POJO. We need to have other
>> output
>> > > > formats such that "columnFamily:column" representation is
>> supported.
>> > > > Map /
>> > > > CSV are some of the options.
>> > > > 3. Allow specifying "end row-key" to stop scanning a table.
>> > > > 4. No metrics.
>> > > >
>> > > > I am planning to add the above functionality to the HBase Input
>> > > operators.
>> > > > These features may go into the HBaseScanOperator /
>> > > HBasePOJOInputOperator.
>> > > >
>> > > > Please let me know your comments.
>> > > >
>> > > > Thanks.
>> > > >
>> > > > Bhupesh
>> > > >
>> > >
>> >
>>
>
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Thanks for the inputs.
As an input operator, I am targeting just the Scan operation. Get operation
may be supported better as a generic operator (like a query operator) which
I can take up later.
-Bhupesh
On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mo...@datatorrent.com>
wrote:
> +1
>
> Regards,
> Mohit
>
> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
> chinmay@datatorrent.com
> > wrote:
>
> > +1 for above.
> > I see that there is HbaseGetOperator but but its abstract no concrete
> > implementation of this I can find.
> > Are you going to implement of that too?
> >
> > Maybe the concrete implementation of HbaseGetOperator should have this.
> >
> > Also, I want to mention one thing about scan from my previous experience
> of
> > Hbase. The Hbase client is synchronous.
> > This means when you fire a scan call, until certain number of records are
> > received at client end, the function blocks.
> > This causes a lot of problems in the current thread as it might just get
> > blocked for a long period of time.
> > Plus, there are always network related latency to add to the problem.
> >
> > Usually the way to deal with this is to fire scan like queries on a
> > separate thread and then consume the results in the main thread.
> >
> > Please take care of this scenario while implementation of scan operator.
> >
> > -Chinmay.
> >
> >
> > ~ Chinmay.
> >
> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> > sandeep@datatorrent.com>
> > wrote:
> >
> > > +1 for this Bhupesh.
> > >
> > > Additionally, I would suggest to add support for;
> > > 1. Point query
> > > 2. Returning any row version
> > >
> > > The above two are key features of HBase and should be supported.
> > >
> > > Regards,
> > > Sandeep
> > >
> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > The current HBasePOJOInputOperator does not allow us to do the
> > following:
> > > >
> > > > 1. Allow us to specify a set of "column family: column" and fetch
> > data
> > > > only for these columns.
> > > > 2. Output format is currently a POJO. We need to have other output
> > > > formats such that "columnFamily:column" representation is
> supported.
> > > > Map /
> > > > CSV are some of the options.
> > > > 3. Allow specifying "end row-key" to stop scanning a table.
> > > > 4. No metrics.
> > > >
> > > > I am planning to add the above functionality to the HBase Input
> > > operators.
> > > > These features may go into the HBaseScanOperator /
> > > HBasePOJOInputOperator.
> > > >
> > > > Please let me know your comments.
> > > >
> > > > Thanks.
> > > >
> > > > Bhupesh
> > > >
> > >
> >
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Mohit Jotwani <mo...@datatorrent.com>.
+1
Regards,
Mohit
On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <chinmay@datatorrent.com
> wrote:
> +1 for above.
> I see that there is HbaseGetOperator but but its abstract no concrete
> implementation of this I can find.
> Are you going to implement of that too?
>
> Maybe the concrete implementation of HbaseGetOperator should have this.
>
> Also, I want to mention one thing about scan from my previous experience of
> Hbase. The Hbase client is synchronous.
> This means when you fire a scan call, until certain number of records are
> received at client end, the function blocks.
> This causes a lot of problems in the current thread as it might just get
> blocked for a long period of time.
> Plus, there are always network related latency to add to the problem.
>
> Usually the way to deal with this is to fire scan like queries on a
> separate thread and then consume the results in the main thread.
>
> Please take care of this scenario while implementation of scan operator.
>
> -Chinmay.
>
>
> ~ Chinmay.
>
> On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
> sandeep@datatorrent.com>
> wrote:
>
> > +1 for this Bhupesh.
> >
> > Additionally, I would suggest to add support for;
> > 1. Point query
> > 2. Returning any row version
> >
> > The above two are key features of HBase and should be supported.
> >
> > Regards,
> > Sandeep
> >
> > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > The current HBasePOJOInputOperator does not allow us to do the
> following:
> > >
> > > 1. Allow us to specify a set of "column family: column" and fetch
> data
> > > only for these columns.
> > > 2. Output format is currently a POJO. We need to have other output
> > > formats such that "columnFamily:column" representation is supported.
> > > Map /
> > > CSV are some of the options.
> > > 3. Allow specifying "end row-key" to stop scanning a table.
> > > 4. No metrics.
> > >
> > > I am planning to add the above functionality to the HBase Input
> > operators.
> > > These features may go into the HBaseScanOperator /
> > HBasePOJOInputOperator.
> > >
> > > Please let me know your comments.
> > >
> > > Thanks.
> > >
> > > Bhupesh
> > >
> >
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Chinmay Kolhatkar <ch...@datatorrent.com>.
+1 for above.
I see that there is HbaseGetOperator but but its abstract no concrete
implementation of this I can find.
Are you going to implement of that too?
Maybe the concrete implementation of HbaseGetOperator should have this.
Also, I want to mention one thing about scan from my previous experience of
Hbase. The Hbase client is synchronous.
This means when you fire a scan call, until certain number of records are
received at client end, the function blocks.
This causes a lot of problems in the current thread as it might just get
blocked for a long period of time.
Plus, there are always network related latency to add to the problem.
Usually the way to deal with this is to fire scan like queries on a
separate thread and then consume the results in the main thread.
Please take care of this scenario while implementation of scan operator.
-Chinmay.
~ Chinmay.
On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:
> +1 for this Bhupesh.
>
> Additionally, I would suggest to add support for;
> 1. Point query
> 2. Returning any row version
>
> The above two are key features of HBase and should be supported.
>
> Regards,
> Sandeep
>
> On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> > The current HBasePOJOInputOperator does not allow us to do the following:
> >
> > 1. Allow us to specify a set of "column family: column" and fetch data
> > only for these columns.
> > 2. Output format is currently a POJO. We need to have other output
> > formats such that "columnFamily:column" representation is supported.
> > Map /
> > CSV are some of the options.
> > 3. Allow specifying "end row-key" to stop scanning a table.
> > 4. No metrics.
> >
> > I am planning to add the above functionality to the HBase Input
> operators.
> > These features may go into the HBaseScanOperator /
> HBasePOJOInputOperator.
> >
> > Please let me know your comments.
> >
> > Thanks.
> >
> > Bhupesh
> >
>
Re: Adding features to HBase Input Operators in Malhar-contrib
Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
+1 for this Bhupesh.
Additionally, I would suggest to add support for;
1. Point query
2. Returning any row version
The above two are key features of HBase and should be supported.
Regards,
Sandeep
On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:
> Hi All,
>
> The current HBasePOJOInputOperator does not allow us to do the following:
>
> 1. Allow us to specify a set of "column family: column" and fetch data
> only for these columns.
> 2. Output format is currently a POJO. We need to have other output
> formats such that "columnFamily:column" representation is supported.
> Map /
> CSV are some of the options.
> 3. Allow specifying "end row-key" to stop scanning a table.
> 4. No metrics.
>
> I am planning to add the above functionality to the HBase Input operators.
> These features may go into the HBaseScanOperator / HBasePOJOInputOperator.
>
> Please let me know your comments.
>
> Thanks.
>
> Bhupesh
>