You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Anil <an...@gmail.com> on 2016/10/11 13:11:27 UTC

Loading Hbase data into Ignite

HI,

we have around 18 M records in hbase which needs to be loaded into ignite
cluster.

i was looking at

http://apacheignite.gridgain.org/v1.7/docs/data-loading

https://github.com/apache/ignite/tree/master/examples

is there any approach where each ignite node loads the data of one hbase
region ?

Do you have any recommendations ?

Thanks.

Re: Loading Hbase data into Ignite

Posted by Vladislav Pyatkov <vl...@gmail.com>.

Hi,

The easiest way do this using DataStrimer[1] from all server nodes, but
with specific part of data.
You can do it using Ignite compute[2] (matching by node id for example or
node parameter or any other) with part number as a parameter to SQL query
for HDB.

[1]: http://apacheignite.gridgain.org/docs/data-streamers
[2]:
http://apacheignite.gridgain.org/docs/distributed-closures#broadcast-methods

On Tue, Oct 11, 2016 at 4:11 PM, Anil <an...@gmail.com> wrote:

> HI,
>
> we have around 18 M records in hbase which needs to be loaded into ignite
> cluster.
>
> i was looking at
>
> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>
> https://github.com/apache/ignite/tree/master/examples
>
> is there any approach where each ignite node loads the data of one hbase
> region ?
>
> Do you have any recommendations ?
>
> Thanks.
>

-- 
Vladislav Pyatkov

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

HI Alexey,

We are planning to have 4 node cluster. we will increase the number of
nodes based on performance.

key is string which unique (some part of hbase record primary key which is
unique). Each record has around 25-30 fields but that is small only. Record
wont have much content.

All 18 M records are related to one use case only.. so planning to keep in
single cache so that pagination , filter and sorting supported at cache
level itself.

Initial load will be just write to cache and changes (or new objects) to
existing cache will be added/updated using kafka stream.

Thanks.

On 11 October 2016 at 19:03, Alexey Kuznetsov <ak...@apache.org> wrote:

> Hi, Anil.
>
> It depends on your use case.
> How many nodes will be in your cluster?
> All 18M records will be in one cache or many caches?
> How big single record? What will be the key?
> You need only load or you also need write changed / new objects in cache
> to HBase?
>
> On Tue, Oct 11, 2016 at 8:11 PM, Anil <an...@gmail.com> wrote:
>
>> HI,
>>
>> we have around 18 M records in hbase which needs to be loaded into ignite
>> cluster.
>>
>> i was looking at
>>
>> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>>
>> https://github.com/apache/ignite/tree/master/examples
>>
>> is there any approach where each ignite node loads the data of one hbase
>> region ?
>>
>> Do you have any recommendations ?
>>
>> Thanks.
>>
>
>
>
> --
> Alexey Kuznetsov
>

Re: Loading Hbase data into Ignite

Posted by Alexey Kuznetsov <ak...@apache.org>.

Hi, Anil.

It depends on your use case.
How many nodes will be in your cluster?
All 18M records will be in one cache or many caches?
How big single record? What will be the key?
You need only load or you also need write changed / new objects in cache to
HBase?

On Tue, Oct 11, 2016 at 8:11 PM, Anil <an...@gmail.com> wrote:

> HI,
>
> we have around 18 M records in hbase which needs to be loaded into ignite
> cluster.
>
> i was looking at
>
> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>
> https://github.com/apache/ignite/tree/master/examples
>
> is there any approach where each ignite node loads the data of one hbase
> region ?
>
> Do you have any recommendations ?
>
> Thanks.
>

-- 
Alexey Kuznetsov

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

Thank you.



On 12 October 2016 at 15:56, Taras Ledkov <tl...@gridgain.com> wrote:

> Hi,
>
> FailoverSpi is used to process jobs failures.
>
> The AlwaysFailoverSpi implementation is used by default. One tries to
> submit a job the 'maximumFailoverAttempts' (default 5) times .
> On 12.10.2016 13:09, Anil wrote:
>
> HI,
>
> Following is the approach to load hbase data into Ingnite
>
> 1. Create Cluster wide singleton distributed custom service
> 2. Get all region(s) information in the init() method of your custom
> service
> 3. Broadcast region(s) using ignite.compute().call() in execute() method
> of your custom service
> 4. Scan a particular region and load the cache
>
> Note : Need to handle node failure during cache load as distributed
> service is deployed on some other node.
>
>
> How a broadcast job process intermediate failure handled in ignite
> compute() ? rescheduled ? or ignored ? Please clarify.
>
> Please let me know if you see any anti-pattern in terms of ignite ?
>
> Thanks.
>
>
>
>
>
>
> On 11 October 2016 at 20:49, Anil <an...@gmail.com> wrote:
>
>> Thank you Vladislav and Andrey. I will look at the document and give a
>> try.
>>
>> Thanks again.
>>
>> On 11 October 2016 at 20:47, Andrey Gura <ag...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> HBase regions doesn't map to Ignite nodes due to architectural
>>> differences. Each HBase region contains rows in some range of keys that
>>> sorted lexicographically while distribution of keys in Ignite depends on
>>> affinity function and key hash code. Also how do you remap region to nodes
>>> in case of region was splitted?
>>>
>>> Of course you can get node ID in cluster for given key but because HBase
>>> keeps rows sorted by keys lexicographically you should perform full scan in
>>> HBase table. So the simplest way for parallelization data loading from
>>> HBase to Ignite it concurrently scan regions and stream all rows to one or
>>> more DataStreamer.
>>>
>>>
>>> On Tue, Oct 11, 2016 at 4:11 PM, Anil <an...@gmail.com> wrote:
>>>
>>>> HI,
>>>>
>>>> we have around 18 M records in hbase which needs to be loaded into
>>>> ignite cluster.
>>>>
>>>> i was looking at
>>>>
>>>> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>>>>
>>>> https://github.com/apache/ignite/tree/master/examples
>>>>
>>>> is there any approach where each ignite node loads the data of one
>>>> hbase region ?
>>>>
>>>> Do you have any recommendations ?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>
> --
> Taras Ledkov
> Mail-To: tledkov@gridgain.com
>
>

Re: Loading Hbase data into Ignite

Posted by Taras Ledkov <tl...@gridgain.com>.

Hi,

FailoverSpi is used to process jobs failures.

The AlwaysFailoverSpi implementation is used by default. One tries to 
submit a job the 'maximumFailoverAttempts' (default 5) times .

On 12.10.2016 13:09, Anil wrote:
> HI,
>
> Following is the approach to load hbase data into Ingnite
>
> 1. Create Cluster wide singleton distributed custom service
> 2. Get all region(s) information in the init() method of your custom 
> service
> 3. Broadcast region(s) using ignite.compute().call() in execute() 
> method of your custom service
> 4. Scan a particular region and load the cache
>
> Note : Need to handle node failure during cache load as distributed 
> service is deployed on some other node.
>
>
> How a broadcast job process intermediate failure handled in ignite 
> compute() ? rescheduled ? or ignored ? Please clarify.
>
> Please let me know if you see any anti-pattern in terms of ignite ?
>
> Thanks.
>
>
>
>
>
>
> On 11 October 2016 at 20:49, Anil <anilklce@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thank you Vladislav and Andrey. I will look at the document and
>     give a try.
>
>     Thanks again.
>
>     On 11 October 2016 at 20:47, Andrey Gura <agura@apache.org
>     <ma...@apache.org>> wrote:
>
>         Hi,
>
>         HBase regions doesn't map to Ignite nodes due to architectural
>         differences. Each HBase region contains rows in some range of
>         keys that sorted lexicographically while distribution of keys
>         in Ignite depends on affinity function and key hash code. Also
>         how do you remap region to nodes in case of region was splitted?
>
>         Of course you can get node ID in cluster for given key but
>         because HBase keeps rows sorted by keys lexicographically you
>         should perform full scan in HBase table. So the simplest way
>         for parallelization data loading from HBase to Ignite it
>         concurrently scan regions and stream all rows to one or more
>         DataStreamer.
>
>
>         On Tue, Oct 11, 2016 at 4:11 PM, Anil <anilklce@gmail.com
>         <ma...@gmail.com>> wrote:
>
>             HI,
>
>             we have around 18 M records in hbase which needs to be
>             loaded into ignite cluster.
>
>             i was looking at
>
>             http://apacheignite.gridgain.org/v1.7/docs/data-loading
>             <http://apacheignite.gridgain.org/v1.7/docs/data-loading>
>
>             https://github.com/apache/ignite/tree/master/examples
>             <https://github.com/apache/ignite/tree/master/examples>
>
>             is there any approach where each ignite node loads the
>             data of one hbase region ?
>
>             Do you have any recommendations ?
>
>             Thanks.
>
>
>
>

-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

HI,

Following is the approach to load hbase data into Ingnite

1. Create Cluster wide singleton distributed custom service
2. Get all region(s) information in the init() method of your custom service
3. Broadcast region(s) using ignite.compute().call() in execute() method of
your custom service
4. Scan a particular region and load the cache

Note : Need to handle node failure during cache load as distributed service
is deployed on some other node.


How a broadcast job process intermediate failure handled in ignite
compute() ? rescheduled ? or ignored ? Please clarify.

Please let me know if you see any anti-pattern in terms of ignite ?

Thanks.






On 11 October 2016 at 20:49, Anil <an...@gmail.com> wrote:

> Thank you Vladislav and Andrey. I will look at the document and give a
> try.
>
> Thanks again.
>
> On 11 October 2016 at 20:47, Andrey Gura <ag...@apache.org> wrote:
>
>> Hi,
>>
>> HBase regions doesn't map to Ignite nodes due to architectural
>> differences. Each HBase region contains rows in some range of keys that
>> sorted lexicographically while distribution of keys in Ignite depends on
>> affinity function and key hash code. Also how do you remap region to nodes
>> in case of region was splitted?
>>
>> Of course you can get node ID in cluster for given key but because HBase
>> keeps rows sorted by keys lexicographically you should perform full scan in
>> HBase table. So the simplest way for parallelization data loading from
>> HBase to Ignite it concurrently scan regions and stream all rows to one or
>> more DataStreamer.
>>
>>
>> On Tue, Oct 11, 2016 at 4:11 PM, Anil <an...@gmail.com> wrote:
>>
>>> HI,
>>>
>>> we have around 18 M records in hbase which needs to be loaded into
>>> ignite cluster.
>>>
>>> i was looking at
>>>
>>> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>>>
>>> https://github.com/apache/ignite/tree/master/examples
>>>
>>> is there any approach where each ignite node loads the data of one hbase
>>> region ?
>>>
>>> Do you have any recommendations ?
>>>
>>> Thanks.
>>>
>>
>>
>

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

This has been resolved Val. Thanks

On 26 October 2016 at 14:58, vdpyatkov <vl...@gmail.com> wrote:

> Hi Anil,
>
> I doubt, about this fields can serialize correctly:
>
> private Scan scan;
> private QueryPlan queryPlan;
>
> You need will get rid of this fields from serialized object.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Loading-Hbase-data-into-Ignite-tp8209p8502.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Re: Loading Hbase data into Ignite

Posted by vdpyatkov <vl...@gmail.com>.

Hi Anil,

I doubt, about this fields can serialize correctly:

private Scan scan;
private QueryPlan queryPlan;

You need will get rid of this fields from serialized object.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Loading-Hbase-data-into-Ignite-tp8209p8502.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

Thanks val.

can you elaborate "You need to implement Callable as easy as possible (in
additional you can try implement Externalizable) and to create connection
into IgniteCallable directly."

ignite.compute().call() accepts IgniteCallable instances only.

i tried attached classes and no luck. Looks like the failures
of attached classes also because of serialization.

is there any way to overcome this ?

Thanks




On 18 October 2016 at 14:31, Vladislav Pyatkov <vl...@gmail.com> wrote:

> Hi Anil,
>
> The implementation of IgniteCallable looks like very doubtful.
> When you invoke "ignite.compute().call(calls)" all IgniteCallable will be
> serialized and
> sended to particular nodes on executing.
>
> I have doubt about, QueryPlan serialized correctly.
>
> You need to implement Callable as easy as possible (in additional you can
> try implement Externalizable) and to create connection into IgniteCallable
> directly.
>
> On Tue, Oct 18, 2016 at 7:34 AM, Anil <an...@gmail.com> wrote:
>
>> Hi Val,
>>
>> I have attached the sample program. please take a look and let me know if
>> you have any questions.
>>
>> after spending some time, i noticed that the exception is happening only
>> when processing of number of parallel callable's with broadcast.
>>
>> Thanks,
>> Anil
>>
>> On 15 October 2016 at 04:33, vkulichenko <va...@gmail.com>
>> wrote:
>>
>>> Hi Anil,
>>>
>>> Yes, the exception doesn't tell much. It would be great if you provide a
>>> test that reproduces the issue.
>>>
>>> -Val
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-ignite-users.705
>>> 18.x6.nabble.com/Loading-Hbase-data-into-Ignite-tp8209p8308.html
>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>
>>
>>
>
>
> --
> Vladislav Pyatkov
>

Re: Loading Hbase data into Ignite

Posted by Vladislav Pyatkov <vl...@gmail.com>.

Hi Anil,

The implementation of IgniteCallable looks like very doubtful.
When you invoke "ignite.compute().call(calls)" all IgniteCallable will be
serialized and
sended to particular nodes on executing.

I have doubt about, QueryPlan serialized correctly.

You need to implement Callable as easy as possible (in additional you can
try implement Externalizable) and to create connection into IgniteCallable
directly.

On Tue, Oct 18, 2016 at 7:34 AM, Anil <an...@gmail.com> wrote:

> Hi Val,
>
> I have attached the sample program. please take a look and let me know if
> you have any questions.
>
> after spending some time, i noticed that the exception is happening only
> when processing of number of parallel callable's with broadcast.
>
> Thanks,
> Anil
>
> On 15 October 2016 at 04:33, vkulichenko <va...@gmail.com>
> wrote:
>
>> Hi Anil,
>>
>> Yes, the exception doesn't tell much. It would be great if you provide a
>> test that reproduces the issue.
>>
>> -Val
>>
>>
>>
>> --
>> View this message in context: http://apache-ignite-users.705
>> 18.x6.nabble.com/Loading-Hbase-data-into-Ignite-tp8209p8308.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>

-- 
Vladislav Pyatkov

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

Hi Val,

I have attached the sample program. please take a look and let me know if
you have any questions.

after spending some time, i noticed that the exception is happening only
when processing of number of parallel callable's with broadcast.

Thanks,
Anil

On 15 October 2016 at 04:33, vkulichenko <va...@gmail.com>
wrote:

> Hi Anil,
>
> Yes, the exception doesn't tell much. It would be great if you provide a
> test that reproduces the issue.
>
> -Val
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Loading-Hbase-data-into-Ignite-tp8209p8308.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Re: Loading Hbase data into Ignite

Posted by vkulichenko <va...@gmail.com>.

Hi Anil,

Yes, the exception doesn't tell much. It would be great if you provide a
test that reproduces the issue.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Loading-Hbase-data-into-Ignite-tp8209p8308.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

HI,

when i am reading hbase information using Broadcast , i see the following
exception

Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:572)
... 149 more
Caused by: java.lang.NullPointerException
at java.security.CodeSource.readObject(CodeSource.java:587)
... 154 more

Could you please let me know root cause of the exception ?

I know, the above exception stack trace is not clear, i can share the code
if required.

Thanks,
Anil

On 11 October 2016 at 20:49, Anil <an...@gmail.com> wrote:

> Thank you Vladislav and Andrey. I will look at the document and give a
> try.
>
> Thanks again.
>
> On 11 October 2016 at 20:47, Andrey Gura <ag...@apache.org> wrote:
>
>> Hi,
>>
>> HBase regions doesn't map to Ignite nodes due to architectural
>> differences. Each HBase region contains rows in some range of keys that
>> sorted lexicographically while distribution of keys in Ignite depends on
>> affinity function and key hash code. Also how do you remap region to nodes
>> in case of region was splitted?
>>
>> Of course you can get node ID in cluster for given key but because HBase
>> keeps rows sorted by keys lexicographically you should perform full scan in
>> HBase table. So the simplest way for parallelization data loading from
>> HBase to Ignite it concurrently scan regions and stream all rows to one or
>> more DataStreamer.
>>
>>
>> On Tue, Oct 11, 2016 at 4:11 PM, Anil <an...@gmail.com> wrote:
>>
>>> HI,
>>>
>>> we have around 18 M records in hbase which needs to be loaded into
>>> ignite cluster.
>>>
>>> i was looking at
>>>
>>> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>>>
>>> https://github.com/apache/ignite/tree/master/examples
>>>
>>> is there any approach where each ignite node loads the data of one hbase
>>> region ?
>>>
>>> Do you have any recommendations ?
>>>
>>> Thanks.
>>>
>>
>>
>

Re: Loading Hbase data into Ignite

Posted by Anil <an...@gmail.com>.

Thank you Vladislav and Andrey. I will look at the document and give a try.

Thanks again.

On 11 October 2016 at 20:47, Andrey Gura <ag...@apache.org> wrote:

> Hi,
>
> HBase regions doesn't map to Ignite nodes due to architectural
> differences. Each HBase region contains rows in some range of keys that
> sorted lexicographically while distribution of keys in Ignite depends on
> affinity function and key hash code. Also how do you remap region to nodes
> in case of region was splitted?
>
> Of course you can get node ID in cluster for given key but because HBase
> keeps rows sorted by keys lexicographically you should perform full scan in
> HBase table. So the simplest way for parallelization data loading from
> HBase to Ignite it concurrently scan regions and stream all rows to one or
> more DataStreamer.
>
>
> On Tue, Oct 11, 2016 at 4:11 PM, Anil <an...@gmail.com> wrote:
>
>> HI,
>>
>> we have around 18 M records in hbase which needs to be loaded into ignite
>> cluster.
>>
>> i was looking at
>>
>> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>>
>> https://github.com/apache/ignite/tree/master/examples
>>
>> is there any approach where each ignite node loads the data of one hbase
>> region ?
>>
>> Do you have any recommendations ?
>>
>> Thanks.
>>
>
>

Re: Loading Hbase data into Ignite

Posted by Andrey Gura <ag...@apache.org>.

Hi,

HBase regions doesn't map to Ignite nodes due to architectural differences.
Each HBase region contains rows in some range of keys that sorted
lexicographically while distribution of keys in Ignite depends on affinity
function and key hash code. Also how do you remap region to nodes in case
of region was splitted?

Of course you can get node ID in cluster for given key but because HBase
keeps rows sorted by keys lexicographically you should perform full scan in
HBase table. So the simplest way for parallelization data loading from
HBase to Ignite it concurrently scan regions and stream all rows to one or
more DataStreamer.

On Tue, Oct 11, 2016 at 4:11 PM, Anil <an...@gmail.com> wrote:

> HI,
>
> we have around 18 M records in hbase which needs to be loaded into ignite
> cluster.
>
> i was looking at
>
> http://apacheignite.gridgain.org/v1.7/docs/data-loading
>
> https://github.com/apache/ignite/tree/master/examples
>
> is there any approach where each ignite node loads the data of one hbase
> region ?
>
> Do you have any recommendations ?
>
> Thanks.
>