You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by babu prasad <ba...@gmail.com> on 2016/01/15 07:25:42 UTC

Loadcache behavior

I was loading data from AWS Redshift into Ignite through the loadcache API.

cache.loadCache(null, "java.lang.Integer", "select * from lineorder where
lo_orderdate > 19970101");

I ran this from a single client. The query basically ran on all the nodes.
I had 1 client and 2 servers and the same query ran thrice on the database
server.

Is this expected behaviour?

Thanks!

Re: Loadcache behavior

Posted by Babu Prasad <ba...@gmail.com>.
Thanks! This may still not help because my underlying datastore is a warehouse where only batch queries are efficient,
Also, it looks like the key to partition mapping could change if the no of partitions in the aff function , which will result in unnecessary backfills.
I think I might need to go one step forward in the ingestion process and use the datastreamer and consume from my event bus which is kinesis.

Sent from my iPhone

> On Jan 15, 2016, at 5:11 AM, Alexey Goncharuk <al...@gmail.com> wrote:
> 
> Btw,
> 
> Even though the same query is executed on all nodes, Ignite will automatically filter out the keys that does not belong to the local node upon cache loading. When the number of nodes is high, this is not very effective since only a small part of data (roughly K/N, where K is the number of backups + 1, N is the number of nodes in a cluster) will loaded to the cache for each node.
> 
> This may be optimized if you store the partition ID alongside with the data record in your database. Take a look at this topic in the documentation [1]. In this case each node will select only those records that belong to this node, which substantially decreases the database load. Applying this will also allow you to workaround the bug related to running a cache query on client nodes.
> 
> Hope this helps,
> AG
> 
> -------
> [1] https://apacheignite.readme.io/docs/data-loading#section-partition-aware-data-loading

Re: Loadcache behavior

Posted by Alexey Goncharuk <al...@gmail.com>.
Btw,

Even though the same query is executed on all nodes, Ignite will
automatically filter out the keys that does not belong to the local node
upon cache loading. When the number of nodes is high, this is not very
effective since only a small part of data (roughly K/N, where K is the
number of backups + 1, N is the number of nodes in a cluster) will loaded
to the cache for each node.

This may be optimized if you store the partition ID alongside with the data
record in your database. Take a look at this topic in the documentation
[1]. In this case each node will select only those records that belong to
this node, which substantially decreases the database load. Applying this
will also allow you to workaround the bug related to running a cache query
on client nodes.

Hope this helps,
AG

-------
[1]
https://apacheignite.readme.io/docs/data-loading#section-partition-aware-data-loading

Re: Loadcache behavior

Posted by Valentin Kulichenko <va...@gmail.com>.
In this case the best way is to use data streamer. Refer to [1] for details.

[1] https://apacheignite.readme.io/docs/data-loading

-Val

On Fri, Jan 15, 2016 at 1:57 PM, babu prasad <ba...@gmail.com> wrote:

> localloadcache() would still load only a single local cache node right?
> Is there a way I can run a single instance of the query from the client
> and load it in the distributed cache that partitions the result across all
> nodes?
>
> On Fri, Jan 15, 2016 at 1:23 PM, vkulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
>> Vinay,
>>
>> Local cache has the same issue as others - load will be triggered on all
>> nodes. I added a comment in the ticket.
>>
>> As a workaround you can simply call localLoadCache() method instead of
>> loadCache() - this should do the trick.
>>
>> -Val
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-ignite-users.70518.x6.nabble.com/Loadcache-behavior-tp2571p2590.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>

Re: Loadcache behavior

Posted by babu prasad <ba...@gmail.com>.
localloadcache() would still load only a single local cache node right?
Is there a way I can run a single instance of the query from the client and
load it in the distributed cache that partitions the result across all
nodes?

On Fri, Jan 15, 2016 at 1:23 PM, vkulichenko <va...@gmail.com>
wrote:

> Vinay,
>
> Local cache has the same issue as others - load will be triggered on all
> nodes. I added a comment in the ticket.
>
> As a workaround you can simply call localLoadCache() method instead of
> loadCache() - this should do the trick.
>
> -Val
>
>
>
> --
> View this message in context:
> http://apache-ignite-users.70518.x6.nabble.com/Loadcache-behavior-tp2571p2590.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Re: Loadcache behavior

Posted by vkulichenko <va...@gmail.com>.
Vinay,

Local cache has the same issue as others - load will be triggered on all
nodes. I added a comment in the ticket.

As a workaround you can simply call localLoadCache() method instead of
loadCache() - this should do the trick.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Loadcache-behavior-tp2571p2590.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Loadcache behavior

Posted by vinshar <vi...@gmail.com>.
Hi Denis,

How this works in case of caches created on client nodes in LOCAL mode? 

These caches will have all the data on client node and nothing on server, is
this right?

Regards,
Vinay



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Loadcache-behavior-tp2571p2586.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Loadcache behavior

Posted by Denis Magda <dm...@gridgain.com>.
Hi,

The query had to be executed only from server nodes.

I've taken a look at Ignite sources and found out that there is a silly 
bug that includes client nodes to the list of those where a query should 
be executed.
The ticket is created:
https://issues.apache.org/jira/browse/IGNITE-2394

If you're interested you can pick it up and contribute to Ignite ;)

--
Denis

On 1/15/2016 9:25 AM, babu prasad wrote:
> I was loading data from AWS Redshift into Ignite through the loadcache 
> API.
>
> cache.loadCache(null, "java.lang.Integer", "select * from lineorder 
> where lo_orderdate > 19970101");
>
> I ran this from a single client. The query basically ran on all the nodes.
> I had 1 client and 2 servers and the same query ran thrice on the 
> database server.
>
> Is this expected behaviour?
>
> Thanks!