You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by ght230 <gh...@163.com> on 2017/01/11 07:02:51 UTC

How to remove large amount of data from cache

I want to remove the old data from cache once a day, the amout of data is
abount 5 millions.
My main remove processes as below: 	
    	
        SqlFieldsQuery Sql = new SqlFieldsQuery(
        		  "select id, name from Table2 where data = xxxx");

        cursor = Cache.query(Sql);

		Set<Table2Key> Keys = new HashSet<>();
		Table2Key Key = new Table2Key();
		
        for (List<?> row : cursor)
    	{
			Key.setid(Long.valueOf((String) row.get(0)));
			Key.setname((String) row.get(1));
			Keys.add(Key);
    	}
		
	    for (Table2Key Key1 : Keys) {
	    	Cache.remove(Key1);
	    }
		
But about 10 minutes after the start of data removing, the Ignite cluster
will be jammed. 
I can see 
"[SEVERE][tcp-client-disco-sock-writer-#2%null%][TcpDiscoverySpi] Failed to
send message: null
java.io.IOException: Failed to get acknowledge for message:
TcpDiscoveryClientHeartbeatMessage [super=TcpDiscoveryAbstractMessage" from
log.
And I can not connect to the cluster by "ignitevisorcmd".
If the amount of data is relatively small, the above method is OK.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-remove-large-amount-of-data-from-cache-tp10024.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: How to remove large amount of data from cache

Posted by Alexey Kuznetsov <ak...@apache.org>.
Hi,

As far as I know Ignite-1.8 supports delete * from Table where ....

On Thu, Jan 12, 2017 at 1:33 PM, ght230 <gh...@163.com> wrote:

> Because at the same time when I remove the old data(I remove the data
> according to their date), Ignite clusters are still putted in new data and
> process it. I worry about the cursor will take too long time, and it will
> affect the processing of new data.
>
> This code does not need running in transaction.
>
> I use ignite 1.6.8, it does not support "Delete * from table ..." query?
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/How-to-remove-large-amount-of-data-
> from-cache-tp10024p10053.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Alexey Kuznetsov

Re: How to remove large amount of data from cache

Posted by ght230 <gh...@163.com>.
Because at the same time when I remove the old data(I remove the data
according to their date), Ignite clusters are still putted in new data and
process it. I worry about the cursor will take too long time, and it will
affect the processing of new data.

This code does not need running in transaction.

I use ignite 1.6.8, it does not support "Delete * from table ..." query?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-remove-large-amount-of-data-from-cache-tp10024p10053.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: How to remove large amount of data from cache

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

1. Why do you need HashSet here? It looks like you can remove entry
in-place without iterating over hashset.
2. Is this code running in transaction?

If you use ignite 1.8, you can run "Delete * from table ..." query. See
http://apacheignite.gridgain.org/docs/dml#section-delete


On Wed, Jan 11, 2017 at 10:02 AM, ght230 <gh...@163.com> wrote:

> I want to remove the old data from cache once a day, the amout of data is
> abount 5 millions.
> My main remove processes as below:
>
>         SqlFieldsQuery Sql = new SqlFieldsQuery(
>                           "select id, name from Table2 where data = xxxx");
>
>         cursor = Cache.query(Sql);
>
>                 Set<Table2Key> Keys = new HashSet<>();
>                 Table2Key Key = new Table2Key();
>
>         for (List<?> row : cursor)
>         {
>                         Key.setid(Long.valueOf((String) row.get(0)));
>                         Key.setname((String) row.get(1));
>                         Keys.add(Key);
>         }
>
>             for (Table2Key Key1 : Keys) {
>                 Cache.remove(Key1);
>             }
>
> But about 10 minutes after the start of data removing, the Ignite cluster
> will be jammed.
> I can see
> "[SEVERE][tcp-client-disco-sock-writer-#2%null%][TcpDiscoverySpi] Failed
> to
> send message: null
> java.io.IOException: Failed to get acknowledge for message:
> TcpDiscoveryClientHeartbeatMessage [super=TcpDiscoveryAbstractMessage"
> from
> log.
> And I can not connect to the cluster by "ignitevisorcmd".
> If the amount of data is relatively small, the above method is OK.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/How-to-remove-large-amount-of-data-
> from-cache-tp10024.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
С уважением,
Машенков Андрей Владимирович
Тел. +7-921-932-61-82

Best regards,
Andrey V. Mashenkov
Cerr: +7-921-932-61-82

Re: How to remove large amount of data from cache

Posted by vkulichenko <va...@gmail.com>.
Can you try to upgrade to 1.8 and check how 'delete from ...' query performs
in your case? From what I hear, this is the most appropriate solution.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-remove-large-amount-of-data-from-cache-tp10024p10077.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: How to remove large amount of data from cache

Posted by ght230 <gh...@163.com>.
I have considered expiration, but my removing logic also include other
conditions beside date.

BTW, when using OffHeap with ExpirePolicy, it seems have bugs about
utilization too many memory.
https://issues.apache.org/jira/browse/IGNITE-3840 and
https://issues.apache.org/jira/browse/IGNITE-3948.
I do not know whether these bugs has been fixed, because I just use offHeap
memory.

I have tried to use removeall, but I found it seems will copy the data need
to be removed from offHeap to Heap when removeall started, it will utilizate
too many memory.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-remove-large-amount-of-data-from-cache-tp10024p10068.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: How to remove large amount of data from cache

Posted by vkulichenko <va...@gmail.com>.
Yes, by batching I mean removeAll or IgniteDataStreamer. However, I would
consider using expiration, I think it can really help you.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-remove-large-amount-of-data-from-cache-tp10024p10063.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: How to remove large amount of data from cache

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

It looks like you try to remove entries from single node. At first you
collect SQL query result from all nodes and then perform delete operations.

Also,  you can try to run a job on grid that would remove local entries on
each node. SqlFieldsQuery.setLocal(true) can be used to make query run
locally.
This should reduce network traffic.

The best way is to use Expire policy as Valentine already mentioned above.
Why this way is not work for you?

On Thu, Jan 12, 2017 at 9:54 AM, ght230 <gh...@163.com> wrote:

> Yes, it is very slow.
>
> You said "use batching" refer to using removeall instead of remove?
>
> The memoryMode of my cache is "OFFHEAP_TIERED", I found if I use removeall,
> it will occupies 2 times the memory.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/How-to-remove-large-amount-of-data-
> from-cache-tp10024p10055.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
С уважением,
Машенков Андрей Владимирович
Тел. +7-921-932-61-82

Best regards,
Andrey V. Mashenkov
Cerr: +7-921-932-61-82

Re: How to remove large amount of data from cache

Posted by ght230 <gh...@163.com>.
Yes, it is very slow. 

You said "use batching" refer to using removeall instead of remove? 

The memoryMode of my cache is "OFFHEAP_TIERED", I found if I use removeall,
it will occupies 2 times the memory.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-remove-large-amount-of-data-from-cache-tp10024p10055.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: How to remove large amount of data from cache

Posted by vkulichenko <va...@gmail.com>.
You're removing one by one and each remove is synchronous distributed
operation, so this is going to be very slow. You should use batching or even
IgniteDataStreamer for this. BTW, maybe you can utilize automatic expiration
[1] for this?

As for jammed cluster, this is weird. The first thing to check is memory
consumption - are you running out of heap?

[1] https://apacheignite.readme.io/docs/expiry-policies

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-remove-large-amount-of-data-from-cache-tp10024p10038.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.