You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Muhammad Adel (JIRA)" <ji...@apache.org> on 2014/05/11 00:05:00 UTC

[jira] [Comment Edited] (CASSANDRA-6998) HintedHandoff - expired hints may block future hints deliveries

    [ https://issues.apache.org/jira/browse/CASSANDRA-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994200#comment-13994200 ] 

Muhammad Adel edited comment on CASSANDRA-6998 at 5/10/14 10:20 AM:
--------------------------------------------------------------------

Sorry if I didn't make myself clear enough. I will try to explain what this patch does:

1-Normally, the page size passed to SliceQueryFilter represents number of live columns returned. The SliceQueryFilter returns a mix of live columns and tombstones which are not eligible for garbage collection yet. If the number of tombstones seen during constructing the query result exceeds the threshold, an exception is thrown. 

2-When retrieving hinted handoff data, use the SliceQueryFilter in a different way. The page size will represent the total number of columns returned: live columns count + tombstones count. Even tombstones that are eligible for garbage collection are returned. The check for the tombstones threshold is removed. No memory overload will take place since the size of data processed and returned is controlled and predictable (exactly the page size).

3-This will send all data in the hinted handoff table to the recovering node allowing for higher data consistency even if the node was down for a time larger than the gc_grace period. On the other hand it can cause a higher network load.

This is a suggested way to deal with the issue of consistency of deleted hinted handoff data. It is not just an addressing for the exception problem, because I think the exception problem is a symptom for the more delicate issue of deleted hinted handoff. 


was (Author: muhammadadel):
Sorry if I didn't make myself clear enough. I will try to explain what this patch does:

1-Normally, the page size passed to SliceQueryFilter represents number of live columns returned. The SliceQueryFilter returns a mix of live columns and tombstones which are not eligible for garbage collection yet. If the number of tombstones seen during constructing the query result exceeds the threshold, an exception is thrown. 

2-When retrieving hinted handoff data, use the SliceQueryFilter in a different way. The page size will represent the total number of columns returned, live columns count + tombstones count. Even tombstones that are eligible for garbage collection are returned. The check for the tombstones threshold is removed. No memory overload will take place since the size of data processed and returned is controlled and predictable (exactly the page size).

3-This will send all data in the hinted handoff table to the recovering node allowing for higher data consistency even if the node was down for a time larger than the gc_grace period. On the other hand it can cause a higher network load.

This is a suggested way to deal with the issue of consistency of deleted hinted handoff data. It is not just an addressing for the exception problem, because I think the exception problem is a symptom for the more delicate issue of deleted hinted handoff. 

> HintedHandoff - expired hints may block future hints deliveries
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-6998
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6998
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: - cluster of two DCs: DC1, DC2
> - keyspace using NetworkTopologyStrategy (replication factors for both DCs)
> - heavy load (write:read, 100:1) with LOCAL_QUORUM using Java driver setup with DC awareness, writing to DC1
>            Reporter: Scooletz
>              Labels: HintedHandoff, TTL
>             Fix For: 2.0.3
>
>         Attachments: 6998
>
>
> For tests purposes, DC2 was shut down for 1 day. The _hints_ table was filled with millions of rows. Now, when _HintedHandOffManager_ tries to _doDeliverHintsToEndpoint_  it queries the store with QueryFilter.getSliceFilter which counts deleted (TTLed) cells and throws org.apache.cassandra.db.filter.TombstoneOverwhelmingException. 
> Throwing this exception stops the manager from running compaction as it is run only after successful handoff. This leaves the HH practically disabled till administrator runs truncateAllHints. 
> Wouldn't it be nicer if on org.apache.cassandra.db.filter.TombstoneOverwhelmingException run compaction? That would remove TTLed hints leaving whole HH mechanism in a healthy state.
> The stacktrace is:
> {quote}
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> 	at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> 	at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
> 	at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
> 	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
> 	at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
> 	at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
> 	at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
> 	at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)