You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Feng Honghua (JIRA)" <ji...@apache.org> on 2012/12/05 07:37:00 UTC

[jira] [Created] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Feng Honghua created HBASE-7280:
-----------------------------------

             Summary: TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
                 Key: HBASE-7280
                 URL: https://issues.apache.org/jira/browse/HBASE-7280
             Project: HBase
          Issue Type: Bug
          Components: Replication
    Affects Versions: 0.94.2
            Reporter: Feng Honghua
             Fix For: 0.94.4


in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510993#comment-13510993 ] 

Jieshan Bean commented on HBASE-7280:
-------------------------------------

Yes, this is the expected behavior. In current implementation, backup cluster should create the tables by itself. 
                
> TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7280
>                 URL: https://issues.apache.org/jira/browse/HBASE-7280
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.2
>            Reporter: Feng Honghua
>             Fix For: 0.94.4
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511053#comment-13511053 ] 

Jieshan Bean commented on HBASE-7280:
-------------------------------------

I agree with your suggesion of adding configuration list for each peer. So we need to maitain this list in Zookeeper for each peer. e.g. 
  peer-1 -> table1[fam1, fam2], table2[fam1]
  peer-2 -> table1[fam1]
So the related properties in table is use-less. right? Hope I understand you correctly.
But this will make things more difficult.
 
Change in ReplicationSink seems simple, but master cluster will send some unneccessary edits to peers.

                
> TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7280
>                 URL: https://issues.apache.org/jira/browse/HBASE-7280
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.2
>            Reporter: Feng Honghua
>             Fix For: 0.94.4
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Posted by "Feng Honghua (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511076#comment-13511076 ] 

Feng Honghua commented on HBASE-7280:
-------------------------------------

yes, that's what I hope for the finer-grained cluster replication. for such design by default (without any table/cf configuration) peer receives all the edits from master cluster. Since in real-world scenario, we may have a master cluster, and a backup cluster which need to replicate the whole copy of the master cluster and it receives all edits, but at the same time maybe there are some experiment/down-stream clusters which just need a certain table or even some CF of a table from master cluster. by providing table/cf configurable peer we can enable such scenarios. 

ReplicationSource need to parse out the peer's table/cf configuration on creation, and filter the edits while reading the HLog files to determine which edits needs to be shipped to the corresponding peer. Looks like no more change in peer-side (ReplicationSink), right?

Yes, my current change in ReplicationSink doesn't save the unnecessary edits to peers, but it's enough to unblocks us. A wiser treatment should be in ReplicationSource where we can filter out unnecessary edits before shipping out to peer cluster by checking if the table exists at peer cluster for each edit.
                
> TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7280
>                 URL: https://issues.apache.org/jira/browse/HBASE-7280
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.2
>            Reporter: Feng Honghua
>             Fix For: 0.94.4
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Posted by "Feng Honghua (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511021#comment-13511021 ] 

Feng Honghua commented on HBASE-7280:
-------------------------------------

I can understand the initiative of current design. A master cluster may have multiple tables with REPLICATION_SCOPE=1, but not all peer clusters want to replicate all these tables, current design prevents only replicating selective table(s). In our scenario, I expect peer cluster(sink) can omit the edits for which the table doesn't exist in peer cluster and only apply edits for which the table(s) exist in peer cluster(we really want to replicate). I make a minor change in ReplicationSink.java which just omits edits for non-existing table(s) in peer cluster and the behavior is what we want. Though this change doesn't reduce the needless network bandwidth it's at least doesn't block the normal replication.
Seems current replication's per-cluster granularity is a bit coarse-grained for many real-world scenarios. In my opinion adding such as table- or columnfamily- list configuration for peer when adding peer is more reasonable.
                
> TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7280
>                 URL: https://issues.apache.org/jira/browse/HBASE-7280
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.2
>            Reporter: Feng Honghua
>             Fix For: 0.94.4
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510662#comment-13510662 ] 

Jean-Daniel Cryans commented on HBASE-7280:
-------------------------------------------

This is "by design", if a source cannot replicate one edit then replication is blocked. Apart from better alerting, what do you think HBase should do?
                
> TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7280
>                 URL: https://issues.apache.org/jira/browse/HBASE-7280
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.2
>            Reporter: Feng Honghua
>             Fix For: 0.94.4
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> in cluster replication, if the master cluster have 2 tables which have column-family declared with replication scope = 1, and add a peer cluster which has only 1 table with the same name as the master cluster, in the ReplicationSource (thread in master cluster) for this peer, edits (logs) for both tables will be shipped to the peer, the peer will fail applying the edits due to TableNotFoundException, and this exception will also be responsed to the original shipper (ReplicationSource in master cluster), and the shipper will fall into an endless retry for shipping the failed edits without proceeding to read the remained(newer) log files and to ship following edits(maybe the normal, expected edit for the registered table). the symptom looks like the TableNotFoundException incurs endless retry and blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira