You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Lars George (JIRA)" <ji...@apache.org> on 2011/01/18 21:36:43 UTC

[jira] Created: (HBASE-3452) Report errors better when async flushCommits() fail

Report errors better when async flushCommits() fail
---------------------------------------------------

                 Key: HBASE-3452
                 URL: https://issues.apache.org/jira/browse/HBASE-3452
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.89.20100924
            Reporter: Lars George
            Priority: Minor
             Fix For: 0.90.1


We had the issue where a MapReduce would fail with the following error:

{code}
org.apache.hadoop.hbase.client.RetriesExhaustedException: Still had 3913 puts left after retrying 10 times.
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1526)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:104)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:65)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:512)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:91)
at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:570)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
{code}

which sent us on a wild goose hunt to figure why we could read from a table but not write back into it. We finally checked the server logs and got this:

{code}
2011-01-18 13:47:56,479 WARN org.apache.hadoop.hbase.regionserver.HRegion: No such column family in batch put
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family dim does not exist in region sessions6,,1295358051638.272a4ba588438119f9f866f491a4428c. in table {NAME =
> 'foo', FAMILIES => [{NAME => 'dims', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0'
, COMPRESSION => 'LZO', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
Y => 'false', BLOCKCACHE => 'true'}, ...
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2931)
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamilies(HRegion.java:1683)
at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1356)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1321)
at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1814)
at org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionServer.java:24
79)
{code}

So we had a typo in the colfam and that was reported server side. That error never made it into the task logs, but should have to make this much easier to track down. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3452) Report errors better when async flushCommits() fail

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983383#action_12983383 ] 

Jean-Daniel Cryans commented on HBASE-3452:
-------------------------------------------

I believe this is HBASE-2330?

> Report errors better when async flushCommits() fail
> ---------------------------------------------------
>
>                 Key: HBASE-3452
>                 URL: https://issues.apache.org/jira/browse/HBASE-3452
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Lars George
>            Priority: Minor
>             Fix For: 0.90.1
>
>
> We had the issue where a MapReduce would fail with the following error:
> {code}
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Still had 3913 puts left after retrying 10 times.
> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1526)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535)
> at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:104)
> at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:65)
> at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:512)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:91)
> at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:1)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:570)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
> at org.apache.hadoop.mapred.Child.main(Child.java:211)
> {code}
> which sent us on a wild goose hunt to figure why we could read from a table but not write back into it. We finally checked the server logs and got this:
> {code}
> 2011-01-18 13:47:56,479 WARN org.apache.hadoop.hbase.regionserver.HRegion: No such column family in batch put
> org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family dim does not exist in region sessions6,,1295358051638.272a4ba588438119f9f866f491a4428c. in table {NAME =
> > 'foo', FAMILIES => [{NAME => 'dims', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0'
> , COMPRESSION => 'LZO', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
> Y => 'false', BLOCKCACHE => 'true'}, ...
> at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2931)
> at org.apache.hadoop.hbase.regionserver.HRegion.checkFamilies(HRegion.java:1683)
> at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1356)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1321)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1814)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionServer.java:24
> 79)
> {code}
> So we had a typo in the colfam and that was reported server side. That error never made it into the task logs, but should have to make this much easier to track down. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3452) Report errors better when async flushCommits() fail

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3452:
-------------------------

    Fix Version/s:     (was: 0.90.1)

> Report errors better when async flushCommits() fail
> ---------------------------------------------------
>
>                 Key: HBASE-3452
>                 URL: https://issues.apache.org/jira/browse/HBASE-3452
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Lars George
>            Priority: Minor
>
> We had the issue where a MapReduce would fail with the following error:
> {code}
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Still had 3913 puts left after retrying 10 times.
> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1526)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535)
> at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:104)
> at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:65)
> at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:512)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:91)
> at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:1)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:570)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
> at org.apache.hadoop.mapred.Child.main(Child.java:211)
> {code}
> which sent us on a wild goose hunt to figure why we could read from a table but not write back into it. We finally checked the server logs and got this:
> {code}
> 2011-01-18 13:47:56,479 WARN org.apache.hadoop.hbase.regionserver.HRegion: No such column family in batch put
> org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family dim does not exist in region sessions6,,1295358051638.272a4ba588438119f9f866f491a4428c. in table {NAME =
> > 'foo', FAMILIES => [{NAME => 'dims', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0'
> , COMPRESSION => 'LZO', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
> Y => 'false', BLOCKCACHE => 'true'}, ...
> at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2931)
> at org.apache.hadoop.hbase.regionserver.HRegion.checkFamilies(HRegion.java:1683)
> at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1356)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1321)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1814)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionServer.java:24
> 79)
> {code}
> So we had a typo in the colfam and that was reported server side. That error never made it into the task logs, but should have to make this much easier to track down. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-3452) Report errors better when async flushCommits() fail

Posted by "Lars George (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars George resolved HBASE-3452.
--------------------------------

    Resolution: Duplicate

Dupe of HBASE-2330

> Report errors better when async flushCommits() fail
> ---------------------------------------------------
>
>                 Key: HBASE-3452
>                 URL: https://issues.apache.org/jira/browse/HBASE-3452
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100924
>            Reporter: Lars George
>            Priority: Minor
>             Fix For: 0.90.1
>
>
> We had the issue where a MapReduce would fail with the following error:
> {code}
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Still had 3913 puts left after retrying 10 times.
> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1526)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535)
> at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:104)
> at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:65)
> at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:512)
> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:91)
> at com.lp.sessionized.mapred.HbaseIndexerReducer.reduce(HbaseIndexerReducer.java:1)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:570)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
> at org.apache.hadoop.mapred.Child.main(Child.java:211)
> {code}
> which sent us on a wild goose hunt to figure why we could read from a table but not write back into it. We finally checked the server logs and got this:
> {code}
> 2011-01-18 13:47:56,479 WARN org.apache.hadoop.hbase.regionserver.HRegion: No such column family in batch put
> org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family dim does not exist in region sessions6,,1295358051638.272a4ba588438119f9f866f491a4428c. in table {NAME =
> > 'foo', FAMILIES => [{NAME => 'dims', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0'
> , COMPRESSION => 'LZO', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMOR
> Y => 'false', BLOCKCACHE => 'true'}, ...
> at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2931)
> at org.apache.hadoop.hbase.regionserver.HRegion.checkFamilies(HRegion.java:1683)
> at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1356)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1321)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1814)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionServer.java:24
> 79)
> {code}
> So we had a typo in the colfam and that was reported server side. That error never made it into the task logs, but should have to make this much easier to track down. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.