You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Lars George (JIRA)" <ji...@apache.org> on 2009/09/22 10:32:16 UTC

[jira] Created: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

HBASE-1765 broke MapReduce when using Result.list()
---------------------------------------------------

                 Key: HBASE-1856
                 URL: https://issues.apache.org/jira/browse/HBASE-1856
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Lars George
            Priority: Critical
             Fix For: 0.20.1


Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.

This is the code I use in the reduce():

{code}
   @Override
    protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
        Context context) throws IOException, InterruptedException {
      String skey = Bytes.toString(key.get());
      context.getCounter(CountersTotals.ROWS).increment(1);
      for (Result result : values) {
        for (KeyValue kv: result.list()) {
          try {
            if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
            ...
{code}

Here is the current list() implementation:

{code}
  public List<KeyValue> list() {
    if(this.kvs == null) {
      readFields();
    }
    return isEmpty()? null: Arrays.asList(sorted());
  }
{code}

The problem is that readFields(DataInput) does not clear kvs!

{code}
  public void readFields(final DataInput in)
  throws IOException {
    familyMap = null;
    row = null;
    int totalBuffer = in.readInt();
    if(totalBuffer == 0) {
      bytes = null;
      return;
    }
    byte [] raw = new byte[totalBuffer];
    in.readFully(raw, 0, totalBuffer);
    bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
  }
{code}

The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!

I assume the only change needed is clearing kvs as well:

{code}
  public void readFields(final DataInput in)
  throws IOException {
    familyMap = null;
    row = null;
    kvs = null;
    ....
{code}

I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

Posted by "Lars George (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars George updated HBASE-1856:
-------------------------------

    Affects Version/s:     (was: 0.20.0)
                       0.21.0
                       0.20.1

> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
>                 Key: HBASE-1856
>                 URL: https://issues.apache.org/jira/browse/HBASE-1856
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Lars George
>            Priority: Critical
>             Fix For: 0.20.1
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
>    @Override
>     protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
>         Context context) throws IOException, InterruptedException {
>       String skey = Bytes.toString(key.get());
>       context.getCounter(CountersTotals.ROWS).increment(1);
>       for (Result result : values) {
>         for (KeyValue kv: result.list()) {
>           try {
>             if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
>             ...
> {code}
> Here is the current list() implementation:
> {code}
>   public List<KeyValue> list() {
>     if(this.kvs == null) {
>       readFields();
>     }
>     return isEmpty()? null: Arrays.asList(sorted());
>   }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     int totalBuffer = in.readInt();
>     if(totalBuffer == 0) {
>       bytes = null;
>       return;
>     }
>     byte [] raw = new byte[totalBuffer];
>     in.readFully(raw, 0, totalBuffer);
>     bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
>   }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     kvs = null;
>     ....
> {code}
> I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1856:
-------------------------

    Attachment: 1856.patch

> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
>                 Key: HBASE-1856
>                 URL: https://issues.apache.org/jira/browse/HBASE-1856
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Lars George
>            Priority: Critical
>             Fix For: 0.20.1
>
>         Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
>    @Override
>     protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
>         Context context) throws IOException, InterruptedException {
>       String skey = Bytes.toString(key.get());
>       context.getCounter(CountersTotals.ROWS).increment(1);
>       for (Result result : values) {
>         for (KeyValue kv: result.list()) {
>           try {
>             if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
>             ...
> {code}
> Here is the current list() implementation:
> {code}
>   public List<KeyValue> list() {
>     if(this.kvs == null) {
>       readFields();
>     }
>     return isEmpty()? null: Arrays.asList(sorted());
>   }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     int totalBuffer = in.readInt();
>     if(totalBuffer == 0) {
>       bytes = null;
>       return;
>     }
>     byte [] raw = new byte[totalBuffer];
>     in.readFully(raw, 0, totalBuffer);
>     bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
>   }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     kvs = null;
>     ....
> {code}
> I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758293#action_12758293 ] 

Lars George commented on HBASE-1856:
------------------------------------

+1 with a minor comment on using "this.". famliyMap and row is used without it, so I would do the same for kvs?

> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
>                 Key: HBASE-1856
>                 URL: https://issues.apache.org/jira/browse/HBASE-1856
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Lars George
>            Assignee: Lars George
>            Priority: Critical
>             Fix For: 0.20.1
>
>         Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
>    @Override
>     protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
>         Context context) throws IOException, InterruptedException {
>       String skey = Bytes.toString(key.get());
>       context.getCounter(CountersTotals.ROWS).increment(1);
>       for (Result result : values) {
>         for (KeyValue kv: result.list()) {
>           try {
>             if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
>             ...
> {code}
> Here is the current list() implementation:
> {code}
>   public List<KeyValue> list() {
>     if(this.kvs == null) {
>       readFields();
>     }
>     return isEmpty()? null: Arrays.asList(sorted());
>   }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     int totalBuffer = in.readInt();
>     if(totalBuffer == 0) {
>       bytes = null;
>       return;
>     }
>     byte [] raw = new byte[totalBuffer];
>     in.readFully(raw, 0, totalBuffer);
>     bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
>   }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     kvs = null;
>     ....
> {code}
> I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-1856:
----------------------------

    Assignee: Lars George

Assigning Lars (he found it -- I can't find my 'fix' in hbase-1815 and its not in any local repo here...).

> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
>                 Key: HBASE-1856
>                 URL: https://issues.apache.org/jira/browse/HBASE-1856
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Lars George
>            Assignee: Lars George
>            Priority: Critical
>             Fix For: 0.20.1
>
>         Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
>    @Override
>     protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
>         Context context) throws IOException, InterruptedException {
>       String skey = Bytes.toString(key.get());
>       context.getCounter(CountersTotals.ROWS).increment(1);
>       for (Result result : values) {
>         for (KeyValue kv: result.list()) {
>           try {
>             if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
>             ...
> {code}
> Here is the current list() implementation:
> {code}
>   public List<KeyValue> list() {
>     if(this.kvs == null) {
>       readFields();
>     }
>     return isEmpty()? null: Arrays.asList(sorted());
>   }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     int totalBuffer = in.readInt();
>     if(totalBuffer == 0) {
>       bytes = null;
>       return;
>     }
>     byte [] raw = new byte[totalBuffer];
>     in.readFully(raw, 0, totalBuffer);
>     bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
>   }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     kvs = null;
>     ....
> {code}
> I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758268#action_12758268 ] 

Lars George commented on HBASE-1856:
------------------------------------

Hrmm, can't see it in HBASE-1815 though. Did you mean a different issue?

> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
>                 Key: HBASE-1856
>                 URL: https://issues.apache.org/jira/browse/HBASE-1856
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Lars George
>            Priority: Critical
>             Fix For: 0.20.1
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
>    @Override
>     protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
>         Context context) throws IOException, InterruptedException {
>       String skey = Bytes.toString(key.get());
>       context.getCounter(CountersTotals.ROWS).increment(1);
>       for (Result result : values) {
>         for (KeyValue kv: result.list()) {
>           try {
>             if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
>             ...
> {code}
> Here is the current list() implementation:
> {code}
>   public List<KeyValue> list() {
>     if(this.kvs == null) {
>       readFields();
>     }
>     return isEmpty()? null: Arrays.asList(sorted());
>   }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     int totalBuffer = in.readInt();
>     if(totalBuffer == 0) {
>       bytes = null;
>       return;
>     }
>     byte [] raw = new byte[totalBuffer];
>     in.readFully(raw, 0, totalBuffer);
>     bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
>   }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     kvs = null;
>     ....
> {code}
> I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1856.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed branch and trunk (with Lars's suggestion).

> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
>                 Key: HBASE-1856
>                 URL: https://issues.apache.org/jira/browse/HBASE-1856
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Lars George
>            Assignee: Lars George
>            Priority: Critical
>             Fix For: 0.20.1
>
>         Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
>    @Override
>     protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
>         Context context) throws IOException, InterruptedException {
>       String skey = Bytes.toString(key.get());
>       context.getCounter(CountersTotals.ROWS).increment(1);
>       for (Result result : values) {
>         for (KeyValue kv: result.list()) {
>           try {
>             if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
>             ...
> {code}
> Here is the current list() implementation:
> {code}
>   public List<KeyValue> list() {
>     if(this.kvs == null) {
>       readFields();
>     }
>     return isEmpty()? null: Arrays.asList(sorted());
>   }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     int totalBuffer = in.readInt();
>     if(totalBuffer == 0) {
>       bytes = null;
>       return;
>     }
>     byte [] raw = new byte[totalBuffer];
>     in.readFully(raw, 0, totalBuffer);
>     bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
>   }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     kvs = null;
>     ....
> {code}
> I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1856) HBASE-1765 broke MapReduce when using Result.list()

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758254#action_12758254 ] 

stack commented on HBASE-1856:
------------------------------

Good man Lars.  I found this too and have the fix you describe above in as part of hbase-1815 IIRC.

> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
>                 Key: HBASE-1856
>                 URL: https://issues.apache.org/jira/browse/HBASE-1856
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Lars George
>            Priority: Critical
>             Fix For: 0.20.1
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
>    @Override
>     protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
>         Context context) throws IOException, InterruptedException {
>       String skey = Bytes.toString(key.get());
>       context.getCounter(CountersTotals.ROWS).increment(1);
>       for (Result result : values) {
>         for (KeyValue kv: result.list()) {
>           try {
>             if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
>             ...
> {code}
> Here is the current list() implementation:
> {code}
>   public List<KeyValue> list() {
>     if(this.kvs == null) {
>       readFields();
>     }
>     return isEmpty()? null: Arrays.asList(sorted());
>   }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     int totalBuffer = in.readInt();
>     if(totalBuffer == 0) {
>       bytes = null;
>       return;
>     }
>     byte [] raw = new byte[totalBuffer];
>     in.readFully(raw, 0, totalBuffer);
>     bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
>   }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
>   public void readFields(final DataInput in)
>   throws IOException {
>     familyMap = null;
>     row = null;
>     kvs = null;
>     ....
> {code}
> I'll test that now and report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.