You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Lars George (JIRA)" <ji...@apache.org> on 2009/11/10 12:20:27 UTC

[jira] Created: (HBASE-1969) HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework

HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework
----------------------------------------------------------------------------------------

                 Key: HBASE-1969
                 URL: https://issues.apache.org/jira/browse/HBASE-1969
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.1
            Reporter: Lars George


The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete instances to the TableOutputFormat. So the explicit Put reference was changed to Writable in the process. But that does not work as expected:

{code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id : attempt_200911031030_0004_m_000013_2, Status : FAILED
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140)
        at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code}

The issue is that the MapReduce framework checks not polymorphic for the type using "instanceof" but with a direct class comparison. In MapTask.java you find this code

{code}
    public synchronized void collect(K key, V value, int partition
                                     ) throws IOException {
      reporter.progress();
      if (key.getClass() != keyClass) {
        throw new IOException("Type mismatch in key from map: expected "
                              + keyClass.getName() + ", recieved "
                              + key.getClass().getName());
      }
      if (value.getClass() != valClass) {
        throw new IOException("Type mismatch in value from map: expected "
                              + valClass.getName() + ", recieved "
                              + value.getClass().getName());
      }
      ... {code}

So it does not work using a Writable as the MapOutputValueClass for the job and then hand in a Put or Delete! The test case TestMapReduce did not pick this up as it has this line in it

{code}
      TableMapReduceUtil.initTableMapperJob(
        Bytes.toString(table.getTableName()), scan,
        ProcessContentsMapper.class, ImmutableBytesWritable.class, 
        Put.class, job);{code}

which sets the value class to Put

{code}if (outputValueClass != null) job.setMapOutputValueClass(outputValueClass);{code}

To fix this (for now) one can set the class to Put the same way or explicitly in their code 

{code}job.setMapOutputValueClass(Put.class);{code}
 
But the whole idea only seems feasable if a) the Hadoop class is amended to use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a combined class that represent a Put *and* a Delete - which seems somewhat wrong, but doable. It would only really find use in that context and would require the user to make use of it when calling context.write(). This is making things not easier to learn.

Suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1969) HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780199#action_12780199 ] 

Lars George commented on HBASE-1969:
------------------------------------

I thought about this. Could we create a class Change that is the base for Put and Delete and implements a Writable as well, which only carries the changes, or in other words KeyValue's with only as much additional meta data that is required to batch execute them on a table?

Then we would have to either 

a.1) add a Put.getChange() / Delete.getChange() 

a.2) implement Change as the base class for Put and Delete

a.3) have a Change change = new Change(put) so you do context.write(key, new Change(put));

Then on the table we 

b.1) either recreate the base classes from the Change instance and apply them as is done right now

b.2) add support for Change to the HTable class so it can batch it and with the KV types knows what to do.


Alternatively we can add a Change() class only to the mapreduce package that takes a Put or Delete as the constructor so you also do

context.write(key, new Change(put));

and it transports the payload (a Put or Delete) to the TableOutputFormat where it is then converted back and applied. Seems the least intrusive alternative.

What say you?



> HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-1969
>                 URL: https://issues.apache.org/jira/browse/HBASE-1969
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Lars George
>
> The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete instances to the TableOutputFormat. So the explicit Put reference was changed to Writable in the process. But that does not work as expected:
> {code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id : attempt_200911031030_0004_m_000013_2, Status : FAILED
> java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
>         at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140)
>         at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code}
> The issue is that the MapReduce framework checks not polymorphic for the type using "instanceof" but with a direct class comparison. In MapTask.java you find this code
> {code}
>     public synchronized void collect(K key, V value, int partition
>                                      ) throws IOException {
>       reporter.progress();
>       if (key.getClass() != keyClass) {
>         throw new IOException("Type mismatch in key from map: expected "
>                               + keyClass.getName() + ", recieved "
>                               + key.getClass().getName());
>       }
>       if (value.getClass() != valClass) {
>         throw new IOException("Type mismatch in value from map: expected "
>                               + valClass.getName() + ", recieved "
>                               + value.getClass().getName());
>       }
>       ... {code}
> So it does not work using a Writable as the MapOutputValueClass for the job and then hand in a Put or Delete! The test case TestMapReduce did not pick this up as it has this line in it
> {code}
>       TableMapReduceUtil.initTableMapperJob(
>         Bytes.toString(table.getTableName()), scan,
>         ProcessContentsMapper.class, ImmutableBytesWritable.class, 
>         Put.class, job);{code}
> which sets the value class to Put
> {code}if (outputValueClass != null) job.setMapOutputValueClass(outputValueClass);{code}
> To fix this (for now) one can set the class to Put the same way or explicitly in their code 
> {code}job.setMapOutputValueClass(Put.class);{code}
>  
> But the whole idea only seems feasable if a) the Hadoop class is amended to use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a combined class that represent a Put *and* a Delete - which seems somewhat wrong, but doable. It would only really find use in that context and would require the user to make use of it when calling context.write(). This is making things not easier to learn.
> Suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1969) HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782333#action_12782333 ] 

Lars George commented on HBASE-1969:
------------------------------------

I agree, I would probably call it "Mutation" (following terminology from the BigTable paper). Get is different and can be kept separate. With that we could kill many birds with one stone, fixing the MR issue here, be able to implement batch mutations and do atomic row mutations comprising Put and Delete operations. BTW, BigTable has that

{code}
// Open the table
Table *T = OpenOrDie("/bigtable/web/webtable");

// Write a new anchor and delete an old anchor
RowMutation r1(T, "com.cnn.www");
r1.Set("anchor:www.c-span.org", "CNN");
r1.Delete("anchor:www.abc.com");
Operation op;
Apply(&op, &r1);
{code}



> HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-1969
>                 URL: https://issues.apache.org/jira/browse/HBASE-1969
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Lars George
>
> The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete instances to the TableOutputFormat. So the explicit Put reference was changed to Writable in the process. But that does not work as expected:
> {code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id : attempt_200911031030_0004_m_000013_2, Status : FAILED
> java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
>         at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140)
>         at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code}
> The issue is that the MapReduce framework checks not polymorphic for the type using "instanceof" but with a direct class comparison. In MapTask.java you find this code
> {code}
>     public synchronized void collect(K key, V value, int partition
>                                      ) throws IOException {
>       reporter.progress();
>       if (key.getClass() != keyClass) {
>         throw new IOException("Type mismatch in key from map: expected "
>                               + keyClass.getName() + ", recieved "
>                               + key.getClass().getName());
>       }
>       if (value.getClass() != valClass) {
>         throw new IOException("Type mismatch in value from map: expected "
>                               + valClass.getName() + ", recieved "
>                               + value.getClass().getName());
>       }
>       ... {code}
> So it does not work using a Writable as the MapOutputValueClass for the job and then hand in a Put or Delete! The test case TestMapReduce did not pick this up as it has this line in it
> {code}
>       TableMapReduceUtil.initTableMapperJob(
>         Bytes.toString(table.getTableName()), scan,
>         ProcessContentsMapper.class, ImmutableBytesWritable.class, 
>         Put.class, job);{code}
> which sets the value class to Put
> {code}if (outputValueClass != null) job.setMapOutputValueClass(outputValueClass);{code}
> To fix this (for now) one can set the class to Put the same way or explicitly in their code 
> {code}job.setMapOutputValueClass(Put.class);{code}
>  
> But the whole idea only seems feasable if a) the Hadoop class is amended to use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a combined class that represent a Put *and* a Delete - which seems somewhat wrong, but doable. It would only really find use in that context and would require the user to make use of it when calling context.write(). This is making things not easier to learn.
> Suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1969) HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782203#action_12782203 ] 

stack commented on HBASE-1969:
------------------------------

Lars, on the Change class (or Edit or Update, whatever we call it), yes, I think we need something like this not only for here but also for a combined multi-get/put/delete.  Ryan has also talked about being able (again) to do Delete and Puts on same row as part of same operation.  We'd need something like this for that too.

In old designs for hbase-880 (? is this the new API issue?), Delete and Put had same base class.   Get had different ancestory.

We don't need to do Gets in this context... and maybe in the batch multi case, the multi-Get could remain a special case; i.e. you would be able to mix Deletes and Puts in a batch but not Gets?



> HBASE-1626 does not work as advertised due to lack of "instanceof" check in MR framework
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-1969
>                 URL: https://issues.apache.org/jira/browse/HBASE-1969
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Lars George
>
> The issue that HBASE-1626 tried to fix is that we can hand in Put or Delete instances to the TableOutputFormat. So the explicit Put reference was changed to Writable in the process. But that does not work as expected:
> {code}09/11/04 13:35:56 INFO mapred.JobClient: Task Id : attempt_200911031030_0004_m_000013_2, Status : FAILED
> java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Writable, recieved org.apache.hadoop.hbase.client.Put
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:812)
>         at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:140)
>         at com.worldlingo.hadoop.mapred.RestoreTable$RestoreMapper.map(RestoreTable.java:69)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305){code}
> The issue is that the MapReduce framework checks not polymorphic for the type using "instanceof" but with a direct class comparison. In MapTask.java you find this code
> {code}
>     public synchronized void collect(K key, V value, int partition
>                                      ) throws IOException {
>       reporter.progress();
>       if (key.getClass() != keyClass) {
>         throw new IOException("Type mismatch in key from map: expected "
>                               + keyClass.getName() + ", recieved "
>                               + key.getClass().getName());
>       }
>       if (value.getClass() != valClass) {
>         throw new IOException("Type mismatch in value from map: expected "
>                               + valClass.getName() + ", recieved "
>                               + value.getClass().getName());
>       }
>       ... {code}
> So it does not work using a Writable as the MapOutputValueClass for the job and then hand in a Put or Delete! The test case TestMapReduce did not pick this up as it has this line in it
> {code}
>       TableMapReduceUtil.initTableMapperJob(
>         Bytes.toString(table.getTableName()), scan,
>         ProcessContentsMapper.class, ImmutableBytesWritable.class, 
>         Put.class, job);{code}
> which sets the value class to Put
> {code}if (outputValueClass != null) job.setMapOutputValueClass(outputValueClass);{code}
> To fix this (for now) one can set the class to Put the same way or explicitly in their code 
> {code}job.setMapOutputValueClass(Put.class);{code}
>  
> But the whole idea only seems feasable if a) the Hadoop class is amended to use "instanceof" instead (lodge Hadoop MapRed JIRA issue?) or b) we have a combined class that represent a Put *and* a Delete - which seems somewhat wrong, but doable. It would only really find use in that context and would require the user to make use of it when calling context.write(). This is making things not easier to learn.
> Suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.