You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Lars George (JIRA)" <ji...@apache.org> on 2009/09/22 10:32:16 UTC
[jira] Created: (HBASE-1856) HBASE-1765 broke MapReduce when using
Result.list()
HBASE-1765 broke MapReduce when using Result.list()
---------------------------------------------------
Key: HBASE-1856
URL: https://issues.apache.org/jira/browse/HBASE-1856
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.20.0
Reporter: Lars George
Priority: Critical
Fix For: 0.20.1
Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
This is the code I use in the reduce():
{code}
@Override
protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
Context context) throws IOException, InterruptedException {
String skey = Bytes.toString(key.get());
context.getCounter(CountersTotals.ROWS).increment(1);
for (Result result : values) {
for (KeyValue kv: result.list()) {
try {
if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
...
{code}
Here is the current list() implementation:
{code}
public List<KeyValue> list() {
if(this.kvs == null) {
readFields();
}
return isEmpty()? null: Arrays.asList(sorted());
}
{code}
The problem is that readFields(DataInput) does not clear kvs!
{code}
public void readFields(final DataInput in)
throws IOException {
familyMap = null;
row = null;
int totalBuffer = in.readInt();
if(totalBuffer == 0) {
bytes = null;
return;
}
byte [] raw = new byte[totalBuffer];
in.readFully(raw, 0, totalBuffer);
bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
}
{code}
The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
I assume the only change needed is clearing kvs as well:
{code}
public void readFields(final DataInput in)
throws IOException {
familyMap = null;
row = null;
kvs = null;
....
{code}
I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1856) HBASE-1765 broke MapReduce when using
Result.list()
Posted by "Lars George (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars George updated HBASE-1856:
-------------------------------
Affects Version/s: (was: 0.20.0)
0.21.0
0.20.1
> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
> Key: HBASE-1856
> URL: https://issues.apache.org/jira/browse/HBASE-1856
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0
> Reporter: Lars George
> Priority: Critical
> Fix For: 0.20.1
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
> @Override
> protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
> Context context) throws IOException, InterruptedException {
> String skey = Bytes.toString(key.get());
> context.getCounter(CountersTotals.ROWS).increment(1);
> for (Result result : values) {
> for (KeyValue kv: result.list()) {
> try {
> if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
> ...
> {code}
> Here is the current list() implementation:
> {code}
> public List<KeyValue> list() {
> if(this.kvs == null) {
> readFields();
> }
> return isEmpty()? null: Arrays.asList(sorted());
> }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> int totalBuffer = in.readInt();
> if(totalBuffer == 0) {
> bytes = null;
> return;
> }
> byte [] raw = new byte[totalBuffer];
> in.readFully(raw, 0, totalBuffer);
> bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
> }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> kvs = null;
> ....
> {code}
> I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1856) HBASE-1765 broke MapReduce when using
Result.list()
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1856:
-------------------------
Attachment: 1856.patch
> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
> Key: HBASE-1856
> URL: https://issues.apache.org/jira/browse/HBASE-1856
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0
> Reporter: Lars George
> Priority: Critical
> Fix For: 0.20.1
>
> Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
> @Override
> protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
> Context context) throws IOException, InterruptedException {
> String skey = Bytes.toString(key.get());
> context.getCounter(CountersTotals.ROWS).increment(1);
> for (Result result : values) {
> for (KeyValue kv: result.list()) {
> try {
> if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
> ...
> {code}
> Here is the current list() implementation:
> {code}
> public List<KeyValue> list() {
> if(this.kvs == null) {
> readFields();
> }
> return isEmpty()? null: Arrays.asList(sorted());
> }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> int totalBuffer = in.readInt();
> if(totalBuffer == 0) {
> bytes = null;
> return;
> }
> byte [] raw = new byte[totalBuffer];
> in.readFully(raw, 0, totalBuffer);
> bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
> }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> kvs = null;
> ....
> {code}
> I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1856) HBASE-1765 broke MapReduce when
using Result.list()
Posted by "Lars George (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758293#action_12758293 ]
Lars George commented on HBASE-1856:
------------------------------------
+1 with a minor comment on using "this.". famliyMap and row is used without it, so I would do the same for kvs?
> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
> Key: HBASE-1856
> URL: https://issues.apache.org/jira/browse/HBASE-1856
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0
> Reporter: Lars George
> Assignee: Lars George
> Priority: Critical
> Fix For: 0.20.1
>
> Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
> @Override
> protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
> Context context) throws IOException, InterruptedException {
> String skey = Bytes.toString(key.get());
> context.getCounter(CountersTotals.ROWS).increment(1);
> for (Result result : values) {
> for (KeyValue kv: result.list()) {
> try {
> if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
> ...
> {code}
> Here is the current list() implementation:
> {code}
> public List<KeyValue> list() {
> if(this.kvs == null) {
> readFields();
> }
> return isEmpty()? null: Arrays.asList(sorted());
> }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> int totalBuffer = in.readInt();
> if(totalBuffer == 0) {
> bytes = null;
> return;
> }
> byte [] raw = new byte[totalBuffer];
> in.readFully(raw, 0, totalBuffer);
> bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
> }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> kvs = null;
> ....
> {code}
> I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-1856) HBASE-1765 broke MapReduce when using
Result.list()
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack reassigned HBASE-1856:
----------------------------
Assignee: Lars George
Assigning Lars (he found it -- I can't find my 'fix' in hbase-1815 and its not in any local repo here...).
> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
> Key: HBASE-1856
> URL: https://issues.apache.org/jira/browse/HBASE-1856
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0
> Reporter: Lars George
> Assignee: Lars George
> Priority: Critical
> Fix For: 0.20.1
>
> Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
> @Override
> protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
> Context context) throws IOException, InterruptedException {
> String skey = Bytes.toString(key.get());
> context.getCounter(CountersTotals.ROWS).increment(1);
> for (Result result : values) {
> for (KeyValue kv: result.list()) {
> try {
> if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
> ...
> {code}
> Here is the current list() implementation:
> {code}
> public List<KeyValue> list() {
> if(this.kvs == null) {
> readFields();
> }
> return isEmpty()? null: Arrays.asList(sorted());
> }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> int totalBuffer = in.readInt();
> if(totalBuffer == 0) {
> bytes = null;
> return;
> }
> byte [] raw = new byte[totalBuffer];
> in.readFully(raw, 0, totalBuffer);
> bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
> }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> kvs = null;
> ....
> {code}
> I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1856) HBASE-1765 broke MapReduce when
using Result.list()
Posted by "Lars George (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758268#action_12758268 ]
Lars George commented on HBASE-1856:
------------------------------------
Hrmm, can't see it in HBASE-1815 though. Did you mean a different issue?
> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
> Key: HBASE-1856
> URL: https://issues.apache.org/jira/browse/HBASE-1856
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0
> Reporter: Lars George
> Priority: Critical
> Fix For: 0.20.1
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
> @Override
> protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
> Context context) throws IOException, InterruptedException {
> String skey = Bytes.toString(key.get());
> context.getCounter(CountersTotals.ROWS).increment(1);
> for (Result result : values) {
> for (KeyValue kv: result.list()) {
> try {
> if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
> ...
> {code}
> Here is the current list() implementation:
> {code}
> public List<KeyValue> list() {
> if(this.kvs == null) {
> readFields();
> }
> return isEmpty()? null: Arrays.asList(sorted());
> }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> int totalBuffer = in.readInt();
> if(totalBuffer == 0) {
> bytes = null;
> return;
> }
> byte [] raw = new byte[totalBuffer];
> in.readFully(raw, 0, totalBuffer);
> bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
> }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> kvs = null;
> ....
> {code}
> I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-1856) HBASE-1765 broke MapReduce when using
Result.list()
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-1856.
--------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Committed branch and trunk (with Lars's suggestion).
> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
> Key: HBASE-1856
> URL: https://issues.apache.org/jira/browse/HBASE-1856
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0
> Reporter: Lars George
> Assignee: Lars George
> Priority: Critical
> Fix For: 0.20.1
>
> Attachments: 1856.patch
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
> @Override
> protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
> Context context) throws IOException, InterruptedException {
> String skey = Bytes.toString(key.get());
> context.getCounter(CountersTotals.ROWS).increment(1);
> for (Result result : values) {
> for (KeyValue kv: result.list()) {
> try {
> if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
> ...
> {code}
> Here is the current list() implementation:
> {code}
> public List<KeyValue> list() {
> if(this.kvs == null) {
> readFields();
> }
> return isEmpty()? null: Arrays.asList(sorted());
> }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> int totalBuffer = in.readInt();
> if(totalBuffer == 0) {
> bytes = null;
> return;
> }
> byte [] raw = new byte[totalBuffer];
> in.readFully(raw, 0, totalBuffer);
> bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
> }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> kvs = null;
> ....
> {code}
> I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1856) HBASE-1765 broke MapReduce when
using Result.list()
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758254#action_12758254 ]
stack commented on HBASE-1856:
------------------------------
Good man Lars. I found this too and have the fix you describe above in as part of hbase-1815 IIRC.
> HBASE-1765 broke MapReduce when using Result.list()
> ---------------------------------------------------
>
> Key: HBASE-1856
> URL: https://issues.apache.org/jira/browse/HBASE-1856
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1, 0.21.0
> Reporter: Lars George
> Priority: Critical
> Fix For: 0.20.1
>
>
> Not sure if it is just me, but using MR over HBase employing a TableReducer is not working. After the first row is read all subsequent rows get the same Result's of that very first row. After tracing this from the Map phase I found the culprit in Result and the HBASE-1765 delayed field parsing change.
> This is the code I use in the reduce():
> {code}
> @Override
> protected void reduce(ImmutableBytesWritable key, Iterable<Result> values,
> Context context) throws IOException, InterruptedException {
> String skey = Bytes.toString(key.get());
> context.getCounter(CountersTotals.ROWS).increment(1);
> for (Result result : values) {
> for (KeyValue kv: result.list()) {
> try {
> if (LOG.isDebugEnabled()) LOG.debug("reduce: key -> " + skey + ", kv -> " + kv);
> ...
> {code}
> Here is the current list() implementation:
> {code}
> public List<KeyValue> list() {
> if(this.kvs == null) {
> readFields();
> }
> return isEmpty()? null: Arrays.asList(sorted());
> }
> {code}
> The problem is that readFields(DataInput) does not clear kvs!
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> int totalBuffer = in.readInt();
> if(totalBuffer == 0) {
> bytes = null;
> return;
> }
> byte [] raw = new byte[totalBuffer];
> in.readFully(raw, 0, totalBuffer);
> bytes = new ImmutableBytesWritable(raw, 0, totalBuffer);
> }
> {code}
> The above is called by the MR framework's WritableSerialization for each map output. But since "kvs" is already set "list()" returns the old data!
> I assume the only change needed is clearing kvs as well:
> {code}
> public void readFields(final DataInput in)
> throws IOException {
> familyMap = null;
> row = null;
> kvs = null;
> ....
> {code}
> I'll test that now and report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.