You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jai Kumar Singh (Created) (JIRA)" <ji...@apache.org> on 2012/01/10 11:24:38 UTC
[jira] [Created] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
----------------------------------------------------------------------
Key: HBASE-5166
URL: https://issues.apache.org/jira/browse/HBASE-5166
Project: HBase
Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jai Kumar Singh updated HBASE-5166:
-----------------------------------
Attachment: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
This is the implementation I am using currently for Multithreadedtablemapper which is a modification of MultithreadedMapper from org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.java
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201439#comment-13201439 ]
Zhihong Yu commented on HBASE-5166:
-----------------------------------
MultithreadedTableMapper misses Apache license
{code}
+ while(!executor.isTerminated()){
+ // wait till all the threads are done
+ }
{code}
We should put sleep() in the above loop and possibly limit the total duration of wait.
A new unit test should be added for MultithreadedTableMapper.
Please look at tests that use TableMapper.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-5166:
-------------------------
Status: Patch Available (was: Open)
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jai Kumar Singh updated HBASE-5166:
-----------------------------------
Status: Patch Available (was: Open)
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212501#comment-13212501 ]
Hadoop QA commented on HBASE-5166:
----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515340/0005-HBASE-5166-Added-MultithreadedTableMapper.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 8 new or modified tests.
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/997//console
This message is automatically generated.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213243#comment-13213243 ]
Zhihong Yu commented on HBASE-5166:
-----------------------------------
My recommendation of using review board is to leave Bugs field empty. Otherwise large amount of post-back from review board would appear in the JIRA.
You can specify hbase in Groups field.
My user name is tedyu.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214351#comment-13214351 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
bq. On 2012-02-23 04:32:03, Michael Stack wrote:
bq. > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.
bq.
bq. Jai Singh wrote:
bq. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.
bq.
bq. Michael Stack wrote:
bq. So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it.
Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai.
- Michael
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------
On 2012-02-23 04:22:51, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-23 04:22:51)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213328#comment-13213328 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5266
-----------------------------------------------------------
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11506>
"hbase.mapreduce." prefix should be kept.
Would "hbase.mapreduce.multithreadedmapper.class" be a good name ?
- Ted
On 2012-02-22 03:22:25, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-22 03:22:25)
bq.
bq.
bq. Review request for Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. This addresses bug HBASE-5166.
bq. https://issues.apache.org/jira/browse/HBASE-5166
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216333#comment-13216333 ]
Hudson commented on HBASE-5166:
-------------------------------
Integrated in HBase-TRUNK #2669 (See [https://builds.apache.org/job/HBase-TRUNK/2669/])
HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098)
Result = SUCCESS
stack :
Files :
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Fix For: 0.94.0
>
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213238#comment-13213238 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------
Review request for Michael Stack.
Summary
-------
There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
This addresses bug HBASE-5166.
https://issues.apache.org/jira/browse/HBASE-5166
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
Diff: https://reviews.apache.org/r/3995/diff
Testing
-------
Thanks,
Jai
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214912#comment-13214912 ]
Hadoop QA commented on HBASE-5166:
----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515764/5166-v9.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated -134 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestImportTsv
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//console
This message is automatically generated.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214342#comment-13214342 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
bq. On 2012-02-23 04:32:03, Michael Stack wrote:
bq. > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.
bq.
bq. Jai Singh wrote:
bq. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.
So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it.
- Michael
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------
On 2012-02-23 04:22:51, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-23 04:22:51)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215432#comment-13215432 ]
stack commented on HBASE-5166:
------------------------------
@Jai Its not you. Those are known failing tests. Let me commit.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212555#comment-13212555 ]
Jai Kumar Singh commented on HBASE-5166:
----------------------------------------
submitted a new patch against current trunk on svn.
Thanks
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jai Kumar Singh updated HBASE-5166:
-----------------------------------
Attachment: 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
Added Thread.sleep() and license thing and testcase
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jai Kumar Singh updated HBASE-5166:
-----------------------------------
Attachment: 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
Modified patch
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201433#comment-13201433 ]
Jai Kumar Singh commented on HBASE-5166:
----------------------------------------
Any comments ??
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213418#comment-13213418 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------
(Updated 2012-02-22 07:20:13.121177)
Review request for hbase, Ted Yu and Michael Stack.
Summary
-------
There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
Diff: https://reviews.apache.org/r/3995/diff
Testing
-------
Thanks,
Jai
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213367#comment-13213367 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------
(Updated 2012-02-22 06:00:23.473596)
Review request for hbase and Michael Stack.
Summary
-------
There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
This addresses bug HBASE-5166.
https://issues.apache.org/jira/browse/HBASE-5166
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
Diff: https://reviews.apache.org/r/3995/diff
Testing
-------
Thanks,
Jai
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213793#comment-13213793 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5268
-----------------------------------------------------------
Quite a few white spaces need to be removed.
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11536>
Should read 'MultithreadedTableMapper instances'
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11508>
Leave a space between while and (
Another space between ) and {
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11537>
Can we give better progress information here ?
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11535>
Long line, please wrap to 80 chars.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11534>
This if block can be an else to the if block above.
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11533>
Please remove white space.
- Ted
On 2012-02-22 07:20:13, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-22 07:20:13)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213240#comment-13213240 ]
Jai Kumar Singh commented on HBASE-5166:
----------------------------------------
@Zhihong Yu: submitted the patch for review with the suggested changes.
For the sub prefix, I've taken this from hadoop and following the same. Reason why we are calling it SubMapRecordReader/Writer because it is intermediate RecordReader/Writer for Mapper Threads and It eventually uses RecordReader/Writer passed to MapReduce Job to do actual read/write.
Thanks,
PS: I tried adding "Zhihong" in the reviewer list on the review page but somehow RB was failing, So I added stack as reviewer. Please do review.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214283#comment-13214283 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
bq. On 2012-02-22 17:53:12, Ted Yu wrote:
bq. > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 114
bq. > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line114>
bq. >
bq. > Should read 'MultithreadedTableMapper instances'
done!
bq. On 2012-02-22 17:53:12, Ted Yu wrote:
bq. > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 155
bq. > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line155>
bq. >
bq. > Can we give better progress information here ?
I am not sure how to do it. It would be possible if I can access underlying RecorderReaader/Writer passed to jobContext and simply calling there getProgress. Could anybody help me here ?
bq. On 2012-02-22 17:53:12, Ted Yu wrote:
bq. > /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java, line 223
bq. > <https://reviews.apache.org/r/3995/diff/2/?file=78620#file78620line223>
bq. >
bq. > This if block can be an else to the if block above.
done
- Jai
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5268
-----------------------------------------------------------
On 2012-02-23 04:17:08, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-23 04:17:08)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-5166:
-------------------------
Attachment: 5166-v9.txt
Same as 0008. Uploading again to rerun hadoopqa. Shouldn't be failing that many tests with this patch.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214288#comment-13214288 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------
(Updated 2012-02-23 04:22:51.078969)
Review request for hbase, Ted Yu and Michael Stack.
Changes
-------
White spaces remove
Summary
-------
There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
Diffs (updated)
-----
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
Diff: https://reviews.apache.org/r/3995/diff
Testing
-------
Thanks,
Jai
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216319#comment-13216319 ]
Hudson commented on HBASE-5166:
-------------------------------
Integrated in HBase-TRUNK-security #122 (See [https://builds.apache.org/job/HBase-TRUNK-security/122/])
HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098)
Result = FAILURE
stack :
Files :
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Fix For: 0.94.0
>
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213417#comment-13213417 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------
(Updated 2012-02-22 07:18:48.273758)
Review request for hbase and Michael Stack.
Changes
-------
Removing bugid HBASE-5166
Summary
-------
There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
Diffs
-----
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
Diff: https://reviews.apache.org/r/3995/diff
Testing
-------
Thanks,
Jai
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jai Kumar Singh updated HBASE-5166:
-----------------------------------
Attachment: 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
updating latest patch
thanks
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187094#comment-13187094 ]
Jai Kumar Singh commented on HBASE-5166:
----------------------------------------
Hi stack,
Thanks for the comment. I've modified the patch accordingly.
Added Executors.newFixedThreadPool(numberOfThreads) for executor part.
-- JK
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213421#comment-13213421 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
bq. On 2012-02-22 05:26:10, Ted Yu wrote:
bq. > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 64
bq. > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line64>
bq. >
bq. > "hbase.mapreduce." prefix should be kept.
bq. > Would "hbase.mapreduce.multithreadedmapper.class" be a good name ?
Okay!
I guess than it should be "hbase.mapreduce.multithreadedtablemapper".
public static final String NUMBER_OF_THREADS = "hbase.mapreduce.multithreadedtablemapper.threads";
public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedtablemapper.mapclass";
- Jai
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5266
-----------------------------------------------------------
On 2012-02-22 07:20:13, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-22 07:20:13)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215443#comment-13215443 ]
Jai Kumar Singh commented on HBASE-5166:
----------------------------------------
thanks stack, ted ;-)
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Fix For: 0.94.0
>
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215428#comment-13215428 ]
Jai Kumar Singh commented on HBASE-5166:
----------------------------------------
@stack,ted: any idea why its failing these tests ?
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214397#comment-13214397 ]
Hadoop QA commented on HBASE-5166:
----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515712/0008-HBASE-5166-Added-MultithreadedTableMapper.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated -134 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.replication.TestReplicationPeer
org.apache.hadoop.hbase.replication.TestReplication
org.apache.hadoop.hbase.TestDrainingServer
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//console
This message is automatically generated.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214361#comment-13214361 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
bq. On 2012-02-23 04:32:03, Michael Stack wrote:
bq. > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.
bq.
bq. Jai Singh wrote:
bq. This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.
bq.
bq. Michael Stack wrote:
bq. So works nicely for your crawling then? Mind writing a sweet release note for this? I'll go commit it.
bq.
bq. Michael Stack wrote:
bq. Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone). Thanks Jai.
Yes, It works great with web crawling scenario.
"MultiThreadedTableMapper for [N/W] IO bound jobs"
Updated the patch on jira.
Thanks
- Jai
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------
On 2012-02-23 04:22:51, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-23 04:22:51)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Daniel Cryans reassigned HBASE-5166:
-----------------------------------------
Assignee: Jai Kumar Singh
Added Jai as a contributor and assigned the Jira.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Assignee: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Fix For: 0.94.0
>
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jai Kumar Singh updated HBASE-5166:
-----------------------------------
Attachment: 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
patch created against current trunk.
Also moved the testcase in separate file.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212547#comment-13212547 ]
Zhihong Yu commented on HBASE-5166:
-----------------------------------
Can you use --no-prefix to generat a new patch for Hadoop QA ?
Thanks
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185963#comment-13185963 ]
stack commented on HBASE-5166:
------------------------------
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
Looks grand to me (as does the network/io-bound justification in your usecase). Would be a nice contrib. I'd like it so I can use it putting up load on hbase; currently have to run a ridiculous amount of concurrent mappers putting up a load using a tool like PerformanceEvaluation which runs a single client doing serial load per map task.
A few comments on the patch.
No need of these lines:
{code}
+ * Copyright 2007 The Apache Software Foundation
{code}
In our code base, we use two spaces for tabs (no hard tabs you have in your file).
Fix the name of this config:
{code}
+ getInt("mapred.map.multithreadedrunner.threads", 10);
{code}
Ditto for the setter.
You don't want to use an executor and something like guava's utility creating the executor running the threads? (See hbase code base for examples)
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214284#comment-13214284 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------
(Updated 2012-02-23 04:17:08.702062)
Review request for hbase, Ted Yu and Michael Stack.
Changes
-------
changes as suggested in review
Summary
-------
There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
Diffs (updated)
-----
/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
Diff: https://reviews.apache.org/r/3995/diff
Testing
-------
Thanks,
Jai
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-5166:
-------------------------
Resolution: Fixed
Fix Version/s: 0.94.0
Release Note: New MultiThreadedTableMapper facility
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
Committed to trunk. Thanks for the patch Jai. Nice one.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Fix For: 0.94.0
>
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212499#comment-13212499 ]
Jai Kumar Singh commented on HBASE-5166:
----------------------------------------
@Zhihong Yu, 1) Apache License was earlier there but I removed that become stack suggested so. Anyway, I'd put it back.
2) I've added Thread.sleep(1000). I am not sure whether we want to limit the wait duration, wouldn't that depend on kind of job we are running ?
3) I've modified the test case of TableMapper in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java
Firstly, I was going to make a new testcase file for MultithreadedTableMapper but it does not make sense in doing so, because that would be too much code repetition.
So, I added a numOfThreads argument in TestTableMapReduce's runTestOnTable function and called the function twice. Check patch for more details.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214338#comment-13214338 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
bq. On 2012-02-23 04:32:03, Michael Stack wrote:
bq. > This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.
This works fine. I've tested it in the usecase I mentioned on jira HBASE-5166.
- Jai
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------
On 2012-02-23 04:22:51, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-23 04:22:51)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212758#comment-13212758 ]
Hadoop QA commented on HBASE-5166:
----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515348/0006-HBASE-5166-Added-MultithreadedTableMapper.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated -134 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestAtomicOperation
org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/998//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/998//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/998//console
This message is automatically generated.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213139#comment-13213139 ]
Zhihong Yu commented on HBASE-5166:
-----------------------------------
@Jai:
{code}
+ * Copyright 2007 The Apache Software Foundation
{code}
Year is not needed in license header. Same here:
{code}
+ * Copyright 2009 The Apache Software Foundation
{code}
{code}
+ public void testAddDependencyJars() throws Exception {
{code}
The above doesn't carry @Test annotation. If it is not needed for this JIRA, please remove it.
{code}
+ public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedrunner.class";
{code}
I think the name of config parameter should be changed to 'multithreadedmapper.class'
Same for NUMBER_OF_THREADS
{code}
+ private class SubMapRecordReader extends RecordReader<ImmutableBytesWritable, Result> {
{code}
Why do we need the Sub prefix above ?
Putting the patch on https://reviews.apache.org would make review process smooth.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous
to MultiThreaded Mapper in hadoop
Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-5166:
-------------------------
Status: Open (was: Patch Available)
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper
analogous to MultiThreaded Mapper in hadoop
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214291#comment-13214291 ]
jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------
Ship it!
This looks great. Does it work? Have you tried it? +1 on commit if it works. Would be nice in things like PE putting up more load.
- Michael
On 2012-02-23 04:22:51, Jai Singh wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3995/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-23 04:22:51)
bq.
bq.
bq. Review request for hbase, Ted Yu and Michael Stack.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.
bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION
bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/3995/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jai
bq.
bq.
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
> Key: HBASE-5166
> URL: https://issues.apache.org/jira/browse/HBASE-5166
> Project: HBase
> Issue Type: Improvement
> Reporter: Jai Kumar Singh
> Priority: Minor
> Labels: multithreaded, tablemapper
> Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs.
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase.
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira