You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jai Kumar Singh (Created) (JIRA)" <ji...@apache.org> on 2012/01/10 11:24:38 UTC

[jira] [Created] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
----------------------------------------------------------------------

                 Key: HBASE-5166
                 URL: https://issues.apache.org/jira/browse/HBASE-5166
             Project: HBase
          Issue Type: Improvement
            Reporter: Jai Kumar Singh
            Priority: Minor


There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jai Kumar Singh updated HBASE-5166:
-----------------------------------

    Attachment: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch

This is the implementation I am using currently for Multithreadedtablemapper which is a modification of MultithreadedMapper from org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.java 
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201439#comment-13201439 ] 

Zhihong Yu commented on HBASE-5166:
-----------------------------------

MultithreadedTableMapper misses Apache license

{code}
+    while(!executor.isTerminated()){
+      // wait till all the threads are done
+    }
{code}
We should put sleep() in the above loop and possibly limit the total duration of wait.

A new unit test should be added for MultithreadedTableMapper.
Please look at tests that use TableMapper.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5166:
-------------------------

    Status: Patch Available  (was: Open)
    
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jai Kumar Singh updated HBASE-5166:
-----------------------------------

    Status: Patch Available  (was: Open)
    
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212501#comment-13212501 ] 

Hadoop QA commented on HBASE-5166:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515340/0005-HBASE-5166-Added-MultithreadedTableMapper.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 8 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/997//console

This message is automatically generated.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213243#comment-13213243 ] 

Zhihong Yu commented on HBASE-5166:
-----------------------------------

My recommendation of using review board is to leave Bugs field empty. Otherwise large amount of post-back from review board would appear in the JIRA.
You can specify hbase in Groups field.

My user name is tedyu.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214351#comment-13214351 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.  > This looks great.  Does it work?  Have you tried it?  +1 on commit if it works.  Would be nice in things like PE putting up more load.
bq.  
bq.  Jai Singh wrote:
bq.      This works fine. I've tested it in the usecase  I mentioned on jira HBASE-5166.
bq.  
bq.  Michael Stack wrote:
bq.      So works nicely for your crawling then?  Mind writing a sweet release note for this?  I'll go commit it.

Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone).  Thanks Jai.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213328#comment-13213328 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5266
-----------------------------------------------------------



/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11506>

    "hbase.mapreduce." prefix should be kept.
    Would "hbase.mapreduce.multithreadedmapper.class" be a good name ?


- Ted


On 2012-02-22 03:22:25, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-22 03:22:25)
bq.  
bq.  
bq.  Review request for Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  This addresses bug HBASE-5166.
bq.      https://issues.apache.org/jira/browse/HBASE-5166
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216333#comment-13216333 ] 

Hudson commented on HBASE-5166:
-------------------------------

Integrated in HBase-TRUNK #2669 (See [https://builds.apache.org/job/HBase-TRUNK/2669/])
    HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098)

     Result = SUCCESS
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java

                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>             Fix For: 0.94.0
>
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213238#comment-13213238 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------

Review request for Michael Stack.


Summary
-------

There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.


This addresses bug HBASE-5166.
    https://issues.apache.org/jira/browse/HBASE-5166


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
-------


Thanks,

Jai


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214912#comment-13214912 ] 

Hadoop QA commented on HBASE-5166:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515764/5166-v9.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -134 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1024//console

This message is automatically generated.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214342#comment-13214342 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.  > This looks great.  Does it work?  Have you tried it?  +1 on commit if it works.  Would be nice in things like PE putting up more load.
bq.  
bq.  Jai Singh wrote:
bq.      This works fine. I've tested it in the usecase  I mentioned on jira HBASE-5166.

So works nicely for your crawling then?  Mind writing a sweet release note for this?  I'll go commit it.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215432#comment-13215432 ] 

stack commented on HBASE-5166:
------------------------------

@Jai Its not you.  Those are known failing tests.  Let me commit.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212555#comment-13212555 ] 

Jai Kumar Singh commented on HBASE-5166:
----------------------------------------

submitted a new patch against current trunk on svn. 

Thanks
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jai Kumar Singh updated HBASE-5166:
-----------------------------------

    Attachment: 0005-HBASE-5166-Added-MultithreadedTableMapper.patch

Added Thread.sleep() and license thing and testcase
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jai Kumar Singh updated HBASE-5166:
-----------------------------------

    Attachment: 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

Modified patch
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201433#comment-13201433 ] 

Jai Kumar Singh commented on HBASE-5166:
----------------------------------------

Any comments ??
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213418#comment-13213418 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------

(Updated 2012-02-22 07:20:13.121177)


Review request for hbase, Ted Yu and Michael Stack.


Summary
-------

There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
-------


Thanks,

Jai


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213367#comment-13213367 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------

(Updated 2012-02-22 06:00:23.473596)


Review request for hbase and Michael Stack.


Summary
-------

There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.


This addresses bug HBASE-5166.
    https://issues.apache.org/jira/browse/HBASE-5166


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
-------


Thanks,

Jai


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213793#comment-13213793 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5268
-----------------------------------------------------------


Quite a few white spaces need to be removed.


/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11536>

    Should read 'MultithreadedTableMapper instances'



/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11508>

    Leave a space between while and (
    Another space between ) and {



/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11537>

    Can we give better progress information here ?



/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11535>

    Long line, please wrap to 80 chars.



/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11534>

    This if block can be an else to the if block above.



/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java
<https://reviews.apache.org/r/3995/#comment11533>

    Please remove white space.


- Ted


On 2012-02-22 07:20:13, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-22 07:20:13)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213240#comment-13213240 ] 

Jai Kumar Singh commented on HBASE-5166:
----------------------------------------

@Zhihong Yu: submitted the patch for review with the suggested changes. 
For the sub prefix, I've taken this from hadoop and following the same. Reason why we are calling it SubMapRecordReader/Writer because it is intermediate RecordReader/Writer for Mapper Threads and It eventually uses RecordReader/Writer passed to MapReduce Job to do actual read/write. 

Thanks,

PS: I tried adding "Zhihong" in the reviewer list on the review page but somehow RB was failing, So I added stack as reviewer. Please do review. 

                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214283#comment-13214283 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------



bq.  On 2012-02-22 17:53:12, Ted Yu wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 114
bq.  > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line114>
bq.  >
bq.  >     Should read 'MultithreadedTableMapper instances'

done!


bq.  On 2012-02-22 17:53:12, Ted Yu wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 155
bq.  > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line155>
bq.  >
bq.  >     Can we give better progress information here ?

I am not sure how to do it. It would be possible if I can access underlying RecorderReaader/Writer passed to jobContext and simply calling there getProgress. Could anybody help me here ?  


bq.  On 2012-02-22 17:53:12, Ted Yu wrote:
bq.  > /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java, line 223
bq.  > <https://reviews.apache.org/r/3995/diff/2/?file=78620#file78620line223>
bq.  >
bq.  >     This if block can be an else to the if block above.

done


- Jai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5268
-----------------------------------------------------------


On 2012-02-23 04:17:08, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-23 04:17:08)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5166:
-------------------------

    Attachment: 5166-v9.txt

Same as 0008.  Uploading again to rerun hadoopqa.  Shouldn't be failing that many tests with this patch.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214288#comment-13214288 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------

(Updated 2012-02-23 04:22:51.078969)


Review request for hbase, Ted Yu and Michael Stack.


Changes
-------

White spaces remove


Summary
-------

There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.


Diffs (updated)
-----

  /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
-------


Thanks,

Jai


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216319#comment-13216319 ] 

Hudson commented on HBASE-5166:
-------------------------------

Integrated in HBase-TRUNK-security #122 (See [https://builds.apache.org/job/HBase-TRUNK-security/122/])
    HBASE-5166 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop (Revision 1293098)

     Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java

                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>             Fix For: 0.94.0
>
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213417#comment-13213417 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------

(Updated 2012-02-22 07:18:48.273758)


Review request for hbase and Michael Stack.


Changes
-------

Removing bugid HBASE-5166


Summary
-------

There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
-------


Thanks,

Jai


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jai Kumar Singh updated HBASE-5166:
-----------------------------------

    Attachment: 0008-HBASE-5166-Added-MultithreadedTableMapper.patch

updating latest patch 
thanks
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187094#comment-13187094 ] 

Jai Kumar Singh commented on HBASE-5166:
----------------------------------------

Hi stack,  
   Thanks for the comment. I've modified the patch accordingly.
   Added Executors.newFixedThreadPool(numberOfThreads) for executor part.

-- JK 

                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213421#comment-13213421 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------



bq.  On 2012-02-22 05:26:10, Ted Yu wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java, line 64
bq.  > <https://reviews.apache.org/r/3995/diff/2/?file=78619#file78619line64>
bq.  >
bq.  >     "hbase.mapreduce." prefix should be kept.
bq.  >     Would "hbase.mapreduce.multithreadedmapper.class" be a good name ?

Okay!
I guess than it should be "hbase.mapreduce.multithreadedtablemapper".

  public static final String NUMBER_OF_THREADS = "hbase.mapreduce.multithreadedtablemapper.threads";
  public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedtablemapper.mapclass";
  


- Jai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5266
-----------------------------------------------------------


On 2012-02-22 07:20:13, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-22 07:20:13)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215443#comment-13215443 ] 

Jai Kumar Singh commented on HBASE-5166:
----------------------------------------

thanks stack, ted ;-) 
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>             Fix For: 0.94.0
>
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215428#comment-13215428 ] 

Jai Kumar Singh commented on HBASE-5166:
----------------------------------------

@stack,ted: any idea why its failing these tests ?
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214397#comment-13214397 ] 

Hadoop QA commented on HBASE-5166:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515712/0008-HBASE-5166-Added-MultithreadedTableMapper.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -134 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 153 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplicationPeer
                  org.apache.hadoop.hbase.replication.TestReplication
                  org.apache.hadoop.hbase.TestDrainingServer
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1020//console

This message is automatically generated.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214361#comment-13214361 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.  > This looks great.  Does it work?  Have you tried it?  +1 on commit if it works.  Would be nice in things like PE putting up more load.
bq.  
bq.  Jai Singh wrote:
bq.      This works fine. I've tested it in the usecase  I mentioned on jira HBASE-5166.
bq.  
bq.  Michael Stack wrote:
bq.      So works nicely for your crawling then?  Mind writing a sweet release note for this?  I'll go commit it.
bq.  
bq.  Michael Stack wrote:
bq.      Oh, mind uploading the final version of the patch to the issue itself then we can run hadoopqa on the patch and make sure it plays well w/ rest of hbase (should be fine given its standalone).  Thanks Jai.

Yes, It works great with web crawling scenario. 

"MultiThreadedTableMapper for [N/W] IO bound jobs"

Updated the patch on jira.

Thanks


- Jai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans reassigned HBASE-5166:
-----------------------------------------

    Assignee: Jai Kumar Singh

Added Jai as a contributor and assigned the Jira.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Assignee: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>             Fix For: 0.94.0
>
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jai Kumar Singh updated HBASE-5166:
-----------------------------------

    Attachment: 0006-HBASE-5166-Added-MultithreadedTableMapper.patch

patch created against current trunk.
Also moved the testcase in separate file.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212547#comment-13212547 ] 

Zhihong Yu commented on HBASE-5166:
-----------------------------------

Can you use --no-prefix to generat a new patch for Hadoop QA ?

Thanks 
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185963#comment-13185963 ] 

stack commented on HBASE-5166:
------------------------------

bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.

Looks grand to me (as does the network/io-bound justification in your usecase).  Would be a nice contrib.   I'd like it so I can use it putting up load on hbase; currently have to run a ridiculous amount of concurrent mappers putting up a load using a tool like PerformanceEvaluation which runs a single client doing serial load per map task.

A few comments on the patch.

No need of these lines:

{code}
+ * Copyright 2007 The Apache Software Foundation
{code}

In our code base, we use two spaces for tabs (no hard tabs you have in your file).

Fix the name of this config:

{code}
+				getInt("mapred.map.multithreadedrunner.threads", 10);
{code}

Ditto for the setter.

You don't want to use an executor and something like guava's utility creating the executor running the threads?  (See hbase code base for examples)


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214284#comment-13214284 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/
-----------------------------------------------------------

(Updated 2012-02-23 04:17:08.702062)


Review request for hbase, Ted Yu and Michael Stack.


Changes
-------

changes as suggested in review


Summary
-------

There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).

Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.


Diffs (updated)
-----

  /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 

Diff: https://reviews.apache.org/r/3995/diff


Testing
-------


Thanks,

Jai


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5166:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.94.0
     Release Note: New MultiThreadedTableMapper facility
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks for the patch Jai.  Nice one.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>             Fix For: 0.94.0
>
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Jai Kumar Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212499#comment-13212499 ] 

Jai Kumar Singh commented on HBASE-5166:
----------------------------------------

@Zhihong Yu, 1) Apache License was earlier there but I removed that become stack suggested so. Anyway, I'd put it back. 
2) I've added Thread.sleep(1000). I am not sure whether we want to limit the wait duration, wouldn't that depend on kind of job we are running ?
3) I've modified the test case of TableMapper in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java 
Firstly, I was going to make a new testcase file for MultithreadedTableMapper but it does not make sense in doing so, because that would be too much code repetition.
So, I added a numOfThreads argument in TestTableMapReduce's runTestOnTable function and called the function twice. Check patch for more details.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214338#comment-13214338 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------



bq.  On 2012-02-23 04:32:03, Michael Stack wrote:
bq.  > This looks great.  Does it work?  Have you tried it?  +1 on commit if it works.  Would be nice in things like PE putting up more load.

This works fine. I've tested it in the usecase  I mentioned on jira HBASE-5166.


- Jai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212758#comment-13212758 ] 

Hadoop QA commented on HBASE-5166:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515348/0006-HBASE-5166-Added-MultithreadedTableMapper.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -134 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestAtomicOperation
                  org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/998//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/998//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/998//console

This message is automatically generated.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213139#comment-13213139 ] 

Zhihong Yu commented on HBASE-5166:
-----------------------------------

@Jai:
{code}
+ * Copyright 2007 The Apache Software Foundation
{code}
Year is not needed in license header. Same here:
{code}
+ * Copyright 2009 The Apache Software Foundation
{code}
{code}
+  public void testAddDependencyJars() throws Exception {
{code}
The above doesn't carry @Test annotation. If it is not needed for this JIRA, please remove it.
{code}
+  public static final String MAPPER_CLASS = "hbase.mapreduce.multithreadedrunner.class";
{code}
I think the name of config parameter should be changed to 'multithreadedmapper.class'
Same for NUMBER_OF_THREADS
{code}
+  private class SubMapRecordReader extends RecordReader<ImmutableBytesWritable, Result> {
{code}
Why do we need the Sub prefix above ?

Putting the patch on https://reviews.apache.org would make review process smooth.
                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5166:
-------------------------

    Status: Open  (was: Patch Available)
    
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch, 0008-HBASE-5166-Added-MultithreadedTableMapper.patch, 5166-v9.txt
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214291#comment-13214291 ] 

jiraposter@reviews.apache.org commented on HBASE-5166:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3995/#review5302
-----------------------------------------------------------

Ship it!


This looks great.  Does it work?  Have you tried it?  +1 on commit if it works.  Would be nice in things like PE putting up more load.

- Michael


On 2012-02-23 04:22:51, Jai Singh wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3995/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-23 04:22:51)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
bq.  UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
bq.  Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
bq.  
bq.  Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?.
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION 
bq.    /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3995/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jai
bq.  
bq.


                
> MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
> ----------------------------------------------------------------------
>
>                 Key: HBASE-5166
>                 URL: https://issues.apache.org/jira/browse/HBASE-5166
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jai Kumar Singh
>            Priority: Minor
>              Labels: multithreaded, tablemapper
>         Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. 
> UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. 
> Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound).
> Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira