You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Clint Heath (Created) (JIRA)" <ji...@apache.org> on 2012/04/06 22:13:22 UTC

[jira] [Created] (HBASE-5741) ImportTsv does not check for table existence

ImportTsv does not check for table existence 
---------------------------------------------

                 Key: HBASE-5741
                 URL: https://issues.apache.org/jira/browse/HBASE-5741
             Project: HBase
          Issue Type: Bug
          Components: mapreduce
    Affects Versions: 0.90.4
            Reporter: Clint Heath


The usage statement for the "importtsv" command to hbase claims this:

"Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)

The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.

This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:

(From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)

305 HTable table = new HTable(conf, tableName);

The HTable method signature in use there assumes the table exists and runs a meta scan on it:

(From org.apache.hadoop.hbase.client.HTable.java)

142 * Creates an object to access a HBase table.
...
151 public HTable(Configuration conf, final String tableName)

What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:

(Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)

690 boolean tableExists = this.doesTableExist(tableName);
691 if (!tableExists) this.createTable(tableName,dirPath);

Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:

12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
...


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256107#comment-13256107 ] 

Zhihong Yu commented on HBASE-5741:
-----------------------------------

Integrated to trunk.

Thanks for the patch Himanshu.

Will wait for 0.94.0 to come out before integrating to 0.94
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257714#comment-13257714 ] 

Hudson commented on HBASE-5741:
-------------------------------

Integrated in HBase-0.94-security #17 (See [https://builds.apache.org/job/HBase-0.94-security/17/])
    HBASE-5741 ImportTsv does not check for table existence (Himanshu) (Revision 1328032)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java

                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251756#comment-13251756 ] 

jiraposter@reviews.apache.org commented on HBASE-5741:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4700/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

There is a bulk output option in the importtsv workload. It outputs the HFiles in a user defined directory. The current code assumes that a table with its name equal to the given output directory exists, and throws an exception otherwise. Here is a patch for creating a table in case it doesn't exist.


This addresses bug HBase-5741.
    https://issues.apache.org/jira/browse/HBase-5741


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java ab22fc4 
  src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ac30a62 

Diff: https://reviews.apache.org/r/4700/diff


Testing
-------

Added a new test for bulkoutput; All importtsv tests pass.


Thanks,

Himanshu


                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5741:
------------------------------

    Attachment:     (was: 5741-v3.txt)
    
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252038#comment-13252038 ] 

Zhihong Yu commented on HBASE-5741:
-----------------------------------

I would integrate patches to trunk and 0.94 tomorrow morning if there is no objection.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-5741:
---------------------------------------

    Attachment: HBase-5741.patch

A patch for checking/adding non-existant table in hbase in case we are using bulkoutput option. Added a test in the TestImportTsv. Please comment.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256083#comment-13256083 ] 

Lars Hofhansl commented on HBASE-5741:
--------------------------------------

Thanks Clint. No harm, I just think it will hide complexity that should not be hidden. I also don't think it's too much to ask a user to create the table first (but we should definitely fix the Javadoc in that case).

That all said, I am +-0 on this. It's a simple change, and v3 looks good.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257659#comment-13257659 ] 

Hudson commented on HBASE-5741:
-------------------------------

Integrated in HBase-0.94 #133 (See [https://builds.apache.org/job/HBase-0.94/133/])
    HBASE-5741 ImportTsv does not check for table existence (Himanshu) (Revision 1328032)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java

                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252064#comment-13252064 ] 

Lars Hofhansl commented on HBASE-5741:
--------------------------------------

Sorry for chiming in late here, but do we actually want ImportTsv to create the table for us?
Personally I'd rather have it fail, which forces me to think about how I want create the table (pre split, compression, etc).

                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5741:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

This was committed a while back... Marking it accordingly.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256231#comment-13256231 ] 

Hudson commented on HBASE-5741:
-------------------------------

Integrated in HBase-TRUNK-security #174 (See [https://builds.apache.org/job/HBase-TRUNK-security/174/])
    HBASE-5741 ImportTsv does not check for table existence (Himanshu) (Revision 1327338)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java

                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249443#comment-13249443 ] 

Himanshu Vashishtha commented on HBASE-5741:
--------------------------------------------

btw, by "making an attempt to read all the regions of this table and then use the start keys of these regions", I mean get the start rows for all the existing regions of the table and then use them for job configuration.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251889#comment-13251889 ] 

Hadoop QA commented on HBASE-5741:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522299/HBase-5741-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1483//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1483//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1483//console

This message is automatically generated.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252087#comment-13252087 ] 

Himanshu Vashishtha commented on HBASE-5741:
--------------------------------------------

As Clint mentioned, there are some workloads that use ImportTsv to create HFiles, which later on are used in the bulk import method : my motivation for the patch.

What you suggest Lars, failing the importtsv, or adding a similar thing in the Import tool too? I think later makes it consistent.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5741:
------------------------------

    Fix Version/s: 0.96.0
                   0.94.0
    
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252036#comment-13252036 ] 

Hadoop QA commented on HBASE-5741:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522330/5741-v3.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1485//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1485//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1485//console

This message is automatically generated.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha reassigned HBASE-5741:
------------------------------------------

    Assignee: Himanshu Vashishtha
    
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252095#comment-13252095 ] 

Lars Hofhansl commented on HBASE-5741:
--------------------------------------

Even the created HFileOutputFormat use the table in HBase to find out about splits. We will rarely bulk import into an empty table that is not pre-split...? (I'm assuming here, you folks at Cloudera see many more use cases that I do)

If you believe strongly that we should add this, let's do it :)
We can do Import in a separate patch, or not do that as we see fit.

                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250813#comment-13250813 ] 

Himanshu Vashishtha commented on HBASE-5741:
--------------------------------------------

You are right. I am on it now. Thanks.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251778#comment-13251778 ] 

jiraposter@reviews.apache.org commented on HBASE-5741:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4700/#review6857
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15254>

    I think the new method should be called doesTableExist() or tableExists()



src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15256>

    Insert space between for and (.



src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15257>

    Insert space between if and (.



src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15255>

    Insert one leading space before cfSet



src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15258>

    Can this method be package private ?
    Since HBaseAdmin is created inside, a better name would be createHBaseAdmin().



src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java
<https://reviews.apache.org/r/4700/#comment15259>

    Insert space between if and (.



src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java
<https://reviews.apache.org/r/4700/#comment15260>

    Insert space between else and {.


- Ted


On 2012-04-11 17:16:59, Himanshu Vashishtha wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4700/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-11 17:16:59)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is a bulk output option in the importtsv workload. It outputs the HFiles in a user defined directory. The current code assumes that a table with its name equal to the given output directory exists, and throws an exception otherwise. Here is a patch for creating a table in case it doesn't exist.
bq.  
bq.  
bq.  This addresses bug HBase-5741.
bq.      https://issues.apache.org/jira/browse/HBase-5741
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java ab22fc4 
bq.    src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ac30a62 
bq.  
bq.  Diff: https://reviews.apache.org/r/4700/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added a new test for bulkoutput; All importtsv tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Himanshu
bq.  
bq.


                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5741:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522338/5741-94.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1486//console

This message is automatically generated.)
    
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252025#comment-13252025 ] 

Hadoop QA commented on HBASE-5741:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522338/5741-94.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1486//console

This message is automatically generated.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257579#comment-13257579 ] 

Zhihong Yu commented on HBASE-5741:
-----------------------------------

Integrated to 0.94 as well.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252076#comment-13252076 ] 

Lars Hofhansl commented on HBASE-5741:
--------------------------------------

Import does not do that either. At least we should be consistent between the various importing tools.
It seems overkill to burden these tools with that.

That said, I am not opposed, just questioning the usefulness.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252081#comment-13252081 ] 

Hadoop QA commented on HBASE-5741:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522343/5741-v3.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1487//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1487//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1487//console

This message is automatically generated.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252095#comment-13252095 ] 

Lars Hofhansl edited comment on HBASE-5741 at 4/12/12 12:33 AM:
----------------------------------------------------------------

Even HFileOutputFormat uses the table in HBase to find out about splits. We will rarely bulk import into an empty table that is not pre-split...? (I'm assuming here, you folks at Cloudera see many more use cases that I do)

If you believe strongly that we should add this, let's do it :)
We can do Import in a separate patch, or not do that as we see fit.

                
      was (Author: lhofhansl):
    Even the created HFileOutputFormat use the table in HBase to find out about splits. We will rarely bulk import into an empty table that is not pre-split...? (I'm assuming here, you folks at Cloudera see many more use cases that I do)

If you believe strongly that we should add this, let's do it :)
We can do Import in a separate patch, or not do that as we see fit.

                  
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252071#comment-13252071 ] 

Zhihong Yu commented on HBASE-5741:
-----------------------------------

How about adding an option to ImportTsv, default to false, allowing user to auto-create the table ?
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Himanshu Vashishtha updated HBASE-5741:
---------------------------------------

    Attachment: HBase-5741-v2.patch

Patch version 2 after rb.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Clint Heath (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249966#comment-13249966 ] 

Clint Heath commented on HBASE-5741:
------------------------------------

Thanks for the feedback.  We I have customers who are using the workflow you mentioned.  This is why I discovered the confusing javadoc reference.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251758#comment-13251758 ] 

Himanshu Vashishtha commented on HBASE-5741:
--------------------------------------------

Uploaded the patch on rb; 
https://reviews.apache.org/r/4700/
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5741:
------------------------------

    Hadoop Flags: Reviewed
          Status: Patch Available  (was: Open)
    
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5741:
------------------------------

    Attachment: 5741-v3.txt

Patch v3 addresses latest review comments.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl closed HBASE-5741.
--------------------------------

    
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.1, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741.patch, HBase-5741-v2.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251851#comment-13251851 ] 

jiraposter@reviews.apache.org commented on HBASE-5741:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4700/
-----------------------------------------------------------

(Updated 2012-04-11 19:01:39.660787)


Review request for hbase.


Changes
-------

Changes done as per Ted.


Summary
-------

There is a bulk output option in the importtsv workload. It outputs the HFiles in a user defined directory. The current code assumes that a table with its name equal to the given output directory exists, and throws an exception otherwise. Here is a patch for creating a table in case it doesn't exist.


This addresses bug HBase-5741.
    https://issues.apache.org/jira/browse/HBase-5741


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java ab22fc4 
  src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ac30a62 

Diff: https://reviews.apache.org/r/4700/diff


Testing
-------

Added a new test for bulkoutput; All importtsv tests pass.


Thanks,

Himanshu


                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5741:
------------------------------

    Attachment: 5741-94.txt

Patch for 0.94 where TestImportTsv passed.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251896#comment-13251896 ] 

jiraposter@reviews.apache.org commented on HBASE-5741:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4700/#review6860
-----------------------------------------------------------

Ship it!


Please attach new patch to JIRA after addressing minor comments.


src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15276>

    The parentheses around hbaseAdmin.tableExists() are not needed.



src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15277>

    Move this to line 263.



src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
<https://reviews.apache.org/r/4700/#comment15278>

    'bothered about' -> 'concerned with'


- Ted


On 2012-04-11 19:01:39, Himanshu Vashishtha wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4700/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-11 19:01:39)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  There is a bulk output option in the importtsv workload. It outputs the HFiles in a user defined directory. The current code assumes that a table with its name equal to the given output directory exists, and throws an exception otherwise. Here is a patch for creating a table in case it doesn't exist.
bq.  
bq.  
bq.  This addresses bug HBase-5741.
bq.      https://issues.apache.org/jira/browse/HBase-5741
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java ab22fc4 
bq.    src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ac30a62 
bq.  
bq.  Diff: https://reviews.apache.org/r/4700/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added a new test for bulkoutput; All importtsv tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Himanshu
bq.  
bq.


                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>         Attachments: HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253637#comment-13253637 ] 

Himanshu Vashishtha commented on HBASE-5741:
--------------------------------------------

I am waiting for Clint to give me some numbers of such use cases to make the cut; (Apparently, he is on vacation these days and will be back this Monday)
Thanks.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5741:
------------------------------

    Attachment: 5741-v3.txt
    
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Clint Heath (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255995#comment-13255995 ] 

Clint Heath commented on HBASE-5741:
------------------------------------

Lars, et al, thank you for your questions and comments.  I may be missing something, so please correct me if I'm wrong, but here's how I see the situation:

1) importtsv will auto-split the input data in region-server-sized Hfiles, based on hbase.hregion.max.filesize (because we use the HFileOutputFormat and it checks this parameter from the client configs).

2) completebulkload then distributes the resulting HFiles to the region servers, thereby populating your cluster with all the regions needed to host your data.  Alleviating all the memstore flushes, compactions, splitting, and other overhead associated with running billions of individual put()s to incrementally load your data.  This is why we recommend the bulk load process to initially populate HBase tables.

3) pre-splitting your table (based on rowkeys, as is the only mechanism) is useless in this scenario, because it will not take into account the size of data associated with each key.  The customer would literally have to walk through their data and determine how many Gigs of data are associated with each rowkey in order to preemptively split their tables in a bulk load scenario...and even then, HBase only takes those pre-splits as "suggestions" and if the data ends up needing to be split differently, that will happen automatically. So what purpose does pre-splitting serve in a bulk load?  Think "initial load" of massive amounts of data.

4) Lars had a good point about compression, yes, typically a customer *should* set up their table with compression first, but they typically don't know this and they can do it after the fact without consequence.

In conclusion, we provide this handy command-line tool to help our HBase customers (most of whom are complete newbies and don't know what they're doing) get their HBase tables up and running, yet we give them conflicting information about how to use the tool.  If we need to tell them to pre-create the table, then we should tell them so...clearly.  However, I think it is just simpler and more intuitive if the tool did that automatically, as the javadocs indicate it will.  Since region splits are determined internally by the HFileOutputFormat anyway (if they use the bulkload option), what's the harm?  This enables people to get their data loaded into HBase with the least amount of education, ramp-up, java development, data mining, etc., etc.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Himanshu Vashishtha (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249442#comment-13249442 ] 

Himanshu Vashishtha commented on HBASE-5741:
--------------------------------------------

Yes, the javadoc says it half correct when it says that "the table must exist in HBase if you are not using the option importtsv.bulk.output".
The behavior, in absence of the above property is to do inserts into the provided table; so it throws an exception if the table doesn't exist.

When we use this option, importtsv tries to configure job by making an attempt to read all the regions of this table and then use the start keys of these regions as partition-markers for the TotalOrderPartitioner class. This usage eliminates the first start-row (which is always a EMPTY_BYTE_ARRAY), so even if we create a new table, it will be useless as for using it for configuring a job. 

For the case with LoadIncrementalHFile, the destination of the directory containing the hFiles is assumed to be in a specific format:
<tableName>/<cf1>/<listOfHFiles>
           /<cf2>/<listOfHFiles>

 --where we are storing the hfiles for a specific column family in a separate sub-directory.

If we do create a table based on the parameters given to the importtsv command, it will not be useful for the case of bulkload usecase as in the importtsv job, we dump hfiles based on rows; so all coulmn famlies for a specific row lands in one hfile. 

It would be great to know if we actually use this workflow: create HFiles from importtsv job, and then use bulkload to insert those HFiles in HBase table.

I think we should change the javadoc.

Please let me know if you have any questions; hbase-mapreduce use cases are exciting.
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5741) ImportTsv does not check for table existence

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5741:
---------------------------------

    Fix Version/s:     (was: 0.94.0)
                   0.94.1

Moving to 0.94.1, since we're still discussing (again, sorry about the late chime in)
                
> ImportTsv does not check for table existence 
> ---------------------------------------------
>
>                 Key: HBASE-5741
>                 URL: https://issues.apache.org/jira/browse/HBASE-5741
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Clint Heath
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch
>
>
> The usage statement for the "importtsv" command to hbase claims this:
> "Note: if you do not use this option, then the target table must already exist in HBase" (in reference to the "importtsv.bulk.output" command-line option)
> The truth is, the table must exist no matter what, importtsv cannot and will not create it for you.
> This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it:
> (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
> 305 HTable table = new HTable(conf, tableName);
> The HTable method signature in use there assumes the table exists and runs a meta scan on it:
> (From org.apache.hadoop.hbase.client.HTable.java)
> 142 * Creates an object to access a HBase table.
> ...
> 151 public HTable(Configuration conf, final String tableName)
> What we should do inside of createSubmittableJob is something similar to what the "completebulkloads" command would do:
> (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
> 690 boolean tableExists = this.doesTableExist(tableName);
> 691 if (!tableExists) this.createTable(tableName,dirPath);
> Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception:
> 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: 
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99999999999999
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira