You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by "Cox, Jonathan A" <ja...@sandia.gov> on 2015/12/18 00:17:11 UTC

Help calling CsvBulkLoadTool from Java Method

I'm wondering if somebody can provide some guidance on how to use CsvBulkLoadTool from within a Java Class, instead of via the command line, as is shown in the documentation. I'd like to determine if CsvBulkLoadTool ran without throwing any exceptions. However, exceptions generated by org.apache.phoenix.* don't seem to be thrown up to the calling method (ToolRunner.run), or at least result in a non-zero return code.

Here is what I am doing:
import org.apache.phoenix.mapreduce.CsvBulkLoadTool;
import org.apache.hadoop.util.ToolRunner;

CsvBulkLoadTool csvBulkLoader = new CsvBulkLoadTool();
int tret;
String[] args = {"-d", "\t", "--table", "MyTableName", "--input", "file://myfile.tsv"};

tret = ToolRunner.run(csvBulkLoader, args);

When I run this code, I get a return code (tret) of zero and no errors in the console output. However, the data is not loaded into HBase. When running the same commands on the command line, I discovered that Phoenix can throw various errors (e.g. wrong column datatype, permission error, whatever...). But there doesn't seem to be a way for me to discover these errors from either exceptions that CsvBulkLoadTool throws or via the return code.

What's the best way to determine if CsvBulkLoadTool ran without error?

Thanks,
Jonathan

Re: Help calling CsvBulkLoadTool from Java Method

Posted by Gabriel Reid <ga...@gmail.com>.

Hi Jonathan,

It looks like this is a bug that was relatively recently introduced in
the bulk load tool (i.e. that the exit status is not correctly
reported if the bulk load fails). I've logged this as a jira ticket:
https://issues.apache.org/jira/browse/PHOENIX-2538.

This means that for now, there doesn't appear to be an automated way
of determining if the batch job was successful or not (the manual way
is looking at the logging that has been output).

- Gabriel


On Fri, Dec 18, 2015 at 12:17 AM, Cox, Jonathan A <ja...@sandia.gov> wrote:
> I’m wondering if somebody can provide some guidance on how to use
> CsvBulkLoadTool from within a Java Class, instead of via the command line,
> as is shown in the documentation. I’d like to determine if CsvBulkLoadTool
> ran without throwing any exceptions. However, exceptions generated by
> org.apache.phoenix.* don’t seem to be thrown up to the calling method
> (ToolRunner.run), or at least result in a non-zero return code.
>
>
>
> Here is what I am doing:
>
> import org.apache.phoenix.mapreduce.CsvBulkLoadTool;
>
> import org.apache.hadoop.util.ToolRunner;
>
>
>
> CsvBulkLoadTool csvBulkLoader = new CsvBulkLoadTool();
>
> int tret;
>
> String[] args = {"-d", "\t", "--table", “MyTableName”, "--input",
> "file://myfile.tsv"};
>
>
>
> tret = ToolRunner.run(csvBulkLoader, args);
>
>
>
> When I run this code, I get a return code (tret) of zero and no errors in
> the console output. However, the data is not loaded into HBase. When running
> the same commands on the command line, I discovered that Phoenix can throw
> various errors (e.g. wrong column datatype, permission error, whatever…).
> But there doesn’t seem to be a way for me to discover these errors from
> either exceptions that CsvBulkLoadTool throws or via the return code.
>
>
>
> What’s the best way to determine if CsvBulkLoadTool ran without error?
>
>
>
> Thanks,
>
> Jonathan