You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Jarek Cecho <ja...@apache.org> on 2013/10/01 21:49:20 UTC

Re: Review Request 14240: SQOOP-1032: Add the --bulk-load-dir option to support the HBase doBulkLoad function

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14240/#review26576
-----------------------------------------------------------


Thank you Alexandre for taking up this ticket!

Would you mind adding automated tests to ensure that the functionality is indeed working?


src/java/org/apache/sqoop/SqoopOptions.java
<https://reviews.apache.org/r/14240/#comment51816>

    Considering we already have output directory and we are using it as a temporary directory in case of a Hive import, what about converting this to boolean property and using the usual output directory for storing the data? This way we will be consistent with the rest of Sqoop.



src/java/org/apache/sqoop/SqoopOptions.java
<https://reviews.apache.org/r/14240/#comment51819>

    I would also recommend to make validations that this parameter is used only with hbase import job.



src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java
<https://reviews.apache.org/r/14240/#comment51820>

    This code seems to be to very high extent duplicated from ImportJobBase.runImport(). I'm wondering if we can just use the usual handlers to add the required functionality? Otherwise, I would much rather add new handlers if needed than copy and past the code.



src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java
<https://reviews.apache.org/r/14240/#comment51818>

    This seems to be very dangerous as at some point everyone can read the files, wouldn't be much secure to simply change the owner to the hbase user?


Jarcec

- Jarek Cecho


On Sept. 28, 2013, 12:35 a.m., Alexandre Normand wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14240/
> -----------------------------------------------------------
> 
> (Updated Sept. 28, 2013, 12:35 a.m.)
> 
> 
> Review request for Sqoop, Jarek Cecho and vasanthkumar.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is the patch to address some of the comments on Zhancheng Deng's review request (https://reviews.apache.org/r/13052/) which was basically the change to add the --bulk-load-dir option to support the HBase doBulkLoad function.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 9ceb5bd 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 5ccf311 
>   src/java/org/apache/sqoop/manager/SqlManager.java 2a4992d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857 
> 
> Diff: https://reviews.apache.org/r/14240/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexandre Normand
> 
>


Re: Review Request 14240: SQOOP-1032: Add the --bulk-load-dir option to support the HBase doBulkLoad function

Posted by Alexandre Normand <al...@gmail.com>.

> On Oct. 1, 2013, 7:49 p.m., Jarek Cecho wrote:
> > Thank you Alexandre for taking up this ticket!
> > 
> > Would you mind adding automated tests to ensure that the functionality is indeed working?
> 
> Alexandre Normand wrote:
>     I'd like to be able to integration test bulk load but this is still not a possibility in non-distributed mode. We currently have the same problems in some of our map/reduce jobs that use bulk loading. The way we've approached it with those is that we have two modes for each job: bulk load enabled/bulk load disabled. The integration tests always run with bulk load disabled. The "bulk load mode" is tested on a real cluster with automated tests that run from jenkins. 
>     
>     For what it's worth, I've done the manual testing on our cluster for this patch.
> 
> Jarek Cecho wrote:
>     Thank you for the feedback Alexandre! Do you happen to know if there is already a JIRA for adding the bulk load mode into the HBase MiniCluster? I tried to search for now, but didn't found it.

Yes, I think this captures the current blocker (although it's possible that other things in local mode would still need work once that's fixed): https://issues.apache.org/jira/browse/HBASE-7744.


- Alexandre


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14240/#review26576
-----------------------------------------------------------


On Oct. 4, 2013, 5:45 p.m., Alexandre Normand wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14240/
> -----------------------------------------------------------
> 
> (Updated Oct. 4, 2013, 5:45 p.m.)
> 
> 
> Review request for Sqoop, Jarek Cecho, Jean-Marc Spaggiari, and Vasanth kumar RJ.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is the patch to address some of the comments on Zhancheng Deng's review request (https://reviews.apache.org/r/13052/) which was basically the change to add the --bulk-load-dir option to support the HBase doBulkLoad function.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 9ceb5bd 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 5ccf311 
>   src/java/org/apache/sqoop/manager/SqlManager.java 2a4992d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ab7f21e 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 0eca991 
>   src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504 
> 
> Diff: https://reviews.apache.org/r/14240/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexandre Normand
> 
>


Re: Review Request 14240: SQOOP-1032: Add the --bulk-load-dir option to support the HBase doBulkLoad function

Posted by Jarek Cecho <ja...@apache.org>.

> On Oct. 1, 2013, 7:49 p.m., Jarek Cecho wrote:
> > Thank you Alexandre for taking up this ticket!
> > 
> > Would you mind adding automated tests to ensure that the functionality is indeed working?
> 
> Alexandre Normand wrote:
>     I'd like to be able to integration test bulk load but this is still not a possibility in non-distributed mode. We currently have the same problems in some of our map/reduce jobs that use bulk loading. The way we've approached it with those is that we have two modes for each job: bulk load enabled/bulk load disabled. The integration tests always run with bulk load disabled. The "bulk load mode" is tested on a real cluster with automated tests that run from jenkins. 
>     
>     For what it's worth, I've done the manual testing on our cluster for this patch.

Thank you for the feedback Alexandre! Do you happen to know if there is already a JIRA for adding the bulk load mode into the HBase MiniCluster? I tried to search for now, but didn't found it.


> On Oct. 1, 2013, 7:49 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java, lines 149-152
> > <https://reviews.apache.org/r/14240/diff/5/?file=359511#file359511line149>
> >
> >     This seems to be very dangerous as at some point everyone can read the files, wouldn't be much secure to simply change the owner to the hbase user?
> 
> Alexandre Normand wrote:
>     It would be more secure but it doesn't work. I just tried and I get:
>     Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Non-super user cannot change owner
>     
>     The actual alternative would be to use the new approach enabled by https://issues.apache.org/jira/browse/HBASE-5498 and described in https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
>     
>     But sqoop can't assume this patch is available, right?

Yeah, we can't assume that the HBASE-5498 is available at this point. Fair enough then.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14240/#review26576
-----------------------------------------------------------


On Oct. 4, 2013, 5:45 p.m., Alexandre Normand wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14240/
> -----------------------------------------------------------
> 
> (Updated Oct. 4, 2013, 5:45 p.m.)
> 
> 
> Review request for Sqoop, Jarek Cecho, Jean-Marc Spaggiari, and Vasanth kumar RJ.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is the patch to address some of the comments on Zhancheng Deng's review request (https://reviews.apache.org/r/13052/) which was basically the change to add the --bulk-load-dir option to support the HBase doBulkLoad function.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 9ceb5bd 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 5ccf311 
>   src/java/org/apache/sqoop/manager/SqlManager.java 2a4992d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ab7f21e 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 0eca991 
>   src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504 
> 
> Diff: https://reviews.apache.org/r/14240/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexandre Normand
> 
>


Re: Review Request 14240: SQOOP-1032: Add the --bulk-load-dir option to support the HBase doBulkLoad function

Posted by Alexandre Normand <al...@gmail.com>.

> On Oct. 1, 2013, 7:49 p.m., Jarek Cecho wrote:
> > Thank you Alexandre for taking up this ticket!
> > 
> > Would you mind adding automated tests to ensure that the functionality is indeed working?

I'd like to be able to integration test bulk load but this is still not a possibility in non-distributed mode. We currently have the same problems in some of our map/reduce jobs that use bulk loading. The way we've approached it with those is that we have two modes for each job: bulk load enabled/bulk load disabled. The integration tests always run with bulk load disabled. The "bulk load mode" is tested on a real cluster with automated tests that run from jenkins. 

For what it's worth, I've done the manual testing on our cluster for this patch. 


- Alexandre


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14240/#review26576
-----------------------------------------------------------


On Oct. 2, 2013, 9:21 p.m., Alexandre Normand wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14240/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2013, 9:21 p.m.)
> 
> 
> Review request for Sqoop, Jarek Cecho, Jean-Marc Spaggiari, and vasanthkumar.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is the patch to address some of the comments on Zhancheng Deng's review request (https://reviews.apache.org/r/13052/) which was basically the change to add the --bulk-load-dir option to support the HBase doBulkLoad function.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 9ceb5bd 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 5ccf311 
>   src/java/org/apache/sqoop/manager/SqlManager.java 2a4992d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ab7f21e 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 0eca991 
> 
> Diff: https://reviews.apache.org/r/14240/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexandre Normand
> 
>


Re: Review Request 14240: SQOOP-1032: Add the --bulk-load-dir option to support the HBase doBulkLoad function

Posted by Alexandre Normand <al...@gmail.com>.

> On Oct. 1, 2013, 7:49 p.m., Jarek Cecho wrote:
> > Thank you Alexandre for taking up this ticket!
> > 
> > Would you mind adding automated tests to ensure that the functionality is indeed working?
> 
> Alexandre Normand wrote:
>     I'd like to be able to integration test bulk load but this is still not a possibility in non-distributed mode. We currently have the same problems in some of our map/reduce jobs that use bulk loading. The way we've approached it with those is that we have two modes for each job: bulk load enabled/bulk load disabled. The integration tests always run with bulk load disabled. The "bulk load mode" is tested on a real cluster with automated tests that run from jenkins. 
>     
>     For what it's worth, I've done the manual testing on our cluster for this patch.
> 
> Jarek Cecho wrote:
>     Thank you for the feedback Alexandre! Do you happen to know if there is already a JIRA for adding the bulk load mode into the HBase MiniCluster? I tried to search for now, but didn't found it.
> 
> Alexandre Normand wrote:
>     Yes, I think this captures the current blocker (although it's possible that other things in local mode would still need work once that's fixed): https://issues.apache.org/jira/browse/HBASE-7744.

You've probably seen the activity on HBASE-7744. I'm trying to get this fixed but do you think this needs to block this sqoop patch? 


- Alexandre


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14240/#review26576
-----------------------------------------------------------


On Oct. 4, 2013, 5:45 p.m., Alexandre Normand wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14240/
> -----------------------------------------------------------
> 
> (Updated Oct. 4, 2013, 5:45 p.m.)
> 
> 
> Review request for Sqoop, Jarek Cecho, Jean-Marc Spaggiari, and Vasanth kumar RJ.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is the patch to address some of the comments on Zhancheng Deng's review request (https://reviews.apache.org/r/13052/) which was basically the change to add the --bulk-load-dir option to support the HBase doBulkLoad function.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 9ceb5bd 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 5ccf311 
>   src/java/org/apache/sqoop/manager/SqlManager.java 2a4992d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ab7f21e 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 0eca991 
>   src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504 
> 
> Diff: https://reviews.apache.org/r/14240/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexandre Normand
> 
>


Re: Review Request 14240: SQOOP-1032: Add the --bulk-load-dir option to support the HBase doBulkLoad function

Posted by Jarek Cecho <ja...@apache.org>.

> On Oct. 1, 2013, 7:49 p.m., Jarek Cecho wrote:
> > Thank you Alexandre for taking up this ticket!
> > 
> > Would you mind adding automated tests to ensure that the functionality is indeed working?
> 
> Alexandre Normand wrote:
>     I'd like to be able to integration test bulk load but this is still not a possibility in non-distributed mode. We currently have the same problems in some of our map/reduce jobs that use bulk loading. The way we've approached it with those is that we have two modes for each job: bulk load enabled/bulk load disabled. The integration tests always run with bulk load disabled. The "bulk load mode" is tested on a real cluster with automated tests that run from jenkins. 
>     
>     For what it's worth, I've done the manual testing on our cluster for this patch.
> 
> Jarek Cecho wrote:
>     Thank you for the feedback Alexandre! Do you happen to know if there is already a JIRA for adding the bulk load mode into the HBase MiniCluster? I tried to search for now, but didn't found it.
> 
> Alexandre Normand wrote:
>     Yes, I think this captures the current blocker (although it's possible that other things in local mode would still need work once that's fixed): https://issues.apache.org/jira/browse/HBASE-7744.
> 
> Alexandre Normand wrote:
>     You've probably seen the activity on HBASE-7744. I'm trying to get this fixed but do you think this needs to block this sqoop patch?

I think that we should be able to move the tests into separate JIRA since we are blocked on the HBASE-7744. Thank you very much for following up there Alexandre, appreciated!


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14240/#review26576
-----------------------------------------------------------


On Oct. 4, 2013, 5:45 p.m., Alexandre Normand wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14240/
> -----------------------------------------------------------
> 
> (Updated Oct. 4, 2013, 5:45 p.m.)
> 
> 
> Review request for Sqoop, Jarek Cecho, Jean-Marc Spaggiari, and Vasanth kumar RJ.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is the patch to address some of the comments on Zhancheng Deng's review request (https://reviews.apache.org/r/13052/) which was basically the change to add the --bulk-load-dir option to support the HBase doBulkLoad function.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 9ceb5bd 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 5ccf311 
>   src/java/org/apache/sqoop/manager/SqlManager.java 2a4992d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java ab7f21e 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 0eca991 
>   src/test/com/cloudera/sqoop/TestSqoopOptions.java 03e2504 
> 
> Diff: https://reviews.apache.org/r/14240/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexandre Normand
> 
>


Re: Review Request 14240: SQOOP-1032: Add the --bulk-load-dir option to support the HBase doBulkLoad function

Posted by Alexandre Normand <al...@gmail.com>.

> On Oct. 1, 2013, 7:49 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java, lines 149-152
> > <https://reviews.apache.org/r/14240/diff/5/?file=359511#file359511line149>
> >
> >     This seems to be very dangerous as at some point everyone can read the files, wouldn't be much secure to simply change the owner to the hbase user?

It would be more secure but it doesn't work. I just tried and I get:
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Non-super user cannot change owner

The actual alternative would be to use the new approach enabled by https://issues.apache.org/jira/browse/HBASE-5498 and described in https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load

But sqoop can't assume this patch is available, right? 


- Alexandre


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14240/#review26576
-----------------------------------------------------------


On Oct. 1, 2013, 8:30 p.m., Alexandre Normand wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14240/
> -----------------------------------------------------------
> 
> (Updated Oct. 1, 2013, 8:30 p.m.)
> 
> 
> Review request for Sqoop, Jarek Cecho, Jean-Marc Spaggiari, and vasanthkumar.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is the patch to address some of the comments on Zhancheng Deng's review request (https://reviews.apache.org/r/13052/) which was basically the change to add the --bulk-load-dir option to support the HBase doBulkLoad function.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 01805f9 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 9ceb5bd 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 5ccf311 
>   src/java/org/apache/sqoop/manager/SqlManager.java 2a4992d 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/HBaseBulkImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java ebb1857 
> 
> Diff: https://reviews.apache.org/r/14240/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexandre Normand
> 
>