You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by daniel voros <da...@gmail.com> on 2017/12/05 09:25:32 UTC

Review Request 64333: Incremental import to HBase deletes only last version of column

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64333/
-----------------------------------------------------------

Review request for Sqoop.


Bugs: SQOOP-3267
    https://issues.apache.org/jira/browse/SQOOP-3267


Repository: sqoop-trunk


Description
-------

Deletes are supported since SQOOP-3149, but we're only deleting the last version of a column when the corresponding cell was set to NULL in the source table.

This can lead to unexpected and misleading results if the row has been transferred multiple times, which can easily happen if it's being modified on the source side.

Also SQOOP-3149 is using a new Put command for every column instead of a single Put per row as before. This could probably lead to a performance drop for wide tables (for which HBase is otherwise usually recommended).


Diffs
-----

  src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 20bf1b96c369613672725e171fe9dc0469feb294 


Diff: https://reviews.apache.org/r/64333/diff/1/


Testing
-------


Thanks,

daniel voros


Re: Review Request 64333: Incremental import to HBase deletes only last version of column

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64333/#review198111
-----------------------------------------------------------


Fix it, then Ship it!




Hi Dani,

Thank you for fixing my findings, I have double checked everything, the tests are running fine, I have found one small typo, please fix it and I can commit it.


src/docs/man/hbase-args.txt
Lines 40 (patched)
<https://reviews.apache.org/r/64333/#comment278220>

    typo: colums


- Szabolcs Vasas


On Dec. 5, 2017, 9:25 a.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64333/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2017, 9:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3267
>     https://issues.apache.org/jira/browse/SQOOP-3267
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a column when the corresponding cell was set to NULL in the source table.
> 
> This can lead to unexpected and misleading results if the row has been transferred multiple times, which can easily happen if it's being modified on the source side.
> 
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per row as before. This could probably lead to a performance drop for wide tables (for which HBase is otherwise usually recommended).
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hbase-args.txt afd5c5b 
>   src/docs/user/hbase-args.txt 53040f5 
>   src/docs/user/hbase.txt ab4aedc 
>   src/java/org/apache/sqoop/SqoopOptions.java 73d0757 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 27d6006 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 0bd6169 
>   src/java/org/apache/sqoop/mapreduce/HBaseImportJob.java 33da487 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java ce21918 
>   src/test/org/apache/sqoop/TestSqoopOptions.java 16901ca 
>   src/test/org/apache/sqoop/hbase/HBaseImportTest.java 2e73cf3 
>   src/test/org/apache/sqoop/hbase/HBaseTestCase.java 98f8698 
> 
> 
> Diff: https://reviews.apache.org/r/64333/diff/4/
> 
> 
> Testing
> -------
> 
> - all unit tests and thirdparty tests passed
>  - splitted previous test for incremental import into two, to test both modes (ignore, delete)
>  - tested on a cluster with HBase (with saved jobs as well)
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Re: Review Request 64333: Incremental import to HBase deletes only last version of column

Posted by Attila Szabo <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64333/#review197746
-----------------------------------------------------------


Fix it, then Ship it!




Hi,

The changes in the HBaseImportTest are not clean enough yet (please apply the requested cleanup), but otherwise LGTM!


src/test/org/apache/sqoop/hbase/HBaseImportTest.java
Lines 122-155 (patched)
<https://reviews.apache.org/r/64333/#comment278004>

    This testcase is a bit hard to read for me.
    Nontheless it resues several local variables with different content.
    It also does not provide enough information for the first look what is happening, what is expected, and why (why vals are set like this, what does updateTable do, why do we expect that we do after runImport).
    
    Could you please clean this up a little bit?


- Attila Szabo


On Dec. 5, 2017, 9:25 a.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64333/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2017, 9:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3267
>     https://issues.apache.org/jira/browse/SQOOP-3267
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a column when the corresponding cell was set to NULL in the source table.
> 
> This can lead to unexpected and misleading results if the row has been transferred multiple times, which can easily happen if it's being modified on the source side.
> 
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per row as before. This could probably lead to a performance drop for wide tables (for which HBase is otherwise usually recommended).
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hbase-args.txt afd5c5b 
>   src/docs/user/hbase-args.txt 53040f5 
>   src/docs/user/hbase.txt ab4aedc 
>   src/java/org/apache/sqoop/SqoopOptions.java 73d0757 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 27d6006 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 0bd6169 
>   src/java/org/apache/sqoop/mapreduce/HBaseImportJob.java 33da487 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java ce21918 
>   src/test/org/apache/sqoop/TestSqoopOptions.java 16901ca 
>   src/test/org/apache/sqoop/hbase/HBaseImportTest.java 2e73cf3 
>   src/test/org/apache/sqoop/hbase/HBaseTestCase.java 98f8698 
> 
> 
> Diff: https://reviews.apache.org/r/64333/diff/2/
> 
> 
> Testing
> -------
> 
> - all unit tests and thirdparty tests passed
>  - splitted previous test for incremental import into two, to test both modes (ignore, delete)
>  - tested on a cluster with HBase (with saved jobs as well)
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Re: Review Request 64333: Incremental import to HBase deletes only last version of column

Posted by Szabolcs Vasas <va...@gmail.com>.

> On Feb. 20, 2018, 5:16 p.m., Szabolcs Vasas wrote:
> > Hi Dani,
> > 
> > Thank you for the patch, it is also great that you have documented it. I have left only a minor comment.
> > 
> > Szabolcs

Sorry, there is one more small thing. I got the following warning when I applied the patch:
SQOOP-3267.3.patch:13: new blank line at EOF.
    +
    warning: 1 line adds whitespace errors.
    
It seems you have added extra new lines at the end of hbase-args.txt. It would be great, if you could fix this one too.
Thanks!


- Szabolcs


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64333/#review197784
-----------------------------------------------------------


On Dec. 5, 2017, 9:25 a.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64333/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2017, 9:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3267
>     https://issues.apache.org/jira/browse/SQOOP-3267
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a column when the corresponding cell was set to NULL in the source table.
> 
> This can lead to unexpected and misleading results if the row has been transferred multiple times, which can easily happen if it's being modified on the source side.
> 
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per row as before. This could probably lead to a performance drop for wide tables (for which HBase is otherwise usually recommended).
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hbase-args.txt afd5c5b 
>   src/docs/user/hbase-args.txt 53040f5 
>   src/docs/user/hbase.txt ab4aedc 
>   src/java/org/apache/sqoop/SqoopOptions.java 73d0757 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 27d6006 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 0bd6169 
>   src/java/org/apache/sqoop/mapreduce/HBaseImportJob.java 33da487 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java ce21918 
>   src/test/org/apache/sqoop/TestSqoopOptions.java 16901ca 
>   src/test/org/apache/sqoop/hbase/HBaseImportTest.java 2e73cf3 
>   src/test/org/apache/sqoop/hbase/HBaseTestCase.java 98f8698 
> 
> 
> Diff: https://reviews.apache.org/r/64333/diff/3/
> 
> 
> Testing
> -------
> 
> - all unit tests and thirdparty tests passed
>  - splitted previous test for incremental import into two, to test both modes (ignore, delete)
>  - tested on a cluster with HBase (with saved jobs as well)
> 
> 
> Thanks,
> 
> daniel voros
> 
>


Re: Review Request 64333: Incremental import to HBase deletes only last version of column

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64333/#review197784
-----------------------------------------------------------



Hi Dani,

Thank you for the patch, it is also great that you have documented it. I have left only a minor comment.

Szabolcs


src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java
Line 202 (original), 210 (patched)
<https://reviews.apache.org/r/64333/#comment278066>

    Can you please add a break statement here? I know it does not matter in this case but if we add a new value to the enum it might cause an issue.


- Szabolcs Vasas


On Dec. 5, 2017, 9:25 a.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64333/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2017, 9:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3267
>     https://issues.apache.org/jira/browse/SQOOP-3267
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Deletes are supported since SQOOP-3149, but we're only deleting the last version of a column when the corresponding cell was set to NULL in the source table.
> 
> This can lead to unexpected and misleading results if the row has been transferred multiple times, which can easily happen if it's being modified on the source side.
> 
> Also SQOOP-3149 is using a new Put command for every column instead of a single Put per row as before. This could probably lead to a performance drop for wide tables (for which HBase is otherwise usually recommended).
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hbase-args.txt afd5c5b 
>   src/docs/user/hbase-args.txt 53040f5 
>   src/docs/user/hbase.txt ab4aedc 
>   src/java/org/apache/sqoop/SqoopOptions.java 73d0757 
>   src/java/org/apache/sqoop/hbase/HBasePutProcessor.java 27d6006 
>   src/java/org/apache/sqoop/hbase/ToStringPutTransformer.java 0bd6169 
>   src/java/org/apache/sqoop/mapreduce/HBaseImportJob.java 33da487 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java ce21918 
>   src/test/org/apache/sqoop/TestSqoopOptions.java 16901ca 
>   src/test/org/apache/sqoop/hbase/HBaseImportTest.java 2e73cf3 
>   src/test/org/apache/sqoop/hbase/HBaseTestCase.java 98f8698 
> 
> 
> Diff: https://reviews.apache.org/r/64333/diff/3/
> 
> 
> Testing
> -------
> 
> - all unit tests and thirdparty tests passed
>  - splitted previous test for incremental import into two, to test both modes (ignore, delete)
>  - tested on a cluster with HBase (with saved jobs as well)
> 
> 
> Thanks,
> 
> daniel voros
> 
>