You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Guido Serra aka Zeph (JIRA)" <ji...@apache.org> on 2013/01/22 18:38:14 UTC

[jira] [Updated] (SQOOP-834) duplicate of data exporting to hbase

     [ https://issues.apache.org/jira/browse/SQOOP-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guido Serra aka Zeph updated SQOOP-834:
---------------------------------------

    Description: 
calling the HBASE Put.add() statement on an unchanged (previously inserted) row/value 
will cause a data duplication (only the timestamp associated will be incremented)

{code}
hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => "mysql:created_at", VERSIONS => 4}
COLUMN                             CELL                                                                                             
mysql:created_at                  timestamp=1358853505756, value=2011-12-21 18:07:38.0                                             
mysql:created_at                  timestamp=1358790515451, value=2011-12-21 18:07:38.0                                             
2 row(s) in 0.0040 seconds
{code}

today's sqoop run
{code}
hbase(main):031:0> Date.new(1358853505756).toString()
=> "Tue Jan 22 11:18:25 UTC 2013"
{code}
yesterday's sqoop run
{code}
hbase(main):032:0> Date.new(1358790515451).toString()
=> "Mon Jan 21 17:48:35 UTC 2013"
{code}

I did verified that this is a desired behavior on server side, according to HBASE-7645

I'd expect instead that a rerun of SQOOP would not cause a reversioning of all rows in the tables in HBase, but just an update of the changed fields


  was:
calling the HBASE Put.add() statement on an unchanged (previously inserted) row/value will cause a data duplication (only the timestamp associated will be incremented)

I did verified that this is a desired behavior on server side, according to HBASE-7645

I'd expect instead that a rerun of SQOOP would not cause a reversioning of all rows in the tables in HBase, but just an update of the changed fields


    
> duplicate of data exporting to hbase
> ------------------------------------
>
>                 Key: SQOOP-834
>                 URL: https://issues.apache.org/jira/browse/SQOOP-834
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Guido Serra aka Zeph
>            Assignee: Guido Serra aka Zeph
>
> calling the HBASE Put.add() statement on an unchanged (previously inserted) row/value 
> will cause a data duplication (only the timestamp associated will be incremented)
> {code}
> hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => "mysql:created_at", VERSIONS => 4}
> COLUMN                             CELL                                                                                             
> mysql:created_at                  timestamp=1358853505756, value=2011-12-21 18:07:38.0                                             
> mysql:created_at                  timestamp=1358790515451, value=2011-12-21 18:07:38.0                                             
> 2 row(s) in 0.0040 seconds
> {code}
> today's sqoop run
> {code}
> hbase(main):031:0> Date.new(1358853505756).toString()
> => "Tue Jan 22 11:18:25 UTC 2013"
> {code}
> yesterday's sqoop run
> {code}
> hbase(main):032:0> Date.new(1358790515451).toString()
> => "Mon Jan 21 17:48:35 UTC 2013"
> {code}
> I did verified that this is a desired behavior on server side, according to HBASE-7645
> I'd expect instead that a rerun of SQOOP would not cause a reversioning of all rows in the tables in HBase, but just an update of the changed fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira