You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Charles PORROT (JIRA)" <ji...@apache.org> on 2018/06/18 11:40:00 UTC

[jira] [Updated] (HBASE-20748) HBaseContext bulkLoad: being able to use custom versions

     [ https://issues.apache.org/jira/browse/HBASE-20748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles PORROT updated HBASE-20748:
-----------------------------------
    Description: 
The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark.HBaseContext_ use the system's current time for the version of the cells to bulk-load.

This makes this method, and its twin _bulkLoadThinRows_, useless if you need to use your own versionning system.

Thus, I propose a third _bulkLoad_ method, based on the original method. Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis for the writes, this new method would use an _Iterator(KeyFamilyQualifier, Array[Byte], Long_), with the _Long_ being the version.

In case of illogical version (for instance, a negative version), the method would throw back to the current timestamp.

See the attached file for a proposal of this new _bulkLoad_ method.

  was:
The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark_ use the system's current time for the version of the cells to bulk-load.

This makes this method, and its twin _bulkLoadThinRows_, useless if you need to use your own versionning system.

Thus, I propose a third _bulkLoad_ method, based on the original method. Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis for the writes, this new method would use an _Iterator(KeyFamilyQualifier, Array[Byte], Long_), with the _Long_ being the version.

In case of illogical version (for instance, a negative version), the method would throw back to the current timestamp.

See the attached file for a proposal of this new _bulkLoad_ method.


> HBaseContext bulkLoad: being able to use custom versions
> --------------------------------------------------------
>
>                 Key: HBASE-20748
>                 URL: https://issues.apache.org/jira/browse/HBASE-20748
>             Project: HBase
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Charles PORROT
>            Priority: Major
>              Labels: HBaseContext, bulkload, spark, versions
>         Attachments: bulkLoadCustomVersions.scala
>
>
> The _bulkLoad_ methods of _class org.apache.hadoop.hbase.spark.HBaseContext_ use the system's current time for the version of the cells to bulk-load.
> This makes this method, and its twin _bulkLoadThinRows_, useless if you need to use your own versionning system.
> Thus, I propose a third _bulkLoad_ method, based on the original method. Instead of using an _Iterator(KeyFamilyQualifier, Array[Byte])_ as the basis for the writes, this new method would use an _Iterator(KeyFamilyQualifier, Array[Byte], Long_), with the _Long_ being the version.
> In case of illogical version (for instance, a negative version), the method would throw back to the current timestamp.
> See the attached file for a proposal of this new _bulkLoad_ method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)