You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Micah Whitacre (JIRA)" <ji...@apache.org> on 2015/01/14 15:35:34 UTC

[jira] [Commented] (CRUNCH-475) Compilation problem caused by KeyValue -> Cell conversion

    [ https://issues.apache.org/jira/browse/CRUNCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276957#comment-14276957 ] 

Micah Whitacre commented on CRUNCH-475:
---------------------------------------

* Hadoop dependency goes from 2.2-> 2.4 but should we just go to 2.5? (not sure of passivity between the versions)
* can remove hbase.midfix because looks to only be used for compat dependencies.
* HFileTargetIT/HFileUtils uses multi imports which as Gabriel pointed out in another issue our non-existent code conventions discourage that.
* for the use cases where we are trying to pull the value out of the Cell should we use CellUtil.cloneValue() instead of using the value array, length, and offset? (e.g. HFileTargetIT)
* Should add comment about why in HBaseTypes.keyValueToBytes we still try to convert the if an IOException is thrown.
* HFileTargetIT creates Cells using KeyValue, but I think you should be able to use CellUtil.createCell(...) instead of just depending on the KeyValue class.
* Implementation choice HFileInputFormat/HFileUtils you are doing custom byte comparison of the row key/column family.  There is a method on CellComparator for doing that (though annotations claim it is private)
* HFileUtils.EXTRACT_ROW_FN can make use of CellUtil.cloneRow(...) vs the Array.copyOfRange(...)
* can clean up this code a bit:
{code}
+    PCollection<Cell> kvs = puts.parallelDo("ConvertPutToKeyValue", new DoFn<Put, Cell>() {
       @Override
-      public void process(Put input, Emitter<KeyValue> emitter) {
+      public void process(Put input, Emitter<Cell> emitter) {
         for (List<KeyValue> keyValues : input.getFamilyMap().values()) {
           for (KeyValue keyValue : keyValues) {
             emitter.emit(keyValue);
           }
         }
       }
{code}
** Should rename the parallel do to say Cell vs KeyValue.
** Can do input.getFamilyCellMap() to get iterable of Cells vs KeyValues.


> Compilation problem caused by KeyValue -> Cell conversion
> ---------------------------------------------------------
>
>                 Key: CRUNCH-475
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-475
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.12.0
>            Reporter: Lee Dongjin
>            Assignee: Josh Wills
>            Priority: Minor
>         Attachments: CRUNCH-475.patch, CRUNCH-475.patch
>
>
> From hbase 0.99, Using KeyValue class for hbase I/O is deprecated and in many APIs it was replaced with Cell interface[^1][^2][^3]. This change causes compilation error with hbase 0.99, which is the first hbase version that supports hadoop 2 only.
> For this change will be permanent from hbase 1.0 and on, it would be better to be fixed.
> [^1]: https://issues.apache.org/jira/browse/HBASE-11805
> [^2]: https://issues.apache.org/jira/browse/HBASE-9359
> [^3]: https://issues.apache.org/jira/browse/HBASE-10526



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)