You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Stephen Durfey (JIRA)" <ji...@apache.org> on 2017/12/07 16:00:01 UTC

[jira] [Updated] (CRUNCH-659) Upgrade to Hive 2.x

     [ https://issues.apache.org/jira/browse/CRUNCH-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Durfey updated CRUNCH-659:
----------------------------------
    Attachment: CRUNCH-659_v1.patch

added patch. updates hive to 2.1 and hadoop to 2.6.0. everything seems to build just fine after these two bumps. the logging differences i think will require both log4j and log4j2 properties files to be provided. the bridge dependency i mentioned routes log4j statements into log4j2 log system. so, log4j2 is still necessary. 

> Upgrade to Hive 2.x
> -------------------
>
>                 Key: CRUNCH-659
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-659
>             Project: Crunch
>          Issue Type: Task
>            Reporter: Stephen Durfey
>            Assignee: Stephen Durfey
>         Attachments: CRUNCH-659_v1.patch
>
>
> I've been working on CRUNCH-340 to finish implementing the HCatSource and HCatTarget. It seems to be in a better place now that crunch only supports hadoop 2. I was looking to target as high of a version of hive/hcat as possible with minimal impact on the code base and dependencies. 
> Hive 2.3.1 is out now. That relies upon hadoop 2.7.2, but HBase doesn't bump up to that version until HBase 2.x. Trying to run with hadoop 2.7.2 causes test failures in crunch-hbase. I'm not sure if that is going to cause runtime issues as the minicluster wouldn't even start due to a package name change in hadoop-hdfs (for the class StorageType) that's causing a no class found error. 
> Hive 2.1.0 relies upon Hadoop 2.6.0, and that plays nice with HBase 1.x. However, the class StructField (inside TupleObjectInspector for ORC files) has a new abstract method added to it that would need to be implemented that was introduced after 2.x of Hive. Other than that everything runs fine. 
> Currently Crunch is on 0.13.1 of Hive, so it's pretty far behind. I'm just kind of looking for feedback on the version bumps that should be targeted for my changes in CRUNCH-340. I wanted to take care of those first in a separate JIRA before introducing new code against a higher Hive version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)