You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Mithun Radhakrishnan (Jira)" <ji...@apache.org> on 2020/01/24 17:45:00 UTC
[jira] [Comment Edited] (HIVE-22771) Partition location incorrectly
formed in FileOutputCommitterContainer
[ https://issues.apache.org/jira/browse/HIVE-22771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023137#comment-17023137 ]
Mithun Radhakrishnan edited comment on HIVE-22771 at 1/24/20 5:44 PM:
----------------------------------------------------------------------
Hmm... This looks like a good catch to me. It looks like the static-partitioning case was missed/mishandled before.
The dynamic-partitioning case seems to be handled better, with a more greedy regex:
[https://github.com/apache/hive/blob/706c1d44f7734ed36900a02c7255c1d0ce0ad45a/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L867]
[~shivam-mohan], you might want to post the patch here, for {{master}} as well, and have CI run tests. The fix seems good to me, on the face of it.
was (Author: mithun):
Hmm... This looks like a good catch to me. It looks like the static-partitioning case was missed/mishandled before.
The dynamic-partitioning case seems to be handled better, with a more greedy regex:
[https://github.com/apache/hive/blob/706c1d44f7734ed36900a02c7255c1d0ce0ad45a/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L867]
[~shivam-mohan], you might want to post the patch here, for {{master}} as well, and have CI run tests. The fix seems good to me, on the face of it.
> Partition location incorrectly formed in FileOutputCommitterContainer
> ---------------------------------------------------------------------
>
> Key: HIVE-22771
> URL: https://issues.apache.org/jira/browse/HIVE-22771
> Project: Hive
> Issue Type: Bug
> Components: HCatalog
> Affects Versions: 1.2.1
> Reporter: Shivam
> Priority: Critical
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Class _HCatOutputFormat_ in package _org.apache.hive.hcatalog.mapreduce_ uses function _setOutput_ to generate _idHash_ using below statement:
> +In file org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java+
> *line 116: idHash = String.valueOf(Math.random());*
> The output of idHash can be similar to values like this : 7.145347157239135E-4
>
> And,
> in class _FileOutputCommitterContainer_ in package _org.apache.hive.hcatalog.mapreduce;_
> Uses below statement to compute final partition path:
> +*In org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java*+
> *line 366: String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + SCRATCH_DIR_NAME + "{color:#FF0000}\\d\\.?\\d+"{color},"");*
> *line 367: partPath = new Path(finalLocn);*
>
> Regex used here is incorrect, since it will only remove integers after the *SCRATCH_DIR_NAME,* and hence will append 'E-4' (for the above example) in the final partition location.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)