You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2018/02/01 16:20:00 UTC

[jira] [Commented] (HIVE-18350) load data should rename files consistent with insert statements

    [ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348834#comment-16348834 ] 

Hive QA commented on HIVE-18350:
--------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m  0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 19s{color} | {color:red} standalone-metastore: The patch generated 1 new + 429 unchanged - 1 fixed = 430 total (was 430) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 41s{color} | {color:red} ql: The patch generated 6 new + 186 unchanged - 2 fixed = 192 total (was 188) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 52s{color} | {color:red} root: The patch generated 9 new + 665 unchanged - 3 fixed = 674 total (was 668) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 10s{color} | {color:red} itests/hcatalog-unit: The patch generated 2 new + 20 unchanged - 0 fixed = 22 total (was 20) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 27s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 419593e |
| Default Java | 1.8.0_111 |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8969/yetus/diff-checkstyle-standalone-metastore.txt |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8969/yetus/diff-checkstyle-ql.txt |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8969/yetus/diff-checkstyle-root.txt |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-8969/yetus/diff-checkstyle-itests_hcatalog-unit.txt |
| whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-8969/yetus/whitespace-eol.txt |
| modules | C: standalone-metastore ql hcatalog/core . itests/hcatalog-unit itests/hive-unit U: . |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-8969/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> load data should rename files consistent with insert statements
> ---------------------------------------------------------------
>
>                 Key: HIVE-18350
>                 URL: https://issues.apache.org/jira/browse/HIVE-18350
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>            Priority: Major
>         Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch
>
>
> Insert statements create files of format ending with 0000_0, 0001_0 etc. However, the load data uses the input file name. That results in inconsistent naming convention which makes SMB joins difficult in some scenarios and may cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they were named by the user.
>  For bucketed table, hive relies on user to name the files matching the bucket in non-strict mode. Hive assumes that the data belongs to same bucket in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty significant due to which it is further divided into two subtasks for smoother merge.
> For existing tables in customer database, it is recommended to reload bucketed tables otherwise if customer tries to run SMB join and there is a bucket for which there is no split, then there is a possibility of getting incorrect results. However, this is not a regression as it would happen even without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)