You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2011/04/18 20:28:05 UTC

[jira] [Created] (HIVE-2117) insert overwrite ignoring partition location

insert overwrite ignoring partition location
--------------------------------------------

                 Key: HIVE-2117
                 URL: https://issues.apache.org/jira/browse/HIVE-2117
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Patrick Hunt
            Priority: Critical


The following code works differently in 0.5.0 vs 0.7.0.

In 0.5.0 the partition location is respected. 

However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).

{code}
create table foo_stg (bar INT, car INT); 
load data local inpath 'data.txt' into table foo_stg;
 
create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
 
from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
{code}

>From what I can tell HIVE-1707 introduced this via a change to
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
specifically:

{code}
+      Path partPath = new Path(tbl.getDataLocation().getPath(),
+          Warehouse.makePartPath(partSpec));
+
+      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
+          .toUri().getAuthority(), partPath.toUri().getPath());
{code}

Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).

This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038354#comment-13038354 ] 

Hudson commented on HIVE-2117:
------------------------------

Integrated in Hive-trunk-h0.21 #745 (See [https://builds.apache.org/hudson/job/Hive-trunk-h0.21/745/])
    HIVE-2117. Insert overwrite ignoring partition location (Patrick Hunt via cws)

cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1126726
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/TestLocationQueries.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java
* /hive/trunk/ql/src/test/results/clientpositive/alter5.q.out
* /hive/trunk/ql/src/test/queries/clientpositive/alter5.q
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/BaseTestQueries.java


> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Query Processor
>    Affects Versions: 0.7.1, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037096#comment-13037096 ] 

Carl Steinbach commented on HIVE-2117:
--------------------------------------

+1. Will commit if tests pass.

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated HIVE-2117:
-------------------------------

             Priority: Blocker  (was: Critical)
    Affects Version/s: 0.8.0

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated HIVE-2117:
-------------------------------

    Attachment: HIVE-2117_trunk.patch
                HIVE-2117_br07.patch

Updated patch files for branch 0.7 and trunk.

This fixes the problem -- I've also added a new test which verifies the location used for the partition. I verified this failed before my patch and passes after applying my patch.

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036914#comment-13036914 ] 

jiraposter@reviews.apache.org commented on HIVE-2117:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/773/
-----------------------------------------------------------

Review request for hive and Carl Steinbach.


Summary
-------

This change resolves a regression introduced by HIVE-1707, specifically that the partition location (set via alter table partition location) is not being respected.

I addressed this by using the user specified location (as done originally), except in the case with cross-filesystem moves (which was the concern in 1707).


This addresses bug HIVE-2117.
    https://issues.apache.org/jira/browse/HIVE-2117


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java bcacd35 
  ql/src/test/org/apache/hadoop/hive/ql/BaseTestQueries.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 06a0447 
  ql/src/test/org/apache/hadoop/hive/ql/TestLocationQueries.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java 8c7c0b8 
  ql/src/test/queries/clientpositive/alter5.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter5.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/773/diff


Testing
-------

I added a new test which verifies partition location explicitly - as the existing tests ignore this detail. This test failed w/o my fix applied, it passes with the fix applied.


Thanks,

Patrick



> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036908#comment-13036908 ] 

jiraposter@reviews.apache.org commented on HIVE-2117:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/772/
-----------------------------------------------------------

Review request for hive and Carl Steinbach.


Summary
-------

This change resolves a regression introduced by HIVE-1707, specifically that the partition location (set via alter table partition location) is not being respected.

I addressed this by using the user specified location (as done originally), except in the case with cross-filesystem moves (which was the concern in 1707).


This addresses bug HIVE-2117.
    https://issues.apache.org/jira/browse/HIVE-2117


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 916b235 
  ql/src/test/org/apache/hadoop/hive/ql/BaseTestQueries.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 4685471 
  ql/src/test/org/apache/hadoop/hive/ql/TestLocationQueries.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java 8c7c0b8 
  ql/src/test/queries/clientpositive/alter5.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter5.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/772/diff


Testing
-------

I added a new test which verifies partition location explicitly - as the existing tests ignore this detail. This test failed w/o my fix applied, it passes with the fix applied.


Thanks,

Patrick



> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2117:
---------------------------------

          Component/s: Query Processor
                       Metastore
    Affects Version/s:     (was: 0.7.0)
                       0.7.1

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Query Processor
>    Affects Versions: 0.7.1, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2117:
---------------------------------

    Affects Version/s:     (was: 0.7.1)

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Query Processor
>    Affects Versions: 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2117:
---------------------------------

    Fix Version/s: 0.7.1
                   0.8.0

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Query Processor
>    Affects Versions: 0.7.1, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 0.7.1, 0.8.0
>
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt reassigned HIVE-2117:
----------------------------------

    Assignee: Patrick Hunt

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036912#comment-13036912 ] 

Patrick Hunt commented on HIVE-2117:
------------------------------------

I posted reviews up on reviewboard:
trunk: https://reviews.apache.org/r/773/
branch-0.7: https://reviews.apache.org/r/772/


> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated HIVE-2117:
-------------------------------

    Attachment: data.txt

single row of data that is used to reproduce the issue

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated HIVE-2117:
-------------------------------

    Status: Patch Available  (was: Open)

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2117:
---------------------------------

    Affects Version/s: 0.7.1

Backported to branch-0.7

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Query Processor
>    Affects Versions: 0.7.1, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2117:
---------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Patrick!

> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Query Processor
>    Affects Versions: 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2117) insert overwrite ignoring partition location

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated HIVE-2117:
-------------------------------

    Attachment: HIVE-2117_br07.patch

This patch is a work in progress, it resolves this jira in branch 0.7 while maintaining compatibility with the requirements from HIVE-1707. All unit tests are passing with this patch applied, also fixes the example I provided in the description (managed and external table).

I'm still working on two aspects of this JIRA; 1) creating a patch for trunk, and 2) adding unit tests to verify this behavior.


> insert overwrite ignoring partition location
> --------------------------------------------
>
>                 Key: HIVE-2117
>                 URL: https://issues.apache.org/jira/browse/HIVE-2117
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: HIVE-2117_br07.patch, data.txt
>
>
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta", the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that <path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
>  
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4'; 
> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
>  
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>, boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully, however given the partition location is explicitly specified for the partition in question it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based code. I'd like to take a stab at addressing this, any suggestions?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira