You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergio Pena <se...@cloudera.com> on 2016/07/22 21:45:43 UTC

Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs
-----

  common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 698efdc 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.

> On July 22, 2016, 10:05 p.m., Thomas Poepping wrote:
> > common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java, lines 44-46
> > <https://reviews.apache.org/r/50359/diff/1/?file=1451405#file1451405line44>
> >
> >     second @Steve Loughran's comment that we should pull this from a config file. maybe another config value for hive-site.xml, a comma separated value list of objectstore schemes? it need not all be S3 related, right?

Shoudn't be better if HDFS has a method to request for all blobstore scheme it supports? 
I think this method should help other non-hive components to see what Hadoop supports depending of the version.


On July 22, 2016, 10:05 p.m., Sergio Pena wrote:
> > We have multiple things to remember:
> >  - this needs to be extensible; not all objectstores are S3
> >  - we need this to be happening in the background, we can't have "if path is S3" in front of each time we find a tmpPath. that's not scalable (from a programmer's point of view, not a functionality point of view)

Agree. At some point we'd like to support the same blobstores hadoop currently supports.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143280
-----------------------------------------------------------


On July 26, 2016, 10:05 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 26, 2016, 10:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/ObjectStorageUtils.java PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/TestObjectStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Thomas Poepping <po...@amazon.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143280
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java (lines 44 - 46)
<https://reviews.apache.org/r/50359/#comment209063>

    second @Steve Loughran's comment that we should pull this from a config file. maybe another config value for hive-site.xml, a comma separated value list of objectstore schemes? it need not all be S3 related, right?



common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java (lines 26 - 27)
<https://reviews.apache.org/r/50359/#comment209059>

    suggest we use either junit.framework OR org.junit.



common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java (lines 30 - 47)
<https://reviews.apache.org/r/50359/#comment209061>

    could we have a second test method that tests your isObjectStoreFileSystem() function?
    
    you can mock the Filesystem objects with Mockito



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 6646 - 6654)
<https://reviews.apache.org/r/50359/#comment209062>

    as suggested on the Jira issue, is there a way we could move this logic to a helper function, to avoid having to change it in multiple places, or newcomers to this section of the code potentially forgetting to check this?


We have multiple things to remember:
 - this needs to be extensible; not all objectstores are S3
 - we need this to be happening in the background, we can't have "if path is S3" in front of each time we find a tmpPath. that's not scalable (from a programmer's point of view, not a functionality point of view)

- Thomas Poepping


On July 22, 2016, 9:45 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 22, 2016, 9:45 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/ObjectStoreUtils.java PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/TestObjectStoreUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 698efdc 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.

> On July 29, 2016, 11:33 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, lines 1840-1841
> > <https://reviews.apache.org/r/50359/diff/5/?file=1456814#file1456814line1840>
> >
> >     I don't follow this. Comment doesn't seem to match code. 
> >     FileSystem.rename() should automatically do copy+delete for S3. So, why do we need to do that explictly?
> >     Per your comment, you want to delete temp dir, but that should already be handled in Context::clear()
> >     Per your code, you are deleting preexisting files on target dir but as I said that should already be handled in fs.rename()

Yes, FileSystem.rename() is handleding the copy+delete for S3. However, for the INSERT OVERWRITE case, the temporary directory that contains 000000_0 also contains a .hive-staging directory that is also copied to S3. This .hive-staging directory should be deleted automatically on HDFS by the deleteOnExit() call, but when this directory is copied to S3, then this deleteOnExit flag is not copied, so the data is kept on S3.

I thought I could point statsTmpLoc to a different location instead. Then I found another place where another temporary directory in .hive-staging is created too. So, instead on fixing these 2, I thought that maybe this can be handled this way by doing it explicitly. Other developers may use getExtTmpPathRelTo() in the future again and they will add more temp data to .hive-staging, so I just wanted to prevent copying unwanted files to S3.

What do you think?


> On July 29, 2016, 11:33 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 1807
> > <https://reviews.apache.org/r/50359/diff/5/?file=1456814#file1456814line1807>
> >
> >     I am not sure if this change is really needed. But, if it does, won't be need equivalent in loadPartition() & loadDynamicPartitions().

Thanks.
I'll take a look at this.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144220
-----------------------------------------------------------


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144220
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 1803)
<https://reviews.apache.org/r/50359/#comment210221>

    I am not sure if this change is really needed. But, if it does, won't be need equivalent in loadPartition() & loadDynamicPartitions().



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (lines 1832 - 1833)
<https://reviews.apache.org/r/50359/#comment210220>

    I don't follow this. Comment doesn't seem to match code. 
    FileSystem.rename() should automatically do copy+delete for S3. So, why do we need to do that explictly?
    Per your comment, you want to delete temp dir, but that should already be handled in Context::clear()
    Per your code, you are deleting preexisting files on target dir but as I said that should already be handled in fs.rename()



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 7019)
<https://reviews.apache.org/r/50359/#comment210222>

    This will go to temp s3 location. you may want to move this to hdfs too.


- Ashutosh Chauhan


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.

> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1807-1814
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469107#file1469107line1807>
> >
> >     Why not use newly added Context::getTempDirForPath(Path path) here.

Yeah, sorry. This is a little confusing. 

The thing is that 'tmpDir' is based on 'dest' (tmpDir = baseCtx.getExternalTmpPath(dest)) where 'dest' is an HDFS temporary directory (not S3). This is the directory causing the .hive-staging to be created on S3 at the end, when HDFS temp dir was copied to S3 (INSERT OVERWRITE).

I found out that FileSinkDesc has a 'getDestPath' that returns you the S3 path. So, the condition is if the 'getDestPath' is on S3, then use 'getMRTmpPath', or continue using the temporary path based on 'dest' (HDFS temp path).

That part of the code was a little confusing regarding the names 'dest', 'getDestPath', 'getFinalDirName'. I was trying to understand this code, but I could not figure out the idea behind 'getFinalDirnName', and 'getDestPath'; so I ended up writing that condition. Also, the comments that were already there mentioned that the temp file should be in the same filesystem as the destination (in case of non-blobstore directories).


> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, lines 7020-7024
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line7020>
> >
> >     Why not use newly introduced tx.getTempDirForPath(dest_path); here?

This part was causing 72 tests failing due to the different scratch directory name. Also I wasn't sure why the stats temp was on the same location as 'queryTmpdir', so I added the condition too incase it has issues with encrypted zones. I like your line best, but I wasn't sure about it, and I ended up doing this condition.

I can do the 'ctx.getTempDirForPath' better. What do you think?


> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 6763
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line6763>
> >
> >     surprised that we weren't using getExternalTmpPathRelTo() here, did we miss this when we introduced this method for encrypt support work?

Mmm, i'm surprised too. Maybe we missed it.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
-----------------------------------------------------------


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 - 1814)
<https://reviews.apache.org/r/50359/#comment211415>

    Why not use newly added Context::getTempDirForPath(Path path) here.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 6763)
<https://reviews.apache.org/r/50359/#comment211416>

    surprised that we weren't using getExternalTmpPathRelTo() here, did we miss this when we introduced this method for encrypt support work?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 7024)
<https://reviews.apache.org/r/50359/#comment211418>

    Why not use newly introduced tx.getTempDirForPath(dest_path); here?


- Ashutosh Chauhan


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated Aug. 10, 2016, 9:08 p.m.)


Review request for hive.


Changes
-------

Changes on this patch:
- Use getTempDirForPath() for the statistics temp file and GenMapRedUtils temp file.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.

> On Aug. 10, 2016, 6:41 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1807-1814
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469107#file1469107line1807>
> >
> >     Not able to follow : )
> >     Are you doing this only to avoid copying .hive-staging dir? If so, you can use filter while copying to eliminate that, no?

Think more about this, I think you were right since the begginning. I can use 'getTempDirForPath(fileSinkDesc.getDestPath())' as it will use the same .hive-staging directory that is used in 'dest'.
I did some tests and it is working fine.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
-----------------------------------------------------------


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 - 1814)
<https://reviews.apache.org/r/50359/#comment211603>

    Not able to follow : )
    Are you doing this only to avoid copying .hive-staging dir? If so, you can use filter while copying to eliminate that, no?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 7024)
<https://reviews.apache.org/r/50359/#comment211602>

    yeah.. lets use ctx.getTempDirForPath() here.


- Ashutosh Chauhan


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Lefty Leverenz <le...@gmail.com>.

> On Aug. 10, 2016, 5:31 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3091-3092
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469104#file1469104line3091>
> >
> >     Tiny nit:  Either make "It" lowercase or move the parenthetical sentence after the first sentence, with a final period like this:
> >     
> >     "Enable the use of scratch directories directly on blob storage systems. (It may cause performance penalties.)"

Looks good now.  +1 for the parameter descriptions.


- Lefty


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
-----------------------------------------------------------


On Aug. 10, 2016, 9:08 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 10, 2016, 9:08 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Lefty Leverenz <le...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 3091 - 3092)
<https://reviews.apache.org/r/50359/#comment211495>

    Tiny nit:  Either make "It" lowercase or move the parenthetical sentence after the first sentence, with a final period like this:
    
    "Enable the use of scratch directories directly on blob storage systems. (It may cause performance penalties.)"


- Lefty Leverenz


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated Aug. 9, 2016, 7:53 p.m.)


Review request for hive.


Changes
-------

- Added new flag variable that allows users to use the table blobstorage location as scratch directory.
- Other minor fixes to allow tests to pass.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated Aug. 4, 2016, 4:29 p.m.)


Review request for hive.


Changes
-------

Addressed minor comments.

Removed the code that was duplicating the rename() to S3. Instead, it gets HDFS scratch directories for the required temporary files.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Lefty Leverenz <le...@gmail.com>.

> On July 30, 2016, 8:44 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3066-3067
> > <https://reviews.apache.org/r/50359/diff/5/?file=1456811#file1456811line3066>
> >
> >     Typo:  Commad-separated --> Comma-separated
> >     
> >     Redundancy:  "... supported blobstore schemes that Hive officially supports" (omit "supported")
> >     
> >     Nit:  A period could be added at the end.

Looks good now, thanks Sergio.


- Lefty


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
-----------------------------------------------------------


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Lefty Leverenz <le...@gmail.com>.

> On July 30, 2016, 8:44 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3066-3067
> > <https://reviews.apache.org/r/50359/diff/5/?file=1456811#file1456811line3066>
> >
> >     Typo:  Commad-separated --> Comma-separated
> >     
> >     Redundancy:  "... supported blobstore schemes that Hive officially supports" (omit "supported")
> >     
> >     Nit:  A period could be added at the end.
> 
> Lefty Leverenz wrote:
>     Looks good now, thanks Sergio.

Aarrgh, forgot to publish that.

Adding a trivial comment for the new config.


- Lefty


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
-----------------------------------------------------------


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Lefty Leverenz <le...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144255
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 3066 - 3067)
<https://reviews.apache.org/r/50359/#comment210256>

    Typo:  Commad-separated --> Comma-separated
    
    Redundancy:  "... supported blobstore schemes that Hive officially supports" (omit "supported")
    
    Nit:  A period could be added at the end.


- Lefty Leverenz


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated July 28, 2016, 8:11 p.m.)


Review request for hive.


Changes
-------

This patch adds a new configuration variable that contains supported blobstore schemes.

HIVE_BLOBSTORE_SUPPORTED_SCHEMES("hive.blobstore.supported.schemes", "s3,s3a,s3n",
            "Commad-separated list of supported blobstore schemes that Hive officially supports");


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f172c81fce20fe951df58f6561d28dc215 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.

> On July 28, 2016, 6:45 p.m., Reuben Kuhnert wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 3225
> > <https://reviews.apache.org/r/50359/diff/4/?file=1455838#file1455838line3225>
> >
> >     This code in both branches of 'if/else' are identical except for the 'destination path'. Maybe factor that out?

It looks the same, but there might be a sligther issue that needs to be tested if we refactor this part.
I recalled we have had several issues with this code, so better leave this way and fix it in another jira.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143975
-----------------------------------------------------------


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f172c81fce20fe951df58f6561d28dc215 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Reuben Kuhnert <si...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review143975
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 3217)
<https://reviews.apache.org/r/50359/#comment209915>

    This code in both branches of 'if/else' are identical except for the 'destination path'. Maybe factor that out?


- Reuben Kuhnert


On \u4e03\u6708 27, 2016, 10:56 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated \u4e03\u6708 27, 2016, 10:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated July 27, 2016, 10:56 p.m.)


Review request for hive.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated July 26, 2016, 10:24 p.m.)


Review request for hive.


Changes
-------

Changes on this patch:
- Added isBlobStorageFileSystem tests
- fix junit imports


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated July 26, 2016, 10:05 p.m.)


Review request for hive.


Changes
-------

Changes added on this patch:
- create a helper method on Context to get the temporary directory depending of the filesystem
- add more tests
- fix issue where staging directories where copied to s3


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/ObjectStorageUtils.java PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/TestObjectStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena