You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergio Pena <se...@cloudera.com> on 2016/08/04 16:29:29 UTC

Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated Aug. 4, 2016, 4:29 p.m.)


Review request for hive.


Changes
-------

Addressed minor comments.

Removed the code that was duplicating the rename() to S3. Instead, it gets HDFS scratch directories for the required temporary files.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.

> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1807-1814
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469107#file1469107line1807>
> >
> >     Why not use newly added Context::getTempDirForPath(Path path) here.

Yeah, sorry. This is a little confusing. 

The thing is that 'tmpDir' is based on 'dest' (tmpDir = baseCtx.getExternalTmpPath(dest)) where 'dest' is an HDFS temporary directory (not S3). This is the directory causing the .hive-staging to be created on S3 at the end, when HDFS temp dir was copied to S3 (INSERT OVERWRITE).

I found out that FileSinkDesc has a 'getDestPath' that returns you the S3 path. So, the condition is if the 'getDestPath' is on S3, then use 'getMRTmpPath', or continue using the temporary path based on 'dest' (HDFS temp path).

That part of the code was a little confusing regarding the names 'dest', 'getDestPath', 'getFinalDirName'. I was trying to understand this code, but I could not figure out the idea behind 'getFinalDirnName', and 'getDestPath'; so I ended up writing that condition. Also, the comments that were already there mentioned that the temp file should be in the same filesystem as the destination (in case of non-blobstore directories).


> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, lines 7020-7024
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line7020>
> >
> >     Why not use newly introduced tx.getTempDirForPath(dest_path); here?

This part was causing 72 tests failing due to the different scratch directory name. Also I wasn't sure why the stats temp was on the same location as 'queryTmpdir', so I added the condition too incase it has issues with encrypted zones. I like your line best, but I wasn't sure about it, and I ended up doing this condition.

I can do the 'ctx.getTempDirForPath' better. What do you think?


> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 6763
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line6763>
> >
> >     surprised that we weren't using getExternalTmpPathRelTo() here, did we miss this when we introduced this method for encrypt support work?

Mmm, i'm surprised too. Maybe we missed it.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
-----------------------------------------------------------


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 - 1814)
<https://reviews.apache.org/r/50359/#comment211415>

    Why not use newly added Context::getTempDirForPath(Path path) here.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 6763)
<https://reviews.apache.org/r/50359/#comment211416>

    surprised that we weren't using getExternalTmpPathRelTo() here, did we miss this when we introduced this method for encrypt support work?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 7024)
<https://reviews.apache.org/r/50359/#comment211418>

    Why not use newly introduced tx.getTempDirForPath(dest_path); here?


- Ashutosh Chauhan


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated Aug. 10, 2016, 9:08 p.m.)


Review request for hive.


Changes
-------

Changes on this patch:
- Use getTempDirForPath() for the statistics temp file and GenMapRedUtils temp file.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.

> On Aug. 10, 2016, 6:41 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1807-1814
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469107#file1469107line1807>
> >
> >     Not able to follow : )
> >     Are you doing this only to avoid copying .hive-staging dir? If so, you can use filter while copying to eliminate that, no?

Think more about this, I think you were right since the begginning. I can use 'getTempDirForPath(fileSinkDesc.getDestPath())' as it will use the same .hive-staging directory that is used in 'dest'.
I did some tests and it is working fine.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
-----------------------------------------------------------


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 - 1814)
<https://reviews.apache.org/r/50359/#comment211603>

    Not able to follow : )
    Are you doing this only to avoid copying .hive-staging dir? If so, you can use filter while copying to eliminate that, no?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 7024)
<https://reviews.apache.org/r/50359/#comment211602>

    yeah.. lets use ctx.getTempDirForPath() here.


- Ashutosh Chauhan


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Lefty Leverenz <le...@gmail.com>.

> On Aug. 10, 2016, 5:31 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3091-3092
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469104#file1469104line3091>
> >
> >     Tiny nit:  Either make "It" lowercase or move the parenthetical sentence after the first sentence, with a final period like this:
> >     
> >     "Enable the use of scratch directories directly on blob storage systems. (It may cause performance penalties.)"

Looks good now.  +1 for the parameter descriptions.


- Lefty


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
-----------------------------------------------------------


On Aug. 10, 2016, 9:08 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 10, 2016, 9:08 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Lefty Leverenz <le...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 3091 - 3092)
<https://reviews.apache.org/r/50359/#comment211495>

    Tiny nit:  Either make "It" lowercase or move the parenthetical sentence after the first sentence, with a final period like this:
    
    "Enable the use of scratch directories directly on blob storage systems. (It may cause performance penalties.)"


- Lefty Leverenz


On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 7:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                                           3.651s
> - insert into table s3dummy values (1);                                                                 39.231s
> - insert overwrite table s3dummy values (1);                                                            42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
> - insert into table s3dummy_ext values (1);                                                             45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
> - insert into table s3dummy values (1);                                                                 15.025s
> - insert overwrite table s3dummy values (1);                                                            25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
> - from dummy insert overwrite table s3dummy select *;                                                   25.469s      
> - from dummy insert into table s3dummy select *;                                                        14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
> - insert into table s3dummy_ext values (1);                                                             16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
> - alter table s3dummypart add partition (part=1);                                                        3.229s
> - alter table s3dummypart add partition (part=2);                                                        3.124s
> - insert into table s3dummypart partition (part=1) values (1);                                          14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------

(Updated Aug. 9, 2016, 7:53 p.m.)


Review request for hive.


Changes
-------

- Added new flag variable that allows users to use the table blobstorage location as scratch directory.
- Other minor fixes to allow tests to pass.


Bugs: HIVE-14270
    https://issues.apache.org/jira/browse/HIVE-14270


Repository: hive-git


Description
-------

This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a 
  common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50359/diff/


Testing
-------

NO PATCH
** NON-PARTITIONED TABLE

- create table dummy (id int);                                                                           3.651s
- insert into table s3dummy values (1);                                                                 39.231s
- insert overwrite table s3dummy values (1);                                                            42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     30.136s

EXTERNAL TABLE

- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       9.297s
- insert into table s3dummy_ext values (1);                                                             45.855s

WITH PATCH

** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';               3.945s
- insert into table s3dummy values (1);                                                                 15.025s
- insert overwrite table s3dummy values (1);                                                            25.149s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy;                     19.158s      
- from dummy insert overwrite table s3dummy select *;                                                   25.469s      
- from dummy insert into table s3dummy select *;                                                        14.501s

** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';       4.827s
- insert into table s3dummy_ext values (1);                                                             16.070s

** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
  location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                                         3.176s
- alter table s3dummypart add partition (part=1);                                                        3.229s
- alter table s3dummypart add partition (part=2);                                                        3.124s
- insert into table s3dummypart partition (part=1) values (1);                                          14.876s
- insert overwrite table s3dummypart partition (part=1) values (1);                                     27.594s     
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;             22.298s      
- from dummypart insert overwrite table s3dummypart partition (part=1) select id;                       29.001s      
- from dummypart insert into table s3dummypart partition (part=1) select id;                            14.869s

** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart;                           18.820s


Thanks,

Sergio Pena