You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergio Pena <se...@cloudera.com> on 2016/08/04 16:29:29 UTC
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------
(Updated Aug. 4, 2016, 4:29 p.m.)
Review request for hive.
Changes
-------
Addressed minor comments.
Removed the code that was duplicating the rename() to S3. Instead, it gets HDFS scratch directories for the required temporary files.
Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270
Repository: hive-git
Description
-------
This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
Diff: https://reviews.apache.org/r/50359/diff/
Testing
-------
NO PATCH
** NON-PARTITIONED TABLE
- create table dummy (id int); 3.651s
- insert into table s3dummy values (1); 39.231s
- insert overwrite table s3dummy values (1); 42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
- insert into table s3dummy_ext values (1); 45.855s
WITH PATCH
** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
- insert into table s3dummy values (1); 15.025s
- insert overwrite table s3dummy values (1); 25.149s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
- from dummy insert overwrite table s3dummy select *; 25.469s
- from dummy insert into table s3dummy select *; 14.501s
** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
- insert into table s3dummy_ext values (1); 16.070s
** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
- alter table s3dummypart add partition (part=1); 3.229s
- alter table s3dummypart add partition (part=2); 3.124s
- insert into table s3dummypart partition (part=1) values (1); 14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
- from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
- from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
Thanks,
Sergio Pena
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Sergio Pena <se...@cloudera.com>.
> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1807-1814
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469107#file1469107line1807>
> >
> > Why not use newly added Context::getTempDirForPath(Path path) here.
Yeah, sorry. This is a little confusing.
The thing is that 'tmpDir' is based on 'dest' (tmpDir = baseCtx.getExternalTmpPath(dest)) where 'dest' is an HDFS temporary directory (not S3). This is the directory causing the .hive-staging to be created on S3 at the end, when HDFS temp dir was copied to S3 (INSERT OVERWRITE).
I found out that FileSinkDesc has a 'getDestPath' that returns you the S3 path. So, the condition is if the 'getDestPath' is on S3, then use 'getMRTmpPath', or continue using the temporary path based on 'dest' (HDFS temp path).
That part of the code was a little confusing regarding the names 'dest', 'getDestPath', 'getFinalDirName'. I was trying to understand this code, but I could not figure out the idea behind 'getFinalDirnName', and 'getDestPath'; so I ended up writing that condition. Also, the comments that were already there mentioned that the temp file should be in the same filesystem as the destination (in case of non-blobstore directories).
> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, lines 7020-7024
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line7020>
> >
> > Why not use newly introduced tx.getTempDirForPath(dest_path); here?
This part was causing 72 tests failing due to the different scratch directory name. Also I wasn't sure why the stats temp was on the same location as 'queryTmpdir', so I added the condition too incase it has issues with encrypted zones. I like your line best, but I wasn't sure about it, and I ended up doing this condition.
I can do the 'ctx.getTempDirForPath' better. What do you think?
> On Aug. 9, 2016, 9:49 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 6763
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469108#file1469108line6763>
> >
> > surprised that we weren't using getExternalTmpPathRelTo() here, did we miss this when we introduced this method for encrypt support work?
Mmm, i'm surprised too. Maybe we missed it.
- Sergio
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
-----------------------------------------------------------
On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
>
> (Updated Aug. 9, 2016, 7:53 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
> common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/50359/diff/
>
>
> Testing
> -------
>
> NO PATCH
> ** NON-PARTITIONED TABLE
>
> - create table dummy (id int); 3.651s
> - insert into table s3dummy values (1); 39.231s
> - insert overwrite table s3dummy values (1); 42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
>
> EXTERNAL TABLE
>
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
> - insert into table s3dummy_ext values (1); 45.855s
>
> WITH PATCH
>
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
> - insert into table s3dummy values (1); 15.025s
> - insert overwrite table s3dummy values (1); 25.149s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
> - from dummy insert overwrite table s3dummy select *; 25.469s
> - from dummy insert into table s3dummy select *; 14.501s
>
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
> - insert into table s3dummy_ext values (1); 16.070s
>
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
> location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
> - alter table s3dummypart add partition (part=1); 3.229s
> - alter table s3dummypart add partition (part=2); 3.124s
> - insert into table s3dummypart partition (part=1) values (1); 14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
> - from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
>
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
>
>
> Thanks,
>
> Sergio Pena
>
>
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145269
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 - 1814)
<https://reviews.apache.org/r/50359/#comment211415>
Why not use newly added Context::getTempDirForPath(Path path) here.
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 6763)
<https://reviews.apache.org/r/50359/#comment211416>
surprised that we weren't using getExternalTmpPathRelTo() here, did we miss this when we introduced this method for encrypt support work?
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 7024)
<https://reviews.apache.org/r/50359/#comment211418>
Why not use newly introduced tx.getTempDirForPath(dest_path); here?
- Ashutosh Chauhan
On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
>
> (Updated Aug. 9, 2016, 7:53 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
> common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/50359/diff/
>
>
> Testing
> -------
>
> NO PATCH
> ** NON-PARTITIONED TABLE
>
> - create table dummy (id int); 3.651s
> - insert into table s3dummy values (1); 39.231s
> - insert overwrite table s3dummy values (1); 42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
>
> EXTERNAL TABLE
>
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
> - insert into table s3dummy_ext values (1); 45.855s
>
> WITH PATCH
>
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
> - insert into table s3dummy values (1); 15.025s
> - insert overwrite table s3dummy values (1); 25.149s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
> - from dummy insert overwrite table s3dummy select *; 25.469s
> - from dummy insert into table s3dummy select *; 14.501s
>
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
> - insert into table s3dummy_ext values (1); 16.070s
>
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
> location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
> - alter table s3dummypart add partition (part=1); 3.229s
> - alter table s3dummypart add partition (part=2); 3.124s
> - insert into table s3dummypart partition (part=1) values (1); 14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
> - from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
>
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
>
>
> Thanks,
>
> Sergio Pena
>
>
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------
(Updated Aug. 10, 2016, 9:08 p.m.)
Review request for hive.
Changes
-------
Changes on this patch:
- Use getTempDirForPath() for the statistics temp file and GenMapRedUtils temp file.
Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270
Repository: hive-git
Description
-------
This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
Diff: https://reviews.apache.org/r/50359/diff/
Testing
-------
NO PATCH
** NON-PARTITIONED TABLE
- create table dummy (id int); 3.651s
- insert into table s3dummy values (1); 39.231s
- insert overwrite table s3dummy values (1); 42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
- insert into table s3dummy_ext values (1); 45.855s
WITH PATCH
** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
- insert into table s3dummy values (1); 15.025s
- insert overwrite table s3dummy values (1); 25.149s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
- from dummy insert overwrite table s3dummy select *; 25.469s
- from dummy insert into table s3dummy select *; 14.501s
** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
- insert into table s3dummy_ext values (1); 16.070s
** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
- alter table s3dummypart add partition (part=1); 3.229s
- alter table s3dummypart add partition (part=2); 3.124s
- insert into table s3dummypart partition (part=1) values (1); 14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
- from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
- from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
Thanks,
Sergio Pena
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Sergio Pena <se...@cloudera.com>.
> On Aug. 10, 2016, 6:41 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1807-1814
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469107#file1469107line1807>
> >
> > Not able to follow : )
> > Are you doing this only to avoid copying .hive-staging dir? If so, you can use filter while copying to eliminate that, no?
Think more about this, I think you were right since the begginning. I can use 'getTempDirForPath(fileSinkDesc.getDestPath())' as it will use the same .hive-staging directory that is used in 'dest'.
I did some tests and it is working fine.
- Sergio
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
-----------------------------------------------------------
On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
>
> (Updated Aug. 9, 2016, 7:53 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
> common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/50359/diff/
>
>
> Testing
> -------
>
> NO PATCH
> ** NON-PARTITIONED TABLE
>
> - create table dummy (id int); 3.651s
> - insert into table s3dummy values (1); 39.231s
> - insert overwrite table s3dummy values (1); 42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
>
> EXTERNAL TABLE
>
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
> - insert into table s3dummy_ext values (1); 45.855s
>
> WITH PATCH
>
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
> - insert into table s3dummy values (1); 15.025s
> - insert overwrite table s3dummy values (1); 25.149s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
> - from dummy insert overwrite table s3dummy select *; 25.469s
> - from dummy insert into table s3dummy select *; 14.501s
>
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
> - insert into table s3dummy_ext values (1); 16.070s
>
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
> location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
> - alter table s3dummypart add partition (part=1); 3.229s
> - alter table s3dummypart add partition (part=2); 3.124s
> - insert into table s3dummypart partition (part=1) values (1); 14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
> - from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
>
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
>
>
> Thanks,
>
> Sergio Pena
>
>
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145381
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1807 - 1814)
<https://reviews.apache.org/r/50359/#comment211603>
Not able to follow : )
Are you doing this only to avoid copying .hive-staging dir? If so, you can use filter while copying to eliminate that, no?
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (lines 7020 - 7024)
<https://reviews.apache.org/r/50359/#comment211602>
yeah.. lets use ctx.getTempDirForPath() here.
- Ashutosh Chauhan
On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
>
> (Updated Aug. 9, 2016, 7:53 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
> common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/50359/diff/
>
>
> Testing
> -------
>
> NO PATCH
> ** NON-PARTITIONED TABLE
>
> - create table dummy (id int); 3.651s
> - insert into table s3dummy values (1); 39.231s
> - insert overwrite table s3dummy values (1); 42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
>
> EXTERNAL TABLE
>
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
> - insert into table s3dummy_ext values (1); 45.855s
>
> WITH PATCH
>
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
> - insert into table s3dummy values (1); 15.025s
> - insert overwrite table s3dummy values (1); 25.149s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
> - from dummy insert overwrite table s3dummy select *; 25.469s
> - from dummy insert into table s3dummy select *; 14.501s
>
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
> - insert into table s3dummy_ext values (1); 16.070s
>
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
> location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
> - alter table s3dummypart add partition (part=1); 3.229s
> - alter table s3dummypart add partition (part=2); 3.124s
> - insert into table s3dummypart partition (part=1) values (1); 14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
> - from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
>
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
>
>
> Thanks,
>
> Sergio Pena
>
>
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Lefty Leverenz <le...@gmail.com>.
> On Aug. 10, 2016, 5:31 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 3091-3092
> > <https://reviews.apache.org/r/50359/diff/7/?file=1469104#file1469104line3091>
> >
> > Tiny nit: Either make "It" lowercase or move the parenthetical sentence after the first sentence, with a final period like this:
> >
> > "Enable the use of scratch directories directly on blob storage systems. (It may cause performance penalties.)"
Looks good now. +1 for the parameter descriptions.
- Lefty
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
-----------------------------------------------------------
On Aug. 10, 2016, 9:08 p.m., Sergio Pena wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
>
> (Updated Aug. 10, 2016, 9:08 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
> common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/50359/diff/
>
>
> Testing
> -------
>
> NO PATCH
> ** NON-PARTITIONED TABLE
>
> - create table dummy (id int); 3.651s
> - insert into table s3dummy values (1); 39.231s
> - insert overwrite table s3dummy values (1); 42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
>
> EXTERNAL TABLE
>
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
> - insert into table s3dummy_ext values (1); 45.855s
>
> WITH PATCH
>
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
> - insert into table s3dummy values (1); 15.025s
> - insert overwrite table s3dummy values (1); 25.149s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
> - from dummy insert overwrite table s3dummy select *; 25.469s
> - from dummy insert into table s3dummy select *; 14.501s
>
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
> - insert into table s3dummy_ext values (1); 16.070s
>
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
> location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
> - alter table s3dummypart add partition (part=1); 3.229s
> - alter table s3dummypart add partition (part=2); 3.124s
> - insert into table s3dummypart partition (part=1) values (1); 14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
> - from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
>
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
>
>
> Thanks,
>
> Sergio Pena
>
>
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Lefty Leverenz <le...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review145307
-----------------------------------------------------------
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 3091 - 3092)
<https://reviews.apache.org/r/50359/#comment211495>
Tiny nit: Either make "It" lowercase or move the parenthetical sentence after the first sentence, with a final period like this:
"Enable the use of scratch directories directly on blob storage systems. (It may cause performance penalties.)"
- Lefty Leverenz
On Aug. 9, 2016, 7:53 p.m., Sergio Pena wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
>
> (Updated Aug. 9, 2016, 7:53 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-14270
> https://issues.apache.org/jira/browse/HIVE-14270
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
>
>
> Diffs
> -----
>
> common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
> common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
> ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/50359/diff/
>
>
> Testing
> -------
>
> NO PATCH
> ** NON-PARTITIONED TABLE
>
> - create table dummy (id int); 3.651s
> - insert into table s3dummy values (1); 39.231s
> - insert overwrite table s3dummy values (1); 42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
>
> EXTERNAL TABLE
>
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
> - insert into table s3dummy_ext values (1); 45.855s
>
> WITH PATCH
>
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
> - insert into table s3dummy values (1); 15.025s
> - insert overwrite table s3dummy values (1); 25.149s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
> - from dummy insert overwrite table s3dummy select *; 25.469s
> - from dummy insert into table s3dummy select *; 14.501s
>
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
> - insert into table s3dummy_ext values (1); 16.070s
>
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
> location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
> - alter table s3dummypart add partition (part=1); 3.229s
> - alter table s3dummypart add partition (part=2); 3.124s
> - insert into table s3dummypart partition (part=1) values (1); 14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
> - from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
>
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
>
>
> Thanks,
>
> Sergio Pena
>
>
Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS
when doing inserts on tables located on S3
Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/
-----------------------------------------------------------
(Updated Aug. 9, 2016, 7:53 p.m.)
Review request for hive.
Changes
-------
- Added new flag variable that allows users to use the table blobstorage location as scratch directory.
- Other minor fixes to allow tests to pass.
Bugs: HIVE-14270
https://issues.apache.org/jira/browse/HIVE-14270
Repository: hive-git
Description
-------
This patch will create a temporary directory for Hive intermediate data on HDFS when S3 tables are used.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f5f619359701b948f57d599a5bdc2ecbdff280a
common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/Context.java 89893eba9fd2316b9a393f06edefa837bb815faf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 5bd78862e1064d7f64a5d764571015a8df1101e8
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a01a7bdbfec962b6617e98091cdb1325c5b0e84f
ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION
Diff: https://reviews.apache.org/r/50359/diff/
Testing
-------
NO PATCH
** NON-PARTITIONED TABLE
- create table dummy (id int); 3.651s
- insert into table s3dummy values (1); 39.231s
- insert overwrite table s3dummy values (1); 42.569s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 30.136s
EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 9.297s
- insert into table s3dummy_ext values (1); 45.855s
WITH PATCH
** NON-PARTITIONED TABLE
- create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 3.945s
- insert into table s3dummy values (1); 15.025s
- insert overwrite table s3dummy values (1); 25.149s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 19.158s
- from dummy insert overwrite table s3dummy select *; 25.469s
- from dummy insert into table s3dummy select *; 14.501s
** EXTERNAL TABLE
- create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy'; 4.827s
- insert into table s3dummy_ext values (1); 16.070s
** PARTITIONED TABLE
- create table s3dummypart (id int) partitioned by (part int)
location 's3a://spena-bucket/user/hive/warehouse/s3dummypart'; 3.176s
- alter table s3dummypart add partition (part=1); 3.229s
- alter table s3dummypart add partition (part=2); 3.124s
- insert into table s3dummypart partition (part=1) values (1); 14.876s
- insert overwrite table s3dummypart partition (part=1) values (1); 27.594s
- insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart; 22.298s
- from dummypart insert overwrite table s3dummypart partition (part=1) select id; 29.001s
- from dummypart insert into table s3dummypart partition (part=1) select id; 14.869s
** DYNAMIC PARTITIONS
- insert into table s3dummypart partition (part) select id, 1 from dummypart; 15.185s
- insert into table s3dummypart partition (part) select id, 1 from dummypart; 18.820s
Thanks,
Sergio Pena