You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Deepak Jaiswal <dj...@hortonworks.com> on 2018/05/08 06:12:18 UTC
Review Request 66999: HIVE-19453
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/
-----------------------------------------------------------
Review request for hive, Jason Dere and Prasanth_J.
Bugs: HIVE-19453
https://issues.apache.org/jira/browse/HIVE-19453
Repository: hive-git
Description
-------
Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96
ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b
ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07
ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237
Diff: https://reviews.apache.org/r/66999/diff/1/
Testing
-------
Added a test to load_data_using_job.q
Thanks,
Deepak Jaiswal
Re: Review Request 66999: HIVE-19453
Posted by Deepak Jaiswal <dj...@hortonworks.com>.
> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
> > Line 838 (original), 839 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017799#file2017799line839>
> >
> > Should the inputFileFormat expression be aliased, like '(inputFileFmt=inputFileFormat)?', and referenced in the line below as '$inputFileFmt?'
I followed what we do in createTableStatement to handle fileFormat where it similarly has tableFileFormat which expands into the entire formatting clause.
> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
> > Line 839 (original), 840 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017799#file2017799line840>
> >
> > Might be useful to be able to pass in SerDe params which are used to initialize the SerDe - this could be useful for some SerDes. For example LazySimpleSerDe allows you to pass in the field separator, or set the timestamp format etc.
I will take a look, thanks.
> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
> > Lines 475 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017800#file2017800line475>
> >
> > Is this supposed to be set using the class name (String), or the actual class object (Class<?>)?
> > Do the inputFormat/serde classes need to be validated here?
It takes strings. The call validates the names and throws exception if class is not found, hence inside try--catch
> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/test/queries/clientpositive/load_data_using_job.q
> > Lines 90 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017801#file2017801line90>
> >
> > Previously what would indicate to Hive that an INSERT plan was required, as opposed to just saving the data as-is like is done for a traditional LOAD DATA?
If table is partitioned and partition is not provided, it would error out, now it would launch a job. Similarly, for any case, when it would otherwise throw exception due to insufficient information for file op but sufficient information to launch an insert job, it would launch one.
- Deepak
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202699
-----------------------------------------------------------
On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
>
> (Updated May 8, 2018, 6:12 a.m.)
>
>
> Review request for hive, Jason Dere and Prasanth_J.
>
>
> Bugs: HIVE-19453
> https://issues.apache.org/jira/browse/HIVE-19453
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96
> ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b
> ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07
> ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237
>
>
> Diff: https://reviews.apache.org/r/66999/diff/1/
>
>
> Testing
> -------
>
> Added a test to load_data_using_job.q
>
>
> Thanks,
>
> Deepak Jaiswal
>
>
Re: Review Request 66999: HIVE-19453
Posted by Deepak Jaiswal <dj...@hortonworks.com>.
> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
> > Line 839 (original), 840 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017799#file2017799line840>
> >
> > Might be useful to be able to pass in SerDe params which are used to initialize the SerDe - this could be useful for some SerDes. For example LazySimpleSerDe allows you to pass in the field separator, or set the timestamp format etc.
>
> Deepak Jaiswal wrote:
> I will take a look, thanks.
https://issues.apache.org/jira/browse/HIVE-19478 will follow this up. Thanks for bringing this up.
- Deepak
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202699
-----------------------------------------------------------
On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
>
> (Updated May 8, 2018, 6:12 a.m.)
>
>
> Review request for hive, Jason Dere and Prasanth_J.
>
>
> Bugs: HIVE-19453
> https://issues.apache.org/jira/browse/HIVE-19453
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96
> ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b
> ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07
> ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237
>
>
> Diff: https://reviews.apache.org/r/66999/diff/1/
>
>
> Testing
> -------
>
> Added a test to load_data_using_job.q
>
>
> Thanks,
>
> Deepak Jaiswal
>
>
Re: Review Request 66999: HIVE-19453
Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202699
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
Line 838 (original), 839 (patched)
<https://reviews.apache.org/r/66999/#comment284680>
Should the inputFileFormat expression be aliased, like '(inputFileFmt=inputFileFormat)?', and referenced in the line below as '$inputFileFmt?'
ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
Line 839 (original), 840 (patched)
<https://reviews.apache.org/r/66999/#comment284686>
Might be useful to be able to pass in SerDe params which are used to initialize the SerDe - this could be useful for some SerDes. For example LazySimpleSerDe allows you to pass in the field separator, or set the timestamp format etc.
ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
Lines 475 (patched)
<https://reviews.apache.org/r/66999/#comment284684>
Is this supposed to be set using the class name (String), or the actual class object (Class<?>)?
Do the inputFormat/serde classes need to be validated here?
ql/src/test/queries/clientpositive/load_data_using_job.q
Lines 90 (patched)
<https://reviews.apache.org/r/66999/#comment284685>
Previously what would indicate to Hive that an INSERT plan was required, as opposed to just saving the data as-is like is done for a traditional LOAD DATA?
- Jason Dere
On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
>
> (Updated May 8, 2018, 6:12 a.m.)
>
>
> Review request for hive, Jason Dere and Prasanth_J.
>
>
> Bugs: HIVE-19453
> https://issues.apache.org/jira/browse/HIVE-19453
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96
> ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b
> ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07
> ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237
>
>
> Diff: https://reviews.apache.org/r/66999/diff/1/
>
>
> Testing
> -------
>
> Added a test to load_data_using_job.q
>
>
> Thanks,
>
> Deepak Jaiswal
>
>
Re: Review Request 66999: HIVE-19453
Posted by Deepak Jaiswal <dj...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202615
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
Lines 481 (patched)
<https://reviews.apache.org/r/66999/#comment284505>
Please ignore this commented code, I removed it in my patch locally.
- Deepak Jaiswal
On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
>
> (Updated May 8, 2018, 6:12 a.m.)
>
>
> Review request for hive, Jason Dere and Prasanth_J.
>
>
> Bugs: HIVE-19453
> https://issues.apache.org/jira/browse/HIVE-19453
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96
> ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b
> ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07
> ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237
>
>
> Diff: https://reviews.apache.org/r/66999/diff/1/
>
>
> Testing
> -------
>
> Added a test to load_data_using_job.q
>
>
> Thanks,
>
> Deepak Jaiswal
>
>