You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Deepak Jaiswal <dj...@hortonworks.com> on 2018/05/08 06:12:18 UTC

Review Request 66999: HIVE-19453

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/
-----------------------------------------------------------

Review request for hive, Jason Dere and Prasanth_J.


Bugs: HIVE-19453
    https://issues.apache.org/jira/browse/HIVE-19453


Repository: hive-git


Description
-------

Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
 
load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b 
  ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237 


Diff: https://reviews.apache.org/r/66999/diff/1/


Testing
-------

Added a test to load_data_using_job.q


Thanks,

Deepak Jaiswal


Re: Review Request 66999: HIVE-19453

Posted by Deepak Jaiswal <dj...@hortonworks.com>.

> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
> > Line 838 (original), 839 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017799#file2017799line839>
> >
> >     Should the inputFileFormat expression be aliased, like '(inputFileFmt=inputFileFormat)?', and referenced in the line below as '$inputFileFmt?'

I followed what we do in createTableStatement to handle fileFormat where it similarly has tableFileFormat which expands into the entire formatting clause.


> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
> > Line 839 (original), 840 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017799#file2017799line840>
> >
> >     Might be useful to be able to pass in SerDe params which are used to initialize the SerDe - this could be useful for some SerDes. For example LazySimpleSerDe allows you to pass in the field separator, or set the timestamp format etc.

I will take a look, thanks.


> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
> > Lines 475 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017800#file2017800line475>
> >
> >     Is this supposed to be set using the class name (String), or the actual class object (Class<?>)?
> >     Do the inputFormat/serde classes need to be validated here?

It takes strings. The call validates the names and throws exception if class is not found, hence inside try--catch


> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/test/queries/clientpositive/load_data_using_job.q
> > Lines 90 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017801#file2017801line90>
> >
> >     Previously what would indicate to Hive that an INSERT plan was required, as opposed to just saving the data as-is like is done for a traditional LOAD DATA?

If table is partitioned and partition is not provided, it would error out, now it would launch a job. Similarly, for any case, when it would otherwise throw exception due to insufficient information for file op but sufficient information to launch an insert job, it would launch one.


- Deepak


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202699
-----------------------------------------------------------


On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
> 
> (Updated May 8, 2018, 6:12 a.m.)
> 
> 
> Review request for hive, Jason Dere and Prasanth_J.
> 
> 
> Bugs: HIVE-19453
>     https://issues.apache.org/jira/browse/HIVE-19453
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>  
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b 
>   ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237 
> 
> 
> Diff: https://reviews.apache.org/r/66999/diff/1/
> 
> 
> Testing
> -------
> 
> Added a test to load_data_using_job.q
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Re: Review Request 66999: HIVE-19453

Posted by Deepak Jaiswal <dj...@hortonworks.com>.

> On May 8, 2018, 11:10 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
> > Line 839 (original), 840 (patched)
> > <https://reviews.apache.org/r/66999/diff/1/?file=2017799#file2017799line840>
> >
> >     Might be useful to be able to pass in SerDe params which are used to initialize the SerDe - this could be useful for some SerDes. For example LazySimpleSerDe allows you to pass in the field separator, or set the timestamp format etc.
> 
> Deepak Jaiswal wrote:
>     I will take a look, thanks.

https://issues.apache.org/jira/browse/HIVE-19478 will follow this up. Thanks for bringing this up.


- Deepak


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202699
-----------------------------------------------------------


On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
> 
> (Updated May 8, 2018, 6:12 a.m.)
> 
> 
> Review request for hive, Jason Dere and Prasanth_J.
> 
> 
> Bugs: HIVE-19453
>     https://issues.apache.org/jira/browse/HIVE-19453
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>  
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b 
>   ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237 
> 
> 
> Diff: https://reviews.apache.org/r/66999/diff/1/
> 
> 
> Testing
> -------
> 
> Added a test to load_data_using_job.q
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Re: Review Request 66999: HIVE-19453

Posted by Jason Dere <jd...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202699
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
Line 838 (original), 839 (patched)
<https://reviews.apache.org/r/66999/#comment284680>

    Should the inputFileFormat expression be aliased, like '(inputFileFmt=inputFileFormat)?', and referenced in the line below as '$inputFileFmt?'



ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
Line 839 (original), 840 (patched)
<https://reviews.apache.org/r/66999/#comment284686>

    Might be useful to be able to pass in SerDe params which are used to initialize the SerDe - this could be useful for some SerDes. For example LazySimpleSerDe allows you to pass in the field separator, or set the timestamp format etc.



ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
Lines 475 (patched)
<https://reviews.apache.org/r/66999/#comment284684>

    Is this supposed to be set using the class name (String), or the actual class object (Class<?>)?
    Do the inputFormat/serde classes need to be validated here?



ql/src/test/queries/clientpositive/load_data_using_job.q
Lines 90 (patched)
<https://reviews.apache.org/r/66999/#comment284685>

    Previously what would indicate to Hive that an INSERT plan was required, as opposed to just saving the data as-is like is done for a traditional LOAD DATA?


- Jason Dere


On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
> 
> (Updated May 8, 2018, 6:12 a.m.)
> 
> 
> Review request for hive, Jason Dere and Prasanth_J.
> 
> 
> Bugs: HIVE-19453
>     https://issues.apache.org/jira/browse/HIVE-19453
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>  
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b 
>   ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237 
> 
> 
> Diff: https://reviews.apache.org/r/66999/diff/1/
> 
> 
> Testing
> -------
> 
> Added a test to load_data_using_job.q
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Re: Review Request 66999: HIVE-19453

Posted by Deepak Jaiswal <dj...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66999/#review202615
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
Lines 481 (patched)
<https://reviews.apache.org/r/66999/#comment284505>

    Please ignore this commented code, I removed it in my patch locally.


- Deepak Jaiswal


On May 8, 2018, 6:12 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66999/
> -----------------------------------------------------------
> 
> (Updated May 8, 2018, 6:12 a.m.)
> 
> 
> Review request for hive, Jason Dere and Prasanth_J.
> 
> 
> Bugs: HIVE-19453
>     https://issues.apache.org/jira/browse/HIVE-19453
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Extend the load data statement to take the inputformat of the source files and the serde to interpret it as parameter. For eg,
>  
> load data local inpath '../../data/files/load_data_job/partitions/load_data_2_partitions.txt' INTO TABLE srcbucket_mapjoin
> INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
> SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe';
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g a837d67b96 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 2b88ea651b 
>   ql/src/test/queries/clientpositive/load_data_using_job.q 3928f1fa07 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 116630c237 
> 
> 
> Diff: https://reviews.apache.org/r/66999/diff/1/
> 
> 
> Testing
> -------
> 
> Added a test to load_data_using_job.q
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>