You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Deepak Jaiswal <dj...@hortonworks.com> on 2018/05/01 04:50:27 UTC

Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/
-----------------------------------------------------------

(Updated May 1, 2018, 4:50 a.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.


Changes
-------

Handle nested subdirs.


Bugs: HIVE-19311
    https://issues.apache.org/jira/browse/HIVE-19311


Repository: hive-git


Description
-------

Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
It should be able to launch an insert job to load the data instead.


Diffs (updated)
-----

  data/files/load_data_job/bucketing.txt PRE-CREATION 
  data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
  itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
  ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
  ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
  ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
  ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/66805/diff/6/

Changes: https://reviews.apache.org/r/66805/diff/5-6/


Testing
-------

Added a unit test.


Thanks,

Deepak Jaiswal


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/#review202219
-----------------------------------------------------------


Ship it!




Ship It!

- Prasanth_J


On May 1, 2018, 9:06 p.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66805/
> -----------------------------------------------------------
> 
> (Updated May 1, 2018, 9:06 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.
> 
> 
> Bugs: HIVE-19311
>     https://issues.apache.org/jira/browse/HIVE-19311
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
> It should be able to launch an insert job to load the data instead.
> 
> 
> Diffs
> -----
> 
>   data/files/load_data_job/bucketing.txt PRE-CREATION 
>   data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
>   ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
>   ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
>   ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
>   ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
>   ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66805/diff/8/
> 
> 
> Testing
> -------
> 
> Added a unit test.
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

Posted by Deepak Jaiswal <dj...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/
-----------------------------------------------------------

(Updated May 1, 2018, 9:10 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.


Changes
-------

accidentally, uploaded wrong patch. Please ignore version 8.


Bugs: HIVE-19311
    https://issues.apache.org/jira/browse/HIVE-19311


Repository: hive-git


Description
-------

Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
It should be able to launch an insert job to load the data instead.


Diffs (updated)
-----

  data/files/load_data_job/bucketing.txt PRE-CREATION 
  data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
  itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
  ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
  ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
  ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
  ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/66805/diff/9/

Changes: https://reviews.apache.org/r/66805/diff/8-9/


Testing
-------

Added a unit test.


Thanks,

Deepak Jaiswal


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

Posted by Deepak Jaiswal <dj...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/
-----------------------------------------------------------

(Updated May 1, 2018, 9:06 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.


Changes
-------

Implemented the new review comment.


Bugs: HIVE-19311
    https://issues.apache.org/jira/browse/HIVE-19311


Repository: hive-git


Description
-------

Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
It should be able to launch an insert job to load the data instead.


Diffs (updated)
-----

  data/files/load_data_job/bucketing.txt PRE-CREATION 
  data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
  itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
  ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
  ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
  ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
  ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/66805/diff/8/

Changes: https://reviews.apache.org/r/66805/diff/7-8/


Testing
-------

Added a unit test.


Thanks,

Deepak Jaiswal


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/#review202211
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Lines 1255 (patched)
<https://reviews.apache.org/r/66805/#comment283963>

    Instead of doing instanceof check here, you would get the same behaviour if you Override getAllOutputs() methods in LoadSemanticAnalyzer right?


- Prasanth_J


On May 1, 2018, 7:56 p.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66805/
> -----------------------------------------------------------
> 
> (Updated May 1, 2018, 7:56 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.
> 
> 
> Bugs: HIVE-19311
>     https://issues.apache.org/jira/browse/HIVE-19311
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
> It should be able to launch an insert job to load the data instead.
> 
> 
> Diffs
> -----
> 
>   data/files/load_data_job/bucketing.txt PRE-CREATION 
>   data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
>   ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
>   ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
>   ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
>   ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
>   ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66805/diff/7/
> 
> 
> Testing
> -------
> 
> Added a unit test.
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

Posted by Deepak Jaiswal <dj...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/
-----------------------------------------------------------

(Updated May 1, 2018, 7:56 p.m.)


Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.


Changes
-------

Implemented Prasanth's review comments.


Bugs: HIVE-19311
    https://issues.apache.org/jira/browse/HIVE-19311


Repository: hive-git


Description
-------

Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
It should be able to launch an insert job to load the data instead.


Diffs (updated)
-----

  data/files/load_data_job/bucketing.txt PRE-CREATION 
  data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
  data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
  itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
  ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
  ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
  ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
  ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
  ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/66805/diff/7/

Changes: https://reviews.apache.org/r/66805/diff/6-7/


Testing
-------

Added a unit test.


Thanks,

Deepak Jaiswal


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

Posted by Deepak Jaiswal <dj...@hortonworks.com>.

> On May 1, 2018, 6:23 a.m., Prasanth_J wrote:
> >

Thanks for the review. Working on the review comments.


> On May 1, 2018, 6:23 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
> > Lines 241 (patched)
> > <https://reviews.apache.org/r/66805/diff/6/?file=2014982#file2014982line270>
> >
> >     Is useSuper required here? I can see why you are doing this. But can you make this decision based on where tempTable is set in the ctx or not? If tempTable is set in the ctx then it is rewritten load statement. Alternatively you could use a separate boolean in ctx to say if load query is rewritten or not based on which you can construct the if condition.

Makes sense. Will do the changes, thanks.


> On May 1, 2018, 6:23 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
> > Lines 483 (patched)
> > <https://reviews.apache.org/r/66805/diff/6/?file=2014982#file2014982line515>
> >
> >     Why is this required? Isn't load data "insert" operation type?

Cleanup issue. Will remove it. Thanks for catching it.


> On May 1, 2018, 6:23 a.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> > Lines 1254 (patched)
> > <https://reviews.apache.org/r/66805/diff/6/?file=2014983#file2014983line1255>
> >
> >     could you add a comment to say why does load require a shallow copy?

Sure.


> On May 1, 2018, 6:23 a.m., Prasanth_J wrote:
> > ql/src/test/queries/clientpositive/load_data_using_job.q
> > Lines 72 (patched)
> > <https://reviews.apache.org/r/66805/diff/6/?file=2014988#file2014988line72>
> >
> >     Could you also add a case for multi-column bucketing?
> >     I think Eric also asked the same question, does this handle multi-level subdirs? IIRC load only recurses to 1 or 2 levels.

I can add multi column bucketing.
Eric was talking about multi column partitioning which creates multi-level subdirs. Currently load fails if it sees a subdirs. It is now handled in latest patch.


- Deepak


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/#review202190
-----------------------------------------------------------


On May 1, 2018, 4:50 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66805/
> -----------------------------------------------------------
> 
> (Updated May 1, 2018, 4:50 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.
> 
> 
> Bugs: HIVE-19311
>     https://issues.apache.org/jira/browse/HIVE-19311
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
> It should be able to launch an insert job to load the data instead.
> 
> 
> Diffs
> -----
> 
>   data/files/load_data_job/bucketing.txt PRE-CREATION 
>   data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
>   ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
>   ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
>   ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
>   ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
>   ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66805/diff/6/
> 
> 
> Testing
> -------
> 
> Added a unit test.
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Re: Review Request 66805: HIVE-19311 : Partition and bucketing support for “load data” statement

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66805/#review202190
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
Lines 241 (patched)
<https://reviews.apache.org/r/66805/#comment283925>

    Is useSuper required here? I can see why you are doing this. But can you make this decision based on where tempTable is set in the ctx or not? If tempTable is set in the ctx then it is rewritten load statement. Alternatively you could use a separate boolean in ctx to say if load query is rewritten or not based on which you can construct the if condition.



ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
Line 255 (original), 254 (patched)
<https://reviews.apache.org/r/66805/#comment283926>

    This can be inlined at the reparse method.



ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
Lines 483 (patched)
<https://reviews.apache.org/r/66805/#comment283923>

    Why is this required? Isn't load data "insert" operation type?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Lines 1254 (patched)
<https://reviews.apache.org/r/66805/#comment283927>

    could you add a comment to say why does load require a shallow copy?



ql/src/test/queries/clientpositive/load_data_using_job.q
Lines 72 (patched)
<https://reviews.apache.org/r/66805/#comment283928>

    Could you also add a case for multi-column bucketing?
    I think Eric also asked the same question, does this handle multi-level subdirs? IIRC load only recurses to 1 or 2 levels.


- Prasanth_J


On May 1, 2018, 4:50 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66805/
> -----------------------------------------------------------
> 
> (Updated May 1, 2018, 4:50 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Eugene Koifman, Jesús Camacho Rodríguez, Prasanth_J, and Vineet Garg.
> 
> 
> Bugs: HIVE-19311
>     https://issues.apache.org/jira/browse/HIVE-19311
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Currently, "load data" statement is very limited. It errors out if any of the information is missing such as partitioning info if table is partitioned or appropriate names when table is bucketed.
> It should be able to launch an insert job to load the data instead.
> 
> 
> Diffs
> -----
> 
>   data/files/load_data_job/bucketing.txt PRE-CREATION 
>   data/files/load_data_job/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/load_data_2_partitions.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_1_partition.txt PRE-CREATION 
>   data/files/load_data_job/partitions/subdir/load_data_2_partitions.txt PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 2ca7b5f63b 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 0fedf0e76e 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 94dd63641d 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java abd678bb54 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java c07991d434 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java fad0e5c24a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 2f3b07f4af 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java ec8c1507ec 
>   ql/src/test/queries/clientnegative/load_part_nospec.q 81517991b2 
>   ql/src/test/queries/clientnegative/nopart_load.q 966982fd5c 
>   ql/src/test/queries/clientpositive/load_data_using_job.q PRE-CREATION 
>   ql/src/test/results/clientnegative/load_part_nospec.q.out bebaf92311 
>   ql/src/test/results/clientnegative/nopart_load.q.out 881514640c 
>   ql/src/test/results/clientpositive/llap/load_data_using_job.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66805/diff/6/
> 
> 
> Testing
> -------
> 
> Added a unit test.
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>