You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Marta Kuczora <ku...@cloudera.com> on 2016/11/24 15:17:22 UTC

Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/
-----------------------------------------------------------

Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.


Bugs: HIVE-15282
    https://issues.apache.org/jira/browse/HIVE-15282


Repository: hive-git


Description
-------

Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
If the partition folder doesn't contain any files, then use the folder's modification time, just as before.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 

Diff: https://reviews.apache.org/r/54065/diff/


Testing
-------

Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
Also ran all index related q tests.


Thanks,

Marta Kuczora


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Nov. 24, 2016, 5:15 p.m., Barna Zsombor Klara wrote:
> > Thanks for the patch.
> > I haven't looked into it closely, but I guess there is no way to write a junit test to cover it?

I tried to figure out how this could be unit tested. The problem was that a lot of things had to be mocked and I ended up with a huge, complicated unit test.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review156858
-----------------------------------------------------------


On Dec. 12, 2016, 1:04 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2016, 1:04 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> 
> Diff: https://reviews.apache.org/r/54065/diff/1/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Nov. 24, 2016, 5:15 p.m., Barna Zsombor Klara wrote:
> > Thanks for the patch.
> > I haven't looked into it closely, but I guess there is no way to write a junit test to cover it?
> 
> Marta Kuczora wrote:
>     I tried to figure out how this could be unit tested. The problem was that a lot of things had to be mocked and I ended up with a huge, complicated unit test.

Thanks a lot for the review by the way! :)


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review156858
-----------------------------------------------------------


On Dec. 12, 2016, 1:04 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2016, 1:04 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> 
> Diff: https://reviews.apache.org/r/54065/diff/1/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Barna Zsombor Klara <zs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review156858
-----------------------------------------------------------



Thanks for the patch.
I haven't looked into it closely, but I guess there is no way to write a junit test to cover it?

- Barna Zsombor Klara


On Nov. 24, 2016, 3:17 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2016, 3:17 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> Diff: https://reviews.apache.org/r/54065/diff/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Nov. 28, 2016, 11:10 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
> > Line 967 (original), 978 (patched)
> > <https://reviews.apache.org/r/54065/diff/1/?file=1570137#file1570137line978>
> >
> >     We might want to use the dataLocation local variable here

Thanks a lot for the review!
You are right, I fixed it.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review157014
-----------------------------------------------------------


On Dec. 12, 2016, 1:04 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2016, 1:04 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> 
> Diff: https://reviews.apache.org/r/54065/diff/1/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Peter Vary <pv...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review157014
-----------------------------------------------------------



Thanks for the patch!

Nice detailed description on the Jira.

Thansk for tracking this issue dowm!


ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 978)
<https://reviews.apache.org/r/54065/#comment227393>

    We might want to use the dataLocation local variable here


- Peter Vary


On Nov. 24, 2016, 3:17 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2016, 3:17 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> Diff: https://reviews.apache.org/r/54065/diff/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>