You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergio Pena <se...@cloudera.com> on 2017/01/27 16:18:36 UTC

Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review163281
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 968)
<https://reviews.apache.org/r/54065/#comment234715>

    shouldn't be easier if we set the dataLocation modification time first to lastModificationTime, and then compare this value with the rest of the partitions found? This way we could avoid the null value and the Long object, and use long instead.



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 972)
<https://reviews.apache.org/r/54065/#comment234712>

    If this condition does not happen ever, then lastModificationTime will end up being null, and basePartTs will contain the null value. Should we use the dataLocation timestamp in case of this condition is never called?



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 978)
<https://reviews.apache.org/r/54065/#comment234713>

    Agree with Peter. Should we use the dataLocation variable instead of calling the method?


- Sergio Pena


On Dec. 12, 2016, 1:04 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2016, 1:04 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> Diff: https://reviews.apache.org/r/54065/diff/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Jan. 27, 2017, 4:18 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
> > Lines 968 (patched)
> > <https://reviews.apache.org/r/54065/diff/1/?file=1570137#file1570137line968>
> >
> >     shouldn't be easier if we set the dataLocation modification time first to lastModificationTime, and then compare this value with the rest of the partitions found? This way we could avoid the null value and the Long object, and use long instead.

Thanks a lot for the review!
Yeah, you are right, I fixed it.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review163281
-----------------------------------------------------------


On Dec. 12, 2016, 1:04 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2016, 1:04 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> 
> Diff: https://reviews.apache.org/r/54065/diff/1/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Jan. 27, 2017, 4:18 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
> > Lines 972 (patched)
> > <https://reviews.apache.org/r/54065/diff/1/?file=1570137#file1570137line972>
> >
> >     If this condition does not happen ever, then lastModificationTime will end up being null, and basePartTs will contain the null value. Should we use the dataLocation timestamp in case of this condition is never called?

Yes, you are right! This issue is fixed by fixing the previous one.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review163281
-----------------------------------------------------------


On Dec. 12, 2016, 1:04 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2016, 1:04 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> 
> Diff: https://reviews.apache.org/r/54065/diff/1/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Re: Review Request 54065: HIVE-15282: Different modification times are used when an index is built and when its staleness is checked

Posted by Marta Kuczora <ku...@cloudera.com>.

> On Jan. 27, 2017, 4:18 p.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
> > Line 967 (original), 978 (patched)
> > <https://reviews.apache.org/r/54065/diff/1/?file=1570137#file1570137line978>
> >
> >     Agree with Peter. Should we use the dataLocation variable instead of calling the method?

Agree too. :) Fixed it.


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54065/#review163281
-----------------------------------------------------------


On Dec. 12, 2016, 1:04 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54065/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2016, 1:04 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Chaoyu Tang, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-15282
>     https://issues.apache.org/jira/browse/HIVE-15282
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Changed the way how the modification time is determined for partitions in the DDLTask.alterIndex method to be the same as when the index staleness is checked. Instead of using the modification date of the partition folder, go through the files in the folder and use the highest modification time and save it as index property. With this we can avoid the issue when the folder and the file is created when the second turns. So the modification time of the folder is in the previous second compared to the modification time of the file.
> If the partition folder doesn't contain any files, then use the folder's modification time, just as before.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cfece77 
> 
> 
> Diff: https://reviews.apache.org/r/54065/diff/1/
> 
> 
> Testing
> -------
> 
> Ran the index_auto_mult_tables_compact and index_auto_mult_tables q tests multiple times, with hard-coded delay with which the test failure described in HIVE-15282 could be reproduced. With the patch, the tests were always successful.
> Also ran all index related q tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>