You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Alexander Kolbasov <ak...@gmail.com> on 2017/10/16 18:37:35 UTC

Review Request 63043: HIVE-17730 Queries can be closed automatically

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63043/
-----------------------------------------------------------

Review request for hive, Aihua Xu, Andrew Sherman, Alan Gates, Janaki Lahorani, Sergio Pena, Sahil Takiar, and Vihang Karajgaonkar.


Bugs: 17730
    https://issues.apache.org/jira/browse/17730


Repository: hive-git


Description
-------

HIVE-17730 Queries can be closed automatically

The goal of the patch is to replaces all curret uses of QueryWrapper with try-with-resource construct that guarantees that the
query is closed.

This change affects the scope of the open query, so in all cases where the code was returning the list of query results, I had to copy the list to a new ArrayList to allow access after query close.


Diffs
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 22e246f1c914532ed6c67151c261fa709a283ef2 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ffb2abdf6294dbd470525ad888bd95feac7e7f06 


Diff: https://reviews.apache.org/r/63043/diff/1/


Testing
-------


Thanks,

Alexander Kolbasov


Re: Review Request 63043: HIVE-17730 Queries can be closed automatically

Posted by Aihua Xu via Review Board <no...@reviews.apache.org>.

> On Oct. 17, 2017, 11:25 p.m., Aihua Xu wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 261 (patched)
> > <https://reviews.apache.org/r/63043/diff/1/?file=1857700#file1857700line262>
> >
> >     OK. I started to recall why we did that this way. I think we don't want to call query.close() in the subclass since when you return query.execute() actually DN doesn't get the result yet until you consume those results. 
> >     
> >     That's why we want the caller to be responsible to call close() explictly and the caller knows that it has consumed all the data. 
> >     
> >     The new implmentation will close the query object inside the subclass right?
> 
> Alexander Kolbasov wrote:
>     Originally Query wasn't Closeable or AutoCLoseable, so you couldn't use it anyway. Now it is and it could be possible to use Query directly, but this raises a problem of handling exceptions on close.
>     
>     Answering the second part of your question. The query now is closed immediately after the use (in subclass that creates the query). Query results can be used after query close, but I need to copy them into another collection - that's why there are a bunch of changes like `return new ArrayList<>(result)`.

I see. I believe the original solution is to avoid the memory pressure on HMS. With your approach, we are prefetching all the data and save in a collection, while the original approach delays the data retrieval until it's needed. 

Is that possible to preserve that logic but still let it close automatically?


- Aihua


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63043/#review188419
-----------------------------------------------------------


On Oct. 16, 2017, 6:37 p.m., Alexander Kolbasov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63043/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2017, 6:37 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Andrew Sherman, Alan Gates, Janaki Lahorani, Sergio Pena, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Bugs: 17730
>     https://issues.apache.org/jira/browse/17730
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-17730 Queries can be closed automatically
> 
> The goal of the patch is to replaces all curret uses of QueryWrapper with try-with-resource construct that guarantees that the
> query is closed.
> 
> This change affects the scope of the open query, so in all cases where the code was returning the list of query results, I had to copy the list to a new ArrayList to allow access after query close.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 22e246f1c914532ed6c67151c261fa709a283ef2 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ffb2abdf6294dbd470525ad888bd95feac7e7f06 
> 
> 
> Diff: https://reviews.apache.org/r/63043/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>


Re: Review Request 63043: HIVE-17730 Queries can be closed automatically

Posted by Alexander Kolbasov <ak...@gmail.com>.

> On Oct. 17, 2017, 11:25 p.m., Aihua Xu wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 261 (patched)
> > <https://reviews.apache.org/r/63043/diff/1/?file=1857700#file1857700line262>
> >
> >     OK. I started to recall why we did that this way. I think we don't want to call query.close() in the subclass since when you return query.execute() actually DN doesn't get the result yet until you consume those results. 
> >     
> >     That's why we want the caller to be responsible to call close() explictly and the caller knows that it has consumed all the data. 
> >     
> >     The new implmentation will close the query object inside the subclass right?
> 
> Alexander Kolbasov wrote:
>     Originally Query wasn't Closeable or AutoCLoseable, so you couldn't use it anyway. Now it is and it could be possible to use Query directly, but this raises a problem of handling exceptions on close.
>     
>     Answering the second part of your question. The query now is closed immediately after the use (in subclass that creates the query). Query results can be used after query close, but I need to copy them into another collection - that's why there are a bunch of changes like `return new ArrayList<>(result)`.
> 
> Aihua Xu wrote:
>     I see. I believe the original solution is to avoid the memory pressure on HMS. With your approach, we are prefetching all the data and save in a collection, while the original approach delays the data retrieval until it's needed. 
>     
>     Is that possible to preserve that logic but still let it close automatically?

Note that we are not copying the data itself in the new ArrayList, we are copying pointers. So the only extra overhead is the newly allocated array which is mostly a short-living object and not a big deal. The prefetches are present in the old code - I didn't add any new ones.

Doing what you suggest requires bigger code refactoring since try-with-resource is based on try-blocks, so we need to significantly shuffle the code around.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63043/#review188419
-----------------------------------------------------------


On Oct. 16, 2017, 6:37 p.m., Alexander Kolbasov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63043/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2017, 6:37 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Andrew Sherman, Alan Gates, Janaki Lahorani, Sergio Pena, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Bugs: 17730
>     https://issues.apache.org/jira/browse/17730
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-17730 Queries can be closed automatically
> 
> The goal of the patch is to replaces all curret uses of QueryWrapper with try-with-resource construct that guarantees that the
> query is closed.
> 
> This change affects the scope of the open query, so in all cases where the code was returning the list of query results, I had to copy the list to a new ArrayList to allow access after query close.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 22e246f1c914532ed6c67151c261fa709a283ef2 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ffb2abdf6294dbd470525ad888bd95feac7e7f06 
> 
> 
> Diff: https://reviews.apache.org/r/63043/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>


Re: Review Request 63043: HIVE-17730 Queries can be closed automatically

Posted by Aihua Xu via Review Board <no...@reviews.apache.org>.

> On Oct. 17, 2017, 11:25 p.m., Aihua Xu wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 261 (patched)
> > <https://reviews.apache.org/r/63043/diff/1/?file=1857700#file1857700line262>
> >
> >     OK. I started to recall why we did that this way. I think we don't want to call query.close() in the subclass since when you return query.execute() actually DN doesn't get the result yet until you consume those results. 
> >     
> >     That's why we want the caller to be responsible to call close() explictly and the caller knows that it has consumed all the data. 
> >     
> >     The new implmentation will close the query object inside the subclass right?
> 
> Alexander Kolbasov wrote:
>     Originally Query wasn't Closeable or AutoCLoseable, so you couldn't use it anyway. Now it is and it could be possible to use Query directly, but this raises a problem of handling exceptions on close.
>     
>     Answering the second part of your question. The query now is closed immediately after the use (in subclass that creates the query). Query results can be used after query close, but I need to copy them into another collection - that's why there are a bunch of changes like `return new ArrayList<>(result)`.
> 
> Aihua Xu wrote:
>     I see. I believe the original solution is to avoid the memory pressure on HMS. With your approach, we are prefetching all the data and save in a collection, while the original approach delays the data retrieval until it's needed. 
>     
>     Is that possible to preserve that logic but still let it close automatically?
> 
> Alexander Kolbasov wrote:
>     Note that we are not copying the data itself in the new ArrayList, we are copying pointers. So the only extra overhead is the newly allocated array which is mostly a short-living object and not a big deal. The prefetches are present in the old code - I didn't add any new ones.
>     
>     Doing what you suggest requires bigger code refactoring since try-with-resource is based on try-blocks, so we need to significantly shuffle the code around.

Maybe I didn't state clearly. When you call query.execute(), what I have noticed that DN actually didn't really execute the query. It will only return an object and wait until the result is getting consumed and DN will do the actual query against the database.

So for the following code, I remember query.execute() doesn't return any data for you at that time yet. Now if new ArrayList<> retrieves the data and it triggers the actual execution, then you are fetching all the data at a time (cause memory pressure); if it doesn't and if q.close() is called, then the List you returned to the parent is not accessible since the query is closed.

you will have either issue. That's what I remembered from the past. Can you check test failure TestHiveMetaTool.testExecuteJDOQL if it's related to this? Right now we are using directSQL access, so we may hide these DN issues from the tests.

private List sub() {

  try (queryWrapper q) {
  
     return new ArrayList<>((List<MPartition>) query.execute(dbName, tableName, partNameMatcher));
  }
}


- Aihua


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63043/#review188419
-----------------------------------------------------------


On Oct. 16, 2017, 6:37 p.m., Alexander Kolbasov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63043/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2017, 6:37 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Andrew Sherman, Alan Gates, Janaki Lahorani, Sergio Pena, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Bugs: 17730
>     https://issues.apache.org/jira/browse/17730
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-17730 Queries can be closed automatically
> 
> The goal of the patch is to replaces all curret uses of QueryWrapper with try-with-resource construct that guarantees that the
> query is closed.
> 
> This change affects the scope of the open query, so in all cases where the code was returning the list of query results, I had to copy the list to a new ArrayList to allow access after query close.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 22e246f1c914532ed6c67151c261fa709a283ef2 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ffb2abdf6294dbd470525ad888bd95feac7e7f06 
> 
> 
> Diff: https://reviews.apache.org/r/63043/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>


Re: Review Request 63043: HIVE-17730 Queries can be closed automatically

Posted by Alexander Kolbasov <ak...@gmail.com>.

> On Oct. 17, 2017, 11:25 p.m., Aihua Xu wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 261 (patched)
> > <https://reviews.apache.org/r/63043/diff/1/?file=1857700#file1857700line262>
> >
> >     OK. I started to recall why we did that this way. I think we don't want to call query.close() in the subclass since when you return query.execute() actually DN doesn't get the result yet until you consume those results. 
> >     
> >     That's why we want the caller to be responsible to call close() explictly and the caller knows that it has consumed all the data. 
> >     
> >     The new implmentation will close the query object inside the subclass right?

Originally Query wasn't Closeable or AutoCLoseable, so you couldn't use it anyway. Now it is and it could be possible to use Query directly, but this raises a problem of handling exceptions on close.

Answering the second part of your question. The query now is closed immediately after the use (in subclass that creates the query). Query results can be used after query close, but I need to copy them into another collection - that's why there are a bunch of changes like `return new ArrayList<>(result)`.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63043/#review188419
-----------------------------------------------------------


On Oct. 16, 2017, 6:37 p.m., Alexander Kolbasov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63043/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2017, 6:37 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Andrew Sherman, Alan Gates, Janaki Lahorani, Sergio Pena, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Bugs: 17730
>     https://issues.apache.org/jira/browse/17730
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-17730 Queries can be closed automatically
> 
> The goal of the patch is to replaces all curret uses of QueryWrapper with try-with-resource construct that guarantees that the
> query is closed.
> 
> This change affects the scope of the open query, so in all cases where the code was returning the list of query results, I had to copy the list to a new ArrayList to allow access after query close.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 22e246f1c914532ed6c67151c261fa709a283ef2 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ffb2abdf6294dbd470525ad888bd95feac7e7f06 
> 
> 
> Diff: https://reviews.apache.org/r/63043/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>


Re: Review Request 63043: HIVE-17730 Queries can be closed automatically

Posted by Aihua Xu via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63043/#review188419
-----------------------------------------------------------




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 261 (patched)
<https://reviews.apache.org/r/63043/#comment265419>

    OK. I started to recall why we did that this way. I think we don't want to call query.close() in the subclass since when you return query.execute() actually DN doesn't get the result yet until you consume those results. 
    
    That's why we want the caller to be responsible to call close() explictly and the caller knows that it has consumed all the data. 
    
    The new implmentation will close the query object inside the subclass right?


- Aihua Xu


On Oct. 16, 2017, 6:37 p.m., Alexander Kolbasov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63043/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2017, 6:37 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Andrew Sherman, Alan Gates, Janaki Lahorani, Sergio Pena, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Bugs: 17730
>     https://issues.apache.org/jira/browse/17730
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-17730 Queries can be closed automatically
> 
> The goal of the patch is to replaces all curret uses of QueryWrapper with try-with-resource construct that guarantees that the
> query is closed.
> 
> This change affects the scope of the open query, so in all cases where the code was returning the list of query results, I had to copy the list to a new ArrayList to allow access after query close.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/tools/HiveMetaTool.java 22e246f1c914532ed6c67151c261fa709a283ef2 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ffb2abdf6294dbd470525ad888bd95feac7e7f06 
> 
> 
> Diff: https://reviews.apache.org/r/63043/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>