You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Rajesh Balamohan <rb...@apache.org> on 2020/02/13 11:40:00 UTC

Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
-----------------------------------------------------------

Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
-------

- Main change is in TxnHandler::checkLock. 
    - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
    - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
- Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs
-----

  standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/1/


Testing
-------


Thanks,

Rajesh Balamohan


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

Posted by Denys Kuzmenko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219572
-----------------------------------------------------------




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4546 (patched)
<https://reviews.apache.org/r/72129/#comment307760>

    We shouldn't remove NULL partitions (lock on db/table level).


- Denys Kuzmenko


On Feb. 13, 2020, 11:40 a.m., Rajesh Balamohan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2020, 11:40 a.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> - Main change is in TxnHandler::checkLock. 
>     - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
>     - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
> - Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

Posted by Rajesh Balamohan <rb...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
-----------------------------------------------------------

(Updated Feb. 19, 2020, 2:10 p.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
-------

- Main change is in TxnHandler::checkLock. 
    - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
    - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
- Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
  standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/4/

Changes: https://reviews.apache.org/r/72129/diff/3-4/


Testing
-------


File Attachments
----------------

HIVE-22850.5.patch
  https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

Posted by Rajesh Balamohan <rb...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
-----------------------------------------------------------

(Updated Feb. 14, 2020, 1:27 a.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Changes
-------

- Removed refactoring changes to simplify changes in this patch.
- Building filter conditions in buildJumpTable(). SQL filters are built off these.


Repository: hive-git


Description
-------

- Main change is in TxnHandler::checkLock. 
    - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
    - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
- Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
  standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/3/

Changes: https://reviews.apache.org/r/72129/diff/2-3/


Testing
-------


File Attachments
----------------

HIVE-22850.5.patch
  https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

Posted by Rajesh Balamohan <rb...@apache.org>.

> On Feb. 13, 2020, 3:08 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 4546 (patched)
> > <https://reviews.apache.org/r/72129/diff/2/?file=2211214#file2211214line4594>
> >
> >     Hi Rajesh, could you please explain, what is the reason of doing partition filtering on HMS side, not backend db?

By adding all the partition details, the query can become large and has the issue of overflowing in the case of oracle (i,e have to batch with 1000 entries). Also, its incurs parsing in sql server side, as it is executed as Statement. Given that we have additional filter now, it would make it a lot simpler to do this in client side.  This was pointed out in the JIRA by Gopal.


- Rajesh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
-----------------------------------------------------------


On Feb. 13, 2020, 1:22 p.m., Rajesh Balamohan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2020, 1:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> - Main change is in TxnHandler::checkLock. 
>     - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
>     - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
> - Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/2/
> 
> 
> Testing
> -------
> 
> 
> File Attachments
> ----------------
> 
> HIVE-22850.5.patch
>   https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

Posted by Denys Kuzmenko via Review Board <no...@reviews.apache.org>.

> On Feb. 13, 2020, 3:08 p.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
> > Lines 4546 (patched)
> > <https://reviews.apache.org/r/72129/diff/2/?file=2211214#file2211214line4594>
> >
> >     Hi Rajesh, could you please explain, what is the reason of doing partition filtering on HMS side, not backend db?
> 
> Rajesh Balamohan wrote:
>     By adding all the partition details, the query can become large and has the issue of overflowing in the case of oracle (i,e have to batch with 1000 entries). Also, its incurs parsing in sql server side, as it is executed as Statement. Given that we have additional filter now, it would make it a lot simpler to do this in client side.  This was pointed out in the JIRA by Gopal.

Can we rewrite the query with JOIN operator? Somethinkg like:
https://issues.apache.org/jira/browse/HIVE-22888


- Denys


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
-----------------------------------------------------------


On Feb. 14, 2020, 1:27 a.m., Rajesh Balamohan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2020, 1:27 a.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> - Main change is in TxnHandler::checkLock. 
>     - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
>     - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
> - Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java a8b9653411 
>   standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/3/
> 
> 
> Testing
> -------
> 
> 
> File Attachments
> ----------------
> 
> HIVE-22850.5.patch
>   https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

Posted by Denys Kuzmenko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/#review219575
-----------------------------------------------------------




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 4546 (patched)
<https://reviews.apache.org/r/72129/#comment307762>

    Hi Rajesh, could you please explain, what is the reason of doing partition filtering on HMS side, not backend db?


- Denys Kuzmenko


On Feb. 13, 2020, 1:22 p.m., Rajesh Balamohan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72129/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2020, 1:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> - Main change is in TxnHandler::checkLock. 
>     - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
>     - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
> - Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 
> 
> 
> Diff: https://reviews.apache.org/r/72129/diff/2/
> 
> 
> Testing
> -------
> 
> 
> File Attachments
> ----------------
> 
> HIVE-22850.5.patch
>   https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>


Re: Review Request 72129: HIVE-22850: Optimise lock acquisition in TxnHandler

Posted by Rajesh Balamohan <rb...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72129/
-----------------------------------------------------------

(Updated Feb. 13, 2020, 1:22 p.m.)


Review request for hive, Gopal V, Peter Vary, and Zoltan Chovan.


Repository: hive-git


Description
-------

- Main change is in TxnHandler::checkLock. 
    - When all incoming requests are SHARED_READ, we can add a condition in the query to retrieve only relevant rows. This avoids significant number of rows fetched in the form of "SHARED_READ + ACQUIRED". There is a corner condition of "SHARED_WRITE --> SHARED_READ::ACQUIRED", which is misleading in the jumpttable. This condition can be optimised later.
    - Also, removed the "HL_PARTITION IN" clause which could potentially overflow for oracle. Partition details can be filtered out, if the earlier query actually returned any rows.
- Rest of the changes, are related to refactoring "TxnHandler::enqueueLockWithRetry" to reduce lock scope.


Diffs (updated)
-----

  standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f53aebe4ad 


Diff: https://reviews.apache.org/r/72129/diff/2/

Changes: https://reviews.apache.org/r/72129/diff/1-2/


Testing
-------


File Attachments (updated)
----------------

HIVE-22850.5.patch
  https://reviews.apache.org/media/uploaded/files/2020/02/13/74ec6cbd-c552-4d46-b5a6-e2fa6da41bdc__HIVE-22850.5.patch


Thanks,

Rajesh Balamohan