You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Aihua Xu via Review Board <no...@reviews.apache.org> on 2018/03/20 23:54:10 UTC

Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/
-----------------------------------------------------------

Review request for hive, Alexander Kolbasov and Yongzhi Chen.


Repository: hive-git


Description
-------

If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .

I'm breaking the query into multiple smaller queries and then we aggregate the result together.


Diffs
-----

  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 6ead20aeaf 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 88d88ed4df 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 9f822564bd 


Diff: https://reviews.apache.org/r/66188/diff/1/


Testing
-------

Manual test has been done for large column of tables.


Thanks,

Aihua Xu


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

Posted by Aihua Xu via Review Board <no...@reviews.apache.org>.

> On March 22, 2018, 11:03 a.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> > Lines 154-158 (original), 154-158 (patched)
> > <https://reviews.apache.org/r/66188/diff/2/?file=1984629#file1984629line154>
> >
> >     It might be a good idea, to use this around our batching as well:
> >     - DatabaseProduct.needsInBatching(dbType)
> >     
> >     What do you think @Aihua?

Thanks Peter for reviewing. This is slightly different problem. In directSQL, some databases may not need batch, some do. While in DN, the limitation is in DN, so it applies to all the databases.


- Aihua


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/#review199751
-----------------------------------------------------------


On April 23, 2018, 10:51 p.m., Aihua Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66188/
> -----------------------------------------------------------
> 
> (Updated April 23, 2018, 10:51 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Yongzhi Chen.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .
> 
> I'm breaking the query into multiple smaller queries and then we aggregate the result together.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/alter_rename_table.q 53fb230cf6 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 997f5fdb88 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 125d5a79f2 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 59749e4947 
> 
> 
> Diff: https://reviews.apache.org/r/66188/diff/3/
> 
> 
> Testing
> -------
> 
> Manual test has been done for large column of tables.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.

> On March 22, 2018, 11:03 a.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> > Lines 154-158 (original), 154-158 (patched)
> > <https://reviews.apache.org/r/66188/diff/2/?file=1984629#file1984629line154>
> >
> >     It might be a good idea, to use this around our batching as well:
> >     - DatabaseProduct.needsInBatching(dbType)
> >     
> >     What do you think @Aihua?
> 
> Aihua Xu wrote:
>     Thanks Peter for reviewing. This is slightly different problem. In directSQL, some databases may not need batch, some do. While in DN, the limitation is in DN, so it applies to all the databases.

Makes sense. Thanks for the clarification!


- Peter


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/#review199751
-----------------------------------------------------------


On April 23, 2018, 10:51 p.m., Aihua Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66188/
> -----------------------------------------------------------
> 
> (Updated April 23, 2018, 10:51 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Yongzhi Chen.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .
> 
> I'm breaking the query into multiple smaller queries and then we aggregate the result together.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/alter_rename_table.q 53fb230cf6 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 997f5fdb88 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 125d5a79f2 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 59749e4947 
> 
> 
> Diff: https://reviews.apache.org/r/66188/diff/3/
> 
> 
> Testing
> -------
> 
> Manual test has been done for large column of tables.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/#review199751
-----------------------------------------------------------



Hi Aihua,

One small nit, and a question from here.

Thanks,
Peter


standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Lines 154-158 (original), 154-158 (patched)
<https://reviews.apache.org/r/66188/#comment280197>

    It might be a good idea, to use this around our batching as well:
    - DatabaseProduct.needsInBatching(dbType)
    
    What do you think @Aihua?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
Lines 842 (patched)
<https://reviews.apache.org/r/66188/#comment280196>

    nit: typo: "queris"


- Peter Vary


On March 21, 2018, 6:57 p.m., Aihua Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66188/
> -----------------------------------------------------------
> 
> (Updated March 21, 2018, 6:57 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Yongzhi Chen.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .
> 
> I'm breaking the query into multiple smaller queries and then we aggregate the result together.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/alter_rename_table.q 2061850540 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 6ead20aeaf 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 88d88ed4df 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 9f822564bd 
> 
> 
> Diff: https://reviews.apache.org/r/66188/diff/2/
> 
> 
> Testing
> -------
> 
> Manual test has been done for large column of tables.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

Posted by Aihua Xu via Review Board <no...@reviews.apache.org>.

> On April 17, 2018, 4:49 p.m., Yongzhi Chen wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 7730 (patched)
> > <https://reviews.apache.org/r/66188/diff/2/?file=1984630#file1984630line7731>
> >
> >     Should you call addQueryAfterUse and closeAllQueries ? That's how do you release the resources held by the batch queries?

In this change, I reuse single query to perform multiple queries and then release the query using queryWrapper. I did check how these two are used. Seems it's misused in other places that whenever there is new query, we created a new Query object which is not necessary. We can improve the logic to reuse the same Query object. It doesn't seem to be necessary.


- Aihua


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/#review201323
-----------------------------------------------------------


On March 21, 2018, 6:57 p.m., Aihua Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66188/
> -----------------------------------------------------------
> 
> (Updated March 21, 2018, 6:57 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Yongzhi Chen.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .
> 
> I'm breaking the query into multiple smaller queries and then we aggregate the result together.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/alter_rename_table.q 2061850540 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 6ead20aeaf 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 88d88ed4df 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 9f822564bd 
> 
> 
> Diff: https://reviews.apache.org/r/66188/diff/2/
> 
> 
> Testing
> -------
> 
> Manual test has been done for large column of tables.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

Posted by Yongzhi Chen via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/#review201323
-----------------------------------------------------------




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 7730 (patched)
<https://reviews.apache.org/r/66188/#comment282498>

    Should you call addQueryAfterUse and closeAllQueries ? That's how do you release the resources held by the batch queries?


- Yongzhi Chen


On March 21, 2018, 6:57 p.m., Aihua Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66188/
> -----------------------------------------------------------
> 
> (Updated March 21, 2018, 6:57 p.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Yongzhi Chen.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .
> 
> I'm breaking the query into multiple smaller queries and then we aggregate the result together.
> 
> 
> Diffs
> -----
> 
>   ql/src/test/queries/clientpositive/alter_rename_table.q 2061850540 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 6ead20aeaf 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 88d88ed4df 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 9f822564bd 
> 
> 
> Diff: https://reviews.apache.org/r/66188/diff/2/
> 
> 
> Testing
> -------
> 
> Manual test has been done for large column of tables.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

Posted by Aihua Xu via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/
-----------------------------------------------------------

(Updated April 23, 2018, 10:51 p.m.)


Review request for hive, Alexander Kolbasov and Yongzhi Chen.


Changes
-------

Address comments.


Repository: hive-git


Description
-------

If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .

I'm breaking the query into multiple smaller queries and then we aggregate the result together.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/alter_rename_table.q 53fb230cf6 
  ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 997f5fdb88 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 125d5a79f2 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 59749e4947 


Diff: https://reviews.apache.org/r/66188/diff/3/

Changes: https://reviews.apache.org/r/66188/diff/2-3/


Testing
-------

Manual test has been done for large column of tables.


Thanks,

Aihua Xu


Re: Review Request 66188: HIVE-18986 Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns

Posted by Aihua Xu via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66188/
-----------------------------------------------------------

(Updated March 21, 2018, 6:57 p.m.)


Review request for hive, Alexander Kolbasov and Yongzhi Chen.


Changes
-------

Added unit tests and fix checkstyle.


Repository: hive-git


Description
-------

If the table contains a lot of columns e.g, 5k, simple table rename would fail with the following stack trace. The issue is datanucleus can't handle the query with lots of colName='c1' && colName='c2' && ... .

I'm breaking the query into multiple smaller queries and then we aggregate the result together.


Diffs (updated)
-----

  ql/src/test/queries/clientpositive/alter_rename_table.q 2061850540 
  ql/src/test/results/clientpositive/alter_rename_table.q.out 732d8a28d8 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/Batchable.java PRE-CREATION 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 6ead20aeaf 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 88d88ed4df 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java 9f822564bd 


Diff: https://reviews.apache.org/r/66188/diff/2/

Changes: https://reviews.apache.org/r/66188/diff/1-2/


Testing
-------

Manual test has been done for large column of tables.


Thanks,

Aihua Xu