You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Deepak Jaiswal <dj...@hortonworks.com> on 2018/07/08 16:49:18 UTC

Hive QA batches timing out

I am seeing tests timing out in my latest ptest run,

https://builds.apache.org/job/PreCommit-HIVE-Build/12468/testReport
https://builds.apache.org/job/PreCommit-HIVE-Build/12468/console

TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240)


From the Hive QA homepage, the last stable build was 12444 whereas the current run is 12473. I looked at some of the runs in between and it looks like most of the runs are failing due to the above batch of unit tests.

Regards,
Deepak

Re: Hive QA batches timing out

Posted by Mahesh Kumar Behera <mb...@hortonworks.com>.
The " Unable to shutdown metastore client' error is coming from shutting down syncMetaStoreClient, as both syncMetaStoreClient and metaStoreClient shares the same client.  I think in Hive:: close method, we should not call syncMetaStoreClient.close().

On 7/9/18, 12:49 PM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:

    I've a feeling that sometimes the same issue happens in other tests - but I agree; disabling it will make our life easier - until the real cause is uncovered and fixed.
    
    cheers,
    Zoltan
    
    
    On 07/09/2018 09:05 AM, Deepak Jaiswal wrote:
    > Thanks Zoltan for the analysis. Perhaps we should disable the test in the meantime as it is blocking several people from committing.
    > 
    > I can go ahead and create a patch for it.
    > 
    > Regards,
    > Deepak
    > 
    > On 7/8/18, 11:33 PM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
    > 
    >      Hello
    >      
    >      Thank you Deepak for taking a closer look! ....from what you've found I've noticed that the runtime of TestReplicationScenariosAcidTables have jumped up to ~2000sec in the
    >      runs which have failed....it seems like this problem is there for a long time now; I've found jira tickets in which this test was "timed out" and the HiveQA comment was
    >      date at April 03....so it's not entirely new...
    >      
    >      The problem which prohibits this test from completing successfully seems like that it has difficulties closing down the metastore client - which goes on for a while ...
    >      I don't know if this is an acid/replication/metastore/? issue...but it seems intermittent - I've a hunch that somehow it might happen more reliably with this test...I've
    >      opened HIVE-20121 to investigate this...
    >      
    >      2018-07-08T22:07:33,461 DEBUG [main] metastore.HiveMetaStoreClient: Unable to shutdown metastore client. Will try closing transport directly.
    >      org.apache.thrift.transport.TTransportException: Cannot write to null outputStream
    >      
    >      some links to more or less recent logs:
    >      http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/maven-test.txt
    >      the hive.log is ~200M:
    >      http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/logs/hive.log
    >      
    >      
    >      cheers,
    >      Zoltan
    >      
    >      On 07/08/2018 06:49 PM, Deepak Jaiswal wrote:
    >      > I am seeing tests timing out in my latest ptest run,
    >      >
    >      > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/testReport
    >      > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/console
    >      >
    >      > TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    >      > TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    >      > TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    >      > TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    >      > TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    >      > TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    >      >
    >      >
    >      >  From the Hive QA homepage, the last stable build was 12444 whereas the current run is 12473. I looked at some of the runs in between and it looks like most of the runs are failing due to the above batch of unit tests.
    >      >
    >      > Regards,
    >      > Deepak
    >      >
    >      
    >      
    > 
    
    


Re: Hive QA batches timing out

Posted by Zoltan Haindrich <ki...@rxd.hu>.
I've a feeling that sometimes the same issue happens in other tests - but I agree; disabling it will make our life easier - until the real cause is uncovered and fixed.

cheers,
Zoltan


On 07/09/2018 09:05 AM, Deepak Jaiswal wrote:
> Thanks Zoltan for the analysis. Perhaps we should disable the test in the meantime as it is blocking several people from committing.
> 
> I can go ahead and create a patch for it.
> 
> Regards,
> Deepak
> 
> On 7/8/18, 11:33 PM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
> 
>      Hello
>      
>      Thank you Deepak for taking a closer look! ....from what you've found I've noticed that the runtime of TestReplicationScenariosAcidTables have jumped up to ~2000sec in the
>      runs which have failed....it seems like this problem is there for a long time now; I've found jira tickets in which this test was "timed out" and the HiveQA comment was
>      date at April 03....so it's not entirely new...
>      
>      The problem which prohibits this test from completing successfully seems like that it has difficulties closing down the metastore client - which goes on for a while ...
>      I don't know if this is an acid/replication/metastore/? issue...but it seems intermittent - I've a hunch that somehow it might happen more reliably with this test...I've
>      opened HIVE-20121 to investigate this...
>      
>      2018-07-08T22:07:33,461 DEBUG [main] metastore.HiveMetaStoreClient: Unable to shutdown metastore client. Will try closing transport directly.
>      org.apache.thrift.transport.TTransportException: Cannot write to null outputStream
>      
>      some links to more or less recent logs:
>      http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/maven-test.txt
>      the hive.log is ~200M:
>      http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/logs/hive.log
>      
>      
>      cheers,
>      Zoltan
>      
>      On 07/08/2018 06:49 PM, Deepak Jaiswal wrote:
>      > I am seeing tests timing out in my latest ptest run,
>      >
>      > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/testReport
>      > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/console
>      >
>      > TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
>      > TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
>      > TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
>      > TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
>      > TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
>      > TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
>      >
>      >
>      >  From the Hive QA homepage, the last stable build was 12444 whereas the current run is 12473. I looked at some of the runs in between and it looks like most of the runs are failing due to the above batch of unit tests.
>      >
>      > Regards,
>      > Deepak
>      >
>      
>      
> 

Re: Hive QA batches timing out

Posted by Deepak Jaiswal <dj...@hortonworks.com>.
Thanks Zoltan for the analysis. Perhaps we should disable the test in the meantime as it is blocking several people from committing.

I can go ahead and create a patch for it.

Regards,
Deepak

On 7/8/18, 11:33 PM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:

    Hello
    
    Thank you Deepak for taking a closer look! ....from what you've found I've noticed that the runtime of TestReplicationScenariosAcidTables have jumped up to ~2000sec in the 
    runs which have failed....it seems like this problem is there for a long time now; I've found jira tickets in which this test was "timed out" and the HiveQA comment was 
    date at April 03....so it's not entirely new...
    
    The problem which prohibits this test from completing successfully seems like that it has difficulties closing down the metastore client - which goes on for a while ...
    I don't know if this is an acid/replication/metastore/? issue...but it seems intermittent - I've a hunch that somehow it might happen more reliably with this test...I've 
    opened HIVE-20121 to investigate this...
    
    2018-07-08T22:07:33,461 DEBUG [main] metastore.HiveMetaStoreClient: Unable to shutdown metastore client. Will try closing transport directly.
    org.apache.thrift.transport.TTransportException: Cannot write to null outputStream
    
    some links to more or less recent logs:
    http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/maven-test.txt
    the hive.log is ~200M:
    http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/logs/hive.log
    
    
    cheers,
    Zoltan
    
    On 07/08/2018 06:49 PM, Deepak Jaiswal wrote:
    > I am seeing tests timing out in my latest ptest run,
    > 
    > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/testReport
    > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/console
    > 
    > TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    > TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    > TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    > TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    > TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    > TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
    > 
    > 
    >  From the Hive QA homepage, the last stable build was 12444 whereas the current run is 12473. I looked at some of the runs in between and it looks like most of the runs are failing due to the above batch of unit tests.
    > 
    > Regards,
    > Deepak
    > 
    
    


Re: Hive QA batches timing out

Posted by Zoltan Haindrich <ki...@rxd.hu>.
Hello

Thank you Deepak for taking a closer look! ....from what you've found I've noticed that the runtime of TestReplicationScenariosAcidTables have jumped up to ~2000sec in the 
runs which have failed....it seems like this problem is there for a long time now; I've found jira tickets in which this test was "timed out" and the HiveQA comment was 
date at April 03....so it's not entirely new...

The problem which prohibits this test from completing successfully seems like that it has difficulties closing down the metastore client - which goes on for a while ...
I don't know if this is an acid/replication/metastore/? issue...but it seems intermittent - I've a hunch that somehow it might happen more reliably with this test...I've 
opened HIVE-20121 to investigate this...

2018-07-08T22:07:33,461 DEBUG [main] metastore.HiveMetaStoreClient: Unable to shutdown metastore client. Will try closing transport directly.
org.apache.thrift.transport.TTransportException: Cannot write to null outputStream

some links to more or less recent logs:
http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/maven-test.txt
the hive.log is ~200M:
http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/logs/hive.log


cheers,
Zoltan

On 07/08/2018 06:49 PM, Deepak Jaiswal wrote:
> I am seeing tests timing out in my latest ptest run,
> 
> https://builds.apache.org/job/PreCommit-HIVE-Build/12468/testReport
> https://builds.apache.org/job/PreCommit-HIVE-Build/12468/console
> 
> TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
> TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
> TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
> TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
> TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
> TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240)
> 
> 
>  From the Hive QA homepage, the last stable build was 12444 whereas the current run is 12473. I looked at some of the runs in between and it looks like most of the runs are failing due to the above batch of unit tests.
> 
> Regards,
> Deepak
>