You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "Casey Stella (JIRA)" <ji...@apache.org> on 2017/01/25 23:10:27 UTC

[jira] [Assigned] (METRON-672) SolrIndexingIntegrationTest fails intermittently

     [ https://issues.apache.org/jira/browse/METRON-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Casey Stella reassigned METRON-672:
-----------------------------------

    Assignee: Casey Stella

> SolrIndexingIntegrationTest fails intermittently
> ------------------------------------------------
>
>                 Key: METRON-672
>                 URL: https://issues.apache.org/jira/browse/METRON-672
>             Project: Metron
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Justin Leet
>            Assignee: Casey Stella
>
> Adapted from a dev list conversation
> h4. Initial Error in Travis
> Jon noted this in the Travis builds
> {code}
> `test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest):
> Took too long to complete: 150582 > 150000`, more details below:
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 166.167 sec <<< FAILURE!
> test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> Time elapsed: 166.071 sec  <<< ERROR!
> {code}
> h4. Additional Notes
> Casey was able to reproduce this locally (but not in his IDE). Couple details in the dev list excerpt.
> Fixing this should ideally include adding more detailed logging to hopefully avoid these issues in the future.  As a note, unfortunately in this case, Casey notes that logging seems to make this issue rarer.  Still, logging to be able to understand the flow (and tuning logging level as appropriate) would help resolve issues in the future.
> h4. Dev List Excerpt
> Per Casey:
> {quote}
> Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt from
> the failing logs on travis for my PR with substantially longer timeouts (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)
> {code}
> Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
> 0 vs 10 vs 0
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 317.056 sec <<< FAILURE!
> test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> Time elapsed: 316.949 sec  <<< ERROR!
> java.lang.RuntimeException: Took too long to complete: 300783 > 300000
>         at org.apache.metron.integration.ComponentRunner.process(ComponentRunner.java:131)
>         at org.apache.metron.indexing.integration.IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> {code}
> I'm getting the impression that this isn't the timeout and we have a
> mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> seconds apart.  That line means that it read 10 entries from kafka, 10
> entries from the indexed data and 6 entries from HDFS.  It's that 6
> entries that is the problem.   Also of note, this does not seem to
> happen to me locally AND it's not consistent on Travis.  Given all
> that I'd say that it's a problem with the HDFS Writer not getting
> flushed, but I verified that it is indeed flushed per message.
> One more thing, just for posterity here, it always freezes at 6 records
> written to HDFS.  That's the reason I thought it was a flushing issue.
> One thing that I would caution though is that this is likely a heisenbug.
> The more logging I added earlier made it less likely to occur. It seems
> more likely to occur on Travis than locally and I made it happen by
> repeatedly running mvn install on Metron-solr (after a mvn install of the
> whole project).
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)