You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike Drob (JIRA)" <ji...@apache.org> on 2015/12/04 22:06:11 UTC

[jira] [Updated] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

     [ https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Drob updated SOLR-8292:
----------------------------
    Attachment: SOLR-8292.patch

Here's the start of a patch to get better logging around what is happening.

I think the intent of the "return null for EOF" was to produce a null after the last complete record had been read. A easily checked "we're done" marker.

In the cases where it actually throws an EOF, I think there must be some truncation and a corrupt tlog file where it fails in the middle of a record.

> TransactionLog.next() does not honor contract and return null for EOF
> ---------------------------------------------------------------------
>
>                 Key: SOLR-8292
>                 URL: https://issues.apache.org/jira/browse/SOLR-8292
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>         Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, but looking at the code a bit more I'm not so sure, that seems like it'd mask what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the output file? Maybe because the 'i' key is right next to the 'o' key? The comment hints that it's checking for the ability to read the first record in input stream along with the header. And perhaps there's a different issue here because the expectation clearly is that the first record should be there if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org