You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/02/02 16:43:00 UTC

[jira] [Work logged] (CRUNCH-698) Avro DataFileReader creation can hang

     [ https://issues.apache.org/jira/browse/CRUNCH-698?focusedWorklogId=546120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546120 ]

ASF GitHub Bot logged work on CRUNCH-698:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Feb/21 16:42
            Start Date: 02/Feb/21 16:42
    Worklog Time Spent: 10m 
      Work Description: noslowerdna opened a new pull request #34:
URL: https://github.com/apache/crunch/pull/34


   Fixes [AVRO-2944](https://issues.apache.org/jira/browse/AVRO-2944) where Avro's static method for creating a DataFileReader instance can get stuck in an infinite loop while trying to read the 4 byte "magic" header of the file. More details can be found at [CRUNCH-698](https://issues.apache.org/jira/browse/CRUNCH-698).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 546120)
    Remaining Estimate: 0h
            Time Spent: 10m

> Avro DataFileReader creation can hang
> -------------------------------------
>
>                 Key: CRUNCH-698
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-698
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Andrew Olson
>            Assignee: Josh Wills
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944] was recently found in the static method for creating a DataFileReader instance, where it can get stuck in an infinite loop while trying to read the 4 byte "magic" header of the file.
> This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro versions. The issue has existed since Avro 1.5 although we have encountered it recently. It does not happen in normal circumstances, there has to be some very unusual input stream behavior (partial/throttled read, or unexpected EOF) causing it. We've only seen it with the S3AFileSystem's S3AInputStream, suddenly starting a few days ago for no apparent reason. Even now it is sporadic, happening a small percent of the time in job tasks that read many S3 files but often enough to be problematic. An AWS support case is open to attempt to find out what could have caused this.
> To avoid the external dependency on a particular Avro version to fix this, we can probably just patch this locally in Crunch since it's only one static method and apart from one legacy constant everything we need access to in the Avro code is public.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)