You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Lars Francke (JIRA)" <ji...@apache.org> on 2014/10/15 19:26:35 UTC

[jira] [Updated] (AVRO-1302) Files written via Python and Avro 1.7.4 on Windows can't be read using Java program

     [ https://issues.apache.org/jira/browse/AVRO-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Francke updated AVRO-1302:
-------------------------------
    Attachment: AVRO-1302.1.patch

It took a day of debugging but we found the solution for our problem.

It is caused by recklessly copying and pasting the example code from the Documentation. This code opens files using the {{w}} and {{r}} modes respectively. These modes replace newline characters with their platform-specific representations. On Windows {{\n}} is being replaced by {{\r\n}}. That obviously corrupts the data.

I've attached a patch that fixes the documentation to always use binary mode ({{wb}} and {{rb}}) and added a note that explains the importance of these.

> Files written via Python and Avro 1.7.4 on Windows can't be read using Java program
> -----------------------------------------------------------------------------------
>
>                 Key: AVRO-1302
>                 URL: https://issues.apache.org/jira/browse/AVRO-1302
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.7.4
>            Reporter: Christopher Conner
>            Priority: Minor
>         Attachments: AVRO-1302.1.patch
>
>
> I'm not sure if this is a Python issue, Avro issue or Windows issue.  However, if create an Avro file on Windows using Python 2.7.4 and Avro 1.7.4.  Then try to read it with a java program, it fails with:
> Successfully opened the Python avro file now I'm going to attempt to read from it
> Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
> 	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
> 	at JavaPythonAvroExample.main(JavaPythonAvroExample.java:27)
> Caused by: java.io.IOException: Invalid sync!
> 	at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
> 	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
> 	... 1 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)