You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Shruthi Jeganathan <sh...@tapjoy.com> on 2015/03/13 19:05:53 UTC

Concurrent writes to same avro file

Hi,

I have multiple threads writing to same avro output file(out.avro). When
deserializing out.avro, I get this exception:

org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
    at com.example.Main.deserialize(Main.java:80)
    at com.example.Main.main(Main.java:50)Caused by:
java.io.IOException: Invalid sync!
    at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
    at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
    ... 2 more

Is this because I'm concurrently writing to out.avro? If it's an issue, is
there a way for multiple threads to simultaneously write to out.avro?
Please provide code samples, if possible.

Thanks.

Re: Concurrent writes to same avro file

Posted by Sean Busbey <bu...@cloudera.com>.
that should be synchronize on the DataFileWriter instance, or whatever
writing object you're using.

On Fri, Mar 13, 2015 at 1:22 PM, Sean Busbey <bu...@cloudera.com> wrote:

> The various Avro writer / readers are not thread safe. You will need to do
> some sort of external synchronization. If the threads are in the same JVM,
> the easiest way to write from multiple threads safely will be to
> synchronize on the DataFileStream instance.
>
> e.g.
>
> synchronized(myDataFileWriter) {
>   myDataFileWriter.append(datum);
> }
>
>
>
> On Fri, Mar 13, 2015 at 1:05 PM, Shruthi Jeganathan <
> shruthi.jeganathan@tapjoy.com> wrote:
>
>> Hi,
>>
>> I have multiple threads writing to same avro output file(out.avro). When
>> deserializing out.avro, I get this exception:
>>
>> org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
>>     at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
>>     at com.example.Main.deserialize(Main.java:80)
>>     at com.example.Main.main(Main.java:50)Caused by: java.io.IOException: Invalid sync!
>>     at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
>>     at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)     ... 2 more
>>
>> Is this because I'm concurrently writing to out.avro? If it's an issue,
>> is there a way for multiple threads to simultaneously write to out.avro?
>> Please provide code samples, if possible.
>>
>> Thanks.
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Re: Concurrent writes to same avro file

Posted by Sean Busbey <bu...@cloudera.com>.
The various Avro writer / readers are not thread safe. You will need to do
some sort of external synchronization. If the threads are in the same JVM,
the easiest way to write from multiple threads safely will be to
synchronize on the DataFileStream instance.

e.g.

synchronized(myDataFileWriter) {
  myDataFileWriter.append(datum);
}



On Fri, Mar 13, 2015 at 1:05 PM, Shruthi Jeganathan <
shruthi.jeganathan@tapjoy.com> wrote:

> Hi,
>
> I have multiple threads writing to same avro output file(out.avro). When
> deserializing out.avro, I get this exception:
>
> org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
>     at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
>     at com.example.Main.deserialize(Main.java:80)
>     at com.example.Main.main(Main.java:50)Caused by: java.io.IOException: Invalid sync!
>     at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
>     at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)     ... 2 more
>
> Is this because I'm concurrently writing to out.avro? If it's an issue, is
> there a way for multiple threads to simultaneously write to out.avro?
> Please provide code samples, if possible.
>
> Thanks.
>
>


-- 
Sean