You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Lan Jiang <lj...@gmail.com> on 2015/08/24 21:54:56 UTC

Converting Protobuf object to Avro

Hi, there

I am trying to convert a protobuf object to Avro. I am using 	

//myProto object is deserialized using google protobuf API
ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
FileOutputStream fo = new FileOutputStream(args[0]);
Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
pbWriter.write(myProto, e);
fo.flush();

The avro file was created successfully. If I cat the file, I can see the data in the file. However, when I tried to use avro-tools to get schema or meta info about the saved avro file, it says

Exception in thread "main" java.io.IOException: Not a data file.
	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
	at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
	at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)

Look at the Avro source code, the error means it does not have the first 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong. 

Appreciate any help you can give me.

Lan

Re: Converting Protobuf object to Avro

Posted by Lan Jiang <lj...@gmail.com>.
Yes. I got it working using Protobufdata class. Just about writing to the mailing list. Thanks!

Sent from my iPhone

> On Aug 24, 2015, at 6:47 PM, William Briggs <wr...@gmail.com> wrote:
> 
> Have you looked at the ProtoBuffData utility class? The getSchema method might do the trick for you: http://avro.apache.org/docs/1.6.1/api/java/org/apache/avro/protobuf/ProtobufData.html#getSchema(java.lang.Class)
> 
> 
>> On Mon, Aug 24, 2015, 4:56 PM Lan Jiang <lj...@gmail.com> wrote:
>> Sean,
>> 
>> Thanks for the reply.
>> 
>> Your suggestion kind of makes sense. The default example wraps a GenericDatumWriter with a DataFileWriter. Then call the create/append/close method on DataFileWriter in sequence to write out the container file. 
>> 
>> Now my problem of using ProtobufDataWriter in a similar fashion is that I do not have an avro schema object in the method call dataFileWriter.create(schema, file). As I understand, the protobuf-avro should have a way to convert the protobuf schema to avro schema for you automatically. I have not found any utility class to do the schema conversion.  Correct me if I am wrong. 
>> 
>> Lan
>> 
>> 
>> 
>>> On Aug 24, 2015, at 3:14 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>> 
>>> Hiya Lan!
>>> 
>>> You need to use a container file instead of just writing via the datum writer yourself.
>>> 
>>> Take a look at the "Getting Started (Java)" section on serialization[1]. The example there uses the GenericDatumWriter, but you ought to be able to switch it out for your ProtobufDatumWriter.
>>> 
>>> 
>>> 
>>> 
>>> [1]: http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE
>>> 
>>>> On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <lj...@gmail.com> wrote:
>>>> Hi, there
>>>> 
>>>> I am trying to convert a protobuf object to Avro. I am using 	
>>>> 
>>>> //myProto object is deserialized using google protobuf API
>>>> ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
>>>> FileOutputStream fo = new FileOutputStream(args[0]);
>>>> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
>>>> pbWriter.write(myProto, e);
>>>> fo.flush();
>>>> 
>>>> The avro file was created successfully. If I cat the file, I can see the data in the file. However, when I tried to use avro-tools to get schema or meta info about the saved avro file, it says
>>>> 
>>>> Exception in thread "main" java.io.IOException: Not a data file.
>>>> 	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>>>> 	at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>>>> 	at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>>>> 
>>>> Look at the Avro source code, the error means it does not have the first 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong. 
>>>> 
>>>> Appreciate any help you can give me.
>>>> 
>>>> Lan
>>> 
>>> 
>>> 
>>> -- 
>>> Sean

Re: Converting Protobuf object to Avro

Posted by William Briggs <wr...@gmail.com>.
Have you looked at the ProtoBuffData utility class? The getSchema method
might do the trick for you:
http://avro.apache.org/docs/1.6.1/api/java/org/apache/avro/protobuf/ProtobufData.html#getSchema(java.lang.Class)

On Mon, Aug 24, 2015, 4:56 PM Lan Jiang <lj...@gmail.com> wrote:

> Sean,
>
> Thanks for the reply.
>
> Your suggestion kind of makes sense. The default example wraps a
> GenericDatumWriter with a DataFileWriter. Then call the create/append/close
> method on DataFileWriter in sequence to write out the container file.
>
> Now my problem of using ProtobufDataWriter in a similar fashion is that I
> do not have an avro schema object in the method call
> dataFileWriter.create(schema, file). As I understand, the protobuf-avro
> should have a way to convert the protobuf schema to avro schema for you
> automatically. I have not found any utility class to do the schema
> conversion.  Correct me if I am wrong.
>
> Lan
>
>
>
> On Aug 24, 2015, at 3:14 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> Hiya Lan!
>
> You need to use a container file instead of just writing via the datum
> writer yourself.
>
> Take a look at the "Getting Started (Java)" section on serialization[1].
> The example there uses the GenericDatumWriter, but you ought to be able to
> switch it out for your ProtobufDatumWriter.
>
>
>
>
> [1]:
> http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE
>
> On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <lj...@gmail.com> wrote:
>
>> Hi, there
>>
>> I am trying to convert a protobuf object to Avro. I am using
>>
>> //myProto object is deserialized using google protobuf API
>> ProtobufDatumWriter<MyProto> pbWriter = new
>> ProtobufDatumWriter<MyProto>(MyProto.class);
>> FileOutputStream fo = new FileOutputStream(args[0]);
>> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
>> pbWriter.write(myProto, e);
>> fo.flush();
>>
>> The avro file was created successfully. If I cat the file, I can see the
>> data in the file. However, when I tried to use avro-tools to get schema or
>> meta info about the saved avro file, it says
>>
>> Exception in thread "main" java.io.IOException: Not a data file.
>> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>> at
>> org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>>
>> Look at the Avro source code, the error means it does not have the first
>> 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done
>> anything wrong.
>>
>> Appreciate any help you can give me.
>>
>> Lan
>>
>
>
>
> --
> Sean
>
>
>

Re: Converting Protobuf object to Avro

Posted by Lan Jiang <lj...@gmail.com>.
Sean,

Thanks for the reply.

Your suggestion kind of makes sense. The default example wraps a GenericDatumWriter with a DataFileWriter. Then call the create/append/close method on DataFileWriter in sequence to write out the container file. 

Now my problem of using ProtobufDataWriter in a similar fashion is that I do not have an avro schema object in the method call dataFileWriter.create(schema, file). As I understand, the protobuf-avro should have a way to convert the protobuf schema to avro schema for you automatically. I have not found any utility class to do the schema conversion.  Correct me if I am wrong. 

Lan



> On Aug 24, 2015, at 3:14 PM, Sean Busbey <bu...@cloudera.com> wrote:
> 
> Hiya Lan!
> 
> You need to use a container file instead of just writing via the datum writer yourself.
> 
> Take a look at the "Getting Started (Java)" section on serialization[1]. The example there uses the GenericDatumWriter, but you ought to be able to switch it out for your ProtobufDatumWriter.
> 
> 
> 
> 
> [1]: http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE <http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE>
> 
> On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <ljiang2@gmail.com <ma...@gmail.com>> wrote:
> Hi, there
> 
> I am trying to convert a protobuf object to Avro. I am using 	
> 
> //myProto object is deserialized using google protobuf API
> ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
> FileOutputStream fo = new FileOutputStream(args[0]);
> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
> pbWriter.write(myProto, e);
> fo.flush();
> 
> The avro file was created successfully. If I cat the file, I can see the data in the file. However, when I tried to use avro-tools to get schema or meta info about the saved avro file, it says
> 
> Exception in thread "main" java.io.IOException: Not a data file.
> 	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
> 	at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> 	at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
> 
> Look at the Avro source code, the error means it does not have the first 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong. 
> 
> Appreciate any help you can give me.
> 
> Lan
> 
> 
> 
> -- 
> Sean


Re: Converting Protobuf object to Avro

Posted by Sean Busbey <bu...@cloudera.com>.
Hiya Lan!

You need to use a container file instead of just writing via the datum
writer yourself.

Take a look at the "Getting Started (Java)" section on serialization[1].
The example there uses the GenericDatumWriter, but you ought to be able to
switch it out for your ProtobufDatumWriter.




[1]:
http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE

On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <lj...@gmail.com> wrote:

> Hi, there
>
> I am trying to convert a protobuf object to Avro. I am using
>
> //myProto object is deserialized using google protobuf API
> ProtobufDatumWriter<MyProto> pbWriter = new
> ProtobufDatumWriter<MyProto>(MyProto.class);
>
> FileOutputStream fo = new FileOutputStream(args[0]);
> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
> pbWriter.write(myProto, e);
> fo.flush();
>
> The avro file was created successfully. If I cat the file, I can see the
> data in the file. However, when I tried to use avro-tools to get schema or
> meta info about the saved avro file, it says
>
> Exception in thread "main" java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at
> org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>
> Look at the Avro source code, the error means it does not have the first 4
> bytes matching the MAGIC first 4 bytes. I am trying to see if I have done
> anything wrong.
>
> Appreciate any help you can give me.
>
> Lan
>



-- 
Sean