You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Lan Jiang <lj...@gmail.com> on 2015/08/24 21:54:56 UTC
Converting Protobuf object to Avro
Hi, there
I am trying to convert a protobuf object to Avro. I am using
//myProto object is deserialized using google protobuf API
ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
FileOutputStream fo = new FileOutputStream(args[0]);
Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
pbWriter.write(myProto, e);
fo.flush();
The avro file was created successfully. If I cat the file, I can see the data in the file. However, when I tried to use avro-tools to get schema or meta info about the saved avro file, it says
Exception in thread "main" java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
Look at the Avro source code, the error means it does not have the first 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong.
Appreciate any help you can give me.
Lan
Re: Converting Protobuf object to Avro
Posted by Lan Jiang <lj...@gmail.com>.
Yes. I got it working using Protobufdata class. Just about writing to the mailing list. Thanks!
Sent from my iPhone
> On Aug 24, 2015, at 6:47 PM, William Briggs <wr...@gmail.com> wrote:
>
> Have you looked at the ProtoBuffData utility class? The getSchema method might do the trick for you: http://avro.apache.org/docs/1.6.1/api/java/org/apache/avro/protobuf/ProtobufData.html#getSchema(java.lang.Class)
>
>
>> On Mon, Aug 24, 2015, 4:56 PM Lan Jiang <lj...@gmail.com> wrote:
>> Sean,
>>
>> Thanks for the reply.
>>
>> Your suggestion kind of makes sense. The default example wraps a GenericDatumWriter with a DataFileWriter. Then call the create/append/close method on DataFileWriter in sequence to write out the container file.
>>
>> Now my problem of using ProtobufDataWriter in a similar fashion is that I do not have an avro schema object in the method call dataFileWriter.create(schema, file). As I understand, the protobuf-avro should have a way to convert the protobuf schema to avro schema for you automatically. I have not found any utility class to do the schema conversion. Correct me if I am wrong.
>>
>> Lan
>>
>>
>>
>>> On Aug 24, 2015, at 3:14 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>>
>>> Hiya Lan!
>>>
>>> You need to use a container file instead of just writing via the datum writer yourself.
>>>
>>> Take a look at the "Getting Started (Java)" section on serialization[1]. The example there uses the GenericDatumWriter, but you ought to be able to switch it out for your ProtobufDatumWriter.
>>>
>>>
>>>
>>>
>>> [1]: http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE
>>>
>>>> On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <lj...@gmail.com> wrote:
>>>> Hi, there
>>>>
>>>> I am trying to convert a protobuf object to Avro. I am using
>>>>
>>>> //myProto object is deserialized using google protobuf API
>>>> ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
>>>> FileOutputStream fo = new FileOutputStream(args[0]);
>>>> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
>>>> pbWriter.write(myProto, e);
>>>> fo.flush();
>>>>
>>>> The avro file was created successfully. If I cat the file, I can see the data in the file. However, when I tried to use avro-tools to get schema or meta info about the saved avro file, it says
>>>>
>>>> Exception in thread "main" java.io.IOException: Not a data file.
>>>> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>>>> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>>>> at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>>>>
>>>> Look at the Avro source code, the error means it does not have the first 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong.
>>>>
>>>> Appreciate any help you can give me.
>>>>
>>>> Lan
>>>
>>>
>>>
>>> --
>>> Sean
Re: Converting Protobuf object to Avro
Posted by William Briggs <wr...@gmail.com>.
Have you looked at the ProtoBuffData utility class? The getSchema method
might do the trick for you:
http://avro.apache.org/docs/1.6.1/api/java/org/apache/avro/protobuf/ProtobufData.html#getSchema(java.lang.Class)
On Mon, Aug 24, 2015, 4:56 PM Lan Jiang <lj...@gmail.com> wrote:
> Sean,
>
> Thanks for the reply.
>
> Your suggestion kind of makes sense. The default example wraps a
> GenericDatumWriter with a DataFileWriter. Then call the create/append/close
> method on DataFileWriter in sequence to write out the container file.
>
> Now my problem of using ProtobufDataWriter in a similar fashion is that I
> do not have an avro schema object in the method call
> dataFileWriter.create(schema, file). As I understand, the protobuf-avro
> should have a way to convert the protobuf schema to avro schema for you
> automatically. I have not found any utility class to do the schema
> conversion. Correct me if I am wrong.
>
> Lan
>
>
>
> On Aug 24, 2015, at 3:14 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> Hiya Lan!
>
> You need to use a container file instead of just writing via the datum
> writer yourself.
>
> Take a look at the "Getting Started (Java)" section on serialization[1].
> The example there uses the GenericDatumWriter, but you ought to be able to
> switch it out for your ProtobufDatumWriter.
>
>
>
>
> [1]:
> http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE
>
> On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <lj...@gmail.com> wrote:
>
>> Hi, there
>>
>> I am trying to convert a protobuf object to Avro. I am using
>>
>> //myProto object is deserialized using google protobuf API
>> ProtobufDatumWriter<MyProto> pbWriter = new
>> ProtobufDatumWriter<MyProto>(MyProto.class);
>> FileOutputStream fo = new FileOutputStream(args[0]);
>> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
>> pbWriter.write(myProto, e);
>> fo.flush();
>>
>> The avro file was created successfully. If I cat the file, I can see the
>> data in the file. However, when I tried to use avro-tools to get schema or
>> meta info about the saved avro file, it says
>>
>> Exception in thread "main" java.io.IOException: Not a data file.
>> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>> at
>> org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>>
>> Look at the Avro source code, the error means it does not have the first
>> 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done
>> anything wrong.
>>
>> Appreciate any help you can give me.
>>
>> Lan
>>
>
>
>
> --
> Sean
>
>
>
Re: Converting Protobuf object to Avro
Posted by Lan Jiang <lj...@gmail.com>.
Sean,
Thanks for the reply.
Your suggestion kind of makes sense. The default example wraps a GenericDatumWriter with a DataFileWriter. Then call the create/append/close method on DataFileWriter in sequence to write out the container file.
Now my problem of using ProtobufDataWriter in a similar fashion is that I do not have an avro schema object in the method call dataFileWriter.create(schema, file). As I understand, the protobuf-avro should have a way to convert the protobuf schema to avro schema for you automatically. I have not found any utility class to do the schema conversion. Correct me if I am wrong.
Lan
> On Aug 24, 2015, at 3:14 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> Hiya Lan!
>
> You need to use a container file instead of just writing via the datum writer yourself.
>
> Take a look at the "Getting Started (Java)" section on serialization[1]. The example there uses the GenericDatumWriter, but you ought to be able to switch it out for your ProtobufDatumWriter.
>
>
>
>
> [1]: http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE <http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE>
>
> On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <ljiang2@gmail.com <ma...@gmail.com>> wrote:
> Hi, there
>
> I am trying to convert a protobuf object to Avro. I am using
>
> //myProto object is deserialized using google protobuf API
> ProtobufDatumWriter<MyProto> pbWriter = new ProtobufDatumWriter<MyProto>(MyProto.class);
> FileOutputStream fo = new FileOutputStream(args[0]);
> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
> pbWriter.write(myProto, e);
> fo.flush();
>
> The avro file was created successfully. If I cat the file, I can see the data in the file. However, when I tried to use avro-tools to get schema or meta info about the saved avro file, it says
>
> Exception in thread "main" java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>
> Look at the Avro source code, the error means it does not have the first 4 bytes matching the MAGIC first 4 bytes. I am trying to see if I have done anything wrong.
>
> Appreciate any help you can give me.
>
> Lan
>
>
>
> --
> Sean
Re: Converting Protobuf object to Avro
Posted by Sean Busbey <bu...@cloudera.com>.
Hiya Lan!
You need to use a container file instead of just writing via the datum
writer yourself.
Take a look at the "Getting Started (Java)" section on serialization[1].
The example there uses the GenericDatumWriter, but you ought to be able to
switch it out for your ProtobufDatumWriter.
[1]:
http://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE
On Mon, Aug 24, 2015 at 12:54 PM, Lan Jiang <lj...@gmail.com> wrote:
> Hi, there
>
> I am trying to convert a protobuf object to Avro. I am using
>
> //myProto object is deserialized using google protobuf API
> ProtobufDatumWriter<MyProto> pbWriter = new
> ProtobufDatumWriter<MyProto>(MyProto.class);
>
> FileOutputStream fo = new FileOutputStream(args[0]);
> Encoder e = EncoderFactory.get().binaryEncoder(fo, null);
> pbWriter.write(myProto, e);
> fo.flush();
>
> The avro file was created successfully. If I cat the file, I can see the
> data in the file. However, when I tried to use avro-tools to get schema or
> meta info about the saved avro file, it says
>
> Exception in thread "main" java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at
> org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
>
> Look at the Avro source code, the error means it does not have the first 4
> bytes matching the MAGIC first 4 bytes. I am trying to see if I have done
> anything wrong.
>
> Appreciate any help you can give me.
>
> Lan
>
--
Sean