You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Sam Poole <po...@bah.com> on 2011/08/08 15:53:19 UTC

Java Example of writing a union

Does anybody have an example of writing a file that uses a union schema?  I
am having problems trying to write a file that uses a union schema because
once I set the schema, I can't add an individual datum because it is not
part of a union.  



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Java-Example-of-writing-a-union-tp3235624p3235624.html
Sent from the Avro - Users mailing list archive at Nabble.com.

RE: Java Example of writing a union

Posted by "Poole, Samuel [USA]" <po...@bah.com>.
Figured out the read problem.  Of course I had an if statement where I need a while. :)





________________________________
From: Poole, Samuel [USA] [poole_samuel@bah.com]
Sent: Monday, August 08, 2011 3:56 PM
To: user@avro.apache.org
Subject: RE: Java Example of writing a union


Thank you very much. Yes, this works good. And then I took it one step further to try and get the schema put in the file and also to apply encoding.





FOO fooObj = ....
BAR barObj = ....
BAR barObj2 = ....
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<SpecificRecord> writer = new SpecificDatumWriter<SpecificRecord>(yourSchema);

        DataFileWriter filewriter=new DataFileWriter(writer);

        CodecFactory codec = CodecFactory.deflateCodec(9);

        filewriter.setCodec(codec);



        filewriter.create(yourSchema,out);



        encoder = EncoderFactory.get().binaryEncoder(out, encoder);



        filewriter.append(fooObj);

        filewriter.append(barObj);

        filewriter.append(barObj2);



        OutputStream outstream=new FileOutputStream("/somefolder/somefile.avro");

        out.writeTo(outstream);




this code works, but now I have an issue with reading the file....

When I read the file, I can only see the first datum in the union.  I know that all of the datums were written to the file because of the size of the file, but I can't read all of the datums.

Here is my code to read the union file.


Schema yourSchema=Schema.parse(new File("/somefolder/someschema.avro"));

DatumReader<SpecificRecord> datumreader=new SpecificDatumReader<SpecificRecord>(yourSchema);

DataFileReader reader=new DataFileReader(new File("/somefolder/somefile.avro"),datumreader);



while (reader.hasNext()){

    SpecificRecord result=(SpecificRecord) reader.next();

    System.out.println(result.getClass());

}



Not sure if I have a problem with how I created the file or how I am reading the file....

Any ideas?



________________________________

From: Vyacheslav Zholudev [vyacheslav.zholudev@gmail.com]
Sent: Monday, August 08, 2011 12:52 PM
To: user@avro.apache.org
Subject: Re: Java Example of writing a union

I'm assuming for now that you are using a specific writer and you have a union schema with two records FOO and BAR (you should get two classes FOO and BAR generated by avro tools):

FOO fooObj = ....
BAR barObj = ....
BAR barObj2 = ....
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<GenericRecord> writer = new SpecificDatumWriter<Record>(yourSchema);
        encoder = EncoderFactory.get().binaryEncoder(out, encoder);
        writer.write(fooObj, encoder);
        writer.write(barObj, encoder);
        writer.write(barObj2, encoder);
        encoder.flush();
        out.close();

Does it make sense?

Vyacheslav

On Aug 8, 2011, at 3:53 PM, Sam Poole wrote:

Does anybody have an example of writing a file that uses a union schema?  I
am having problems trying to write a file that uses a union schema because
once I set the schema, I can't add an individual datum because it is not
part of a union.



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Java-Example-of-writing-a-union-tp3235624p3235624.html
Sent from the Avro - Users mailing list archive at Nabble.com<http://Nabble.com>.


Re: Java Example of writing a union

Posted by Scott Carey <sc...@apache.org>.
FYI, deflateCodec(9) rarely improves compression over level 6, but is much
slower to write. 

Also, unless you increase the block size in the file to over 256KB it
probably won't improve it at all.  The primary thing that larger
deflate/gzip compression levels do is increase the size of the lookback
window for finding duplicate segments.

In short, with your actual data, try different compression levels and buffer
sizes and see what works best for you.   The best choice is almost never
compression level 9.

I often end up with compression level 3 or 1 when I need the speed, and
level 6 or 7 with larger blocks for 'archival' use.
A useful link comparing speed to compression ratio for gzip (gzip is deflate
with a different header and crc) is:
http://tukaani.org/lzma/benchmarks.html

As you can see, compression level 9 is typically 2 to 3 times slower than
level 6 and only a tiny fraction better compression ratio.

On 8/8/11 12:56 PM, "Poole, Samuel [USA]" <po...@bah.com> wrote:

> Thank you very much. Yes, this works good. And then I took it one step further
> to try and get the schema put in the file and also to apply encoding.
> 
>  
> 
>  
> 
> FOO fooObj = ....
> BAR barObj = ....
> BAR barObj2 = ....
>         ByteArrayOutputStream out = new ByteArrayOutputStream();
>         DatumWriter<SpecificRecord> writer = new
> SpecificDatumWriter<SpecificRecord>(yourSchema);
>         
>         DataFileWriter filewriter=new DataFileWriter(writer);
>         CodecFactory codec = CodecFactory.deflateCodec(9);
> 
>         filewriter.setCodec(codec);
> 
>  
> 
>         filewriter.create(yourSchema,out);
> 
>         
> 
>         encoder = EncoderFactory.get().binaryEncoder(out, encoder);
> 
>  
> 
>         filewriter.append(fooObj);
> 
>         filewriter.append(barObj);
> 
>         filewriter.append(barObj2);
> 
>  
> 
>         OutputStream outstream=new
> FileOutputStream("/somefolder/somefile.avro");
> 
>         out.writeTo(outstream);
> 
>  
>  
>  
> this code works, but now I have an issue with reading the file....
>  
> When I read the file, I can only see the first datum in the union.  I know
> that all of the datums were written to the file because of the size of the
> file, but I can't read all of the datums.
>  
> Here is my code to read the union file.
>  
> Schema yourSchema=Schema.parse(new File("/somefolder/someschema.avro"));
> 
> DatumReader<SpecificRecord> datumreader=new
> SpecificDatumReader<SpecificRecord>(yourSchema);
> 
> DataFileReader reader=new DataFileReader(new
> File("/somefolder/somefile.avro"),datumreader);
> 
>  
> 
> if(reader.hasNext()){
> 
>     SpecificRecord result=(SpecificRecord) reader.next();
> 
>     System.out.println(result.getClass());
> 
> }
> 
>  
>  
> Not sure if I have a problem with how I created the file or how I am reading
> the file....
>  
> Any ideas?
>  
>  
> 
> From: Vyacheslav Zholudev [vyacheslav.zholudev@gmail.com]
> Sent: Monday, August 08, 2011 12:52 PM
> To: user@avro.apache.org
> Subject: Re: Java Example of writing a union
> 
> I'm assuming for now that you are using a specific writer and you have a union
> schema with two records FOO and BAR (you should get two classes FOO and BAR
> generated by avro tools):
> 
> FOO fooObj = ....
> BAR barObj = ....
> BAR barObj2 = ....
>         ByteArrayOutputStream out = new ByteArrayOutputStream();
>         DatumWriter<GenericRecord> writer = new
> SpecificDatumWriter<Record>(yourSchema);
>         encoder = EncoderFactory.get().binaryEncoder(out, encoder);
>         writer.write(fooObj, encoder);
>         writer.write(barObj, encoder);
>         writer.write(barObj2, encoder);
>         encoder.flush();
>         out.close();
> 
> Does it make sense?
> 
> Vyacheslav
> 
> On Aug 8, 2011, at 3:53 PM, Sam Poole wrote:
> 
>> Does anybody have an example of writing a file that uses a union schema?  I
>> am having problems trying to write a file that uses a union schema because
>> once I set the schema, I can't add an individual datum because it is not
>> part of a union.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-avro.679487.n3.nabble.com/Java-Example-of-writing-a-union-tp323
>> 5624p3235624.html
>> Sent from the Avro - Users mailing list archive at Nabble.com
>> <http://Nabble.com> .
> 



RE: Java Example of writing a union

Posted by "Poole, Samuel [USA]" <po...@bah.com>.
Thank you very much. Yes, this works good. And then I took it one step further to try and get the schema put in the file and also to apply encoding.





FOO fooObj = ....
BAR barObj = ....
BAR barObj2 = ....
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<SpecificRecord> writer = new SpecificDatumWriter<SpecificRecord>(yourSchema);

        DataFileWriter filewriter=new DataFileWriter(writer);

        CodecFactory codec = CodecFactory.deflateCodec(9);

        filewriter.setCodec(codec);



        filewriter.create(yourSchema,out);



        encoder = EncoderFactory.get().binaryEncoder(out, encoder);



        filewriter.append(fooObj);

        filewriter.append(barObj);

        filewriter.append(barObj2);



        OutputStream outstream=new FileOutputStream("/somefolder/somefile.avro");

        out.writeTo(outstream);




this code works, but now I have an issue with reading the file....

When I read the file, I can only see the first datum in the union.  I know that all of the datums were written to the file because of the size of the file, but I can't read all of the datums.

Here is my code to read the union file.


Schema yourSchema=Schema.parse(new File("/somefolder/someschema.avro"));

DatumReader<SpecificRecord> datumreader=new SpecificDatumReader<SpecificRecord>(yourSchema);

DataFileReader reader=new DataFileReader(new File("/somefolder/somefile.avro"),datumreader);



if(reader.hasNext()){

    SpecificRecord result=(SpecificRecord) reader.next();

    System.out.println(result.getClass());

}



Not sure if I have a problem with how I created the file or how I am reading the file....

Any ideas?



________________________________

From: Vyacheslav Zholudev [vyacheslav.zholudev@gmail.com]
Sent: Monday, August 08, 2011 12:52 PM
To: user@avro.apache.org
Subject: Re: Java Example of writing a union

I'm assuming for now that you are using a specific writer and you have a union schema with two records FOO and BAR (you should get two classes FOO and BAR generated by avro tools):

FOO fooObj = ....
BAR barObj = ....
BAR barObj2 = ....
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<GenericRecord> writer = new SpecificDatumWriter<Record>(yourSchema);
        encoder = EncoderFactory.get().binaryEncoder(out, encoder);
        writer.write(fooObj, encoder);
        writer.write(barObj, encoder);
        writer.write(barObj2, encoder);
        encoder.flush();
        out.close();

Does it make sense?

Vyacheslav

On Aug 8, 2011, at 3:53 PM, Sam Poole wrote:

Does anybody have an example of writing a file that uses a union schema?  I
am having problems trying to write a file that uses a union schema because
once I set the schema, I can't add an individual datum because it is not
part of a union.



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Java-Example-of-writing-a-union-tp3235624p3235624.html
Sent from the Avro - Users mailing list archive at Nabble.com<http://Nabble.com>.


Re: Java Example of writing a union

Posted by Vyacheslav Zholudev <vy...@gmail.com>.
I'm assuming for now that you are using a specific writer and you have a union schema with two records FOO and BAR (you should get two classes FOO and BAR generated by avro tools):

	FOO fooObj = ....
	BAR barObj = ....
	BAR barObj2 = ....
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<GenericRecord> writer = new SpecificDatumWriter<Record>(yourSchema);
        encoder = EncoderFactory.get().binaryEncoder(out, encoder);
        writer.write(fooObj, encoder);
        writer.write(barObj, encoder);
        writer.write(barObj2, encoder);
        encoder.flush();
        out.close();

Does it make sense?

Vyacheslav

On Aug 8, 2011, at 3:53 PM, Sam Poole wrote:

> Does anybody have an example of writing a file that uses a union schema?  I
> am having problems trying to write a file that uses a union schema because
> once I set the schema, I can't add an individual datum because it is not
> part of a union.  
> 
> 
> 
> --
> View this message in context: http://apache-avro.679487.n3.nabble.com/Java-Example-of-writing-a-union-tp3235624p3235624.html
> Sent from the Avro - Users mailing list archive at Nabble.com.