You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2010/02/09 19:09:28 UTC

[jira] Commented: (AVRO-295) JsonEncoder is not flushed after writing using ReflectDatumWriter

    [ https://issues.apache.org/jira/browse/AVRO-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831568#action_12831568 ] 

Doug Cutting commented on AVRO-295:
-----------------------------------

I am not convinced that we should automatically flush after each object is serialized, as this may adversely affect performance when writing many small objects.  Mightn't it be better to add a flush() method to DatumWriter and leave control of when things are flushed to the application?

> JsonEncoder  is not flushed after writing using ReflectDatumWriter
> ------------------------------------------------------------------
>
>                 Key: AVRO-295
>                 URL: https://issues.apache.org/jira/browse/AVRO-295
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.3.0
>            Reporter: Jonathan Hsieh
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-295-test.patch, AVRO-295.patch
>
>
> JsonEncoder needs to be flushed otherwise data may be left in its buffers.  Ideally behavior should be the same regardless of what kind of Encoder passed in. Here is some example code: 
> {code}
> class  A { 
>   long timestamp;
> }
>   public void testEventSchemaSerializeBinary() throws IOException {
>     A e = new A();
>     e.timestamp = 1234;
>     ReflectData reflectData = ReflectData.get();
>     Schema schm = reflectData.getSchema(A.class);
>     System.out.println(schm);
>     ReflectDatumWriter writer = new ReflectDatumWriter(schm);
>     ByteArrayOutputStream out = new ByteArrayOutputStream();
>     Encoder json = new BinaryEncoder(out);
>     writer.write(e, json); // only one calls
>     byte[] bs = out.toByteArray();
>     int len = bs.length; // length is 2, which is reasonable.
>     System.out.println("output size: " + len);
>   }
> public void testSerializeJson() throws IOException {
>     A a = new A();
>     a.timestamp = 1234;
>     ReflectData reflectData = ReflectData.get();
>     Schema schm = reflectData.getSchema(A.class);
>     ReflectDatumWriter writer = new ReflectDatumWriter(schm);
>     ByteArrayOutputStream out = new ByteArrayOutputStream();
>     JsonEncoder json = new JsonEncoder(schm, out);
>     writer.write(e, json); /// only one call
>     // did not flush
>     byte[] bs = out.toByteArray();
>     int len = bs.length; // len == 0;  this is unexpected!
>     System.out.println("output size: " + len); 
>  
>     // flushed this time. this is a bit unwieldy
>     json.flush(); 
>     bs = out.toByteArray();
>     len = bs.length; // len == 18; this is better!
>     System.out.println("output size: " + len);
> }
> {code}
> One way to deal with it is to  have either all Encoders have flush method so the DatumWriter can always flush it, and potentially add a flush method to DatumWriter as well. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.