You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Douglas Kaminsky (Created) (JIRA)" <ji...@apache.org> on 2011/12/20 22:49:31 UTC

[jira] [Created] (AVRO-985) Remove byte-by-byte copying in RecordBuilderBase.defaultValue

Remove byte-by-byte copying in RecordBuilderBase.defaultValue
-------------------------------------------------------------

                 Key: AVRO-985
                 URL: https://issues.apache.org/jira/browse/AVRO-985
             Project: Avro
          Issue Type: Improvement
            Reporter: Douglas Kaminsky


In one section of RecordBuilderBase.defaultValue(Field) (quoted below) a bytewise copy of the default object is created based on the JSON value provided. However, this is an extremely inefficient operation and causes large slowdowns when building large object sets, including latency spikes when the binary encoder flushes. 

A simple workaround for a majority of cases would be to have a separate code path for "primitives" (fixed, string, boolean, int, double, enum, float, bytes) that allows direct creation rather than a full bytewise copy (and subsequent deep copy).

*_RecordBuilderBase.java_*:

{code}
    // If not cached, get the default Java value by encoding the default JSON
    // value and then decoding it:
    if (defaultValue == null) {
      ByteArrayOutputStream baos = new ByteArrayOutputStream();
      encoder = EncoderFactory.get().binaryEncoder(baos, encoder);
      ResolvingGrammarGenerator.encode(
          encoder, field.schema(), defaultJsonValue);
      encoder.flush();
      decoder = DecoderFactory.get().binaryDecoder(
          baos.toByteArray(), decoder);
      defaultValue = new GenericDatumReader(
          field.schema()).read(null, decoder);
      defaultSchemaValues.putIfAbsent(field.pos(), defaultValue);
    }
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-985) Remove byte-by-byte copying in RecordBuilderBase.defaultValue for non-complex types

Posted by "Douglas Kaminsky (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Douglas Kaminsky updated AVRO-985:
----------------------------------

    Summary: Remove byte-by-byte copying in RecordBuilderBase.defaultValue for non-complex types  (was: Remove byte-by-byte copying in RecordBuilderBase.defaultValue)
    
> Remove byte-by-byte copying in RecordBuilderBase.defaultValue for non-complex types
> -----------------------------------------------------------------------------------
>
>                 Key: AVRO-985
>                 URL: https://issues.apache.org/jira/browse/AVRO-985
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.6.1
>            Reporter: Douglas Kaminsky
>
> In one section of RecordBuilderBase.defaultValue(Field) (quoted below) a bytewise copy of the default object is created based on the JSON value provided. However, this is an extremely inefficient operation and causes large slowdowns when building large object sets, including latency spikes when the binary encoder flushes. 
> A simple workaround for a majority of cases would be to have a separate code path for "primitives" (fixed, string, boolean, int, double, enum, float, bytes) that allows direct creation rather than a full bytewise copy (and subsequent deep copy).
> *_RecordBuilderBase.java_*:
> {code}
>     // If not cached, get the default Java value by encoding the default JSON
>     // value and then decoding it:
>     if (defaultValue == null) {
>       ByteArrayOutputStream baos = new ByteArrayOutputStream();
>       encoder = EncoderFactory.get().binaryEncoder(baos, encoder);
>       ResolvingGrammarGenerator.encode(
>           encoder, field.schema(), defaultJsonValue);
>       encoder.flush();
>       decoder = DecoderFactory.get().binaryDecoder(
>           baos.toByteArray(), decoder);
>       defaultValue = new GenericDatumReader(
>           field.schema()).read(null, decoder);
>       defaultSchemaValues.putIfAbsent(field.pos(), defaultValue);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (AVRO-985) Remove byte-by-byte copying in RecordBuilderBase.defaultValue for non-complex types

Posted by "Scott Carey (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180913#comment-13180913 ] 

Scott Carey commented on AVRO-985:
----------------------------------

This is a big performance problem, We could use some sort of Avro Value API (perhaps simply GenericData and friends, in immutable form) instead of the JSON objecs to represent default values.  Then, when serializing the serialized bytes can be cached instead and output, and when deserializing the immutable Generic object can be substituted.
                
> Remove byte-by-byte copying in RecordBuilderBase.defaultValue for non-complex types
> -----------------------------------------------------------------------------------
>
>                 Key: AVRO-985
>                 URL: https://issues.apache.org/jira/browse/AVRO-985
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.6.1
>            Reporter: Douglas Kaminsky
>
> In one section of RecordBuilderBase.defaultValue(Field) (quoted below) a bytewise copy of the default object is created based on the JSON value provided. However, this is an extremely inefficient operation and causes large slowdowns when building large object sets, including latency spikes when the binary encoder flushes. 
> A simple workaround for a majority of cases would be to have a separate code path for "primitives" (fixed, string, boolean, int, double, enum, float, bytes) that allows direct creation rather than a full bytewise copy (and subsequent deep copy).
> *_RecordBuilderBase.java_*:
> {code}
>     // If not cached, get the default Java value by encoding the default JSON
>     // value and then decoding it:
>     if (defaultValue == null) {
>       ByteArrayOutputStream baos = new ByteArrayOutputStream();
>       encoder = EncoderFactory.get().binaryEncoder(baos, encoder);
>       ResolvingGrammarGenerator.encode(
>           encoder, field.schema(), defaultJsonValue);
>       encoder.flush();
>       decoder = DecoderFactory.get().binaryDecoder(
>           baos.toByteArray(), decoder);
>       defaultValue = new GenericDatumReader(
>           field.schema()).read(null, decoder);
>       defaultSchemaValues.putIfAbsent(field.pos(), defaultValue);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-985) Remove byte-by-byte copying in RecordBuilderBase.defaultValue

Posted by "Douglas Kaminsky (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Douglas Kaminsky updated AVRO-985:
----------------------------------

    Affects Version/s: 1.6.1
    
> Remove byte-by-byte copying in RecordBuilderBase.defaultValue
> -------------------------------------------------------------
>
>                 Key: AVRO-985
>                 URL: https://issues.apache.org/jira/browse/AVRO-985
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.6.1
>            Reporter: Douglas Kaminsky
>
> In one section of RecordBuilderBase.defaultValue(Field) (quoted below) a bytewise copy of the default object is created based on the JSON value provided. However, this is an extremely inefficient operation and causes large slowdowns when building large object sets, including latency spikes when the binary encoder flushes. 
> A simple workaround for a majority of cases would be to have a separate code path for "primitives" (fixed, string, boolean, int, double, enum, float, bytes) that allows direct creation rather than a full bytewise copy (and subsequent deep copy).
> *_RecordBuilderBase.java_*:
> {code}
>     // If not cached, get the default Java value by encoding the default JSON
>     // value and then decoding it:
>     if (defaultValue == null) {
>       ByteArrayOutputStream baos = new ByteArrayOutputStream();
>       encoder = EncoderFactory.get().binaryEncoder(baos, encoder);
>       ResolvingGrammarGenerator.encode(
>           encoder, field.schema(), defaultJsonValue);
>       encoder.flush();
>       decoder = DecoderFactory.get().binaryDecoder(
>           baos.toByteArray(), decoder);
>       defaultValue = new GenericDatumReader(
>           field.schema()).read(null, decoder);
>       defaultSchemaValues.putIfAbsent(field.pos(), defaultValue);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-985) Remove byte-by-byte copying in RecordBuilderBase.defaultValue for non-complex types

Posted by "Douglas Kaminsky (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Douglas Kaminsky updated AVRO-985:
----------------------------------

    Component/s: java
    
> Remove byte-by-byte copying in RecordBuilderBase.defaultValue for non-complex types
> -----------------------------------------------------------------------------------
>
>                 Key: AVRO-985
>                 URL: https://issues.apache.org/jira/browse/AVRO-985
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.6.1
>            Reporter: Douglas Kaminsky
>
> In one section of RecordBuilderBase.defaultValue(Field) (quoted below) a bytewise copy of the default object is created based on the JSON value provided. However, this is an extremely inefficient operation and causes large slowdowns when building large object sets, including latency spikes when the binary encoder flushes. 
> A simple workaround for a majority of cases would be to have a separate code path for "primitives" (fixed, string, boolean, int, double, enum, float, bytes) that allows direct creation rather than a full bytewise copy (and subsequent deep copy).
> *_RecordBuilderBase.java_*:
> {code}
>     // If not cached, get the default Java value by encoding the default JSON
>     // value and then decoding it:
>     if (defaultValue == null) {
>       ByteArrayOutputStream baos = new ByteArrayOutputStream();
>       encoder = EncoderFactory.get().binaryEncoder(baos, encoder);
>       ResolvingGrammarGenerator.encode(
>           encoder, field.schema(), defaultJsonValue);
>       encoder.flush();
>       decoder = DecoderFactory.get().binaryDecoder(
>           baos.toByteArray(), decoder);
>       defaultValue = new GenericDatumReader(
>           field.schema()).read(null, decoder);
>       defaultSchemaValues.putIfAbsent(field.pos(), defaultValue);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira