You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "Dave Wright (JIRA)" <ji...@apache.org> on 2010/05/31 22:13:39 UTC

[jira] Created: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Poor performance for Reader::readBytes can be easily improved
-------------------------------------------------------------

                 Key: AVRO-556
                 URL: https://issues.apache.org/jira/browse/AVRO-556
             Project: Avro
          Issue Type: Improvement
          Components: c++
    Affects Versions: 1.3.2
         Environment: Linux
            Reporter: Dave Wright


The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
The code can easily be changed to simply do:
void readBytes(std::vector<uint8_t> &val) {
        int64_t size = readSize();        
       val.resize(size);
       in_.readBytes(&val[0], size);
}
..which will copy all the bytes in a single call.
(note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Posted by "Dave Wright (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Wright updated AVRO-556:
-----------------------------

    Description: 
The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
The code can easily be changed to simply do:
{{void readBytes(std::vector<uint8_t> &val) {

        int64_t size = readSize();        

       val.resize(size);

       in_.readBytes(&val[0], size);

}}}
..which will copy all the bytes in a single call.
(note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.

The same optimization can easily be applied to readFixed(uint8_t *val...) as well.



  was:
The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
The code can easily be changed to simply do:
void readBytes(std::vector<uint8_t> &val) {
        int64_t size = readSize();        
       val.resize(size);
       in_.readBytes(&val[0], size);
}
..which will copy all the bytes in a single call.
(note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.






> Poor performance for Reader::readBytes can be easily improved
> -------------------------------------------------------------
>
>                 Key: AVRO-556
>                 URL: https://issues.apache.org/jira/browse/AVRO-556
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.3.2
>         Environment: Linux
>            Reporter: Dave Wright
>
> The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
> The code can easily be changed to simply do:
> {{void readBytes(std::vector<uint8_t> &val) {
>         int64_t size = readSize();        
>        val.resize(size);
>        in_.readBytes(&val[0], size);
> }}}
> ..which will copy all the bytes in a single call.
> (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).
> In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.
> The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Posted by "Scott Banachowski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Banachowski updated AVRO-556:
-----------------------------------

    Status: Patch Available  (was: Open)

> Poor performance for Reader::readBytes can be easily improved
> -------------------------------------------------------------
>
>                 Key: AVRO-556
>                 URL: https://issues.apache.org/jira/browse/AVRO-556
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.3.2
>         Environment: Linux
>            Reporter: Dave Wright
>         Attachments: AVRO-556.patch
>
>
> The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
> The code can easily be changed to simply do:
> {noformat}
> void readBytes(std::vector<uint8_t> &val) {
>         int64_t size = readSize(); 
>        val.resize(size);
>        in_.readBytes(&val[0], size);
> }
> {noformat}
> ..which will copy all the bytes in a single call.
> (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).
> In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.
> The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Posted by "Dave Wright (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Wright updated AVRO-556:
-----------------------------

    Description: 
The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
The code can easily be changed to simply do:
{noformat}
void readBytes(std::vector<uint8_t> &val) {
        int64_t size = readSize(); 
       val.resize(size);
       in_.readBytes(&val[0], size);
}
{noformat}
..which will copy all the bytes in a single call.
(note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.

The same optimization can easily be applied to readFixed(uint8_t *val...) as well.



  was:
The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
The code can easily be changed to simply do:
{{void readBytes(std::vector<uint8_t> &val) {\\
        int64_t size = readSize(); \\
       val.resize(size);\\
       in_.readBytes(&val[0], size);\\
}}}\\
..which will copy all the bytes in a single call.
(note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.

The same optimization can easily be applied to readFixed(uint8_t *val...) as well.




> Poor performance for Reader::readBytes can be easily improved
> -------------------------------------------------------------
>
>                 Key: AVRO-556
>                 URL: https://issues.apache.org/jira/browse/AVRO-556
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.3.2
>         Environment: Linux
>            Reporter: Dave Wright
>
> The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
> The code can easily be changed to simply do:
> {noformat}
> void readBytes(std::vector<uint8_t> &val) {
>         int64_t size = readSize(); 
>        val.resize(size);
>        in_.readBytes(&val[0], size);
> }
> {noformat}
> ..which will copy all the bytes in a single call.
> (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).
> In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.
> The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Posted by "Scott Banachowski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Banachowski updated AVRO-556:
-----------------------------------

    Attachment: AVRO-556.patch

Patch file.

> Poor performance for Reader::readBytes can be easily improved
> -------------------------------------------------------------
>
>                 Key: AVRO-556
>                 URL: https://issues.apache.org/jira/browse/AVRO-556
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.3.2
>         Environment: Linux
>            Reporter: Dave Wright
>         Attachments: AVRO-556.patch
>
>
> The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
> The code can easily be changed to simply do:
> {noformat}
> void readBytes(std::vector<uint8_t> &val) {
>         int64_t size = readSize(); 
>        val.resize(size);
>        in_.readBytes(&val[0], size);
> }
> {noformat}
> ..which will copy all the bytes in a single call.
> (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).
> In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.
> The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Posted by "Dave Wright (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Wright updated AVRO-556:
-----------------------------

    Description: 
The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
The code can easily be changed to simply do:
{{void readBytes(std::vector<uint8_t> &val) {\\
        int64_t size = readSize(); \\
       val.resize(size);\\
       in_.readBytes(&val[0], size);\\
}}}\\
..which will copy all the bytes in a single call.
(note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.

The same optimization can easily be applied to readFixed(uint8_t *val...) as well.



  was:
The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
The code can easily be changed to simply do:
{{void readBytes(std::vector<uint8_t> &val) {

        int64_t size = readSize();        

       val.resize(size);

       in_.readBytes(&val[0], size);

}}}
..which will copy all the bytes in a single call.
(note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).

In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.

The same optimization can easily be applied to readFixed(uint8_t *val...) as well.




> Poor performance for Reader::readBytes can be easily improved
> -------------------------------------------------------------
>
>                 Key: AVRO-556
>                 URL: https://issues.apache.org/jira/browse/AVRO-556
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.3.2
>         Environment: Linux
>            Reporter: Dave Wright
>
> The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
> The code can easily be changed to simply do:
> {{void readBytes(std::vector<uint8_t> &val) {\\
>         int64_t size = readSize(); \\
>        val.resize(size);\\
>        in_.readBytes(&val[0], size);\\
> }}}\\
> ..which will copy all the bytes in a single call.
> (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).
> In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.
> The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Posted by "Scott Banachowski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/AVRO-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874443#action_12874443 ] 

Scott Banachowski commented on AVRO-556:
----------------------------------------

Thanks for the suggestion and tests, I will commit your changes for readBytes/Fixed soon.


> Poor performance for Reader::readBytes can be easily improved
> -------------------------------------------------------------
>
>                 Key: AVRO-556
>                 URL: https://issues.apache.org/jira/browse/AVRO-556
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.3.2
>         Environment: Linux
>            Reporter: Dave Wright
>
> The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
> The code can easily be changed to simply do:
> {noformat}
> void readBytes(std::vector<uint8_t> &val) {
>         int64_t size = readSize(); 
>        val.resize(size);
>        in_.readBytes(&val[0], size);
> }
> {noformat}
> ..which will copy all the bytes in a single call.
> (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).
> In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.
> The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AVRO-556) Poor performance for Reader::readBytes can be easily improved

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/AVRO-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Hammerbacher updated AVRO-556:
-----------------------------------

           Status: Resolved  (was: Patch Available)
    Fix Version/s: 1.3.3
       Resolution: Fixed

Committed with revision 951653 by Scott and merged into branch-1.3 with revision 951751.

> Poor performance for Reader::readBytes can be easily improved
> -------------------------------------------------------------
>
>                 Key: AVRO-556
>                 URL: https://issues.apache.org/jira/browse/AVRO-556
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.3.2
>         Environment: Linux
>            Reporter: Dave Wright
>             Fix For: 1.3.3
>
>         Attachments: AVRO-556.patch
>
>
> The default implementation of Reader::readBytes on 1.3.2 reads bytes into the result vector one-byte-at-a-time. For large byte arrays (~500k or so), this is horrendously slow. 
> The code can easily be changed to simply do:
> {noformat}
> void readBytes(std::vector<uint8_t> &val) {
>         int64_t size = readSize(); 
>        val.resize(size);
>        in_.readBytes(&val[0], size);
> }
> {noformat}
> ..which will copy all the bytes in a single call.
> (note: it appears this function has been changed in the trunk, but it still copies byte-by-byte, so the optimization would still apply).
> In my testing of serializing/deserializing a message with a 500k byte field in it 1000 times, execution time dropped from from 30+sec to 0.2sec with this optimization.
> The same optimization can easily be applied to readFixed(uint8_t *val...) as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.