You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Vivek Nadkarni (JIRA)" <ji...@apache.org> on 2012/05/14 10:28:49 UTC

[jira] [Created] (AVRO-1089) Avro-C - Penalty 30x to 50x for using resolved writer on arrays

Vivek Nadkarni created AVRO-1089:
------------------------------------

             Summary: Avro-C - Penalty 30x to 50x for using resolved writer on arrays
                 Key: AVRO-1089
                 URL: https://issues.apache.org/jira/browse/AVRO-1089
             Project: Avro
          Issue Type: Bug
          Components: c
    Affects Versions: 1.6.3, 1.7.0
         Environment: Ubuntu Linux
            Reporter: Vivek Nadkarni
             Fix For: 1.7.0


The new performance tests created in AVRO-1088 show that using the
resolved writer takes 30 to 50 times longer than using no schema
resolution or using the resolved reader for simple and nested arrays.

For a simple array, using the resolved writer took ~30x longer than
using the memory reader that assumed a matching schema. For the nested
array, using the resolved writer took ~50x longer.

These results suggest that there is a bug in resolved writer. I do not
have a proposed fix at this time.


**** Running simple array matched schemas ****
  250000 tests per run
  Run 1
  Run 2
  Run 3
  Average time: 2.123s
  Tests/sec:    117739
**** Running simple array resolved writer ****
  10000 tests per run
  Run 1
  Run 2
  Run 3
  Average time: 2.747s
  Tests/sec:    3641


**** Running nested array matched schemas ****
  250000 tests per run
  Run 1
  Run 2
  Run 3
  Average time: 3.030s
  Tests/sec:    82508
**** Running nested array resolved writer ****
  10000 tests per run
  Run 1
  Run 2
  Run 3
  Average time: 6.650s
  Tests/sec:    1504



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (AVRO-1089) Avro-C - Penalty 30x to 50x for using resolved writer on arrays

Posted by "Vivek Nadkarni (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Nadkarni updated AVRO-1089:
---------------------------------

    Attachment: AVRO-1089-performance.png

This screenshot was generated using kcachegrind, after running the
performance test test_simple_array_resolved_writer(). The plot shows
that the majority of the time (97%) is spent in the function
avro_resolved_writer_free_elements() called by
avro_resolved_array_writer_reset(). This information suggests that the
bug lies in one of these two functions. Unfortunately, I still don't
have a mechanism or a fix for this issue. 


                
> Avro-C - Penalty 30x to 50x for using resolved writer on arrays
> ---------------------------------------------------------------
>
>                 Key: AVRO-1089
>                 URL: https://issues.apache.org/jira/browse/AVRO-1089
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.6.3, 1.7.0
>         Environment: Ubuntu Linux
>            Reporter: Vivek Nadkarni
>             Fix For: 1.7.0
>
>         Attachments: AVRO-1089-performance.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The new performance tests created in AVRO-1088 show that using the
> resolved writer takes 30 to 50 times longer than using no schema
> resolution or using the resolved reader for simple and nested arrays.
> For a simple array, using the resolved writer took ~30x longer than
> using the memory reader that assumed a matching schema. For the nested
> array, using the resolved writer took ~50x longer.
> These results suggest that there is a bug in resolved writer. I do not
> have a proposed fix at this time.
> **** Running simple array matched schemas ****
>   250000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 2.123s
>   Tests/sec:    117739
> **** Running simple array resolved writer ****
>   10000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 2.747s
>   Tests/sec:    3641
> **** Running nested array matched schemas ****
>   250000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 3.030s
>   Tests/sec:    82508
> **** Running nested array resolved writer ****
>   10000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 6.650s
>   Tests/sec:    1504

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (AVRO-1089) Avro-C - Penalty 30x to 50x for using resolved writer on arrays

Posted by "Douglas Creager (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Douglas Creager resolved AVRO-1089.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.3

Patch committed to SVN
                
> Avro-C - Penalty 30x to 50x for using resolved writer on arrays
> ---------------------------------------------------------------
>
>                 Key: AVRO-1089
>                 URL: https://issues.apache.org/jira/browse/AVRO-1089
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.6.3, 1.7.0
>         Environment: Ubuntu Linux
>            Reporter: Vivek Nadkarni
>             Fix For: 1.7.3
>
>         Attachments: 0001-AVRO-1089.-Fix-performance-penalty-for-array-resolve.patch, AVRO-1089-performance.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The new performance tests created in AVRO-1088 show that using the
> resolved writer takes 30 to 50 times longer than using no schema
> resolution or using the resolved reader for simple and nested arrays.
> For a simple array, using the resolved writer took ~30x longer than
> using the memory reader that assumed a matching schema. For the nested
> array, using the resolved writer took ~50x longer.
> These results suggest that there is a bug in resolved writer. I do not
> have a proposed fix at this time.
> **** Running simple array matched schemas ****
>   250000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 2.123s
>   Tests/sec:    117739
> **** Running simple array resolved writer ****
>   10000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 2.747s
>   Tests/sec:    3641
> **** Running nested array matched schemas ****
>   250000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 3.030s
>   Tests/sec:    82508
> **** Running nested array resolved writer ****
>   10000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 6.650s
>   Tests/sec:    1504

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1089) Avro-C - Penalty 30x to 50x for using resolved writer on arrays

Posted by "Douglas Creager (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Douglas Creager updated AVRO-1089:
----------------------------------

    Attachment: 0001-AVRO-1089.-Fix-performance-penalty-for-array-resolve.patch

Here's a one-liner patch that fixes this.  The problem was that an internal array wasn't being cleared, and was growing not just with the size of each test case, but with the number of test cases.  Iterating through that array was causing the slowdown.

All tests still pass; running time for the resolved array tests are now comparable with the non-resolved array tests.
                
> Avro-C - Penalty 30x to 50x for using resolved writer on arrays
> ---------------------------------------------------------------
>
>                 Key: AVRO-1089
>                 URL: https://issues.apache.org/jira/browse/AVRO-1089
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.6.3, 1.7.0
>         Environment: Ubuntu Linux
>            Reporter: Vivek Nadkarni
>         Attachments: 0001-AVRO-1089.-Fix-performance-penalty-for-array-resolve.patch, AVRO-1089-performance.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The new performance tests created in AVRO-1088 show that using the
> resolved writer takes 30 to 50 times longer than using no schema
> resolution or using the resolved reader for simple and nested arrays.
> For a simple array, using the resolved writer took ~30x longer than
> using the memory reader that assumed a matching schema. For the nested
> array, using the resolved writer took ~50x longer.
> These results suggest that there is a bug in resolved writer. I do not
> have a proposed fix at this time.
> **** Running simple array matched schemas ****
>   250000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 2.123s
>   Tests/sec:    117739
> **** Running simple array resolved writer ****
>   10000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 2.747s
>   Tests/sec:    3641
> **** Running nested array matched schemas ****
>   250000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 3.030s
>   Tests/sec:    82508
> **** Running nested array resolved writer ****
>   10000 tests per run
>   Run 1
>   Run 2
>   Run 3
>   Average time: 6.650s
>   Tests/sec:    1504

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira