You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Joeri Hermans (JIRA)" <ji...@apache.org> on 2016/04/11 16:12:25 UTC

[jira] [Created] (SQOOP-2906) Optimization of AvroUtil.toAvroIdentifier

Joeri Hermans created SQOOP-2906:
------------------------------------

             Summary: Optimization of AvroUtil.toAvroIdentifier
                 Key: SQOOP-2906
                 URL: https://issues.apache.org/jira/browse/SQOOP-2906
             Project: Sqoop
          Issue Type: Improvement
            Reporter: Joeri Hermans


Hi all

Our distributed profiler indicated some inefficiencies in the AvroUtil.toAvroIdentifier method, more specifically, the use of Regex patterns. This can be directly observed from the FlameGraph generated by this profiler (https://jhermans.web.cern.ch/jhermans/sqoop_avro_flamegraph.svg). We implemented an optimization, and compared this with the original method. On our testing machine, the optimization by itself is about 500% (on average) more efficient compared to the original implementation. We have yet to test how this optimization will influence the performance of user jobs.

Any suggestions or remarks are welcome.

Kind regards,


Joeri

https://github.com/apache/sqoop/pull/18



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)