You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary D. Gregory (Jira)" <ji...@apache.org> on 2021/07/12 22:32:00 UTC
[jira] [Comment Edited] (CSV-277) Review Lexer simpleToken for
Performance
[ https://issues.apache.org/jira/browse/CSV-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379427#comment-17379427 ]
Gary D. Gregory edited comment on CSV-277 at 7/12/21, 10:31 PM:
----------------------------------------------------------------
You can try extending Commons IO's {{UnsynchronizedByteArrayInputStream}} to see if it makes a difference.
was (Author: garydgregory):
You can try extending Commons IO's
unsynchronized version of the reader to see if it makes a difference.
> Review Lexer simpleToken for Performance
> ----------------------------------------
>
> Key: CSV-277
> URL: https://issues.apache.org/jira/browse/CSV-277
> Project: Commons CSV
> Issue Type: Improvement
> Reporter: David Mollitor
> Priority: Major
> Attachments: CSVCapture.PNG
>
>
> Running the Apache ORC benchmarks which has {{commons-csv}} as a dependency and noticed the bulk of running time is in {{commons-csv}}.
> I attached the VisualVM output and here is my test setup:
> {code:none}
> JVM: OpenJDK 64-Bit Server VM (25.292-b10, mixed mode)
> Java: version 1.8.0_292, vendor Private Build
> Java Home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> JVM Flags: <none>
> {code}
> I suspect this is in part because {{ExtendedBufferedReader}} extends {{BufferedReader}}. {{BufferedReader}} is a synchronized method class which means that every call to {{read}} requires synchronization. Usually it's not an issue, but for {{commons-csv}}, it adds a lot of overhead because it reads each byte one-at-a-time. So even though it's buffered, it has to go through a synchronization processes for each byte read. It also has to perform a "jump" into the parent class for each byte.
> Nothing else stands out to me as being "slow."
--
This message was sent by Atlassian Jira
(v8.3.4#803005)