You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2009/12/21 14:18:32 UTC
DO NOT REPLY [Bug 48425] New: DateUtil.isCellDateFormatted() method
is slow
https://issues.apache.org/bugzilla/show_bug.cgi?id=48425
Summary: DateUtil.isCellDateFormatted() method is slow
Product: POI
Version: 3.6
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: POI Overall
AssignedTo: dev@poi.apache.org
ReportedBy: jan.stette@gmail.com
I have done some performance testing for code reading data from large
spreadsheets using POI. In this use case, I found that half of the CPU time
was spent in a single method in POI: DateUtil.isCellDateFormatted(cell). We
call this method every time we extract a value from a cell in order to
correctly create Date objects when cells contain dates.
Looking at this method, it spends most of its time in DateUtil.isADateFormat().
This method is very slow, as it performs seven regular expression
substitutions on the formatString parameter and one additional regex match.
None of the regexes are precompiled, so they're all compiled on every call to
this method.
I would suggest replacing the first five regexes with calls to a string
substitution method that doesn't require regexes, as they are simple
replacements. For the remaining three regexes, I would suggest precompiling
them instead of just calling String.replaceAll() and String.matches().
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 48425] DateUtil.isCellDateFormatted() method is
slow
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48425
--- Comment #2 from Jan <ja...@gmail.com> 2009-12-22 02:12:24 UTC ---
Great, thanks for the very quick response!
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 48425] DateUtil.isCellDateFormatted() method is
slow
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48425
Yegor Kozlov <ye...@dinom.ru> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #1 from Yegor Kozlov <ye...@dinom.ru> 2009-12-22 00:03:45 UTC ---
A good catch, thanks.
As you suggested, I replaced the first five regexes with a loop collecting
characters into a buffer. The remaining three regexes are pre-compiled at class
initialization time.
In my benchmark I measured the number of calls to
DateUtil.isCellDateFormatted() made in ten seconds. The reworked code is
significantly faster: the throughput is at least five times greater.
I committed the fix in r893105.
Yegor
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org