You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2009/12/21 14:18:32 UTC

DO NOT REPLY [Bug 48425] New: DateUtil.isCellDateFormatted() method is slow

https://issues.apache.org/bugzilla/show_bug.cgi?id=48425

           Summary: DateUtil.isCellDateFormatted() method is slow
           Product: POI
           Version: 3.6
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POI Overall
        AssignedTo: dev@poi.apache.org
        ReportedBy: jan.stette@gmail.com


I have done some performance testing for code reading data from large
spreadsheets using POI.  In this use case, I found that half of the CPU time
was spent in a single method in POI: DateUtil.isCellDateFormatted(cell).  We
call this method every time we extract a value from a cell in order to
correctly create Date objects when cells contain dates.

Looking at this method, it spends most of its time in DateUtil.isADateFormat().
 This method is very slow, as it performs seven regular expression
substitutions on the formatString parameter and one additional regex match. 
None of the regexes are precompiled, so they're all compiled on every call to
this method.

I would suggest replacing the first five regexes with calls to a string
substitution method that doesn't require regexes, as they are simple
replacements.  For the remaining three regexes, I would suggest precompiling
them instead of just calling String.replaceAll() and String.matches().

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 48425] DateUtil.isCellDateFormatted() method is slow

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48425

--- Comment #2 from Jan <ja...@gmail.com> 2009-12-22 02:12:24 UTC ---
Great, thanks for the very quick response!

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 48425] DateUtil.isCellDateFormatted() method is slow

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48425

Yegor Kozlov <ye...@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #1 from Yegor Kozlov <ye...@dinom.ru> 2009-12-22 00:03:45 UTC ---
A good catch, thanks. 

As you suggested, I replaced the first five regexes with a loop collecting
characters into a buffer. The remaining three regexes are pre-compiled at class
initialization time. 

In my benchmark I measured the number of calls to
DateUtil.isCellDateFormatted() made in ten seconds. The reworked code is
significantly faster: the throughput is at least five times greater. 

I committed the fix in r893105. 

Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org