You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Andrzej Bialecki (Created) (JIRA)" <ji...@apache.org> on 2012/02/17 02:13:59 UTC
[jira] [Created] (TIKA-864) Metadata.formatDate should use
ThreadLocal
Metadata.formatDate should use ThreadLocal
------------------------------------------
Key: TIKA-864
URL: https://issues.apache.org/jira/browse/TIKA-864
Project: Tika
Issue Type: Improvement
Components: metadata
Reporter: Andrzej Bialecki
Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TIKA-864) Metadata.formatDate causes
blocking in concurrent use
Posted by "Jukka Zitting (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210313#comment-13210313 ]
Jukka Zitting edited comment on TIKA-864 at 2/17/12 3:10 PM:
-------------------------------------------------------------
bq. perhaps the best solution here is just to add a bit of custom code that formats the requested string directly
Done in revision 1245600.
was (Author: jukkaz):
bq. perhaps the best solution here is
just to add a bit of custom code that formats the requested string directly
Done in revision 1245600.
> Metadata.formatDate causes blocking in concurrent use
> -----------------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
> Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (TIKA-864) Metadata.formatDate causes blocking in
concurrent use
Posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-864.
--------------------------------
Resolution: Fixed
Fix Version/s: 1.1
Assignee: Jukka Zitting
bq. perhaps the best solution here is
just to add a bit of custom code that formats the requested string directly
Done in revision 1245600.
> Metadata.formatDate causes blocking in concurrent use
> -----------------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
> Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-864) Metadata.formatDate should use
ThreadLocal
Posted by "Jukka Zitting (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210240#comment-13210240 ]
Jukka Zitting commented on TIKA-864:
------------------------------------
Like in TIKA-865, is this a real measurable performance bottleneck? If not, I suggest we keep the code as is.
> Metadata.formatDate should use ThreadLocal
> ------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-864) Metadata.formatDate should use
ThreadLocal
Posted by "Andrzej Bialecki (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210238#comment-13210238 ]
Andrzej Bialecki commented on TIKA-864:
----------------------------------------
Good point. Maybe Tika should use Joda-Time instead of the built-in DateFormat and Calendar classes - not only it's much faster but also provides thread-safe classes for date parsing and formatting (http://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.html)
> Metadata.formatDate should use ThreadLocal
> ------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-864) Metadata.formatDate causes blocking in
concurrent use
Posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting updated TIKA-864:
-------------------------------
Summary: Metadata.formatDate causes blocking in concurrent use (was: Metadata.formatDate should use ThreadLocal)
bq. threads were very often blocked on a few sync blocks, among others on this one and the one in TIKA-865.
Reason enough for me, thanks for the background.
I updated the issue summary to identify the problem to be solved instead of a proposed solution. Using ThreadLocals is troublesome as already mentioned by Nick.
bq. Joda-Time
I'm not too excited about adding extra dependencies to tika-core. In TIKA-495 (which led to the use of a synchronized static variable) the FastDateFormat class from Commons Lang was considered as an alternative, but also there the overhead of an extra dependency (or embedding just that class) was a problem.
The formatDate() contract is pretty straightforward, so perhaps the best solution here is
just to add a bit of custom code that formats the requested string directly from the given
date object without needing extra formatter classes.
> Metadata.formatDate causes blocking in concurrent use
> -----------------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-864) Metadata.formatDate causes blocking
in concurrent use
Posted by "Andrzej Bialecki (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210694#comment-13210694 ]
Andrzej Bialecki commented on TIKA-864:
----------------------------------------
Thanks Jukka, this solved the issue nicely - however, I just noticed that the javadoc for that method is now incorrect, because it claims the method is synchronized and points to TIKA-495. This comment should be removed.
> Metadata.formatDate causes blocking in concurrent use
> -----------------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
> Assignee: Jukka Zitting
> Fix For: 1.1
>
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-864) Metadata.formatDate should use
ThreadLocal
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210207#comment-13210207 ]
Nick Burch commented on TIKA-864:
---------------------------------
If we did store them on a ThreadLocal, then how would we allow them to be cleaned up?
For an example, Tomcat will give you warnings if you leave behind Thread Locals, so we'd need to give a way to clean them up that someone using Tika inside a webapp could use.
> Metadata.formatDate should use ThreadLocal
> ------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-864) Metadata.formatDate should use
ThreadLocal
Posted by "Andrzej Bialecki (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210247#comment-13210247 ]
Andrzej Bialecki commented on TIKA-864:
----------------------------------------
I noticed this issue when profiling a larger application that uses a configurable pool of threads (hundreds) to process the Enron data-set (the version in plain text RFC822 format, available here http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz). I didn't measure in numbers the impact of this particular method call on the whole process, but I saw that threads were very often blocked on a few sync blocks, among others on this one and the one in TIKA-865.
> Metadata.formatDate should use ThreadLocal
> ------------------------------------------
>
> Key: TIKA-864
> URL: https://issues.apache.org/jira/browse/TIKA-864
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Andrzej Bialecki
>
> Currently this is a synchronized method that uses a single instance of DateFormat. Instead it could use a pool of ThreadLocal DateFormat instances and avoid the sync blocking.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira