You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/05/07 10:56:10 UTC
[jira] [Created] (NUTCH-1352) Improve regex urlfilters/normalizers
synchronization
Ferdy Galema created NUTCH-1352:
-----------------------------------
Summary: Improve regex urlfilters/normalizers synchronization
Key: NUTCH-1352
URL: https://issues.apache.org/jira/browse/NUTCH-1352
Project: Nutch
Issue Type: Improvement
Reporter: Ferdy Galema
Fix For: nutchgora
I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1352) Improve regex urlfilters/normalizers
synchronization
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-1352.
----------------------------------
Resolution: Fixed
Committed for 1.6 in rev. 1349227.
Thanks Ferdy.
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269463#comment-13269463 ]
Markus Jelsma commented on NUTCH-1352:
--------------------------------------
Interesting. This should apply to trunk as well.
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora
>
> Attachments: NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1352) Improve regex urlfilters/normalizers
synchronization
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1352:
--------------------------------
Fix Version/s: 1.5
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.5
>
> Attachments: NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269619#comment-13269619 ]
Ferdy Galema commented on NUTCH-1352:
-------------------------------------
Thanks.
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271069#comment-13271069 ]
Hudson commented on NUTCH-1352:
-------------------------------
Integrated in Nutch-nutchgora #248 (See [https://builds.apache.org/job/Nutch-nutchgora/248/])
NUTCH-1352 Improve regex urlfilters/normalizers synchronization (Revision 1335066)
Result = SUCCESS
ferdy :
Files :
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexRule.java
* /nutch/branches/nutchgora/src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexURLFilterBase.java
* /nutch/branches/nutchgora/src/plugin/urlfilter-regex/src/java/org/apache/nutch/urlfilter/regex/RegexURLFilter.java
* /nutch/branches/nutchgora/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
* /nutch/branches/nutchgora/src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1352) Improve regex urlfilters/normalizers
synchronization
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1352:
--------------------------------
Fix Version/s: (was: 1.5)
1.6
On second thought, I will hold commit for trunk for now. (Feature freeze I guess?)
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269508#comment-13269508 ]
Markus Jelsma commented on NUTCH-1352:
--------------------------------------
Yes, thanks :)
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1352) Improve regex urlfilters/normalizers
synchronization
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1352:
---------------------------------
Attachment: NUTCH-1352-1.6-1.patch
Slightly modified patch for trunk.
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293546#comment-13293546 ]
Hudson commented on NUTCH-1352:
-------------------------------
Integrated in nutch-trunk-maven #310 (See [https://builds.apache.org/job/nutch-trunk-maven/310/])
NUTCH-1352 Improve regex urlfilters/normalizers synchronization (Revision 1349227)
Result = SUCCESS
markus :
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexRule.java
* /nutch/trunk/src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexURLFilterBase.java
* /nutch/trunk/src/plugin/urlfilter-regex/src/java/org/apache/nutch/urlfilter/regex/RegexURLFilter.java
* /nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
* /nutch/trunk/src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269697#comment-13269697 ]
Ferdy Galema commented on NUTCH-1352:
-------------------------------------
committed at nutchgora
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269497#comment-13269497 ]
Ferdy Galema commented on NUTCH-1352:
-------------------------------------
This indeed applies to trunk too. (Except for a minor patch segment about a logging statement... quite irrelevant).
I'll commit it to trunk too.
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.5
>
> Attachments: NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1352) Improve regex urlfilters/normalizers
synchronization
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1352:
--------------------------------
Attachment: NUTCH-1352.patch
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora
>
> Attachments: NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295801#comment-13295801 ]
Hudson commented on NUTCH-1352:
-------------------------------
Integrated in Nutch-trunk #1869 (See [https://builds.apache.org/job/Nutch-trunk/1869/])
NUTCH-1352 Improve regex urlfilters/normalizers synchronization (Revision 1349227)
Result = SUCCESS
markus : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1349227
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexRule.java
* /nutch/trunk/src/plugin/lib-regex-filter/src/java/org/apache/nutch/urlfilter/api/RegexURLFilterBase.java
* /nutch/trunk/src/plugin/urlfilter-regex/src/java/org/apache/nutch/urlfilter/regex/RegexURLFilter.java
* /nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
* /nutch/trunk/src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1352) Improve regex
urlfilters/normalizers synchronization
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291789#comment-13291789 ]
Lewis John McGibbney commented on NUTCH-1352:
---------------------------------------------
Markus feel free to commit this for trunk :)
> Improve regex urlfilters/normalizers synchronization
> ----------------------------------------------------
>
> Key: NUTCH-1352
> URL: https://issues.apache.org/jira/browse/NUTCH-1352
> Project: Nutch
> Issue Type: Improvement
> Reporter: Ferdy Galema
> Fix For: nutchgora, 1.6
>
> Attachments: NUTCH-1352-1.6-1.patch, NUTCH-1352.patch
>
>
> I noticed that during fetching a lot of the time the fetcherthreads are blocking on a monitor because of outlink normalizing/filtering. The cause of this: Some of the regex plugins use single lock synchronization.
> This patch improves throughput by removing synchronization locks and replace them with threadlocals were needed.
> It has been extensively tested in production. I will commit this later today when no objection.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira