You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/07/30 14:21:34 UTC
[jira] [Created] (NUTCH-1441) AnchorIndexingFilter should use plain
HashSet
Ferdy Galema created NUTCH-1441:
-----------------------------------
Summary: AnchorIndexingFilter should use plain HashSet
Key: NUTCH-1441
URL: https://issues.apache.org/jira/browse/NUTCH-1441
Project: Nutch
Issue Type: Bug
Reporter: Ferdy Galema
Priority: Minor
Fix For: 2.1
Attachments: NUTCH-1441.patch
AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1441) AnchorIndexingFilter should use plain
HashSet
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1441:
--------------------------------
Attachment: NUTCH-1441.patch
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 2.1
>
> Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1441) AnchorIndexingFilter should use
plain HashSet
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424824#comment-13424824 ]
Lewis John McGibbney commented on NUTCH-1441:
---------------------------------------------
also for trunk? If you wish to create patch ferdy great, if not then i can pick it up when i write the test for the class.
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 2.1
>
> Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (NUTCH-1441) AnchorIndexingFilter should use
plain HashSet
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema reopened NUTCH-1441:
---------------------------------
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 1.6, 2.1
>
> Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1441) AnchorIndexingFilter should use plain
HashSet
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1441:
--------------------------------
Attachment: NUTCH-1441-trunk.patch
Patch for trunk. It would be great if you could apply and test this for trunk Lewis.
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 1.6, 2.1
>
> Attachments: NUTCH-1441-trunk.patch, NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1441) AnchorIndexingFilter should use plain
HashSet
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1441:
--------------------------------
Patch Info: Patch Available
Fix Version/s: 1.6
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 1.6, 2.1
>
> Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (NUTCH-1441) AnchorIndexingFilter should use plain
HashSet
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema closed NUTCH-1441.
-------------------------------
Resolution: Fixed
committed
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 2.1
>
> Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1441) AnchorIndexingFilter should use
plain HashSet
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-1441.
-----------------------------------------
Resolution: Fixed
Committed @revision 1387341 in trunk
Thank you Ferdy
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 1.6, 2.1
>
> Attachments: NUTCH-1441.patch, NUTCH-1441-trunk.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1441) AnchorIndexingFilter should use
plain HashSet
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458189#comment-13458189 ]
Hudson commented on NUTCH-1441:
-------------------------------
Integrated in nutch-trunk-maven #426 (See [https://builds.apache.org/job/nutch-trunk-maven/426/])
NUTCH-1441 AnchorIndexingFilter should use plain HashSet (Revision 1387341)
Result = SUCCESS
lewismc :
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
> Key: NUTCH-1441
> URL: https://issues.apache.org/jira/browse/NUTCH-1441
> Project: Nutch
> Issue Type: Bug
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: 1.6, 2.1
>
> Attachments: NUTCH-1441.patch, NUTCH-1441-trunk.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira