You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ferdy Galema (JIRA)" <ji...@apache.org> on 2012/07/30 14:21:34 UTC

[jira] [Created] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Ferdy Galema created NUTCH-1441:
-----------------------------------

             Summary: AnchorIndexingFilter should use plain HashSet
                 Key: NUTCH-1441
                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
             Project: Nutch
          Issue Type: Bug
            Reporter: Ferdy Galema
            Priority: Minor
             Fix For: 2.1
         Attachments: NUTCH-1441.patch

AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)

This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1441:
--------------------------------

    Attachment: NUTCH-1441.patch
    
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424824#comment-13424824 ] 

Lewis John McGibbney commented on NUTCH-1441:
---------------------------------------------

also for trunk? If you wish to create patch ferdy great, if not then i can pick it up when i write the test for the class. 
                
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema reopened NUTCH-1441:
---------------------------------

    
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 1.6, 2.1
>
>         Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1441:
--------------------------------

    Attachment: NUTCH-1441-trunk.patch

Patch for trunk. It would be great if you could apply and test this for trunk Lewis.
                
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 1.6, 2.1
>
>         Attachments: NUTCH-1441-trunk.patch, NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1441:
--------------------------------

       Patch Info: Patch Available
    Fix Version/s: 1.6
    
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 1.6, 2.1
>
>         Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema closed NUTCH-1441.
-------------------------------

    Resolution: Fixed

committed
                
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1441.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney resolved NUTCH-1441.
-----------------------------------------

    Resolution: Fixed

Committed @revision 1387341 in trunk
Thank you Ferdy
                
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 1.6, 2.1
>
>         Attachments: NUTCH-1441.patch, NUTCH-1441-trunk.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1441) AnchorIndexingFilter should use plain HashSet

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458189#comment-13458189 ] 

Hudson commented on NUTCH-1441:
-------------------------------

Integrated in nutch-trunk-maven #426 (See [https://builds.apache.org/job/nutch-trunk-maven/426/])
    NUTCH-1441 AnchorIndexingFilter should use plain HashSet (Revision 1387341)

     Result = SUCCESS
lewismc : 
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java

                
> AnchorIndexingFilter should use plain HashSet
> ---------------------------------------------
>
>                 Key: NUTCH-1441
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1441
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: 1.6, 2.1
>
>         Attachments: NUTCH-1441.patch, NUTCH-1441-trunk.patch
>
>
> AnchorIndexingFilter should use a plain HashSet, instead of WeakHashMap. WeakHashMap is unnecessary and can perhaps even cause bugs. (A WeakHashMap get its entries removed when the gc notices the keys are not elsewhere in use.)
> This patch also makes the filter a bit faster by lazy instantiating the set. (No need to create one everytime when deduplication is off).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira