You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2008/08/19 06:03:44 UTC

[jira] Created: (NUTCH-646) New Indexing Framework for Nutch

New Indexing Framework for Nutch
--------------------------------

                 Key: NUTCH-646
                 URL: https://issues.apache.org/jira/browse/NUTCH-646
             Project: Nutch
          Issue Type: New Feature
          Components: indexer
    Affects Versions: 0.9.0
         Environment: All
            Reporter: Dennis Kubes
            Assignee: Dennis Kubes
             Fix For: 1.0.0, 0.9.0


New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-646:
-------------------------------

    Attachment: NUTCH-646-1-20080818.patch

The new indexing framework including new field indexer, field extension point, and field filter plugins.  This patch relys on the arity java arithmetic parser and on the new scoring framework defined in NUTCH-635.

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: NUTCH-646-1-20080818.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653489#action_12653489 ] 

Dennis Kubes commented on NUTCH-646:
------------------------------------

For the final version of this I have removed the arity dependencies and computation functionality.  I still think that type of functionality is needed but it didn't feel like the right place for it at this time.

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654154#action_12654154 ] 

Dennis Kubes commented on NUTCH-646:
------------------------------------

Not yet.  I need to write up some serious documentation about how to use both the new scoring and indexing systems.  I will try to get to that soon.

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-646:
-------------------------------

    Attachment: arity-1.3.2.jar

Arity jar licensed under the apache license.  Needed for the field-boost plugin.  Arity is an arithmetic parser.  This is used in the field-boost plugin to do simple arithmetic manipulations during indexing.  Arity can be found at http://code.google.com/p/arity/

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes closed NUTCH-646.
------------------------------


> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-646:
-------------------------------

    Attachment: NUTCH-646-2-20081126.patch

Updated indexing patch.

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657685#action_12657685 ] 

Hudson commented on NUTCH-646:
------------------------------

Integrated in Nutch-trunk #667 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/667/])
    

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653927#action_12653927 ] 

Doğacan Güney commented on NUTCH-646:
-------------------------------------

I should probably not be so lazy and just check it myself, but do we have a tutorial in wiki about how to use the new indexing system (or the scoring system for that regard) ?

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (NUTCH-646) New Indexing Framework for Nutch

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes resolved NUTCH-646.
--------------------------------

    Resolution: Fixed

Committed with revision 723447

> New Indexing Framework for Nutch
> --------------------------------
>
>                 Key: NUTCH-646
>                 URL: https://issues.apache.org/jira/browse/NUTCH-646
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0, 0.9.0
>
>         Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field abstraction consistent with Lucene index semantics.  Allows multiple MR jobs to be created for different fields and those fields to be aggregated and indexed in the end.  Overcomes limitations of the current indexer that limits what databases are passed into the indexer.  Creates a new extension point as well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.