You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/02/20 22:03:06 UTC

[jira] Created: (NUTCH-447) Dmoz Structure Parser Tool

Dmoz Structure Parser Tool
--------------------------

                 Key: NUTCH-447
                 URL: https://issues.apache.org/jira/browse/NUTCH-447
             Project: Nutch
          Issue Type: New Feature
    Affects Versions: 0.9.0
         Environment: all platforms
            Reporter: Dennis Kubes
         Assigned To: Dennis Kubes
            Priority: Minor


This is a tool that will take the dmoz structure RDF file and return a listing of the categories.  The categories return can be limited by depth or by regular expression pattern.  This tool borrows heavily from the DmozParser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-447) Dmoz Structure Parser Tool

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-447:
-------------------------------

    Attachment: dmoz-structure.patch

Patch that contains the DmozStructureParser class.

> Dmoz Structure Parser Tool
> --------------------------
>
>                 Key: NUTCH-447
>                 URL: https://issues.apache.org/jira/browse/NUTCH-447
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>         Environment: all platforms
>            Reporter: Dennis Kubes
>         Assigned To: Dennis Kubes
>            Priority: Minor
>         Attachments: dmoz-structure.patch
>
>
> This is a tool that will take the dmoz structure RDF file and return a listing of the categories.  The categories return can be limited by depth or by regular expression pattern.  This tool borrows heavily from the DmozParser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-447) Dmoz Structure Parser Tool

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes closed NUTCH-447.
------------------------------


Closed

> Dmoz Structure Parser Tool
> --------------------------
>
>                 Key: NUTCH-447
>                 URL: https://issues.apache.org/jira/browse/NUTCH-447
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>         Environment: all platforms
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>            Priority: Minor
>         Attachments: dmoz-structure.patch
>
>
> This is a tool that will take the dmoz structure RDF file and return a listing of the categories.  The categories return can be limited by depth or by regular expression pattern.  This tool borrows heavily from the DmozParser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (NUTCH-447) Dmoz Structure Parser Tool

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes resolved NUTCH-447.
--------------------------------

    Resolution: Won't Fix

Tool is in JIRA, no need to add to main trunk.

> Dmoz Structure Parser Tool
> --------------------------
>
>                 Key: NUTCH-447
>                 URL: https://issues.apache.org/jira/browse/NUTCH-447
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>         Environment: all platforms
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>            Priority: Minor
>         Attachments: dmoz-structure.patch
>
>
> This is a tool that will take the dmoz structure RDF file and return a listing of the categories.  The categories return can be limited by depth or by regular expression pattern.  This tool borrows heavily from the DmozParser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-447) Dmoz Structure Parser Tool

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474663 ] 

Otis Gospodnetic commented on NUTCH-447:
----------------------------------------

The idea being to limit crawling only to links under a certain category as opposed to crawling all links in Dmoz?


> Dmoz Structure Parser Tool
> --------------------------
>
>                 Key: NUTCH-447
>                 URL: https://issues.apache.org/jira/browse/NUTCH-447
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>         Environment: all platforms
>            Reporter: Dennis Kubes
>         Assigned To: Dennis Kubes
>            Priority: Minor
>         Attachments: dmoz-structure.patch
>
>
> This is a tool that will take the dmoz structure RDF file and return a listing of the categories.  The categories return can be limited by depth or by regular expression pattern.  This tool borrows heavily from the DmozParser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-447) Dmoz Structure Parser Tool

Posted by "Dennis Kubes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474713 ] 

Dennis Kubes commented on NUTCH-447:
------------------------------------

This tool is for people who need a defined category structure or want to grab all or part of the dmoz category structure without urls.  You could certainly then use this list as the topic list in the DmozParserTool to only crawl under a certain category.  

> Dmoz Structure Parser Tool
> --------------------------
>
>                 Key: NUTCH-447
>                 URL: https://issues.apache.org/jira/browse/NUTCH-447
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>         Environment: all platforms
>            Reporter: Dennis Kubes
>         Assigned To: Dennis Kubes
>            Priority: Minor
>         Attachments: dmoz-structure.patch
>
>
> This is a tool that will take the dmoz structure RDF file and return a listing of the categories.  The categories return can be limited by depth or by regular expression pattern.  This tool borrows heavily from the DmozParser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.