You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Aaron Nall (JIRA)" <ji...@apache.org> on 2008/07/28 21:33:31 UTC

[jira] Created: (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.
-----------------------------------------------------------------------------------------------------------------

                 Key: NUTCH-638
                 URL: https://issues.apache.org/jira/browse/NUTCH-638
             Project: Nutch
          Issue Type: Improvement
          Components: searcher
    Affects Versions: 1.0.0
            Reporter: Aaron Nall
            Priority: Minor


I wanted to conduct all index creation operations in hdfs but search from the local file system using the same cluster of machines.  I believe that this is a common use case.  

This required either a parallel nutch install or edits (scripted or manual) to hadoop-site.xml to change the file system from hdfs to local when starting a distributed searcher service.  This minor patch makes IndexSearcher and NutchBean honor URIs as supported by hadoop 0.17 without altering existing functionality if simple paths are entered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639058#action_12639058 ] 

Andrzej Bialecki  commented on NUTCH-638:
-----------------------------------------

I think in NutchBean.java we can also use dir.getFileSystem(conf) instead of FileSystem.get(dir.toUri(), this.conf). Could you please test if this works for you? Other than that the patch looks fine.

> Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-638
>                 URL: https://issues.apache.org/jira/browse/NUTCH-638
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Aaron Nall
>            Priority: Minor
>         Attachments: distributed-search-uri.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> I wanted to conduct all index creation operations in hdfs but search from the local file system using the same cluster of machines.  I believe that this is a common use case.  
> This required either a parallel nutch install or edits (scripted or manual) to hadoop-site.xml to change the file system from hdfs to local when starting a distributed searcher service.  This minor patch makes IndexSearcher and NutchBean honor URIs as supported by hadoop 0.17 without altering existing functionality if simple paths are entered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

Posted by "Aaron Nall (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Nall updated NUTCH-638:
-----------------------------

    Attachment: distributed-search-uri.patch

This is the patch that I used to address the issue.

> Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-638
>                 URL: https://issues.apache.org/jira/browse/NUTCH-638
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Aaron Nall
>            Priority: Minor
>         Attachments: distributed-search-uri.patch
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> I wanted to conduct all index creation operations in hdfs but search from the local file system using the same cluster of machines.  I believe that this is a common use case.  
> This required either a parallel nutch install or edits (scripted or manual) to hadoop-site.xml to change the file system from hdfs to local when starting a distributed searcher service.  This minor patch makes IndexSearcher and NutchBean honor URIs as supported by hadoop 0.17 without altering existing functionality if simple paths are entered.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.