You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/08/18 20:47:15 UTC

[jira] Closed: (NUTCH-341) IndexMerger now deletes entire after completing

     [ http://issues.apache.org/jira/browse/NUTCH-341?page=all ]

Andrzej Bialecki  closed NUTCH-341.
-----------------------------------

    Fix Version/s: 0.8.1
                   0.9.0
       Resolution: Fixed

Fixed. Thanks!

> IndexMerger now deletes entire <workingdir> after completing
> ------------------------------------------------------------
>
>                 Key: NUTCH-341
>                 URL: http://issues.apache.org/jira/browse/NUTCH-341
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 0.8
>            Reporter: Chris Schneider
>            Priority: Critical
>             Fix For: 0.8.1, 0.9.0
>
>         Attachments: doNotDeleteTmpIndexMergeDirV1.patch, patch-v2.txt
>
>
> Change 383304 deleted the following line near Line 117 (see <http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexMerger.java?r1=383304&r2=405204&diff_format=h> for details):
> workDir = new File(workDir, "indexmerger-workingdir");
> Previously, if no -workingdir <workingdir> parameter was specified, IndexMerger.main() would place an "indexmerger-workingdir" directory into the default directory and then delete the former after completing. Now, IndexMerger.main() defaults the value of its workDir to "indexmerger" within the default directory, and deletes this workDir afterward.
> However, if -workingdir <workingdir> _is_ specified, IndexMerger.main() will now set workDir to _this_ path and delete the _entire_ <workingdir> afterward. Previously, IndexMerger.main() would only delete <workingDir>/"indexmerger-workingdir", without deleting <workingdir> itself. This is because the line mentioned above always appended "indexmerger-workingdir" to workDir.
> Our hardware configuration on the jobtracker/namenode box attempts to keep all large datasets on a separate, large hard drive. Accordingly, we were keeping dfs.name.dir, dfs.data.dir, mapred.system.dir, and mapred.local.dir on this drive. Unfortunately, we were passing the folder containing these folders in the <workingdir> parameter to the IndexMerger. As a result, the first time we ran the IndexMerger, we ended up trashing our entire DFS!
> Perhaps the way that the IndexMerger handles its <workingdir> parmaeter now is an acceptable design. However, given the way it handled this parameter in the past, I feel that the current implementation is unacceptably dangerous.
> More importantly, perhaps there's some way that we could make hadoop more robust in handling its critical data files. I plan to place a directory owned by root with "dr--------" permissions into each of these critical directories in order to prevent any of them from suffering the fate of our DFS. This could become part of a standard hadoop installation.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira