You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Monu Ogbe (JIRA)" <ji...@apache.org> on 2006/03/30 16:27:26 UTC

[jira] Created: (HADOOP-112) copyFromLocal should exclude .crc files

copyFromLocal should exclude .crc files
---------------------------------------

         Key: HADOOP-112
         URL: http://issues.apache.org/jira/browse/HADOOP-112
     Project: Hadoop
        Type: Bug
  Components: dfs  
 Environment: DFS cluster of 6 3hz Xeons with 2Gb RAM running Centos 4.2 and Sun's JDK1.5 - but Probably applies in any environment
    Reporter: Monu Ogbe
    Priority: Minor


Doug Cutting says: "The problem is that when copyFromLocal 
enumerates local files it should exclude .crc files, but it does not. 
This is the listFiles() call on DistributedFileSystem:160.  It should 
filter this, excluding files that are FileSystem.isChecksumFile().

BTW, as a workaround, it is safe to first remove all of the .crc files, 
but your files will no longer be checksummed as they are read.  On 
systems without ECC memory file corruption is not uncommon, but I have 
seen very little on clusters that have ECC."

Original observations:

Hello Team,

I created a backup of my DFS database:

# bin/hadoop dfs -copyToLocal /user/root/crawl /mylocaldir

I now want to restore from the backup using:

# bin/hadoop dfs -copyFromLocal /mylocaldir/crawl /user/root

However I'm getting the following error:

copyFromLocal: Target /user/root/crawl/crawldb/current/part-00000/.data.crc
already exists

I get this message with every permutation of the command that I've tried, and
even after totally deleting all content in the DFS directories.

I'd be grateful for any pointers.

Many thanks,





-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-112) copyFromLocal should exclude .crc files

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-112?page=comments#action_12373413 ] 

Doug Cutting commented on HADOOP-112:
-------------------------------------

The changes made are listed at:

http://svn.apache.org/viewcvs?rev=390218&view=rev

The described problem was fixed: a -copyToLocal (a.k.a -get) followed by a -copyFromLocal (a.k.a. -put) no longer fails complaining about a .crc file.  If this is failing again then this bug should be re-opened.  Otherwise I think it should remain closed.

If there is a problem with 'dfs -cp' then I think that is a separate bug, no?


> copyFromLocal should exclude .crc files
> ---------------------------------------
>
>          Key: HADOOP-112
>          URL: http://issues.apache.org/jira/browse/HADOOP-112
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>  Environment: DFS cluster of 6 3hz Xeons with 2Gb RAM running Centos 4.2 and Sun's JDK1.5 - but Probably applies in any environment
>     Reporter: Monu Ogbe
>     Assignee: Doug Cutting
>     Priority: Minor
>      Fix For: 0.1.0

>
> Doug Cutting says: "The problem is that when copyFromLocal 
> enumerates local files it should exclude .crc files, but it does not. 
> This is the listFiles() call on DistributedFileSystem:160.  It should 
> filter this, excluding files that are FileSystem.isChecksumFile().
> BTW, as a workaround, it is safe to first remove all of the .crc files, 
> but your files will no longer be checksummed as they are read.  On 
> systems without ECC memory file corruption is not uncommon, but I have 
> seen very little on clusters that have ECC."
> Original observations:
> Hello Team,
> I created a backup of my DFS database:
> # bin/hadoop dfs -copyToLocal /user/root/crawl /mylocaldir
> I now want to restore from the backup using:
> # bin/hadoop dfs -copyFromLocal /mylocaldir/crawl /user/root
> However I'm getting the following error:
> copyFromLocal: Target /user/root/crawl/crawldb/current/part-00000/.data.crc
> already exists
> I get this message with every permutation of the command that I've tried, and
> even after totally deleting all content in the DFS directories.
> I'd be grateful for any pointers.
> Many thanks,

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (HADOOP-112) copyFromLocal should exclude .crc files

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-112?page=all ]
     
Doug Cutting closed HADOOP-112:
-------------------------------

    Resolution: Fixed

Given the lack of further comments, I will re-close this.  If there are other, related bugs, please report them as new bugs rather than re-opening this again.

> copyFromLocal should exclude .crc files
> ---------------------------------------
>
>          Key: HADOOP-112
>          URL: http://issues.apache.org/jira/browse/HADOOP-112
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>  Environment: DFS cluster of 6 3hz Xeons with 2Gb RAM running Centos 4.2 and Sun's JDK1.5 - but Probably applies in any environment
>     Reporter: Monu Ogbe
>     Assignee: Doug Cutting
>     Priority: Minor
>      Fix For: 0.1.0

>
> Doug Cutting says: "The problem is that when copyFromLocal 
> enumerates local files it should exclude .crc files, but it does not. 
> This is the listFiles() call on DistributedFileSystem:160.  It should 
> filter this, excluding files that are FileSystem.isChecksumFile().
> BTW, as a workaround, it is safe to first remove all of the .crc files, 
> but your files will no longer be checksummed as they are read.  On 
> systems without ECC memory file corruption is not uncommon, but I have 
> seen very little on clusters that have ECC."
> Original observations:
> Hello Team,
> I created a backup of my DFS database:
> # bin/hadoop dfs -copyToLocal /user/root/crawl /mylocaldir
> I now want to restore from the backup using:
> # bin/hadoop dfs -copyFromLocal /mylocaldir/crawl /user/root
> However I'm getting the following error:
> copyFromLocal: Target /user/root/crawl/crawldb/current/part-00000/.data.crc
> already exists
> I get this message with every permutation of the command that I've tried, and
> even after totally deleting all content in the DFS directories.
> I'd be grateful for any pointers.
> Many thanks,

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-112) copyFromLocal should exclude .crc files

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-112?page=all ]
     
Doug Cutting resolved HADOOP-112:
---------------------------------

    Fix Version: 0.1
     Resolution: Fixed
      Assign To: Doug Cutting

I just committed a fix for this.  DistributedFileSystem.copyFromLocal() no longer attempts to copy CRC files.

> copyFromLocal should exclude .crc files
> ---------------------------------------
>
>          Key: HADOOP-112
>          URL: http://issues.apache.org/jira/browse/HADOOP-112
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>  Environment: DFS cluster of 6 3hz Xeons with 2Gb RAM running Centos 4.2 and Sun's JDK1.5 - but Probably applies in any environment
>     Reporter: Monu Ogbe
>     Assignee: Doug Cutting
>     Priority: Minor
>      Fix For: 0.1

>
> Doug Cutting says: "The problem is that when copyFromLocal 
> enumerates local files it should exclude .crc files, but it does not. 
> This is the listFiles() call on DistributedFileSystem:160.  It should 
> filter this, excluding files that are FileSystem.isChecksumFile().
> BTW, as a workaround, it is safe to first remove all of the .crc files, 
> but your files will no longer be checksummed as they are read.  On 
> systems without ECC memory file corruption is not uncommon, but I have 
> seen very little on clusters that have ECC."
> Original observations:
> Hello Team,
> I created a backup of my DFS database:
> # bin/hadoop dfs -copyToLocal /user/root/crawl /mylocaldir
> I now want to restore from the backup using:
> # bin/hadoop dfs -copyFromLocal /mylocaldir/crawl /user/root
> However I'm getting the following error:
> copyFromLocal: Target /user/root/crawl/crawldb/current/part-00000/.data.crc
> already exists
> I get this message with every permutation of the command that I've tried, and
> even after totally deleting all content in the DFS directories.
> I'd be grateful for any pointers.
> Many thanks,

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Reopened: (HADOOP-112) copyFromLocal should exclude .crc files

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-112?page=all ]
     
Konstantin Shvachko reopened HADOOP-112:
----------------------------------------


I cannot see what exactly was committed here, but now regular dfs -cp doesn't
copy crc files, which causes e.g. dfs -cat complain about it.
I presume that changes were made to FileUtil.copyContents() which is called in
several places, which sometimes do and sometimes do not need crc files.
In case of dfs copy the crc files are needed.

> copyFromLocal should exclude .crc files
> ---------------------------------------
>
>          Key: HADOOP-112
>          URL: http://issues.apache.org/jira/browse/HADOOP-112
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>  Environment: DFS cluster of 6 3hz Xeons with 2Gb RAM running Centos 4.2 and Sun's JDK1.5 - but Probably applies in any environment
>     Reporter: Monu Ogbe
>     Assignee: Doug Cutting
>     Priority: Minor
>      Fix For: 0.1.0

>
> Doug Cutting says: "The problem is that when copyFromLocal 
> enumerates local files it should exclude .crc files, but it does not. 
> This is the listFiles() call on DistributedFileSystem:160.  It should 
> filter this, excluding files that are FileSystem.isChecksumFile().
> BTW, as a workaround, it is safe to first remove all of the .crc files, 
> but your files will no longer be checksummed as they are read.  On 
> systems without ECC memory file corruption is not uncommon, but I have 
> seen very little on clusters that have ECC."
> Original observations:
> Hello Team,
> I created a backup of my DFS database:
> # bin/hadoop dfs -copyToLocal /user/root/crawl /mylocaldir
> I now want to restore from the backup using:
> # bin/hadoop dfs -copyFromLocal /mylocaldir/crawl /user/root
> However I'm getting the following error:
> copyFromLocal: Target /user/root/crawl/crawldb/current/part-00000/.data.crc
> already exists
> I get this message with every permutation of the command that I've tried, and
> even after totally deleting all content in the DFS directories.
> I'd be grateful for any pointers.
> Many thanks,

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira