You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Milind Bhandarkar (JIRA)" <ji...@apache.org> on 2006/11/20 23:03:02 UTC

[jira] Created: (HADOOP-738) dfs get or copyToLocal should not copy crc file

dfs get or copyToLocal should not copy crc file
-----------------------------------------------

                 Key: HADOOP-738
                 URL: http://issues.apache.org/jira/browse/HADOOP-738
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.8.0
         Environment: all
            Reporter: Milind Bhandarkar
         Assigned To: Milind Bhandarkar
             Fix For: 0.9.0


Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12451750 ] 
            
Doug Cutting commented on HADOOP-738:
-------------------------------------

> Let's add a command line option to turn CRC export on.

As I said, I'm fine with that, but that sounds like a separate issue than the one described: It doesn't fix this issue, but rather hides it for some users.  When crc's *are* enabled, it shouldn't cause problems to 'dfs -put' files that have crcs.  That's the issue here.  Disabling crcs for 'dfs -get' for folks who have ECC memory and don't ever have file corruption issues on local disks is not something I object to.

> This will let sophisticated users who have a need for CRCs use them [ ... ]

So 'sophisticated' now means that you don't have ECC memory and reliable hard drives?


> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Work started: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Work on HADOOP-738 started by Milind Bhandarkar.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12451449 ] 
            
Doug Cutting commented on HADOOP-738:
-------------------------------------

For the particular problem cited, wouldn't the fix be to remove the old .crc file before writing the new file?  CRC files are useful outside of DFS.  Throwing them away means the data is no longer checked for end-to-end corruption.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12452033 ] 
            
Doug Cutting commented on HADOOP-738:
-------------------------------------

> Do we still want to support crc files [ ...]

MapReduce data spends a lot of time in memory (while sorting) and on local disks.  Most checksum errors folks see are from local disks during sorting, not HDFS.  So, yes, we'll still need it.

And per-block checksums are different.  They're not end-to-end.  Currently we checksum the data as it is written to the output stream's buffer, and validate it as it is read from the input stream's buffer.  A lot can happen between that time and it winding up in a DFS block.  To replace this we'd ideally want to still compute the checksum as it is written, transmit it along with the block to datanodes, then transmit it back to the client when the data is read, and verify as it is read.  We'd also need sub-block checksums, not per-block, so that one can seek without checksumming an entire block.  Yes, TCP does checksums, but memory errors can be introduced on either end outside of the TCP stack, and, if blocks are temporarily stored on local disk, it can also be a source of block corruption.  So getting rid of CRC files for even HDFS will take more than just per-block checksums on datanodes.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Doug Cutting updated HADOOP-738:
--------------------------------

           Status: Resolved  (was: Patch Available)
    Fix Version/s: 0.10.0
       Resolution: Fixed

I just committed this.  Thanks, Milind!

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12452029 ] 
            
Konstantin Shvachko commented on HADOOP-738:
--------------------------------------------

We plan to remove crc files as separate entities replacing them by
per block crcs on data-nodes. Do we still want to support crc files
just for local fs after that? If not then Milinds solution seems right.


> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Milind Bhandarkar updated HADOOP-738:
-------------------------------------

    Status: Patch Available  (was: In Progress)

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12451472 ] 
            
Yoram Arnon commented on HADOOP-738:
------------------------------------

local files in general do not have a crc file associated with them.
making a local copy of dfs file implies losing the crc file IMO, just like putting a file into dfs should assume the source filed doesn't have a crc file associated with it.
yes, we lose something, but that something is the norm in the local fs world. To me that makes sense, especially since the crc files are hidden, and thus confusing for the uninitiated.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12454070 ] 
            
Doug Cutting commented on HADOOP-738:
-------------------------------------

> I see two different issues here.

Me too, but I'm not sure they are the same two.

One is a bug.  From the original description, "When we -put, since the crc files already exist on DFS, this put fails."  When local CRC files are enabled, a -get followed by a -put of the same file should not fail, but apparently it does.  This should work, no?

The other is a feature request to be able to disable CRC creation in -get.  I'm okay with this feature.

I'd rather fix the bug before we introduce the feature, especially if CRC-free -get becomes the default, since it masks the bug.


> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12452613 ] 
            
eric baldeschwieler commented on HADOOP-738:
--------------------------------------------

Well the discussion does seem to be on the topic of the subject.  As discussed in the thread there are several reasons to consider the change.  Enhancing the description could be done.

I agree that we probably should fix the interaction of the CRC files in the current copy too.  Although renaming them so they are visible would help a lot no mater what.

I also agree that we will need sub-block CRC info even when we move the CRC data to be a block attribute.

Moving multi-terrabyte objects to local disk is not the prototypical use of Hadoop in our environment.  It is certainly not the prototypical reason we invoke -get or -copyToLocal.

We don't seem to be converging here.  Maybe we should create two commands.  One which is typically used for lightweight copies and does not write CRCs and one which does and has an inverse command that validates the CRCs on import.  Although what happens when a CRC does not match on import?  The CRC exporter would create visible CRCs and would have well defined semantics for overwriting files.  (Failing if the target directory would avoid the problem in the description ...)

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12453783 ] 
            
Konstantin Shvachko commented on HADOOP-738:
--------------------------------------------

I see two different issues here.
The first is that MR needs to copy locally dfs files and it turns out to be using local fs for that.
And the second is that -put and -get commands of DFSShell are not symmetrical, meaning
when a user "puts" he doesn't need crc, while when he "gets" he sees a crc file.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12455815 ] 
            
Hadoop QA commented on HADOOP-738:
----------------------------------

+1, http://issues.apache.org/jira/secure/attachment/12346489/hadoop-crc.patch applied and successfully tested against trunk revision r481432

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12451500 ] 
            
eric baldeschwieler commented on HADOOP-738:
--------------------------------------------

I'd suggest we turn this around.  Exporting CRCs is not the expected default and causes regular user confusion in our environment.  Let's add a command line option to turn CRC export on.  This will let sophisticated users who have a need for CRCs use them, but removes them from the common case where they are simply causing confusion and pain.

Fair enough?

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Milind Bhandarkar updated HADOOP-738:
-------------------------------------

    Attachment: hadoop-crc.patch

This patch adds one more parameter to FileUtil.copy method, which specifies whether to exclude crc files.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Milind Bhandarkar updated HADOOP-738:
-------------------------------------

    Attachment:     (was: hadoop-crc.patch)

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Milind Bhandarkar updated HADOOP-738:
-------------------------------------

    Status: Patch Available  (was: Open)

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12451824 ] 
            
eric baldeschwieler commented on HADOOP-738:
--------------------------------------------

The issues you raise are real when you are moving multi-terabyte search indexes and such.  Having end-2-end CRC support is crucial and I'm all for keeping it in.  But yes, our average use is not "sophisticated" in that their average operation to local disk in our environment is just to pull some experiment results locally.  In general modern machines are good enough that megabyte to gigabyte sized files can be read and written reliably enough that CRC errors are not a dominate concern.

Where they are a concern, I suggest we make the CRC files visible.  Rather than using "dot" files, just append .crc to the file name.  Then at least folks can see them and ask the right questions, etc.  Things will work better that way.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12452016 ] 
            
Doug Cutting commented on HADOOP-738:
-------------------------------------

> I suggest we make the CRC files visible.

Either someone needs to change the description of this issue, or this discussion needs to move to a new issue.  As I keep saying, the fixes proposed don't fix the described problem: when local CRC files are used then 'dfs -put' fails.  I have not verified this, but, if this is the case, it is a bug that should be fixed.  Disabling CRC files for some folks, while a reasonable thing to do, won't fix this.  Neither will changing their names.

> multi-terabyte search indexes [ ... not typical ...]

Aren't these prototypical for Hadoop?  Isn't that why we're building Hadoop?  Should such uses require special configuration?

> [...] CRC errors are not a dominate concern

If you have ECC memory, which you do, but not all folks do.


> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Milind Bhandarkar updated HADOOP-738:
-------------------------------------

    Status: Open  (was: Patch Available)

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-738?page=comments#action_12451479 ] 
            
Doug Cutting commented on HADOOP-738:
-------------------------------------

In Hadoop, local files generally *do* have a .crc file associated with them.  This has caught lots of data corruption problems (mostly for folks w/o ECC memory).

I'm okay having an option to not create crc files when exporting files, but I think it's the wrong fix for the problem in the description of this issue.  Getting and putting files should work with local .crc files enabled.

> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-738) dfs get or copyToLocal should not copy crc file

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-738?page=all ]

Milind Bhandarkar updated HADOOP-738:
-------------------------------------

    Attachment: hadoop-crc.patch

Added an optional -crc argument to -get and -copyToLocal commands in DFSShell. Without this argument, crc files are not created on local file system. If one wants to explicitly request fetching crc files as well, one has to use:

-get -crc <src> <localDst>



> dfs get or copyToLocal should not copy crc file
> -----------------------------------------------
>
>                 Key: HADOOP-738
>                 URL: http://issues.apache.org/jira/browse/HADOOP-738
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.8.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>         Attachments: hadoop-crc.patch
>
>
> Currently, when we -get or -copyToLocal a directory from DFS, all the files including crc files are also copied. When we -put or -copyFromLocal again, since the crc files already exist on DFS, this put fails. The solution is not to copy checksum files when copying to local. Patch is forthcoming.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira