You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "steveloughran (via GitHub)" <gi...@apache.org> on 2023/05/03 18:41:31 UTC

[GitHub] [hadoop] steveloughran commented on pull request #5603: HADOOP-18723. Add detail logs if distcp checksum mismatch

steveloughran commented on PR #5603:
URL: https://github.com/apache/hadoop/pull/5603#issuecomment-1533526312

   Well, I'm afraid your specific problem does not match Dee why do use cases of uploading to stores without checksums. Now, I would I've been happier if distcp's -skipCrc option was required to copy data from an FS with checksums to one without, but it is not and to add it now would break so many people's workflows.
   
   So what do we do here? 
   
   maybe: create counters of why files were copied, specifically
   * not found at destination
   * file length different
   * modtime
   * checksum
   
   Then after a job you can see why files were copied from the host where the job was launched. Then if you want to know why there were issues such as checksums and modtimes, you can log out to debug. Obviously, this will be something to add to the distcp documentation.
   
   Now: big warning. I am personally scared of distCp. It is a critical workflow tool and even use programmatically, yet it is surprisingly brittle. It is a running joke that's the last person two add any code to the module gets to field or support calls until someone else comes along. Thank you for volunteering! This also explains why we will be very reluctant/strict about taking on changes. Don't take it personally is as hey everyone gets that same grilling here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org