You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Wei-Chiu Chuang (Jira)" <ji...@apache.org> on 2022/09/15 00:19:00 UTC
[jira] [Commented] (HDDS-7062) distcp cause inconsistent block length between keyInfo and block file length
[ https://issues.apache.org/jira/browse/HDDS-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605014#comment-17605014 ]
Wei-Chiu Chuang commented on HDDS-7062:
---------------------------------------
I am using CDP 7.1.8, which is a snapshot at Ozone master branch around June. But it's not reproducible to me...
> distcp cause inconsistent block length between keyInfo and block file length
> ----------------------------------------------------------------------------
>
> Key: HDDS-7062
> URL: https://issues.apache.org/jira/browse/HDDS-7062
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Jie Yao
> Priority: Major
> Attachments: 截屏2022-07-19 20.36.50.png, 截屏2022-07-28 15.41.46.png
>
>
> recently , we use distcp to copy files from HDFS to Ozone and find that : if we copy a file directly, distcp will copy the file as a whole and generate full blocks. but if we put this file into a directory and use distcp to copy this directory, it will break the file into pieces of different size.
> for example, we have a file of 1G and if we copy a file directly it looks like:
> {color:#0747a6}{{color}
> {color:#0747a6} "volumeName" : "vol1",{color}
> {color:#0747a6} "bucketName" : "bkt1",{color}
> {color:#0747a6} "name" : "1G_39.dat",{color}
> {color:#0747a6} "dataSize" : 1048576000,{color}
> {color:#0747a6} "creationTime" : "2022-07-18T09:54:02.944Z",{color}
> {color:#0747a6} "modificationTime" : "2022-07-18T09:54:23.132Z",{color}
> {color:#0747a6} "replicationConfig" :{color}
> {color:#0747a6}{ "replicationFactor" : "THREE", "requiredNodes" : 3, "replicationType" : "RATIS" }{color}
> {color:#0747a6},{color}
> {color:#0747a6} "ozoneKeyLocations" : [{color}
> {color:#0747a6}{ "containerID" : 305, "localID" : 109611004723206132, "length" : 268435456, "offset" : 0, "keyOffset" : 0 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 304, "localID" : 109611004723206133, "length" : 268435456, "offset" : 0, "keyOffset" : 268435456 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 303, "localID" : 109611004723206134, "length" : 268435456, "offset" : 0, "keyOffset" : 536870912 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 304, "localID" : 109611004723206135, "length" : 243269632, "offset" : 0, "keyOffset" : 805306368 }{color}
> {color:#0747a6}],{color}
> {color:#0747a6} "metadata" : \{ }{color}
> {color:#0747a6}}
> if we put it in a directory and use `distcp` to copy the directory , it will look like:
> {
> {color:#0747a6} "volumeName" : "vol1",{color}
> {color:#0747a6} "bucketName" : "bkt1",{color}
> {color:#0747a6} "name" : "mockData/1G_39.dat",{color}
> {color:#0747a6} "dataSize" : 1048576000,{color}
> {color:#0747a6} "creationTime" : "2022-07-18T08:59:47.472Z",{color}
> {color:#0747a6} "modificationTime" : "2022-07-18T09:40:22.980Z",{color}
> {color:#0747a6} "replicationConfig" :{color}
> {color:#0747a6}{ "replicationFactor" : "THREE", "requiredNodes" : 3, "replicationType" : "RATIS" }{color}
> {color:#0747a6},{color}
> {color:#0747a6} "ozoneKeyLocations" : [{color}
> {color:#0747a6}{ "containerID" : 234, "localID" : 109611004723204874, "length" : 67108864, "offset" : 0, "keyOffset" : 0 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 240, "localID" : 109611004723204989, "length" : 67108864, "offset" : 0, "keyOffset" : 67108864 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 244, "localID" : 109611004723205092, "length" : 83886080, "offset" : 0, "keyOffset" : 134217728 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 250, "localID" : 109611004723205204, "length" : 50331648, "offset" : 0, "keyOffset" : 218103808 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 255, "localID" : 109611004723205295, "length" : 16777216, "offset" : 0, "keyOffset" : 268435456 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 261, "localID" : 109611004723205413, "length" : 83886080, "offset" : 0, "keyOffset" : 285212672 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 266, "localID" : 109611004723205503, "length" : 50331648, "offset" : 0, "keyOffset" : 369098752 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 272, "localID" : 109611004723205631, "length" : 100663296, "offset" : 0, "keyOffset" : 419430400 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 276, "localID" : 109611004723205749, "length" : 67108864, "offset" : 0, "keyOffset" : 520093696 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 279, "localID" : 109611004723205805, "length" : 33554432, "offset" : 0, "keyOffset" : 587202560 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 283, "localID" : 109611004723205854, "length" : 50331648, "offset" : 0, "keyOffset" : 620756992 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 288, "localID" : 109611004723205958, "length" : 16777216, "offset" : 0, "keyOffset" : 671088640 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 290, "localID" : 109611004723206013, "length" : 67108864, "offset" : 0, "keyOffset" : 687865856 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 294, "localID" : 109611004723206048, "length" : 100663296, "offset" : 0, "keyOffset" : 754974720 }{color}
> {color:#0747a6},{color}
> {color:#0747a6}{ "containerID" : 299, "localID" : 109611004723206091, "length" : 192937984, "offset" : 0, "keyOffset" : 855638016 }{color}
> {color:#0747a6}],{color}
> {color:#0747a6} "metadata" : \{ }{color}
> {color:#0747a6}}
> what is more, we also find that there will be some block with different length between the record in keyinfo metadata and the block file length in container. for example :
> the local ID 109611004723205204{color:#0747a6} {color}{color:#172b4d}above has a length of 50331648{color}. but when we check the block file in container , it shows it has a length of 61708864. !截屏2022-07-19 20.36.50.png!
> another problem is that when we compute the block composite-crc checksum , the full block length(67108864), but not the expected length of {color:#172b4d}50331648, is{color} read back and computed, which leads an erroneous result.
> i think we need to figure out why the two problem below happens:
> 1 why the block file length in container is different from the metadata recorded in key info.
> 2 why the erroneous file length will be read back.
> PS: the 1G file is generate by `dd if=/dev/zero `, so all the bit in this file is 0.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org