You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Zesheng Wu <wu...@gmail.com> on 2014/07/17 11:50:01 UTC

Replace a block with a new one

Hi guys,

I recently encounter a scenario which needs to replace an exist block with
a newly written block
The most straightforward way to finish may be like this:
Suppose the original file is A, and we write a new file B which is composed
by the new data blocks, then we merge A and B to C which is the file we
wanted
The obvious shortcoming of this method is wasting of network bandwidth

I'm wondering whether there is a way to replace the old block by the new
block directly.
Any thoughts?

-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Mmm, it seems that the facebook branch
https://github.com/facebook/hadoop-20/
<https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java>
has
implemented reed-solomon codes, what I was checking earlier were the
following two issues:
https://issues.apache.org/jira/browse/HDFS-503
https://issues.apache.org/jira/browse/HDFS-600

Thanks again Bertrand, I will check through the facebook branch to find
more information.


2014-07-22 9:31 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thank Bertrand, I've checked these information earlier. There's only XOR
> implementation, and missed blocks are reconstructed by creating new files.
>
>
>
> 2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> And there is actually quite a lot of information about it.
>>
>>
>> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>>
>> http://wiki.apache.org/hadoop/HDFS-RAID
>>
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
>> wrote:
>>
>>> I wrote my answer thinking about the XOR implementation. With
>>> reed-solomon and single replication, the cases that need to be considered
>>> are indeed smaller, simpler.
>>>
>>> It seems I was wrong about my last statement though. If the machine
>>> hosting a single-replicated block is lost, it isn't likely that the block
>>> can't be reconstructed from a summary of the data. But with a RAID6
>>> strategy / RS, its is indeed possible but of course up to a certain number
>>> of blocks.
>>>
>>> There clearly is a case for such tool on a Hadoop cluster.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>>> delete the block from the local filesystem (it's only a file) then HDFS
>>>> will replicate the good remaining replica of this block.
>>>>
>>>> We only have one replica for each block, if a block is corrupted, HDFS
>>>> cannot replicate it.
>>>>
>>>>
>>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>>
>>>> Thanks Bertrand, my reply comments inline following.
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>  *[Zesheng: We will implement a periodical checking mechanism to
>>>>> check the corrupted blocks.]*
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>>> back good data and coding blocks]*
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>>> can maintain equal or even higher node failures **tolerance. About
>>>>> open source, we will consider this in the future.**]*
>>>>>
>>>>>
>>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>>> replica of this block.
>>>>>>
>>>>>> For performance reason (and that's what you want to do?), you might
>>>>>> be able to fix the corruption without needing to retrieve the good replica.
>>>>>> It might be possible by working directly with the local system by replacing
>>>>>> the corrupted block by the corrected block (which again are files). On
>>>>>> issue is that the corrected block might be different than the good replica.
>>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>>> with two different good replicas for the same block and that will not be
>>>>>> pretty...
>>>>>>
>>>>>> If the result is to be open source, you might want to check with
>>>>>> Facebook about their implementation and track the process within Apache
>>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>>> that the less replicas there is, the less read of the data for processing
>>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>>> implemented as described in:
>>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>>
>>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>>> the easier solution?
>>>>>>>>
>>>>>>>> Bertrand Dechoux
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for reply, Arpit.
>>>>>>>>> Yes, we need to do this regularly. The original requirement of
>>>>>>>>> this is that we want to do RAID(which is based reed-solomon erasure codes)
>>>>>>>>> on our HDFS cluster. When a block is corrupted or missing, the downgrade
>>>>>>>>> read needs quick recovery of the block. We are considering how to recovery
>>>>>>>>> the corrupted/missing block quickly.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>
>>>>>>>>> :
>>>>>>>>>
>>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>>
>>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>>> bandwidth
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old
>>>>>>>>>>>>> block by the new block directly.
>>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > --
>>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>>
>>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Mmm, it seems that the facebook branch
https://github.com/facebook/hadoop-20/
<https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java>
has
implemented reed-solomon codes, what I was checking earlier were the
following two issues:
https://issues.apache.org/jira/browse/HDFS-503
https://issues.apache.org/jira/browse/HDFS-600

Thanks again Bertrand, I will check through the facebook branch to find
more information.


2014-07-22 9:31 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thank Bertrand, I've checked these information earlier. There's only XOR
> implementation, and missed blocks are reconstructed by creating new files.
>
>
>
> 2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> And there is actually quite a lot of information about it.
>>
>>
>> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>>
>> http://wiki.apache.org/hadoop/HDFS-RAID
>>
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
>> wrote:
>>
>>> I wrote my answer thinking about the XOR implementation. With
>>> reed-solomon and single replication, the cases that need to be considered
>>> are indeed smaller, simpler.
>>>
>>> It seems I was wrong about my last statement though. If the machine
>>> hosting a single-replicated block is lost, it isn't likely that the block
>>> can't be reconstructed from a summary of the data. But with a RAID6
>>> strategy / RS, its is indeed possible but of course up to a certain number
>>> of blocks.
>>>
>>> There clearly is a case for such tool on a Hadoop cluster.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>>> delete the block from the local filesystem (it's only a file) then HDFS
>>>> will replicate the good remaining replica of this block.
>>>>
>>>> We only have one replica for each block, if a block is corrupted, HDFS
>>>> cannot replicate it.
>>>>
>>>>
>>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>>
>>>> Thanks Bertrand, my reply comments inline following.
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>  *[Zesheng: We will implement a periodical checking mechanism to
>>>>> check the corrupted blocks.]*
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>>> back good data and coding blocks]*
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>>> can maintain equal or even higher node failures **tolerance. About
>>>>> open source, we will consider this in the future.**]*
>>>>>
>>>>>
>>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>>> replica of this block.
>>>>>>
>>>>>> For performance reason (and that's what you want to do?), you might
>>>>>> be able to fix the corruption without needing to retrieve the good replica.
>>>>>> It might be possible by working directly with the local system by replacing
>>>>>> the corrupted block by the corrected block (which again are files). On
>>>>>> issue is that the corrected block might be different than the good replica.
>>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>>> with two different good replicas for the same block and that will not be
>>>>>> pretty...
>>>>>>
>>>>>> If the result is to be open source, you might want to check with
>>>>>> Facebook about their implementation and track the process within Apache
>>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>>> that the less replicas there is, the less read of the data for processing
>>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>>> implemented as described in:
>>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>>
>>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>>> the easier solution?
>>>>>>>>
>>>>>>>> Bertrand Dechoux
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for reply, Arpit.
>>>>>>>>> Yes, we need to do this regularly. The original requirement of
>>>>>>>>> this is that we want to do RAID(which is based reed-solomon erasure codes)
>>>>>>>>> on our HDFS cluster. When a block is corrupted or missing, the downgrade
>>>>>>>>> read needs quick recovery of the block. We are considering how to recovery
>>>>>>>>> the corrupted/missing block quickly.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>
>>>>>>>>> :
>>>>>>>>>
>>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>>
>>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>>> bandwidth
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old
>>>>>>>>>>>>> block by the new block directly.
>>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > --
>>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>>
>>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Mmm, it seems that the facebook branch
https://github.com/facebook/hadoop-20/
<https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java>
has
implemented reed-solomon codes, what I was checking earlier were the
following two issues:
https://issues.apache.org/jira/browse/HDFS-503
https://issues.apache.org/jira/browse/HDFS-600

Thanks again Bertrand, I will check through the facebook branch to find
more information.


2014-07-22 9:31 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thank Bertrand, I've checked these information earlier. There's only XOR
> implementation, and missed blocks are reconstructed by creating new files.
>
>
>
> 2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> And there is actually quite a lot of information about it.
>>
>>
>> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>>
>> http://wiki.apache.org/hadoop/HDFS-RAID
>>
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
>> wrote:
>>
>>> I wrote my answer thinking about the XOR implementation. With
>>> reed-solomon and single replication, the cases that need to be considered
>>> are indeed smaller, simpler.
>>>
>>> It seems I was wrong about my last statement though. If the machine
>>> hosting a single-replicated block is lost, it isn't likely that the block
>>> can't be reconstructed from a summary of the data. But with a RAID6
>>> strategy / RS, its is indeed possible but of course up to a certain number
>>> of blocks.
>>>
>>> There clearly is a case for such tool on a Hadoop cluster.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>>> delete the block from the local filesystem (it's only a file) then HDFS
>>>> will replicate the good remaining replica of this block.
>>>>
>>>> We only have one replica for each block, if a block is corrupted, HDFS
>>>> cannot replicate it.
>>>>
>>>>
>>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>>
>>>> Thanks Bertrand, my reply comments inline following.
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>  *[Zesheng: We will implement a periodical checking mechanism to
>>>>> check the corrupted blocks.]*
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>>> back good data and coding blocks]*
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>>> can maintain equal or even higher node failures **tolerance. About
>>>>> open source, we will consider this in the future.**]*
>>>>>
>>>>>
>>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>>> replica of this block.
>>>>>>
>>>>>> For performance reason (and that's what you want to do?), you might
>>>>>> be able to fix the corruption without needing to retrieve the good replica.
>>>>>> It might be possible by working directly with the local system by replacing
>>>>>> the corrupted block by the corrected block (which again are files). On
>>>>>> issue is that the corrected block might be different than the good replica.
>>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>>> with two different good replicas for the same block and that will not be
>>>>>> pretty...
>>>>>>
>>>>>> If the result is to be open source, you might want to check with
>>>>>> Facebook about their implementation and track the process within Apache
>>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>>> that the less replicas there is, the less read of the data for processing
>>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>>> implemented as described in:
>>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>>
>>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>>> the easier solution?
>>>>>>>>
>>>>>>>> Bertrand Dechoux
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for reply, Arpit.
>>>>>>>>> Yes, we need to do this regularly. The original requirement of
>>>>>>>>> this is that we want to do RAID(which is based reed-solomon erasure codes)
>>>>>>>>> on our HDFS cluster. When a block is corrupted or missing, the downgrade
>>>>>>>>> read needs quick recovery of the block. We are considering how to recovery
>>>>>>>>> the corrupted/missing block quickly.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>
>>>>>>>>> :
>>>>>>>>>
>>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>>
>>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>>> bandwidth
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old
>>>>>>>>>>>>> block by the new block directly.
>>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > --
>>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>>
>>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Mmm, it seems that the facebook branch
https://github.com/facebook/hadoop-20/
<https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java>
has
implemented reed-solomon codes, what I was checking earlier were the
following two issues:
https://issues.apache.org/jira/browse/HDFS-503
https://issues.apache.org/jira/browse/HDFS-600

Thanks again Bertrand, I will check through the facebook branch to find
more information.


2014-07-22 9:31 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thank Bertrand, I've checked these information earlier. There's only XOR
> implementation, and missed blocks are reconstructed by creating new files.
>
>
>
> 2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> And there is actually quite a lot of information about it.
>>
>>
>> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>>
>> http://wiki.apache.org/hadoop/HDFS-RAID
>>
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
>> wrote:
>>
>>> I wrote my answer thinking about the XOR implementation. With
>>> reed-solomon and single replication, the cases that need to be considered
>>> are indeed smaller, simpler.
>>>
>>> It seems I was wrong about my last statement though. If the machine
>>> hosting a single-replicated block is lost, it isn't likely that the block
>>> can't be reconstructed from a summary of the data. But with a RAID6
>>> strategy / RS, its is indeed possible but of course up to a certain number
>>> of blocks.
>>>
>>> There clearly is a case for such tool on a Hadoop cluster.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>>> delete the block from the local filesystem (it's only a file) then HDFS
>>>> will replicate the good remaining replica of this block.
>>>>
>>>> We only have one replica for each block, if a block is corrupted, HDFS
>>>> cannot replicate it.
>>>>
>>>>
>>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>>
>>>> Thanks Bertrand, my reply comments inline following.
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>  *[Zesheng: We will implement a periodical checking mechanism to
>>>>> check the corrupted blocks.]*
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>>> back good data and coding blocks]*
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>>> can maintain equal or even higher node failures **tolerance. About
>>>>> open source, we will consider this in the future.**]*
>>>>>
>>>>>
>>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> So you know that a block is corrupted thanks to an external process
>>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>>> replica of this block.
>>>>>>
>>>>>> For performance reason (and that's what you want to do?), you might
>>>>>> be able to fix the corruption without needing to retrieve the good replica.
>>>>>> It might be possible by working directly with the local system by replacing
>>>>>> the corrupted block by the corrected block (which again are files). On
>>>>>> issue is that the corrected block might be different than the good replica.
>>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>>> with two different good replicas for the same block and that will not be
>>>>>> pretty...
>>>>>>
>>>>>> If the result is to be open source, you might want to check with
>>>>>> Facebook about their implementation and track the process within Apache
>>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>>> that the less replicas there is, the less read of the data for processing
>>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>>> implemented as described in:
>>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>>
>>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>>> the easier solution?
>>>>>>>>
>>>>>>>> Bertrand Dechoux
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for reply, Arpit.
>>>>>>>>> Yes, we need to do this regularly. The original requirement of
>>>>>>>>> this is that we want to do RAID(which is based reed-solomon erasure codes)
>>>>>>>>> on our HDFS cluster. When a block is corrupted or missing, the downgrade
>>>>>>>>> read needs quick recovery of the block. We are considering how to recovery
>>>>>>>>> the corrupted/missing block quickly.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>
>>>>>>>>> :
>>>>>>>>>
>>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>>
>>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>>> bandwidth
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old
>>>>>>>>>>>>> block by the new block directly.
>>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > --
>>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>>
>>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thank Bertrand, I've checked these information earlier. There's only XOR
implementation, and missed blocks are reconstructed by creating new files.



2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> And there is actually quite a lot of information about it.
>
>
> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>
> http://wiki.apache.org/hadoop/HDFS-RAID
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
>
>> I wrote my answer thinking about the XOR implementation. With
>> reed-solomon and single replication, the cases that need to be considered
>> are indeed smaller, simpler.
>>
>> It seems I was wrong about my last statement though. If the machine
>> hosting a single-replicated block is lost, it isn't likely that the block
>> can't be reconstructed from a summary of the data. But with a RAID6
>> strategy / RS, its is indeed possible but of course up to a certain number
>> of blocks.
>>
>> There clearly is a case for such tool on a Hadoop cluster.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>> delete the block from the local filesystem (it's only a file) then HDFS
>>> will replicate the good remaining replica of this block.
>>>
>>> We only have one replica for each block, if a block is corrupted, HDFS
>>> cannot replicate it.
>>>
>>>
>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>
>>> Thanks Bertrand, my reply comments inline following.
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>>> the corrupted blocks.]*
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>> back good data and coding blocks]*
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>> can maintain equal or even higher node failures **tolerance. About
>>>> open source, we will consider this in the future.**]*
>>>>
>>>>
>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>> implemented as described in:
>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>
>>>>>>
>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>
>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>> the easier solution?
>>>>>>>
>>>>>>> Bertrand Dechoux
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for reply, Arpit.
>>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>>> corrupted/missing block quickly.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>>
>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>
>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>> bandwidth
>>>>>>>>>>>> >
>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>>> by the new block directly.
>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>> >
>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>
>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thank Bertrand, I've checked these information earlier. There's only XOR
implementation, and missed blocks are reconstructed by creating new files.



2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> And there is actually quite a lot of information about it.
>
>
> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>
> http://wiki.apache.org/hadoop/HDFS-RAID
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
>
>> I wrote my answer thinking about the XOR implementation. With
>> reed-solomon and single replication, the cases that need to be considered
>> are indeed smaller, simpler.
>>
>> It seems I was wrong about my last statement though. If the machine
>> hosting a single-replicated block is lost, it isn't likely that the block
>> can't be reconstructed from a summary of the data. But with a RAID6
>> strategy / RS, its is indeed possible but of course up to a certain number
>> of blocks.
>>
>> There clearly is a case for such tool on a Hadoop cluster.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>> delete the block from the local filesystem (it's only a file) then HDFS
>>> will replicate the good remaining replica of this block.
>>>
>>> We only have one replica for each block, if a block is corrupted, HDFS
>>> cannot replicate it.
>>>
>>>
>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>
>>> Thanks Bertrand, my reply comments inline following.
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>>> the corrupted blocks.]*
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>> back good data and coding blocks]*
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>> can maintain equal or even higher node failures **tolerance. About
>>>> open source, we will consider this in the future.**]*
>>>>
>>>>
>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>> implemented as described in:
>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>
>>>>>>
>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>
>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>> the easier solution?
>>>>>>>
>>>>>>> Bertrand Dechoux
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for reply, Arpit.
>>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>>> corrupted/missing block quickly.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>>
>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>
>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>> bandwidth
>>>>>>>>>>>> >
>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>>> by the new block directly.
>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>> >
>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>
>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thank Bertrand, I've checked these information earlier. There's only XOR
implementation, and missed blocks are reconstructed by creating new files.



2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> And there is actually quite a lot of information about it.
>
>
> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>
> http://wiki.apache.org/hadoop/HDFS-RAID
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
>
>> I wrote my answer thinking about the XOR implementation. With
>> reed-solomon and single replication, the cases that need to be considered
>> are indeed smaller, simpler.
>>
>> It seems I was wrong about my last statement though. If the machine
>> hosting a single-replicated block is lost, it isn't likely that the block
>> can't be reconstructed from a summary of the data. But with a RAID6
>> strategy / RS, its is indeed possible but of course up to a certain number
>> of blocks.
>>
>> There clearly is a case for such tool on a Hadoop cluster.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>> delete the block from the local filesystem (it's only a file) then HDFS
>>> will replicate the good remaining replica of this block.
>>>
>>> We only have one replica for each block, if a block is corrupted, HDFS
>>> cannot replicate it.
>>>
>>>
>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>
>>> Thanks Bertrand, my reply comments inline following.
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>>> the corrupted blocks.]*
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>> back good data and coding blocks]*
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>> can maintain equal or even higher node failures **tolerance. About
>>>> open source, we will consider this in the future.**]*
>>>>
>>>>
>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>> implemented as described in:
>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>
>>>>>>
>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>
>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>> the easier solution?
>>>>>>>
>>>>>>> Bertrand Dechoux
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for reply, Arpit.
>>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>>> corrupted/missing block quickly.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>>
>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>
>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>> bandwidth
>>>>>>>>>>>> >
>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>>> by the new block directly.
>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>> >
>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>
>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thank Bertrand, I've checked these information earlier. There's only XOR
implementation, and missed blocks are reconstructed by creating new files.



2014-07-22 3:47 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> And there is actually quite a lot of information about it.
>
>
> https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
>
> http://wiki.apache.org/hadoop/HDFS-RAID
>
>
> https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
>
>> I wrote my answer thinking about the XOR implementation. With
>> reed-solomon and single replication, the cases that need to be considered
>> are indeed smaller, simpler.
>>
>> It seems I was wrong about my last statement though. If the machine
>> hosting a single-replicated block is lost, it isn't likely that the block
>> can't be reconstructed from a summary of the data. But with a RAID6
>> strategy / RS, its is indeed possible but of course up to a certain number
>> of blocks.
>>
>> There clearly is a case for such tool on a Hadoop cluster.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>>  If a block is corrupted but hasn't been detected by HDFS, you could
>>> delete the block from the local filesystem (it's only a file) then HDFS
>>> will replicate the good remaining replica of this block.
>>>
>>> We only have one replica for each block, if a block is corrupted, HDFS
>>> cannot replicate it.
>>>
>>>
>>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>>
>>> Thanks Bertrand, my reply comments inline following.
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>>> the corrupted blocks.]*
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>>> back good data and coding blocks]*
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>>> but not with the number of supported node failures, reed-solomon algorithm
>>>> can maintain equal or even higher node failures **tolerance. About
>>>> open source, we will consider this in the future.**]*
>>>>
>>>>
>>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> So you know that a block is corrupted thanks to an external process
>>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>>> replica of this block.
>>>>>
>>>>> For performance reason (and that's what you want to do?), you might be
>>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>>> might be possible by working directly with the local system by replacing
>>>>> the corrupted block by the corrected block (which again are files). On
>>>>> issue is that the corrected block might be different than the good replica.
>>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>>> with two different good replicas for the same block and that will not be
>>>>> pretty...
>>>>>
>>>>> If the result is to be open source, you might want to check with
>>>>> Facebook about their implementation and track the process within Apache
>>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>>> that the less replicas there is, the less read of the data for processing
>>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>>> implemented as described in:
>>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>>
>>>>>>
>>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>>
>>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>>> the easier solution?
>>>>>>>
>>>>>>> Bertrand Dechoux
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for reply, Arpit.
>>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>>> corrupted/missing block quickly.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>>
>>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>>
>>>>>>>>> If you need to do this regularly you should consider a mutable
>>>>>>>>> file store like HBase. If you start modifying blocks from under HDFS you
>>>>>>>>> open up all sorts of consistency issues.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That will break the consistency of the file system, but it
>>>>>>>>>> doesn't hurt to try.
>>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> there's no way to do that, as HDFS does not provide file
>>>>>>>>>>>> updates features. You'll need to write a new file with the changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Wellington.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Hi guys,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>>> is the file we wanted
>>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>>> bandwidth
>>>>>>>>>>>> >
>>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>>> by the new block directly.
>>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>>> >
>>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Wishes!
>>>>>>>>>>>
>>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

And there is actually quite a lot of information about it.

https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java

http://wiki.apache.org/hadoop/HDFS-RAID

https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel

Bertrand Dechoux


On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
wrote:

> I wrote my answer thinking about the XOR implementation. With reed-solomon
> and single replication, the cases that need to be considered are indeed
> smaller, simpler.
>
> It seems I was wrong about my last statement though. If the machine
> hosting a single-replicated block is lost, it isn't likely that the block
> can't be reconstructed from a summary of the data. But with a RAID6
> strategy / RS, its is indeed possible but of course up to a certain number
> of blocks.
>
> There clearly is a case for such tool on a Hadoop cluster.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>>  If a block is corrupted but hasn't been detected by HDFS, you could
>> delete the block from the local filesystem (it's only a file) then HDFS
>> will replicate the good remaining replica of this block.
>>
>> We only have one replica for each block, if a block is corrupted, HDFS
>> cannot replicate it.
>>
>>
>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>
>> Thanks Bertrand, my reply comments inline following.
>>>
>>> So you know that a block is corrupted thanks to an external process
>>> which in this case is checking the parity blocks. If a block is corrupted
>>> but hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>> the corrupted blocks.]*
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>> back good data and coding blocks]*
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>> but not with the number of supported node failures, reed-solomon algorithm
>>> can maintain equal or even higher node failures **tolerance. About open
>>> source, we will consider this in the future.**]*
>>>
>>>
>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>> implemented as described in:
>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>
>>>>>
>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>> the easier solution?
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for reply, Arpit.
>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>> corrupted/missing block quickly.
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>
>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>
>>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>>> all sorts of consistency issues.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>>> hurt to try.
>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>>
>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Wellington.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Hi guys,
>>>>>>>>>>> >
>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>> is the file we wanted
>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>> bandwidth
>>>>>>>>>>> >
>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>> by the new block directly.
>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>> >
>>>>>>>>>>> > --
>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>> >
>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Wishes!
>>>>>>>>>>
>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

And there is actually quite a lot of information about it.

https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java

http://wiki.apache.org/hadoop/HDFS-RAID

https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel

Bertrand Dechoux


On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
wrote:

> I wrote my answer thinking about the XOR implementation. With reed-solomon
> and single replication, the cases that need to be considered are indeed
> smaller, simpler.
>
> It seems I was wrong about my last statement though. If the machine
> hosting a single-replicated block is lost, it isn't likely that the block
> can't be reconstructed from a summary of the data. But with a RAID6
> strategy / RS, its is indeed possible but of course up to a certain number
> of blocks.
>
> There clearly is a case for such tool on a Hadoop cluster.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>>  If a block is corrupted but hasn't been detected by HDFS, you could
>> delete the block from the local filesystem (it's only a file) then HDFS
>> will replicate the good remaining replica of this block.
>>
>> We only have one replica for each block, if a block is corrupted, HDFS
>> cannot replicate it.
>>
>>
>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>
>> Thanks Bertrand, my reply comments inline following.
>>>
>>> So you know that a block is corrupted thanks to an external process
>>> which in this case is checking the parity blocks. If a block is corrupted
>>> but hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>> the corrupted blocks.]*
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>> back good data and coding blocks]*
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>> but not with the number of supported node failures, reed-solomon algorithm
>>> can maintain equal or even higher node failures **tolerance. About open
>>> source, we will consider this in the future.**]*
>>>
>>>
>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>> implemented as described in:
>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>
>>>>>
>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>> the easier solution?
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for reply, Arpit.
>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>> corrupted/missing block quickly.
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>
>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>
>>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>>> all sorts of consistency issues.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>>> hurt to try.
>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>>
>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Wellington.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Hi guys,
>>>>>>>>>>> >
>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>> is the file we wanted
>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>> bandwidth
>>>>>>>>>>> >
>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>> by the new block directly.
>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>> >
>>>>>>>>>>> > --
>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>> >
>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Wishes!
>>>>>>>>>>
>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

And there is actually quite a lot of information about it.

https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java

http://wiki.apache.org/hadoop/HDFS-RAID

https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel

Bertrand Dechoux


On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
wrote:

> I wrote my answer thinking about the XOR implementation. With reed-solomon
> and single replication, the cases that need to be considered are indeed
> smaller, simpler.
>
> It seems I was wrong about my last statement though. If the machine
> hosting a single-replicated block is lost, it isn't likely that the block
> can't be reconstructed from a summary of the data. But with a RAID6
> strategy / RS, its is indeed possible but of course up to a certain number
> of blocks.
>
> There clearly is a case for such tool on a Hadoop cluster.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>>  If a block is corrupted but hasn't been detected by HDFS, you could
>> delete the block from the local filesystem (it's only a file) then HDFS
>> will replicate the good remaining replica of this block.
>>
>> We only have one replica for each block, if a block is corrupted, HDFS
>> cannot replicate it.
>>
>>
>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>
>> Thanks Bertrand, my reply comments inline following.
>>>
>>> So you know that a block is corrupted thanks to an external process
>>> which in this case is checking the parity blocks. If a block is corrupted
>>> but hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>> the corrupted blocks.]*
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>> back good data and coding blocks]*
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>> but not with the number of supported node failures, reed-solomon algorithm
>>> can maintain equal or even higher node failures **tolerance. About open
>>> source, we will consider this in the future.**]*
>>>
>>>
>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>> implemented as described in:
>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>
>>>>>
>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>> the easier solution?
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for reply, Arpit.
>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>> corrupted/missing block quickly.
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>
>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>
>>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>>> all sorts of consistency issues.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>>> hurt to try.
>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>>
>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Wellington.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Hi guys,
>>>>>>>>>>> >
>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>> is the file we wanted
>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>> bandwidth
>>>>>>>>>>> >
>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>> by the new block directly.
>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>> >
>>>>>>>>>>> > --
>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>> >
>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Wishes!
>>>>>>>>>>
>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

And there is actually quite a lot of information about it.

https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java

http://wiki.apache.org/hadoop/HDFS-RAID

https://issues.apache.org/jira/browse/MAPREDUCE/component/12313416/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel

Bertrand Dechoux


On Mon, Jul 21, 2014 at 3:46 PM, Bertrand Dechoux <de...@gmail.com>
wrote:

> I wrote my answer thinking about the XOR implementation. With reed-solomon
> and single replication, the cases that need to be considered are indeed
> smaller, simpler.
>
> It seems I was wrong about my last statement though. If the machine
> hosting a single-replicated block is lost, it isn't likely that the block
> can't be reconstructed from a summary of the data. But with a RAID6
> strategy / RS, its is indeed possible but of course up to a certain number
> of blocks.
>
> There clearly is a case for such tool on a Hadoop cluster.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>>  If a block is corrupted but hasn't been detected by HDFS, you could
>> delete the block from the local filesystem (it's only a file) then HDFS
>> will replicate the good remaining replica of this block.
>>
>> We only have one replica for each block, if a block is corrupted, HDFS
>> cannot replicate it.
>>
>>
>> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>>
>> Thanks Bertrand, my reply comments inline following.
>>>
>>> So you know that a block is corrupted thanks to an external process
>>> which in this case is checking the parity blocks. If a block is corrupted
>>> but hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>  *[Zesheng: We will implement a periodical checking mechanism to check
>>> the corrupted blocks.]*
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to
>>> implement the HDFS raid, so the corrupted block will be recovered from the
>>> back good data and coding blocks]*
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>> *[Zesheng: Yes, I agree with you that the read performance downgrade,
>>> but not with the number of supported node failures, reed-solomon algorithm
>>> can maintain equal or even higher node failures **tolerance. About open
>>> source, we will consider this in the future.**]*
>>>
>>>
>>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> So you know that a block is corrupted thanks to an external process
>>>> which in this case is checking the parity blocks. If a block is corrupted
>>>> but hasn't been detected by HDFS, you could delete the block from the local
>>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>>> replica of this block.
>>>>
>>>> For performance reason (and that's what you want to do?), you might be
>>>> able to fix the corruption without needing to retrieve the good replica. It
>>>> might be possible by working directly with the local system by replacing
>>>> the corrupted block by the corrected block (which again are files). On
>>>> issue is that the corrected block might be different than the good replica.
>>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>>> with two different good replicas for the same block and that will not be
>>>> pretty...
>>>>
>>>> If the result is to be open source, you might want to check with
>>>> Facebook about their implementation and track the process within Apache
>>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>>> that the less replicas there is, the less read of the data for processing
>>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>>> implemented as described in:
>>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>>
>>>>>
>>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>>
>>>>> You want to implement a RAID on top of HDFS or use HDFS on top of
>>>>>> RAID? I am not sure I understand any of these use cases. HDFS handles for
>>>>>> you replication and error detection. Fine tuning the cluster wouldn't be
>>>>>> the easier solution?
>>>>>>
>>>>>> Bertrand Dechoux
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for reply, Arpit.
>>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>>> corrupted/missing block quickly.
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>>
>>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>>
>>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>>> all sorts of consistency issues.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>>> hurt to try.
>>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>>
>>>>>>>>>>> Notice that even if you manage to find the physical block
>>>>>>>>>>> replica files on the disk, corresponding to the part of the file you want
>>>>>>>>>>> to change, you can't simply update it manually, as this would give a
>>>>>>>>>>> different checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Wellington.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Hi guys,
>>>>>>>>>>> >
>>>>>>>>>>> > I recently encounter a scenario which needs to replace an
>>>>>>>>>>> exist block with a newly written block
>>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>>> > Suppose the original file is A, and we write a new file B
>>>>>>>>>>> which is composed by the new data blocks, then we merge A and B to C which
>>>>>>>>>>> is the file we wanted
>>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>>> bandwidth
>>>>>>>>>>> >
>>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>>> by the new block directly.
>>>>>>>>>>> > Any thoughts?
>>>>>>>>>>> >
>>>>>>>>>>> > --
>>>>>>>>>>> > Best Wishes!
>>>>>>>>>>> >
>>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Wishes!
>>>>>>>>>>
>>>>>>>>>> Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>>> you have received this communication in error, please contact the sender
>>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

I wrote my answer thinking about the XOR implementation. With reed-solomon
and single replication, the cases that need to be considered are indeed
smaller, simpler.

It seems I was wrong about my last statement though. If the machine hosting
a single-replicated block is lost, it isn't likely that the block can't be
reconstructed from a summary of the data. But with a RAID6 strategy / RS,
its is indeed possible but of course up to a certain number of blocks.

There clearly is a case for such tool on a Hadoop cluster.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:

>  If a block is corrupted but hasn't been detected by HDFS, you could
> delete the block from the local filesystem (it's only a file) then HDFS
> will replicate the good remaining replica of this block.
>
> We only have one replica for each block, if a block is corrupted, HDFS
> cannot replicate it.
>
>
> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>
> Thanks Bertrand, my reply comments inline following.
>>
>> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>  *[Zesheng: We will implement a periodical checking mechanism to check
>> the corrupted blocks.]*
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
>> the HDFS raid, so the corrupted block will be recovered from the back good
>> data and coding blocks]*
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
>> not with the number of supported node failures, reed-solomon algorithm can
>> maintain equal or even higher node failures **tolerance. About open
>> source, we will consider this in the future.**]*
>>
>>
>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> So you know that a block is corrupted thanks to an external process which
>>> in this case is checking the parity blocks. If a block is corrupted but
>>> hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>> implemented as described in:
>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>
>>>>
>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>>> easier solution?
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for reply, Arpit.
>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>> corrupted/missing block quickly.
>>>>>>
>>>>>>
>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>
>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>
>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>> all sorts of consistency issues.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>> hurt to try.
>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>
>>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Wellington.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> > Hi guys,
>>>>>>>>>> >
>>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>>> block with a newly written block
>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>>> file we wanted
>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>> bandwidth
>>>>>>>>>> >
>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>> by the new block directly.
>>>>>>>>>> > Any thoughts?
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Best Wishes!
>>>>>>>>>> >
>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

I wrote my answer thinking about the XOR implementation. With reed-solomon
and single replication, the cases that need to be considered are indeed
smaller, simpler.

It seems I was wrong about my last statement though. If the machine hosting
a single-replicated block is lost, it isn't likely that the block can't be
reconstructed from a summary of the data. But with a RAID6 strategy / RS,
its is indeed possible but of course up to a certain number of blocks.

There clearly is a case for such tool on a Hadoop cluster.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:

>  If a block is corrupted but hasn't been detected by HDFS, you could
> delete the block from the local filesystem (it's only a file) then HDFS
> will replicate the good remaining replica of this block.
>
> We only have one replica for each block, if a block is corrupted, HDFS
> cannot replicate it.
>
>
> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>
> Thanks Bertrand, my reply comments inline following.
>>
>> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>  *[Zesheng: We will implement a periodical checking mechanism to check
>> the corrupted blocks.]*
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
>> the HDFS raid, so the corrupted block will be recovered from the back good
>> data and coding blocks]*
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
>> not with the number of supported node failures, reed-solomon algorithm can
>> maintain equal or even higher node failures **tolerance. About open
>> source, we will consider this in the future.**]*
>>
>>
>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> So you know that a block is corrupted thanks to an external process which
>>> in this case is checking the parity blocks. If a block is corrupted but
>>> hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>> implemented as described in:
>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>
>>>>
>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>>> easier solution?
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for reply, Arpit.
>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>> corrupted/missing block quickly.
>>>>>>
>>>>>>
>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>
>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>
>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>> all sorts of consistency issues.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>> hurt to try.
>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>
>>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Wellington.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> > Hi guys,
>>>>>>>>>> >
>>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>>> block with a newly written block
>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>>> file we wanted
>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>> bandwidth
>>>>>>>>>> >
>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>> by the new block directly.
>>>>>>>>>> > Any thoughts?
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Best Wishes!
>>>>>>>>>> >
>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

I wrote my answer thinking about the XOR implementation. With reed-solomon
and single replication, the cases that need to be considered are indeed
smaller, simpler.

It seems I was wrong about my last statement though. If the machine hosting
a single-replicated block is lost, it isn't likely that the block can't be
reconstructed from a summary of the data. But with a RAID6 strategy / RS,
its is indeed possible but of course up to a certain number of blocks.

There clearly is a case for such tool on a Hadoop cluster.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:

>  If a block is corrupted but hasn't been detected by HDFS, you could
> delete the block from the local filesystem (it's only a file) then HDFS
> will replicate the good remaining replica of this block.
>
> We only have one replica for each block, if a block is corrupted, HDFS
> cannot replicate it.
>
>
> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>
> Thanks Bertrand, my reply comments inline following.
>>
>> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>  *[Zesheng: We will implement a periodical checking mechanism to check
>> the corrupted blocks.]*
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
>> the HDFS raid, so the corrupted block will be recovered from the back good
>> data and coding blocks]*
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
>> not with the number of supported node failures, reed-solomon algorithm can
>> maintain equal or even higher node failures **tolerance. About open
>> source, we will consider this in the future.**]*
>>
>>
>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> So you know that a block is corrupted thanks to an external process which
>>> in this case is checking the parity blocks. If a block is corrupted but
>>> hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>> implemented as described in:
>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>
>>>>
>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>>> easier solution?
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for reply, Arpit.
>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>> corrupted/missing block quickly.
>>>>>>
>>>>>>
>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>
>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>
>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>> all sorts of consistency issues.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>> hurt to try.
>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>
>>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Wellington.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> > Hi guys,
>>>>>>>>>> >
>>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>>> block with a newly written block
>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>>> file we wanted
>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>> bandwidth
>>>>>>>>>> >
>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>> by the new block directly.
>>>>>>>>>> > Any thoughts?
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Best Wishes!
>>>>>>>>>> >
>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

I wrote my answer thinking about the XOR implementation. With reed-solomon
and single replication, the cases that need to be considered are indeed
smaller, simpler.

It seems I was wrong about my last statement though. If the machine hosting
a single-replicated block is lost, it isn't likely that the block can't be
reconstructed from a summary of the data. But with a RAID6 strategy / RS,
its is indeed possible but of course up to a certain number of blocks.

There clearly is a case for such tool on a Hadoop cluster.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 2:35 PM, Zesheng Wu <wu...@gmail.com> wrote:

>  If a block is corrupted but hasn't been detected by HDFS, you could
> delete the block from the local filesystem (it's only a file) then HDFS
> will replicate the good remaining replica of this block.
>
> We only have one replica for each block, if a block is corrupted, HDFS
> cannot replicate it.
>
>
> 2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:
>
> Thanks Bertrand, my reply comments inline following.
>>
>> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>  *[Zesheng: We will implement a periodical checking mechanism to check
>> the corrupted blocks.]*
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
>> the HDFS raid, so the corrupted block will be recovered from the back good
>> data and coding blocks]*
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
>> not with the number of supported node failures, reed-solomon algorithm can
>> maintain equal or even higher node failures **tolerance. About open
>> source, we will consider this in the future.**]*
>>
>>
>> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> So you know that a block is corrupted thanks to an external process which
>>> in this case is checking the parity blocks. If a block is corrupted but
>>> hasn't been detected by HDFS, you could delete the block from the local
>>> filesystem (it's only a file) then HDFS will replicate the good remaining
>>> replica of this block.
>>>
>>> For performance reason (and that's what you want to do?), you might be
>>> able to fix the corruption without needing to retrieve the good replica. It
>>> might be possible by working directly with the local system by replacing
>>> the corrupted block by the corrected block (which again are files). On
>>> issue is that the corrected block might be different than the good replica.
>>> If HDFS is able to tell (with CRC) it might be good else you will end up
>>> with two different good replicas for the same block and that will not be
>>> pretty...
>>>
>>> If the result is to be open source, you might want to check with
>>> Facebook about their implementation and track the process within Apache
>>> JIRA. You could gain additional feedbacks. One downside of HDFS RAID is
>>> that the less replicas there is, the less read of the data for processing
>>> will be 'efficient/fast'. Reducing the number of replicas also diminishes
>>> the number of supported node failures. I wouldn't say it's an easy ride.
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> We want to implement a RAID on top of HDFS, something like facebook
>>>> implemented as described in:
>>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>>
>>>>
>>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>>
>>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>>> easier solution?
>>>>>
>>>>> Bertrand Dechoux
>>>>>
>>>>>
>>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for reply, Arpit.
>>>>>> Yes, we need to do this regularly. The original requirement of this
>>>>>> is that we want to do RAID(which is based reed-solomon erasure codes) on
>>>>>> our HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>>> corrupted/missing block quickly.
>>>>>>
>>>>>>
>>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>>
>>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why
>>>>>>> not just take the perf hit and recreate the file?
>>>>>>>
>>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>>> all sorts of consistency issues.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>>> hurt to try.
>>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> How about write a new block with new checksum file, and replace
>>>>>>>>> the old block file and checksum file both?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>>
>>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Wellington.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> > Hi guys,
>>>>>>>>>> >
>>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>>> block with a newly written block
>>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>>> file we wanted
>>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>>> bandwidth
>>>>>>>>>> >
>>>>>>>>>> > I'm wondering whether there is a way to replace the old block
>>>>>>>>>> by the new block directly.
>>>>>>>>>> > Any thoughts?
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Best Wishes!
>>>>>>>>>> >
>>>>>>>>>> > Yours, Zesheng
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Wishes!
>>>>>>>>>
>>>>>>>>> Yours, Zesheng
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>> entity to which it is addressed and may contain information that is
>>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>>> you have received this communication in error, please contact the sender
>>>>>>> immediately and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

 If a block is corrupted but hasn't been detected by HDFS, you could delete
the block from the local filesystem (it's only a file) then HDFS will
replicate the good remaining replica of this block.

We only have one replica for each block, if a block is corrupted, HDFS
cannot replicate it.


2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thanks Bertrand, my reply comments inline following.
>
> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>  *[Zesheng: We will implement a periodical checking mechanism to check
> the corrupted blocks.]*
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
> the HDFS raid, so the corrupted block will be recovered from the back good
> data and coding blocks]*
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
> not with the number of supported node failures, reed-solomon algorithm can
> maintain equal or even higher node failures **tolerance. About open
> source, we will consider this in the future.**]*
>
>
> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> We want to implement a RAID on top of HDFS, something like facebook
>>> implemented as described in:
>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>
>>>
>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>> easier solution?
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for reply, Arpit.
>>>>> Yes, we need to do this regularly. The original requirement of this is
>>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>> corrupted/missing block quickly.
>>>>>
>>>>>
>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>
>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>>> just take the perf hit and recreate the file?
>>>>>>
>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>> all sorts of consistency issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>> hurt to try.
>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>>> old block file and checksum file both?
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>
>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Wellington.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Hi guys,
>>>>>>>>> >
>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>> block with a newly written block
>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>> file we wanted
>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>> bandwidth
>>>>>>>>> >
>>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>>> the new block directly.
>>>>>>>>> > Any thoughts?
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Best Wishes!
>>>>>>>>> >
>>>>>>>>> > Yours, Zesheng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

 If a block is corrupted but hasn't been detected by HDFS, you could delete
the block from the local filesystem (it's only a file) then HDFS will
replicate the good remaining replica of this block.

We only have one replica for each block, if a block is corrupted, HDFS
cannot replicate it.


2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thanks Bertrand, my reply comments inline following.
>
> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>  *[Zesheng: We will implement a periodical checking mechanism to check
> the corrupted blocks.]*
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
> the HDFS raid, so the corrupted block will be recovered from the back good
> data and coding blocks]*
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
> not with the number of supported node failures, reed-solomon algorithm can
> maintain equal or even higher node failures **tolerance. About open
> source, we will consider this in the future.**]*
>
>
> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> We want to implement a RAID on top of HDFS, something like facebook
>>> implemented as described in:
>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>
>>>
>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>> easier solution?
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for reply, Arpit.
>>>>> Yes, we need to do this regularly. The original requirement of this is
>>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>> corrupted/missing block quickly.
>>>>>
>>>>>
>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>
>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>>> just take the perf hit and recreate the file?
>>>>>>
>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>> all sorts of consistency issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>> hurt to try.
>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>>> old block file and checksum file both?
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>
>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Wellington.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Hi guys,
>>>>>>>>> >
>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>> block with a newly written block
>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>> file we wanted
>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>> bandwidth
>>>>>>>>> >
>>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>>> the new block directly.
>>>>>>>>> > Any thoughts?
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Best Wishes!
>>>>>>>>> >
>>>>>>>>> > Yours, Zesheng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

 If a block is corrupted but hasn't been detected by HDFS, you could delete
the block from the local filesystem (it's only a file) then HDFS will
replicate the good remaining replica of this block.

We only have one replica for each block, if a block is corrupted, HDFS
cannot replicate it.


2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thanks Bertrand, my reply comments inline following.
>
> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>  *[Zesheng: We will implement a periodical checking mechanism to check
> the corrupted blocks.]*
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
> the HDFS raid, so the corrupted block will be recovered from the back good
> data and coding blocks]*
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
> not with the number of supported node failures, reed-solomon algorithm can
> maintain equal or even higher node failures **tolerance. About open
> source, we will consider this in the future.**]*
>
>
> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> We want to implement a RAID on top of HDFS, something like facebook
>>> implemented as described in:
>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>
>>>
>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>> easier solution?
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for reply, Arpit.
>>>>> Yes, we need to do this regularly. The original requirement of this is
>>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>> corrupted/missing block quickly.
>>>>>
>>>>>
>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>
>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>>> just take the perf hit and recreate the file?
>>>>>>
>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>> all sorts of consistency issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>> hurt to try.
>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>>> old block file and checksum file both?
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>
>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Wellington.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Hi guys,
>>>>>>>>> >
>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>> block with a newly written block
>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>> file we wanted
>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>> bandwidth
>>>>>>>>> >
>>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>>> the new block directly.
>>>>>>>>> > Any thoughts?
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Best Wishes!
>>>>>>>>> >
>>>>>>>>> > Yours, Zesheng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

 If a block is corrupted but hasn't been detected by HDFS, you could delete
the block from the local filesystem (it's only a file) then HDFS will
replicate the good remaining replica of this block.

We only have one replica for each block, if a block is corrupted, HDFS
cannot replicate it.


2014-07-21 20:30 GMT+08:00 Zesheng Wu <wu...@gmail.com>:

> Thanks Bertrand, my reply comments inline following.
>
> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>  *[Zesheng: We will implement a periodical checking mechanism to check
> the corrupted blocks.]*
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
> *[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
> the HDFS raid, so the corrupted block will be recovered from the back good
> data and coding blocks]*
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
> *[Zesheng: Yes, I agree with you that the read performance downgrade, but
> not with the number of supported node failures, reed-solomon algorithm can
> maintain equal or even higher node failures **tolerance. About open
> source, we will consider this in the future.**]*
>
>
> 2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> So you know that a block is corrupted thanks to an external process which
>> in this case is checking the parity blocks. If a block is corrupted but
>> hasn't been detected by HDFS, you could delete the block from the local
>> filesystem (it's only a file) then HDFS will replicate the good remaining
>> replica of this block.
>>
>> For performance reason (and that's what you want to do?), you might be
>> able to fix the corruption without needing to retrieve the good replica. It
>> might be possible by working directly with the local system by replacing
>> the corrupted block by the corrected block (which again are files). On
>> issue is that the corrected block might be different than the good replica.
>> If HDFS is able to tell (with CRC) it might be good else you will end up
>> with two different good replicas for the same block and that will not be
>> pretty...
>>
>> If the result is to be open source, you might want to check with Facebook
>> about their implementation and track the process within Apache JIRA. You
>> could gain additional feedbacks. One downside of HDFS RAID is that the less
>> replicas there is, the less read of the data for processing will be
>> 'efficient/fast'. Reducing the number of replicas also diminishes the
>> number of supported node failures. I wouldn't say it's an easy ride.
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> We want to implement a RAID on top of HDFS, something like facebook
>>> implemented as described in:
>>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>>
>>>
>>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>>
>>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID?
>>>> I am not sure I understand any of these use cases. HDFS handles for you
>>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>>> easier solution?
>>>>
>>>> Bertrand Dechoux
>>>>
>>>>
>>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for reply, Arpit.
>>>>> Yes, we need to do this regularly. The original requirement of this is
>>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>>> needs quick recovery of the block. We are considering how to recovery the
>>>>> corrupted/missing block quickly.
>>>>>
>>>>>
>>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>>
>>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>>> just take the perf hit and recreate the file?
>>>>>>
>>>>>> If you need to do this regularly you should consider a mutable file
>>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>>> all sorts of consistency issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>>> hurt to try.
>>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>>> old block file and checksum file both?
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>>
>>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Wellington.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Hi guys,
>>>>>>>>> >
>>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>>> block with a newly written block
>>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>>> file we wanted
>>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>>> bandwidth
>>>>>>>>> >
>>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>>> the new block directly.
>>>>>>>>> > Any thoughts?
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Best Wishes!
>>>>>>>>> >
>>>>>>>>> > Yours, Zesheng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Wishes!
>>>>>>>>
>>>>>>>> Yours, Zesheng
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity to which it is addressed and may contain information that is
>>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>>> notified that any printing, copying, dissemination, distribution,
>>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>>> you have received this communication in error, please contact the sender
>>>>>> immediately and delete it from your system. Thank You.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>



-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks Bertrand, my reply comments inline following.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.
*[Zesheng: We will implement a periodical checking mechanism to check the
corrupted blocks.]*

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...
*[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
the HDFS raid, so the corrupted block will be recovered from the back good
data and coding blocks]*

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.
*[Zesheng: Yes, I agree with you that the read performance downgrade, but
not with the number of supported node failures, reed-solomon algorithm can
maintain equal or even higher node failures **tolerance. About open source,
we will consider this in the future.**]*


2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> We want to implement a RAID on top of HDFS, something like facebook
>> implemented as described in:
>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>
>>
>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>>> am not sure I understand any of these use cases. HDFS handles for you
>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>> easier solution?
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for reply, Arpit.
>>>> Yes, we need to do this regularly. The original requirement of this is
>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>> needs quick recovery of the block. We are considering how to recovery the
>>>> corrupted/missing block quickly.
>>>>
>>>>
>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>
>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>> just take the perf hit and recreate the file?
>>>>>
>>>>> If you need to do this regularly you should consider a mutable file
>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>> all sorts of consistency issues.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>> hurt to try.
>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>>
>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>> old block file and checksum file both?
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>
>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Wellington.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> > Hi guys,
>>>>>>>> >
>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>> block with a newly written block
>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>> file we wanted
>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>> bandwidth
>>>>>>>> >
>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>> the new block directly.
>>>>>>>> > Any thoughts?
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Best Wishes!
>>>>>>>> >
>>>>>>>> > Yours, Zesheng
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks Bertrand, my reply comments inline following.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.
*[Zesheng: We will implement a periodical checking mechanism to check the
corrupted blocks.]*

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...
*[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
the HDFS raid, so the corrupted block will be recovered from the back good
data and coding blocks]*

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.
*[Zesheng: Yes, I agree with you that the read performance downgrade, but
not with the number of supported node failures, reed-solomon algorithm can
maintain equal or even higher node failures **tolerance. About open source,
we will consider this in the future.**]*


2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> We want to implement a RAID on top of HDFS, something like facebook
>> implemented as described in:
>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>
>>
>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>>> am not sure I understand any of these use cases. HDFS handles for you
>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>> easier solution?
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for reply, Arpit.
>>>> Yes, we need to do this regularly. The original requirement of this is
>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>> needs quick recovery of the block. We are considering how to recovery the
>>>> corrupted/missing block quickly.
>>>>
>>>>
>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>
>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>> just take the perf hit and recreate the file?
>>>>>
>>>>> If you need to do this regularly you should consider a mutable file
>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>> all sorts of consistency issues.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>> hurt to try.
>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>>
>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>> old block file and checksum file both?
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>
>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Wellington.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> > Hi guys,
>>>>>>>> >
>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>> block with a newly written block
>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>> file we wanted
>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>> bandwidth
>>>>>>>> >
>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>> the new block directly.
>>>>>>>> > Any thoughts?
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Best Wishes!
>>>>>>>> >
>>>>>>>> > Yours, Zesheng
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks Bertrand, my reply comments inline following.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.
*[Zesheng: We will implement a periodical checking mechanism to check the
corrupted blocks.]*

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...
*[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
the HDFS raid, so the corrupted block will be recovered from the back good
data and coding blocks]*

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.
*[Zesheng: Yes, I agree with you that the read performance downgrade, but
not with the number of supported node failures, reed-solomon algorithm can
maintain equal or even higher node failures **tolerance. About open source,
we will consider this in the future.**]*


2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> We want to implement a RAID on top of HDFS, something like facebook
>> implemented as described in:
>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>
>>
>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>>> am not sure I understand any of these use cases. HDFS handles for you
>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>> easier solution?
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for reply, Arpit.
>>>> Yes, we need to do this regularly. The original requirement of this is
>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>> needs quick recovery of the block. We are considering how to recovery the
>>>> corrupted/missing block quickly.
>>>>
>>>>
>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>
>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>> just take the perf hit and recreate the file?
>>>>>
>>>>> If you need to do this regularly you should consider a mutable file
>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>> all sorts of consistency issues.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>> hurt to try.
>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>>
>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>> old block file and checksum file both?
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>
>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Wellington.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> > Hi guys,
>>>>>>>> >
>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>> block with a newly written block
>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>> file we wanted
>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>> bandwidth
>>>>>>>> >
>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>> the new block directly.
>>>>>>>> > Any thoughts?
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Best Wishes!
>>>>>>>> >
>>>>>>>> > Yours, Zesheng
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks Bertrand, my reply comments inline following.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.
*[Zesheng: We will implement a periodical checking mechanism to check the
corrupted blocks.]*

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...
*[Zesheng: Indeed we will use the reed-solomon erasure codes to implement
the HDFS raid, so the corrupted block will be recovered from the back good
data and coding blocks]*

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.
*[Zesheng: Yes, I agree with you that the read performance downgrade, but
not with the number of supported node failures, reed-solomon algorithm can
maintain equal or even higher node failures **tolerance. About open source,
we will consider this in the future.**]*


2014-07-21 20:01 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> So you know that a block is corrupted thanks to an external process which
> in this case is checking the parity blocks. If a block is corrupted but
> hasn't been detected by HDFS, you could delete the block from the local
> filesystem (it's only a file) then HDFS will replicate the good remaining
> replica of this block.
>
> For performance reason (and that's what you want to do?), you might be
> able to fix the corruption without needing to retrieve the good replica. It
> might be possible by working directly with the local system by replacing
> the corrupted block by the corrected block (which again are files). On
> issue is that the corrected block might be different than the good replica.
> If HDFS is able to tell (with CRC) it might be good else you will end up
> with two different good replicas for the same block and that will not be
> pretty...
>
> If the result is to be open source, you might want to check with Facebook
> about their implementation and track the process within Apache JIRA. You
> could gain additional feedbacks. One downside of HDFS RAID is that the less
> replicas there is, the less read of the data for processing will be
> 'efficient/fast'. Reducing the number of replicas also diminishes the
> number of supported node failures. I wouldn't say it's an easy ride.
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> We want to implement a RAID on top of HDFS, something like facebook
>> implemented as described in:
>> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>>
>>
>> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>>
>> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>>> am not sure I understand any of these use cases. HDFS handles for you
>>> replication and error detection. Fine tuning the cluster wouldn't be the
>>> easier solution?
>>>
>>> Bertrand Dechoux
>>>
>>>
>>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for reply, Arpit.
>>>> Yes, we need to do this regularly. The original requirement of this is
>>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>>> needs quick recovery of the block. We are considering how to recovery the
>>>> corrupted/missing block quickly.
>>>>
>>>>
>>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>>
>>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>>> just take the perf hit and recreate the file?
>>>>>
>>>>> If you need to do this regularly you should consider a mutable file
>>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>>> all sorts of consistency issues.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> That will break the consistency of the file system, but it doesn't
>>>>>> hurt to try.
>>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>>
>>>>>>> How about write a new block with new checksum file, and replace the
>>>>>>> old block file and checksum file both?
>>>>>>>
>>>>>>>
>>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>>
>>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Wellington.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> > Hi guys,
>>>>>>>> >
>>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>>> block with a newly written block
>>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>>> > Suppose the original file is A, and we write a new file B which
>>>>>>>> is composed by the new data blocks, then we merge A and B to C which is the
>>>>>>>> file we wanted
>>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>>> bandwidth
>>>>>>>> >
>>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>>> the new block directly.
>>>>>>>> > Any thoughts?
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Best Wishes!
>>>>>>>> >
>>>>>>>> > Yours, Zesheng
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Wishes!
>>>>>>>
>>>>>>> Yours, Zesheng
>>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:

> We want to implement a RAID on top of HDFS, something like facebook
> implemented as described in:
> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>
>
> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>> am not sure I understand any of these use cases. HDFS handles for you
>> replication and error detection. Fine tuning the cluster wouldn't be the
>> easier solution?
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> Thanks for reply, Arpit.
>>> Yes, we need to do this regularly. The original requirement of this is
>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>> needs quick recovery of the block. We are considering how to recovery the
>>> corrupted/missing block quickly.
>>>
>>>
>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>
>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>> just take the perf hit and recreate the file?
>>>>
>>>> If you need to do this regularly you should consider a mutable file
>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>> all sorts of consistency issues.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>>
>>>>> That will break the consistency of the file system, but it doesn't
>>>>> hurt to try.
>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>
>>>>>> How about write a new block with new checksum file, and replace the
>>>>>> old block file and checksum file both?
>>>>>>
>>>>>>
>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>
>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Wellington.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>
>>>>>>> > Hi guys,
>>>>>>> >
>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>> block with a newly written block
>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>>> file we wanted
>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>> bandwidth
>>>>>>> >
>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>> the new block directly.
>>>>>>> > Any thoughts?
>>>>>>> >
>>>>>>> > --
>>>>>>> > Best Wishes!
>>>>>>> >
>>>>>>> > Yours, Zesheng
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:

> We want to implement a RAID on top of HDFS, something like facebook
> implemented as described in:
> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>
>
> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>> am not sure I understand any of these use cases. HDFS handles for you
>> replication and error detection. Fine tuning the cluster wouldn't be the
>> easier solution?
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> Thanks for reply, Arpit.
>>> Yes, we need to do this regularly. The original requirement of this is
>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>> needs quick recovery of the block. We are considering how to recovery the
>>> corrupted/missing block quickly.
>>>
>>>
>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>
>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>> just take the perf hit and recreate the file?
>>>>
>>>> If you need to do this regularly you should consider a mutable file
>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>> all sorts of consistency issues.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>>
>>>>> That will break the consistency of the file system, but it doesn't
>>>>> hurt to try.
>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>
>>>>>> How about write a new block with new checksum file, and replace the
>>>>>> old block file and checksum file both?
>>>>>>
>>>>>>
>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>
>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Wellington.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>
>>>>>>> > Hi guys,
>>>>>>> >
>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>> block with a newly written block
>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>>> file we wanted
>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>> bandwidth
>>>>>>> >
>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>> the new block directly.
>>>>>>> > Any thoughts?
>>>>>>> >
>>>>>>> > --
>>>>>>> > Best Wishes!
>>>>>>> >
>>>>>>> > Yours, Zesheng
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:

> We want to implement a RAID on top of HDFS, something like facebook
> implemented as described in:
> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>
>
> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>> am not sure I understand any of these use cases. HDFS handles for you
>> replication and error detection. Fine tuning the cluster wouldn't be the
>> easier solution?
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> Thanks for reply, Arpit.
>>> Yes, we need to do this regularly. The original requirement of this is
>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>> needs quick recovery of the block. We are considering how to recovery the
>>> corrupted/missing block quickly.
>>>
>>>
>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>
>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>> just take the perf hit and recreate the file?
>>>>
>>>> If you need to do this regularly you should consider a mutable file
>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>> all sorts of consistency issues.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>>
>>>>> That will break the consistency of the file system, but it doesn't
>>>>> hurt to try.
>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>
>>>>>> How about write a new block with new checksum file, and replace the
>>>>>> old block file and checksum file both?
>>>>>>
>>>>>>
>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>
>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Wellington.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>
>>>>>>> > Hi guys,
>>>>>>> >
>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>> block with a newly written block
>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>>> file we wanted
>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>> bandwidth
>>>>>>> >
>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>> the new block directly.
>>>>>>> > Any thoughts?
>>>>>>> >
>>>>>>> > --
>>>>>>> > Best Wishes!
>>>>>>> >
>>>>>>> > Yours, Zesheng
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
replica of this block.

For performance reason (and that's what you want to do?), you might be able
to fix the corruption without needing to retrieve the good replica. It
might be possible by working directly with the local system by replacing
the corrupted block by the corrected block (which again are files). On
issue is that the corrected block might be different than the good replica.
If HDFS is able to tell (with CRC) it might be good else you will end up
with two different good replicas for the same block and that will not be
pretty...

If the result is to be open source, you might want to check with Facebook
about their implementation and track the process within Apache JIRA. You
could gain additional feedbacks. One downside of HDFS RAID is that the less
replicas there is, the less read of the data for processing will be
'efficient/fast'. Reducing the number of replicas also diminishes the
number of supported node failures. I wouldn't say it's an easy ride.

Bertrand Dechoux


On Mon, Jul 21, 2014 at 1:29 PM, Zesheng Wu <wu...@gmail.com> wrote:

> We want to implement a RAID on top of HDFS, something like facebook
> implemented as described in:
> https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
>
>
> 2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:
>
> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
>> am not sure I understand any of these use cases. HDFS handles for you
>> replication and error detection. Fine tuning the cluster wouldn't be the
>> easier solution?
>>
>> Bertrand Dechoux
>>
>>
>> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com>
>> wrote:
>>
>>> Thanks for reply, Arpit.
>>> Yes, we need to do this regularly. The original requirement of this is
>>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>>> needs quick recovery of the block. We are considering how to recovery the
>>> corrupted/missing block quickly.
>>>
>>>
>>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>>
>>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>>> just take the perf hit and recreate the file?
>>>>
>>>> If you need to do this regularly you should consider a mutable file
>>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>>> all sorts of consistency issues.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>>
>>>>> That will break the consistency of the file system, but it doesn't
>>>>> hurt to try.
>>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>>
>>>>>> How about write a new block with new checksum file, and replace the
>>>>>> old block file and checksum file both?
>>>>>>
>>>>>>
>>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>>> wellington.chevreuil@gmail.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>>> features. You'll need to write a new file with the changes.
>>>>>>>
>>>>>>> Notice that even if you manage to find the physical block replica
>>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Wellington.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>>
>>>>>>> > Hi guys,
>>>>>>> >
>>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>>> block with a newly written block
>>>>>>> > The most straightforward way to finish may be like this:
>>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>>> file we wanted
>>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>>> bandwidth
>>>>>>> >
>>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>>> the new block directly.
>>>>>>> > Any thoughts?
>>>>>>> >
>>>>>>> > --
>>>>>>> > Best Wishes!
>>>>>>> >
>>>>>>> > Yours, Zesheng
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Wishes!
>>>>>>
>>>>>> Yours, Zesheng
>>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

We want to implement a RAID on top of HDFS, something like facebook
implemented as described in:
https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/


2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
> am not sure I understand any of these use cases. HDFS handles for you
> replication and error detection. Fine tuning the cluster wouldn't be the
> easier solution?
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Thanks for reply, Arpit.
>> Yes, we need to do this regularly. The original requirement of this is
>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>> needs quick recovery of the block. We are considering how to recovery the
>> corrupted/missing block quickly.
>>
>>
>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>
>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>> just take the perf hit and recreate the file?
>>>
>>> If you need to do this regularly you should consider a mutable file
>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>> all sorts of consistency issues.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>
>>>> That will break the consistency of the file system, but it doesn't hurt
>>>> to try.
>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>
>>>>> How about write a new block with new checksum file, and replace the
>>>>> old block file and checksum file both?
>>>>>
>>>>>
>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>> wellington.chevreuil@gmail.com>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>> features. You'll need to write a new file with the changes.
>>>>>>
>>>>>> Notice that even if you manage to find the physical block replica
>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>
>>>>>> Regards,
>>>>>> Wellington.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>
>>>>>> > Hi guys,
>>>>>> >
>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>> block with a newly written block
>>>>>> > The most straightforward way to finish may be like this:
>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>> file we wanted
>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>> bandwidth
>>>>>> >
>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>> the new block directly.
>>>>>> > Any thoughts?
>>>>>> >
>>>>>> > --
>>>>>> > Best Wishes!
>>>>>> >
>>>>>> > Yours, Zesheng
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

We want to implement a RAID on top of HDFS, something like facebook
implemented as described in:
https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/


2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
> am not sure I understand any of these use cases. HDFS handles for you
> replication and error detection. Fine tuning the cluster wouldn't be the
> easier solution?
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Thanks for reply, Arpit.
>> Yes, we need to do this regularly. The original requirement of this is
>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>> needs quick recovery of the block. We are considering how to recovery the
>> corrupted/missing block quickly.
>>
>>
>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>
>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>> just take the perf hit and recreate the file?
>>>
>>> If you need to do this regularly you should consider a mutable file
>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>> all sorts of consistency issues.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>
>>>> That will break the consistency of the file system, but it doesn't hurt
>>>> to try.
>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>
>>>>> How about write a new block with new checksum file, and replace the
>>>>> old block file and checksum file both?
>>>>>
>>>>>
>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>> wellington.chevreuil@gmail.com>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>> features. You'll need to write a new file with the changes.
>>>>>>
>>>>>> Notice that even if you manage to find the physical block replica
>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>
>>>>>> Regards,
>>>>>> Wellington.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>
>>>>>> > Hi guys,
>>>>>> >
>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>> block with a newly written block
>>>>>> > The most straightforward way to finish may be like this:
>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>> file we wanted
>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>> bandwidth
>>>>>> >
>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>> the new block directly.
>>>>>> > Any thoughts?
>>>>>> >
>>>>>> > --
>>>>>> > Best Wishes!
>>>>>> >
>>>>>> > Yours, Zesheng
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

We want to implement a RAID on top of HDFS, something like facebook
implemented as described in:
https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/


2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
> am not sure I understand any of these use cases. HDFS handles for you
> replication and error detection. Fine tuning the cluster wouldn't be the
> easier solution?
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Thanks for reply, Arpit.
>> Yes, we need to do this regularly. The original requirement of this is
>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>> needs quick recovery of the block. We are considering how to recovery the
>> corrupted/missing block quickly.
>>
>>
>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>
>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>> just take the perf hit and recreate the file?
>>>
>>> If you need to do this regularly you should consider a mutable file
>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>> all sorts of consistency issues.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>
>>>> That will break the consistency of the file system, but it doesn't hurt
>>>> to try.
>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>
>>>>> How about write a new block with new checksum file, and replace the
>>>>> old block file and checksum file both?
>>>>>
>>>>>
>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>> wellington.chevreuil@gmail.com>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>> features. You'll need to write a new file with the changes.
>>>>>>
>>>>>> Notice that even if you manage to find the physical block replica
>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>
>>>>>> Regards,
>>>>>> Wellington.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>
>>>>>> > Hi guys,
>>>>>> >
>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>> block with a newly written block
>>>>>> > The most straightforward way to finish may be like this:
>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>> file we wanted
>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>> bandwidth
>>>>>> >
>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>> the new block directly.
>>>>>> > Any thoughts?
>>>>>> >
>>>>>> > --
>>>>>> > Best Wishes!
>>>>>> >
>>>>>> > Yours, Zesheng
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

We want to implement a RAID on top of HDFS, something like facebook
implemented as described in:
https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/


2014-07-21 17:19 GMT+08:00 Bertrand Dechoux <de...@gmail.com>:

> You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
> am not sure I understand any of these use cases. HDFS handles for you
> replication and error detection. Fine tuning the cluster wouldn't be the
> easier solution?
>
> Bertrand Dechoux
>
>
> On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Thanks for reply, Arpit.
>> Yes, we need to do this regularly. The original requirement of this is
>> that we want to do RAID(which is based reed-solomon erasure codes) on our
>> HDFS cluster. When a block is corrupted or missing, the downgrade read
>> needs quick recovery of the block. We are considering how to recovery the
>> corrupted/missing block quickly.
>>
>>
>> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>>
>>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>>> just take the perf hit and recreate the file?
>>>
>>> If you need to do this regularly you should consider a mutable file
>>> store like HBase. If you start modifying blocks from under HDFS you open up
>>> all sorts of consistency issues.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>>
>>>> That will break the consistency of the file system, but it doesn't hurt
>>>> to try.
>>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>>
>>>>> How about write a new block with new checksum file, and replace the
>>>>> old block file and checksum file both?
>>>>>
>>>>>
>>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>>> wellington.chevreuil@gmail.com>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>>> features. You'll need to write a new file with the changes.
>>>>>>
>>>>>> Notice that even if you manage to find the physical block replica
>>>>>> files on the disk, corresponding to the part of the file you want to
>>>>>> change, you can't simply update it manually, as this would give a different
>>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>>
>>>>>> Regards,
>>>>>> Wellington.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>>
>>>>>> > Hi guys,
>>>>>> >
>>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>>> block with a newly written block
>>>>>> > The most straightforward way to finish may be like this:
>>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>>> file we wanted
>>>>>> > The obvious shortcoming of this method is wasting of network
>>>>>> bandwidth
>>>>>> >
>>>>>> > I'm wondering whether there is a way to replace the old block by
>>>>>> the new block directly.
>>>>>> > Any thoughts?
>>>>>> >
>>>>>> > --
>>>>>> > Best Wishes!
>>>>>> >
>>>>>> > Yours, Zesheng
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes!
>>>>>
>>>>> Yours, Zesheng
>>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
am not sure I understand any of these use cases. HDFS handles for you
replication and error detection. Fine tuning the cluster wouldn't be the
easier solution?

Bertrand Dechoux


On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:

> Thanks for reply, Arpit.
> Yes, we need to do this regularly. The original requirement of this is
> that we want to do RAID(which is based reed-solomon erasure codes) on our
> HDFS cluster. When a block is corrupted or missing, the downgrade read
> needs quick recovery of the block. We are considering how to recovery the
> corrupted/missing block quickly.
>
>
> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>
>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>> just take the perf hit and recreate the file?
>>
>> If you need to do this regularly you should consider a mutable file store
>> like HBase. If you start modifying blocks from under HDFS you open up all
>> sorts of consistency issues.
>>
>>
>>
>>
>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>
>>> That will break the consistency of the file system, but it doesn't hurt
>>> to try.
>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>
>>>> How about write a new block with new checksum file, and replace the old
>>>> block file and checksum file both?
>>>>
>>>>
>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>> wellington.chevreuil@gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>> features. You'll need to write a new file with the changes.
>>>>>
>>>>> Notice that even if you manage to find the physical block replica
>>>>> files on the disk, corresponding to the part of the file you want to
>>>>> change, you can't simply update it manually, as this would give a different
>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>
>>>>> Regards,
>>>>> Wellington.
>>>>>
>>>>>
>>>>>
>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>
>>>>> > Hi guys,
>>>>> >
>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>> block with a newly written block
>>>>> > The most straightforward way to finish may be like this:
>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>> file we wanted
>>>>> > The obvious shortcoming of this method is wasting of network
>>>>> bandwidth
>>>>> >
>>>>> > I'm wondering whether there is a way to replace the old block by the
>>>>> new block directly.
>>>>> > Any thoughts?
>>>>> >
>>>>> > --
>>>>> > Best Wishes!
>>>>> >
>>>>> > Yours, Zesheng
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
am not sure I understand any of these use cases. HDFS handles for you
replication and error detection. Fine tuning the cluster wouldn't be the
easier solution?

Bertrand Dechoux


On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:

> Thanks for reply, Arpit.
> Yes, we need to do this regularly. The original requirement of this is
> that we want to do RAID(which is based reed-solomon erasure codes) on our
> HDFS cluster. When a block is corrupted or missing, the downgrade read
> needs quick recovery of the block. We are considering how to recovery the
> corrupted/missing block quickly.
>
>
> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>
>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>> just take the perf hit and recreate the file?
>>
>> If you need to do this regularly you should consider a mutable file store
>> like HBase. If you start modifying blocks from under HDFS you open up all
>> sorts of consistency issues.
>>
>>
>>
>>
>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>
>>> That will break the consistency of the file system, but it doesn't hurt
>>> to try.
>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>
>>>> How about write a new block with new checksum file, and replace the old
>>>> block file and checksum file both?
>>>>
>>>>
>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>> wellington.chevreuil@gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>> features. You'll need to write a new file with the changes.
>>>>>
>>>>> Notice that even if you manage to find the physical block replica
>>>>> files on the disk, corresponding to the part of the file you want to
>>>>> change, you can't simply update it manually, as this would give a different
>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>
>>>>> Regards,
>>>>> Wellington.
>>>>>
>>>>>
>>>>>
>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>
>>>>> > Hi guys,
>>>>> >
>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>> block with a newly written block
>>>>> > The most straightforward way to finish may be like this:
>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>> file we wanted
>>>>> > The obvious shortcoming of this method is wasting of network
>>>>> bandwidth
>>>>> >
>>>>> > I'm wondering whether there is a way to replace the old block by the
>>>>> new block directly.
>>>>> > Any thoughts?
>>>>> >
>>>>> > --
>>>>> > Best Wishes!
>>>>> >
>>>>> > Yours, Zesheng
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
am not sure I understand any of these use cases. HDFS handles for you
replication and error detection. Fine tuning the cluster wouldn't be the
easier solution?

Bertrand Dechoux


On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:

> Thanks for reply, Arpit.
> Yes, we need to do this regularly. The original requirement of this is
> that we want to do RAID(which is based reed-solomon erasure codes) on our
> HDFS cluster. When a block is corrupted or missing, the downgrade read
> needs quick recovery of the block. We are considering how to recovery the
> corrupted/missing block quickly.
>
>
> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>
>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>> just take the perf hit and recreate the file?
>>
>> If you need to do this regularly you should consider a mutable file store
>> like HBase. If you start modifying blocks from under HDFS you open up all
>> sorts of consistency issues.
>>
>>
>>
>>
>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>
>>> That will break the consistency of the file system, but it doesn't hurt
>>> to try.
>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>
>>>> How about write a new block with new checksum file, and replace the old
>>>> block file and checksum file both?
>>>>
>>>>
>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>> wellington.chevreuil@gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>> features. You'll need to write a new file with the changes.
>>>>>
>>>>> Notice that even if you manage to find the physical block replica
>>>>> files on the disk, corresponding to the part of the file you want to
>>>>> change, you can't simply update it manually, as this would give a different
>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>
>>>>> Regards,
>>>>> Wellington.
>>>>>
>>>>>
>>>>>
>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>
>>>>> > Hi guys,
>>>>> >
>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>> block with a newly written block
>>>>> > The most straightforward way to finish may be like this:
>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>> file we wanted
>>>>> > The obvious shortcoming of this method is wasting of network
>>>>> bandwidth
>>>>> >
>>>>> > I'm wondering whether there is a way to replace the old block by the
>>>>> new block directly.
>>>>> > Any thoughts?
>>>>> >
>>>>> > --
>>>>> > Best Wishes!
>>>>> >
>>>>> > Yours, Zesheng
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Bertrand Dechoux <de...@gmail.com>.

You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
am not sure I understand any of these use cases. HDFS handles for you
replication and error detection. Fine tuning the cluster wouldn't be the
easier solution?

Bertrand Dechoux


On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu <wu...@gmail.com> wrote:

> Thanks for reply, Arpit.
> Yes, we need to do this regularly. The original requirement of this is
> that we want to do RAID(which is based reed-solomon erasure codes) on our
> HDFS cluster. When a block is corrupted or missing, the downgrade read
> needs quick recovery of the block. We are considering how to recovery the
> corrupted/missing block quickly.
>
>
> 2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:
>
>> IMHO this is a spectacularly bad idea. Is it a one off event? Why not
>> just take the perf hit and recreate the file?
>>
>> If you need to do this regularly you should consider a mutable file store
>> like HBase. If you start modifying blocks from under HDFS you open up all
>> sorts of consistency issues.
>>
>>
>>
>>
>> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>>
>>> That will break the consistency of the file system, but it doesn't hurt
>>> to try.
>>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>>
>>>> How about write a new block with new checksum file, and replace the old
>>>> block file and checksum file both?
>>>>
>>>>
>>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>>> wellington.chevreuil@gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> there's no way to do that, as HDFS does not provide file updates
>>>>> features. You'll need to write a new file with the changes.
>>>>>
>>>>> Notice that even if you manage to find the physical block replica
>>>>> files on the disk, corresponding to the part of the file you want to
>>>>> change, you can't simply update it manually, as this would give a different
>>>>> checksum, making HDFS mark such blocks as corrupt.
>>>>>
>>>>> Regards,
>>>>> Wellington.
>>>>>
>>>>>
>>>>>
>>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>>
>>>>> > Hi guys,
>>>>> >
>>>>> > I recently encounter a scenario which needs to replace an exist
>>>>> block with a newly written block
>>>>> > The most straightforward way to finish may be like this:
>>>>> > Suppose the original file is A, and we write a new file B which is
>>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>>> file we wanted
>>>>> > The obvious shortcoming of this method is wasting of network
>>>>> bandwidth
>>>>> >
>>>>> > I'm wondering whether there is a way to replace the old block by the
>>>>> new block directly.
>>>>> > Any thoughts?
>>>>> >
>>>>> > --
>>>>> > Best Wishes!
>>>>> >
>>>>> > Yours, Zesheng
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes!
>>>>
>>>> Yours, Zesheng
>>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks for reply, Arpit.
Yes, we need to do this regularly. The original requirement of this is that
we want to do RAID(which is based reed-solomon erasure codes) on our HDFS
cluster. When a block is corrupted or missing, the downgrade read needs
quick recovery of the block. We are considering how to recovery the
corrupted/missing block quickly.


2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:

> IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
> take the perf hit and recreate the file?
>
> If you need to do this regularly you should consider a mutable file store
> like HBase. If you start modifying blocks from under HDFS you open up all
> sorts of consistency issues.
>
>
>
>
> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>
>> That will break the consistency of the file system, but it doesn't hurt
>> to try.
>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>
>>> How about write a new block with new checksum file, and replace the old
>>> block file and checksum file both?
>>>
>>>
>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>> wellington.chevreuil@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> there's no way to do that, as HDFS does not provide file updates
>>>> features. You'll need to write a new file with the changes.
>>>>
>>>> Notice that even if you manage to find the physical block replica files
>>>> on the disk, corresponding to the part of the file you want to change, you
>>>> can't simply update it manually, as this would give a different checksum,
>>>> making HDFS mark such blocks as corrupt.
>>>>
>>>> Regards,
>>>> Wellington.
>>>>
>>>>
>>>>
>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>
>>>> > Hi guys,
>>>> >
>>>> > I recently encounter a scenario which needs to replace an exist block
>>>> with a newly written block
>>>> > The most straightforward way to finish may be like this:
>>>> > Suppose the original file is A, and we write a new file B which is
>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>> file we wanted
>>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>>> >
>>>> > I'm wondering whether there is a way to replace the old block by the
>>>> new block directly.
>>>> > Any thoughts?
>>>> >
>>>> > --
>>>> > Best Wishes!
>>>> >
>>>> > Yours, Zesheng
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks for reply, Arpit.
Yes, we need to do this regularly. The original requirement of this is that
we want to do RAID(which is based reed-solomon erasure codes) on our HDFS
cluster. When a block is corrupted or missing, the downgrade read needs
quick recovery of the block. We are considering how to recovery the
corrupted/missing block quickly.


2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:

> IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
> take the perf hit and recreate the file?
>
> If you need to do this regularly you should consider a mutable file store
> like HBase. If you start modifying blocks from under HDFS you open up all
> sorts of consistency issues.
>
>
>
>
> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>
>> That will break the consistency of the file system, but it doesn't hurt
>> to try.
>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>
>>> How about write a new block with new checksum file, and replace the old
>>> block file and checksum file both?
>>>
>>>
>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>> wellington.chevreuil@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> there's no way to do that, as HDFS does not provide file updates
>>>> features. You'll need to write a new file with the changes.
>>>>
>>>> Notice that even if you manage to find the physical block replica files
>>>> on the disk, corresponding to the part of the file you want to change, you
>>>> can't simply update it manually, as this would give a different checksum,
>>>> making HDFS mark such blocks as corrupt.
>>>>
>>>> Regards,
>>>> Wellington.
>>>>
>>>>
>>>>
>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>
>>>> > Hi guys,
>>>> >
>>>> > I recently encounter a scenario which needs to replace an exist block
>>>> with a newly written block
>>>> > The most straightforward way to finish may be like this:
>>>> > Suppose the original file is A, and we write a new file B which is
>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>> file we wanted
>>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>>> >
>>>> > I'm wondering whether there is a way to replace the old block by the
>>>> new block directly.
>>>> > Any thoughts?
>>>> >
>>>> > --
>>>> > Best Wishes!
>>>> >
>>>> > Yours, Zesheng
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks for reply, Arpit.
Yes, we need to do this regularly. The original requirement of this is that
we want to do RAID(which is based reed-solomon erasure codes) on our HDFS
cluster. When a block is corrupted or missing, the downgrade read needs
quick recovery of the block. We are considering how to recovery the
corrupted/missing block quickly.


2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:

> IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
> take the perf hit and recreate the file?
>
> If you need to do this regularly you should consider a mutable file store
> like HBase. If you start modifying blocks from under HDFS you open up all
> sorts of consistency issues.
>
>
>
>
> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>
>> That will break the consistency of the file system, but it doesn't hurt
>> to try.
>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>
>>> How about write a new block with new checksum file, and replace the old
>>> block file and checksum file both?
>>>
>>>
>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>> wellington.chevreuil@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> there's no way to do that, as HDFS does not provide file updates
>>>> features. You'll need to write a new file with the changes.
>>>>
>>>> Notice that even if you manage to find the physical block replica files
>>>> on the disk, corresponding to the part of the file you want to change, you
>>>> can't simply update it manually, as this would give a different checksum,
>>>> making HDFS mark such blocks as corrupt.
>>>>
>>>> Regards,
>>>> Wellington.
>>>>
>>>>
>>>>
>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>
>>>> > Hi guys,
>>>> >
>>>> > I recently encounter a scenario which needs to replace an exist block
>>>> with a newly written block
>>>> > The most straightforward way to finish may be like this:
>>>> > Suppose the original file is A, and we write a new file B which is
>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>> file we wanted
>>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>>> >
>>>> > I'm wondering whether there is a way to replace the old block by the
>>>> new block directly.
>>>> > Any thoughts?
>>>> >
>>>> > --
>>>> > Best Wishes!
>>>> >
>>>> > Yours, Zesheng
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

Thanks for reply, Arpit.
Yes, we need to do this regularly. The original requirement of this is that
we want to do RAID(which is based reed-solomon erasure codes) on our HDFS
cluster. When a block is corrupted or missing, the downgrade read needs
quick recovery of the block. We are considering how to recovery the
corrupted/missing block quickly.


2014-07-19 5:18 GMT+08:00 Arpit Agarwal <aa...@hortonworks.com>:

> IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
> take the perf hit and recreate the file?
>
> If you need to do this regularly you should consider a mutable file store
> like HBase. If you start modifying blocks from under HDFS you open up all
> sorts of consistency issues.
>
>
>
>
> On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:
>
>> That will break the consistency of the file system, but it doesn't hurt
>> to try.
>>  On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>>
>>> How about write a new block with new checksum file, and replace the old
>>> block file and checksum file both?
>>>
>>>
>>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>>> wellington.chevreuil@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> there's no way to do that, as HDFS does not provide file updates
>>>> features. You'll need to write a new file with the changes.
>>>>
>>>> Notice that even if you manage to find the physical block replica files
>>>> on the disk, corresponding to the part of the file you want to change, you
>>>> can't simply update it manually, as this would give a different checksum,
>>>> making HDFS mark such blocks as corrupt.
>>>>
>>>> Regards,
>>>> Wellington.
>>>>
>>>>
>>>>
>>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>>
>>>> > Hi guys,
>>>> >
>>>> > I recently encounter a scenario which needs to replace an exist block
>>>> with a newly written block
>>>> > The most straightforward way to finish may be like this:
>>>> > Suppose the original file is A, and we write a new file B which is
>>>> composed by the new data blocks, then we merge A and B to C which is the
>>>> file we wanted
>>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>>> >
>>>> > I'm wondering whether there is a way to replace the old block by the
>>>> new block directly.
>>>> > Any thoughts?
>>>> >
>>>> > --
>>>> > Best Wishes!
>>>> >
>>>> > Yours, Zesheng
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes!
>>>
>>> Yours, Zesheng
>>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Arpit Agarwal <aa...@hortonworks.com>.

IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
take the perf hit and recreate the file?

If you need to do this regularly you should consider a mutable file store
like HBase. If you start modifying blocks from under HDFS you open up all
sorts of consistency issues.




On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:

> That will break the consistency of the file system, but it doesn't hurt to
> try.
> On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>
>> How about write a new block with new checksum file, and replace the old
>> block file and checksum file both?
>>
>>
>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>> wellington.chevreuil@gmail.com>:
>>
>>> Hi,
>>>
>>> there's no way to do that, as HDFS does not provide file updates
>>> features. You'll need to write a new file with the changes.
>>>
>>> Notice that even if you manage to find the physical block replica files
>>> on the disk, corresponding to the part of the file you want to change, you
>>> can't simply update it manually, as this would give a different checksum,
>>> making HDFS mark such blocks as corrupt.
>>>
>>> Regards,
>>> Wellington.
>>>
>>>
>>>
>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>
>>> > Hi guys,
>>> >
>>> > I recently encounter a scenario which needs to replace an exist block
>>> with a newly written block
>>> > The most straightforward way to finish may be like this:
>>> > Suppose the original file is A, and we write a new file B which is
>>> composed by the new data blocks, then we merge A and B to C which is the
>>> file we wanted
>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>> >
>>> > I'm wondering whether there is a way to replace the old block by the
>>> new block directly.
>>> > Any thoughts?
>>> >
>>> > --
>>> > Best Wishes!
>>> >
>>> > Yours, Zesheng
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Replace a block with a new one

Posted by Arpit Agarwal <aa...@hortonworks.com>.

IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
take the perf hit and recreate the file?

If you need to do this regularly you should consider a mutable file store
like HBase. If you start modifying blocks from under HDFS you open up all
sorts of consistency issues.




On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:

> That will break the consistency of the file system, but it doesn't hurt to
> try.
> On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>
>> How about write a new block with new checksum file, and replace the old
>> block file and checksum file both?
>>
>>
>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>> wellington.chevreuil@gmail.com>:
>>
>>> Hi,
>>>
>>> there's no way to do that, as HDFS does not provide file updates
>>> features. You'll need to write a new file with the changes.
>>>
>>> Notice that even if you manage to find the physical block replica files
>>> on the disk, corresponding to the part of the file you want to change, you
>>> can't simply update it manually, as this would give a different checksum,
>>> making HDFS mark such blocks as corrupt.
>>>
>>> Regards,
>>> Wellington.
>>>
>>>
>>>
>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>
>>> > Hi guys,
>>> >
>>> > I recently encounter a scenario which needs to replace an exist block
>>> with a newly written block
>>> > The most straightforward way to finish may be like this:
>>> > Suppose the original file is A, and we write a new file B which is
>>> composed by the new data blocks, then we merge A and B to C which is the
>>> file we wanted
>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>> >
>>> > I'm wondering whether there is a way to replace the old block by the
>>> new block directly.
>>> > Any thoughts?
>>> >
>>> > --
>>> > Best Wishes!
>>> >
>>> > Yours, Zesheng
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Replace a block with a new one

Posted by Arpit Agarwal <aa...@hortonworks.com>.

IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
take the perf hit and recreate the file?

If you need to do this regularly you should consider a mutable file store
like HBase. If you start modifying blocks from under HDFS you open up all
sorts of consistency issues.




On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:

> That will break the consistency of the file system, but it doesn't hurt to
> try.
> On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>
>> How about write a new block with new checksum file, and replace the old
>> block file and checksum file both?
>>
>>
>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>> wellington.chevreuil@gmail.com>:
>>
>>> Hi,
>>>
>>> there's no way to do that, as HDFS does not provide file updates
>>> features. You'll need to write a new file with the changes.
>>>
>>> Notice that even if you manage to find the physical block replica files
>>> on the disk, corresponding to the part of the file you want to change, you
>>> can't simply update it manually, as this would give a different checksum,
>>> making HDFS mark such blocks as corrupt.
>>>
>>> Regards,
>>> Wellington.
>>>
>>>
>>>
>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>
>>> > Hi guys,
>>> >
>>> > I recently encounter a scenario which needs to replace an exist block
>>> with a newly written block
>>> > The most straightforward way to finish may be like this:
>>> > Suppose the original file is A, and we write a new file B which is
>>> composed by the new data blocks, then we merge A and B to C which is the
>>> file we wanted
>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>> >
>>> > I'm wondering whether there is a way to replace the old block by the
>>> new block directly.
>>> > Any thoughts?
>>> >
>>> > --
>>> > Best Wishes!
>>> >
>>> > Yours, Zesheng
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Replace a block with a new one

Posted by Arpit Agarwal <aa...@hortonworks.com>.

IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
take the perf hit and recreate the file?

If you need to do this regularly you should consider a mutable file store
like HBase. If you start modifying blocks from under HDFS you open up all
sorts of consistency issues.




On Fri, Jul 18, 2014 at 2:10 PM, Shumin Guo <gs...@gmail.com> wrote:

> That will break the consistency of the file system, but it doesn't hurt to
> try.
> On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:
>
>> How about write a new block with new checksum file, and replace the old
>> block file and checksum file both?
>>
>>
>> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
>> wellington.chevreuil@gmail.com>:
>>
>>> Hi,
>>>
>>> there's no way to do that, as HDFS does not provide file updates
>>> features. You'll need to write a new file with the changes.
>>>
>>> Notice that even if you manage to find the physical block replica files
>>> on the disk, corresponding to the part of the file you want to change, you
>>> can't simply update it manually, as this would give a different checksum,
>>> making HDFS mark such blocks as corrupt.
>>>
>>> Regards,
>>> Wellington.
>>>
>>>
>>>
>>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>>
>>> > Hi guys,
>>> >
>>> > I recently encounter a scenario which needs to replace an exist block
>>> with a newly written block
>>> > The most straightforward way to finish may be like this:
>>> > Suppose the original file is A, and we write a new file B which is
>>> composed by the new data blocks, then we merge A and B to C which is the
>>> file we wanted
>>> > The obvious shortcoming of this method is wasting of network bandwidth
>>> >
>>> > I'm wondering whether there is a way to replace the old block by the
>>> new block directly.
>>> > Any thoughts?
>>> >
>>> > --
>>> > Best Wishes!
>>> >
>>> > Yours, Zesheng
>>>
>>>
>>
>>
>> --
>> Best Wishes!
>>
>> Yours, Zesheng
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Replace a block with a new one

Posted by Shumin Guo <gs...@gmail.com>.

That will break the consistency of the file system, but it doesn't hurt to
try.
On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:

> How about write a new block with new checksum file, and replace the old
> block file and checksum file both?
>
>
> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
> wellington.chevreuil@gmail.com>:
>
>> Hi,
>>
>> there's no way to do that, as HDFS does not provide file updates
>> features. You'll need to write a new file with the changes.
>>
>> Notice that even if you manage to find the physical block replica files
>> on the disk, corresponding to the part of the file you want to change, you
>> can't simply update it manually, as this would give a different checksum,
>> making HDFS mark such blocks as corrupt.
>>
>> Regards,
>> Wellington.
>>
>>
>>
>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>
>> > Hi guys,
>> >
>> > I recently encounter a scenario which needs to replace an exist block
>> with a newly written block
>> > The most straightforward way to finish may be like this:
>> > Suppose the original file is A, and we write a new file B which is
>> composed by the new data blocks, then we merge A and B to C which is the
>> file we wanted
>> > The obvious shortcoming of this method is wasting of network bandwidth
>> >
>> > I'm wondering whether there is a way to replace the old block by the
>> new block directly.
>> > Any thoughts?
>> >
>> > --
>> > Best Wishes!
>> >
>> > Yours, Zesheng
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Shumin Guo <gs...@gmail.com>.

That will break the consistency of the file system, but it doesn't hurt to
try.
On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:

> How about write a new block with new checksum file, and replace the old
> block file and checksum file both?
>
>
> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
> wellington.chevreuil@gmail.com>:
>
>> Hi,
>>
>> there's no way to do that, as HDFS does not provide file updates
>> features. You'll need to write a new file with the changes.
>>
>> Notice that even if you manage to find the physical block replica files
>> on the disk, corresponding to the part of the file you want to change, you
>> can't simply update it manually, as this would give a different checksum,
>> making HDFS mark such blocks as corrupt.
>>
>> Regards,
>> Wellington.
>>
>>
>>
>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>
>> > Hi guys,
>> >
>> > I recently encounter a scenario which needs to replace an exist block
>> with a newly written block
>> > The most straightforward way to finish may be like this:
>> > Suppose the original file is A, and we write a new file B which is
>> composed by the new data blocks, then we merge A and B to C which is the
>> file we wanted
>> > The obvious shortcoming of this method is wasting of network bandwidth
>> >
>> > I'm wondering whether there is a way to replace the old block by the
>> new block directly.
>> > Any thoughts?
>> >
>> > --
>> > Best Wishes!
>> >
>> > Yours, Zesheng
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Shumin Guo <gs...@gmail.com>.

That will break the consistency of the file system, but it doesn't hurt to
try.
On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:

> How about write a new block with new checksum file, and replace the old
> block file and checksum file both?
>
>
> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
> wellington.chevreuil@gmail.com>:
>
>> Hi,
>>
>> there's no way to do that, as HDFS does not provide file updates
>> features. You'll need to write a new file with the changes.
>>
>> Notice that even if you manage to find the physical block replica files
>> on the disk, corresponding to the part of the file you want to change, you
>> can't simply update it manually, as this would give a different checksum,
>> making HDFS mark such blocks as corrupt.
>>
>> Regards,
>> Wellington.
>>
>>
>>
>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>
>> > Hi guys,
>> >
>> > I recently encounter a scenario which needs to replace an exist block
>> with a newly written block
>> > The most straightforward way to finish may be like this:
>> > Suppose the original file is A, and we write a new file B which is
>> composed by the new data blocks, then we merge A and B to C which is the
>> file we wanted
>> > The obvious shortcoming of this method is wasting of network bandwidth
>> >
>> > I'm wondering whether there is a way to replace the old block by the
>> new block directly.
>> > Any thoughts?
>> >
>> > --
>> > Best Wishes!
>> >
>> > Yours, Zesheng
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Shumin Guo <gs...@gmail.com>.

That will break the consistency of the file system, but it doesn't hurt to
try.
On Jul 17, 2014 8:48 PM, "Zesheng Wu" <wu...@gmail.com> wrote:

> How about write a new block with new checksum file, and replace the old
> block file and checksum file both?
>
>
> 2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
> wellington.chevreuil@gmail.com>:
>
>> Hi,
>>
>> there's no way to do that, as HDFS does not provide file updates
>> features. You'll need to write a new file with the changes.
>>
>> Notice that even if you manage to find the physical block replica files
>> on the disk, corresponding to the part of the file you want to change, you
>> can't simply update it manually, as this would give a different checksum,
>> making HDFS mark such blocks as corrupt.
>>
>> Regards,
>> Wellington.
>>
>>
>>
>> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>>
>> > Hi guys,
>> >
>> > I recently encounter a scenario which needs to replace an exist block
>> with a newly written block
>> > The most straightforward way to finish may be like this:
>> > Suppose the original file is A, and we write a new file B which is
>> composed by the new data blocks, then we merge A and B to C which is the
>> file we wanted
>> > The obvious shortcoming of this method is wasting of network bandwidth
>> >
>> > I'm wondering whether there is a way to replace the old block by the
>> new block directly.
>> > Any thoughts?
>> >
>> > --
>> > Best Wishes!
>> >
>> > Yours, Zesheng
>>
>>
>
>
> --
> Best Wishes!
>
> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

How about write a new block with new checksum file, and replace the old
block file and checksum file both?


2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
wellington.chevreuil@gmail.com>:

> Hi,
>
> there's no way to do that, as HDFS does not provide file updates features.
> You'll need to write a new file with the changes.
>
> Notice that even if you manage to find the physical block replica files on
> the disk, corresponding to the part of the file you want to change, you
> can't simply update it manually, as this would give a different checksum,
> making HDFS mark such blocks as corrupt.
>
> Regards,
> Wellington.
>
>
>
> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
> > Hi guys,
> >
> > I recently encounter a scenario which needs to replace an exist block
> with a newly written block
> > The most straightforward way to finish may be like this:
> > Suppose the original file is A, and we write a new file B which is
> composed by the new data blocks, then we merge A and B to C which is the
> file we wanted
> > The obvious shortcoming of this method is wasting of network bandwidth
> >
> > I'm wondering whether there is a way to replace the old block by the new
> block directly.
> > Any thoughts?
> >
> > --
> > Best Wishes!
> >
> > Yours, Zesheng
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

How about write a new block with new checksum file, and replace the old
block file and checksum file both?


2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
wellington.chevreuil@gmail.com>:

> Hi,
>
> there's no way to do that, as HDFS does not provide file updates features.
> You'll need to write a new file with the changes.
>
> Notice that even if you manage to find the physical block replica files on
> the disk, corresponding to the part of the file you want to change, you
> can't simply update it manually, as this would give a different checksum,
> making HDFS mark such blocks as corrupt.
>
> Regards,
> Wellington.
>
>
>
> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
> > Hi guys,
> >
> > I recently encounter a scenario which needs to replace an exist block
> with a newly written block
> > The most straightforward way to finish may be like this:
> > Suppose the original file is A, and we write a new file B which is
> composed by the new data blocks, then we merge A and B to C which is the
> file we wanted
> > The obvious shortcoming of this method is wasting of network bandwidth
> >
> > I'm wondering whether there is a way to replace the old block by the new
> block directly.
> > Any thoughts?
> >
> > --
> > Best Wishes!
> >
> > Yours, Zesheng
>
>


-- 
Best Wishes!

Yours, Zesheng

umsubscribe

Posted by "jason_jice@yahoo.com" <ja...@yahoo.com>.


-------- Original Message --------
From: Wellington Chevreuil <we...@gmail.com>
Sent: Thursday, July 17, 2014 04:34 AM
To: user@hadoop.apache.org
Subject: Re: Replace a block with a new one

>Hi,
>
>there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 
>
>Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.
>
>Regards,
>Wellington.
>
>
>
>On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Hi guys,
>> 
>> I recently encounter a scenario which needs to replace an exist block with a newly written block
>> The most straightforward way to finish may be like this:
>> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
>> The obvious shortcoming of this method is wasting of network bandwidth 
>> 
>> I'm wondering whether there is a way to replace the old block by the new block directly.
>> Any thoughts?
>> 
>> -- 
>> Best Wishes!
>> 
>> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

How about write a new block with new checksum file, and replace the old
block file and checksum file both?


2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
wellington.chevreuil@gmail.com>:

> Hi,
>
> there's no way to do that, as HDFS does not provide file updates features.
> You'll need to write a new file with the changes.
>
> Notice that even if you manage to find the physical block replica files on
> the disk, corresponding to the part of the file you want to change, you
> can't simply update it manually, as this would give a different checksum,
> making HDFS mark such blocks as corrupt.
>
> Regards,
> Wellington.
>
>
>
> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
> > Hi guys,
> >
> > I recently encounter a scenario which needs to replace an exist block
> with a newly written block
> > The most straightforward way to finish may be like this:
> > Suppose the original file is A, and we write a new file B which is
> composed by the new data blocks, then we merge A and B to C which is the
> file we wanted
> > The obvious shortcoming of this method is wasting of network bandwidth
> >
> > I'm wondering whether there is a way to replace the old block by the new
> block directly.
> > Any thoughts?
> >
> > --
> > Best Wishes!
> >
> > Yours, Zesheng
>
>


-- 
Best Wishes!

Yours, Zesheng

Re: Replace a block with a new one

Posted by Zesheng Wu <wu...@gmail.com>.

How about write a new block with new checksum file, and replace the old
block file and checksum file both?


2014-07-17 19:34 GMT+08:00 Wellington Chevreuil <
wellington.chevreuil@gmail.com>:

> Hi,
>
> there's no way to do that, as HDFS does not provide file updates features.
> You'll need to write a new file with the changes.
>
> Notice that even if you manage to find the physical block replica files on
> the disk, corresponding to the part of the file you want to change, you
> can't simply update it manually, as this would give a different checksum,
> making HDFS mark such blocks as corrupt.
>
> Regards,
> Wellington.
>
>
>
> On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
> > Hi guys,
> >
> > I recently encounter a scenario which needs to replace an exist block
> with a newly written block
> > The most straightforward way to finish may be like this:
> > Suppose the original file is A, and we write a new file B which is
> composed by the new data blocks, then we merge A and B to C which is the
> file we wanted
> > The obvious shortcoming of this method is wasting of network bandwidth
> >
> > I'm wondering whether there is a way to replace the old block by the new
> block directly.
> > Any thoughts?
> >
> > --
> > Best Wishes!
> >
> > Yours, Zesheng
>
>


-- 
Best Wishes!

Yours, Zesheng

umsubscribe

Posted by "jason_jice@yahoo.com" <ja...@yahoo.com>.


-------- Original Message --------
From: Wellington Chevreuil <we...@gmail.com>
Sent: Thursday, July 17, 2014 04:34 AM
To: user@hadoop.apache.org
Subject: Re: Replace a block with a new one

>Hi,
>
>there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 
>
>Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.
>
>Regards,
>Wellington.
>
>
>
>On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Hi guys,
>> 
>> I recently encounter a scenario which needs to replace an exist block with a newly written block
>> The most straightforward way to finish may be like this:
>> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
>> The obvious shortcoming of this method is wasting of network bandwidth 
>> 
>> I'm wondering whether there is a way to replace the old block by the new block directly.
>> Any thoughts?
>> 
>> -- 
>> Best Wishes!
>> 
>> Yours, Zesheng
>

umsubscribe

Posted by "jason_jice@yahoo.com" <ja...@yahoo.com>.


-------- Original Message --------
From: Wellington Chevreuil <we...@gmail.com>
Sent: Thursday, July 17, 2014 04:34 AM
To: user@hadoop.apache.org
Subject: Re: Replace a block with a new one

>Hi,
>
>there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 
>
>Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.
>
>Regards,
>Wellington.
>
>
>
>On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Hi guys,
>> 
>> I recently encounter a scenario which needs to replace an exist block with a newly written block
>> The most straightforward way to finish may be like this:
>> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
>> The obvious shortcoming of this method is wasting of network bandwidth 
>> 
>> I'm wondering whether there is a way to replace the old block by the new block directly.
>> Any thoughts?
>> 
>> -- 
>> Best Wishes!
>> 
>> Yours, Zesheng
>

umsubscribe

Posted by "jason_jice@yahoo.com" <ja...@yahoo.com>.


-------- Original Message --------
From: Wellington Chevreuil <we...@gmail.com>
Sent: Thursday, July 17, 2014 04:34 AM
To: user@hadoop.apache.org
Subject: Re: Replace a block with a new one

>Hi,
>
>there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 
>
>Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.
>
>Regards,
>Wellington.
>
>
>
>On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:
>
>> Hi guys,
>> 
>> I recently encounter a scenario which needs to replace an exist block with a newly written block
>> The most straightforward way to finish may be like this:
>> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
>> The obvious shortcoming of this method is wasting of network bandwidth 
>> 
>> I'm wondering whether there is a way to replace the old block by the new block directly.
>> Any thoughts?
>> 
>> -- 
>> Best Wishes!
>> 
>> Yours, Zesheng
>

Re: Replace a block with a new one

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi,

there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 

Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.

Regards,
Wellington.

On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:

> Hi guys,
> 
> I recently encounter a scenario which needs to replace an exist block with a newly written block
> The most straightforward way to finish may be like this:
> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
> The obvious shortcoming of this method is wasting of network bandwidth 
> 
> I'm wondering whether there is a way to replace the old block by the new block directly.
> Any thoughts?
> 
> -- 
> Best Wishes!
> 
> Yours, Zesheng

Re: Replace a block with a new one

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi,

there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 

Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.

Regards,
Wellington.

On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:

> Hi guys,
> 
> I recently encounter a scenario which needs to replace an exist block with a newly written block
> The most straightforward way to finish may be like this:
> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
> The obvious shortcoming of this method is wasting of network bandwidth 
> 
> I'm wondering whether there is a way to replace the old block by the new block directly.
> Any thoughts?
> 
> -- 
> Best Wishes!
> 
> Yours, Zesheng

Re: Replace a block with a new one

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi,

there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 

Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.

Regards,
Wellington.

On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:

> Hi guys,
> 
> I recently encounter a scenario which needs to replace an exist block with a newly written block
> The most straightforward way to finish may be like this:
> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
> The obvious shortcoming of this method is wasting of network bandwidth 
> 
> I'm wondering whether there is a way to replace the old block by the new block directly.
> Any thoughts?
> 
> -- 
> Best Wishes!
> 
> Yours, Zesheng

Re: Replace a block with a new one

Posted by Wellington Chevreuil <we...@gmail.com>.

Hi,

there's no way to do that, as HDFS does not provide file updates features. You'll need to write a new file with the changes. 

Notice that even if you manage to find the physical block replica files on the disk, corresponding to the part of the file you want to change, you can't simply update it manually, as this would give a different checksum, making HDFS mark such blocks as corrupt.

Regards,
Wellington.

On 17 Jul 2014, at 10:50, Zesheng Wu <wu...@gmail.com> wrote:

> Hi guys,
> 
> I recently encounter a scenario which needs to replace an exist block with a newly written block
> The most straightforward way to finish may be like this:
> Suppose the original file is A, and we write a new file B which is composed by the new data blocks, then we merge A and B to C which is the file we wanted
> The obvious shortcoming of this method is wasting of network bandwidth 
> 
> I'm wondering whether there is a way to replace the old block by the new block directly.
> Any thoughts?
> 
> -- 
> Best Wishes!
> 
> Yours, Zesheng