You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Pankaj Misra <pa...@impetus.co.in> on 2012/12/13 10:50:50 UTC

Erasure Coding in HDFS

Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra


________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Dear All,

Any ideas on the questions below would be extremely helpful, requesting for your help and support, appreciate it. Thanks.

Regards
Pankaj Misra

From: Pankaj Misra
Sent: Friday, December 14, 2012 5:42 PM
To: user@hadoop.apache.org
Subject: RE: Erasure Coding in HDFS

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Erasure Coding in HDFS
Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Dear All,

Any ideas on the questions below would be extremely helpful, requesting for your help and support, appreciate it. Thanks.

Regards
Pankaj Misra

From: Pankaj Misra
Sent: Friday, December 14, 2012 5:42 PM
To: user@hadoop.apache.org
Subject: RE: Erasure Coding in HDFS

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Erasure Coding in HDFS
Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Dear All,

Any ideas on the questions below would be extremely helpful, requesting for your help and support, appreciate it. Thanks.

Regards
Pankaj Misra

From: Pankaj Misra
Sent: Friday, December 14, 2012 5:42 PM
To: user@hadoop.apache.org
Subject: RE: Erasure Coding in HDFS

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Erasure Coding in HDFS
Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Dear All,

Any ideas on the questions below would be extremely helpful, requesting for your help and support, appreciate it. Thanks.

Regards
Pankaj Misra

From: Pankaj Misra
Sent: Friday, December 14, 2012 5:42 PM
To: user@hadoop.apache.org
Subject: RE: Erasure Coding in HDFS

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Erasure Coding in HDFS
Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org
Subject: Erasure Coding in HDFS

Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org
Subject: Erasure Coding in HDFS

Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org
Subject: Erasure Coding in HDFS

Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Erasure Coding in HDFS

Posted by Pankaj Misra <pa...@impetus.co.in>.

Requesting for community's help for the questions below, as it will help us better understand erasure coding at HDFS level in context of replication. Thanks.

Thanks and Regards
Pankaj Misra
________________________________
From: Pankaj Misra
Sent: Thursday, December 13, 2012 3:20 PM
To: user@hadoop.apache.org
Subject: Erasure Coding in HDFS

Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing.

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS
https://issues.apache.org/jira/browse/HDFS-503

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.