You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Steven Rand <st...@gmail.com> on 2019/03/06 04:13:15 UTC

erasure coding with 2.x clients

Hi all,

I wanted to suggest the possibility of backporting client-side erasure
coding changes to branch-2, and get feedback on whether this is (1)
desirable, and (2) feasible without also backporting the server-side
changes.

Currently, client-side code to support erasure coding hasn't been
backported to branch-2, and as a result, both reads and writes of
erasure-coded data with 2.x clients fail:

* Running "hdfs dfs -get" on an erasure-coded file with a 2.9 client fails
with "java.io.IOException: Unexpected EOS from the reader" coming
from DFSInputStream.readWithStrategy(DFSInputStream.java:964).
* Writing to an erasure-coded directory via "hdfs dfs -put" with a Hadoop
2.9 client fails with a NotReplicatedYetException. (Writing the same file
to a directory that doesn't use erasure coding succeeds with the 2.9
client, and writing the file to the directory with erasure coding succeeds
using a 3.2 client.)

I think it's desirable to backport the client-side erasure coding support
to branch-2. Currently we have wire compatibility that allows 2.x clients
to run on 3.x clusters; however, these clients can't make use of one of the
most compelling features of Hadoop 3.

However, I don't know the code well enough to say whether it's possible to
backport the client-side changes without also pulling in the server-side
changes, at which point the scope of the backport increases dramatically.

I'm hoping people can weigh in on whether this is something we want to do,
and also on whether it's something we can do without backporting the
server-side changes as well.

If this a reasonable request, I'll file a JIRA for it.

Thanks,
Steve

Re: erasure coding with 2.x clients

Posted by Steven Rand <st...@gmail.com>.
After reading some of the code, I believe that it should be possible to
backport the client-side changes without the server-side changes. I've
created https://issues.apache.org/jira/browse/HDFS-14352 to track.

Best,
Steve

On Wed, Mar 6, 2019 at 3:13 PM Steven Rand <st...@gmail.com> wrote:

> Hi all,
>
> I wanted to suggest the possibility of backporting client-side erasure
> coding changes to branch-2, and get feedback on whether this is (1)
> desirable, and (2) feasible without also backporting the server-side
> changes.
>
> Currently, client-side code to support erasure coding hasn't been
> backported to branch-2, and as a result, both reads and writes of
> erasure-coded data with 2.x clients fail:
>
> * Running "hdfs dfs -get" on an erasure-coded file with a 2.9 client fails
> with "java.io.IOException: Unexpected EOS from the reader" coming
> from DFSInputStream.readWithStrategy(DFSInputStream.java:964).
> * Writing to an erasure-coded directory via "hdfs dfs -put" with a Hadoop
> 2.9 client fails with a NotReplicatedYetException. (Writing the same file
> to a directory that doesn't use erasure coding succeeds with the 2.9
> client, and writing the file to the directory with erasure coding succeeds
> using a 3.2 client.)
>
> I think it's desirable to backport the client-side erasure coding support
> to branch-2. Currently we have wire compatibility that allows 2.x clients
> to run on 3.x clusters; however, these clients can't make use of one of the
> most compelling features of Hadoop 3.
>
> However, I don't know the code well enough to say whether it's possible to
> backport the client-side changes without also pulling in the server-side
> changes, at which point the scope of the backport increases dramatically.
>
> I'm hoping people can weigh in on whether this is something we want to do,
> and also on whether it's something we can do without backporting the
> server-side changes as well.
>
> If this a reasonable request, I'll file a JIRA for it.
>
> Thanks,
> Steve
>