You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Stephan Ewen <se...@apache.org> on 2015/08/26 20:12:02 UTC

Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

I think that is a very good idea.

Originally, we wrapped the Hadoop FS classes for convenience (they were
changing, we wanted to keep the system independent of Hadoop), but these
are no longer relevant reasons, in my opinion.

Let's start with your proposal and see if we can actually get rid of the
wrapping in a way that is friendly to existing users.

Would you open an issue for this?

Greetings,
Stephan


On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud <AL...@bouyguestelecom.fr>
wrote:

> Hi,
>
>
>
> I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to
> write into a hdfs file, calling
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a
>  HadoopDataOutputStream that wraps a
> org.apache.hadoop.fs.FSDataOutputStream (under its
> org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).
>
>
>
> However, FSDataOutputStream exposes many methods like flush,   getPos etc,
> but HadoopDataOutputStream only wraps write & close.
>
>
>
> For instance, flush() calls the default, empty implementation of
> OutputStream instead of the hadoop one, and that’s confusing. Moreover,
> because of the restrictive OutputStream interface, hsync() and hflush() are
> not exposed to Flink ; maybe having a getWrappedStream() would be
> convenient.
>
>
>
> (For now, that prevents me from using Flink FileSystem object, I directly
> use hadoop’s one).
>
>
>
> Regards,
>
> Arnaud
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société
> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si
> vous n'êtes pas destinataire de ce message, merci de le détruire et
> d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The
> company that sent this message cannot therefore be held liable for its
> content nor attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of this message, then
> please delete it and notify the sender.
>

Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

Posted by Stephan Ewen <se...@apache.org>.

Hi!

I pushed a fix to the master to expose more methods.

You can access the original Hadoop stream now, and you can also call
"flush()" and "sync()" in the Flink stream, which get forwarded as
"hflush()" and "hsync()" in Hadoop 2 (in Hadoop 1 these are not available).

The fix is in the master and I will make it part of the upcoming milestone
release.

Greetings,
Stephan

On Thu, Aug 27, 2015 at 9:51 AM, Ufuk Celebi <uf...@data-artisans.com> wrote:

>
> > On 27 Aug 2015, at 09:33, LINZ, Arnaud <AL...@bouyguestelecom.fr> wrote:
> >
> > Hi,
> >
> > Ok, I’ve created  FLINK-2580 to track this issue (and FLINK-2579, which
> is totally unrelated).
>
> Thanks :)
>
> > I think I’m going to set up my dev environment to start contributing a
> little more than just complaining J.
>
> If you need any help with the setup, let us know. There is also this
> guide:
> https://ci.apache.org/projects/flink/flink-docs-master/internals/ide_setup.html
>
> – Ufuk
>
>

Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

Posted by Ufuk Celebi <uf...@data-artisans.com>.

> On 27 Aug 2015, at 09:33, LINZ, Arnaud <AL...@bouyguestelecom.fr> wrote:
> 
> Hi,
>  
> Ok, I’ve created  FLINK-2580 to track this issue (and FLINK-2579, which is totally unrelated).

Thanks :)

> I think I’m going to set up my dev environment to start contributing a little more than just complaining J.

If you need any help with the setup, let us know. There is also this guide: https://ci.apache.org/projects/flink/flink-docs-master/internals/ide_setup.html

– Ufuk

RE: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

Posted by "LINZ, Arnaud" <AL...@bouyguestelecom.fr>.

Hi,

Ok, I’ve created FLINK-2580 to track this issue (and FLINK-2579, which is totally unrelated).

I think I’m going to set up my dev environment to start contributing a little more than just complaining ☺.

Best regards,
Arnaud

De : ewenstephan@gmail.com [mailto:ewenstephan@gmail.com] De la part de Stephan Ewen
Envoyé : mercredi 26 août 2015 20:12
À : user@flink.apache.org
Objet : Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

I think that is a very good idea.

Originally, we wrapped the Hadoop FS classes for convenience (they were changing, we wanted to keep the system independent of Hadoop), but these are no longer relevant reasons, in my opinion.

Let's start with your proposal and see if we can actually get rid of the wrapping in a way that is friendly to existing users.

Would you open an issue for this?

Greetings,
Stephan

On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud <AL...@bouyguestelecom.fr>> wrote:
Hi,

I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).

However, FSDataOutputStream exposes many methods like flush, getPos etc, but HadoopDataOutputStream only wraps write & close.

For instance, flush() calls the default, empty implementation of OutputStream instead of the hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not exposed to Flink ; maybe having a getWrappedStream() would be convenient.

(For now, that prevents me from using Flink FileSystem object, I directly use hadoop’s one).

Regards,
Arnaud

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.