You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Romain Manni-Bucau <rm...@gmail.com> on 2018/04/06 15:16:55 UTC

building on top of filesystem, can beam help?

Hi guys,

I have a use case where I'd like to be able to expose to a user some file
system navigation and enable him to visualize the file system (as in beam
sense)

Technically it is a matter of being able to use glob pattern to browse the
file system using match(specs).

What is important in that use case is to align the visualization and the
potential runtime to have the same impl/view and not have to split it in 2
code branches which can lead to inconsistency.

Therefore i'd like to be able to reuse beam FileSystem but I have a few
blockers:

1. it is nested in sdk-java-core which brings 2 drawbacks
a. it brings the whole beam sdk which is not desired in that part of the
app (should not be visible in the classpath)
b. the dependency stack is just unpractiable (guava, jackson, byte-buddy,
avro, joda, at least, are not desired at all here) and a shade makes it way
too fat to be a valid dependency for that usage
2. I don't know how to configure the FS from one of its instance (I'd like
to be able to get its options class like FileSystem#getConfigurationType
returning a PipelineOptions)

Do you think it is possible to extract the filesystem API in a dependency
free beam subproject (or at least submodule) and add the configuration hint
in the API?

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>

Re: building on top of filesystem, can beam help?

Posted by Romain Manni-Bucau <rm...@gmail.com>.
I did a PR for that but it just beings connectivity to beam. To solve any
issue the opposite is the only valid option.

Le 6 avr. 2018 22:31, "Reuven Lax" <re...@google.com> a écrit :

In the other thread, we suggested writing a Beam FileSystem impl that wraps
VFS. Is that a path forward here? Then you can build on top of VFS instead,
and simply layer VfsFilesystem on top of it when running on Beam.

On Fri, Apr 6, 2018 at 1:23 PM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Partially. Will run with beam in half of the cases or without in the
> remaining 50% (and in this case the dependencies+api are currently
> blocking). My constraint is that to activate any feature i must be able to
> cover both cases.
>
>
>
>
> Le 6 avr. 2018 22:14, "Reuven Lax" <re...@google.com> a écrit :
>
>> So is this project of yours also built on top of Beam, or is it unrelated?
>>
>> On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>
>>> Issues forking are:
>>>
>>> 1. I have to drop beam FileIO (in all its flavors) which means not
>>> taking any benefit from beam in that area which is 50% of beam gain (the
>>> other being the portability)
>>> 2. I have to maintain a bridge for all filesystem impl - being said it
>>> still misses some info ATM
>>> 3. It is still in beam sdk so "here" which is misleading for dev (can be
>>> fixed if beam becomes modular)
>>>
>>> As a side note - and to link with another thread topic: with vfs as an
>>> abstraction i dont have that issue at all.
>>>
>>> Le 6 avr. 2018 20:35, "Reuven Lax" <re...@google.com> a écrit :
>>>
>>>> Personally, this is a case where I think forking might be a better
>>>> option, even though I'm not generally a fan of duplicating code.
>>>>
>>>> In past projects, depending on internal modules of other projects never
>>>> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
>>>> might make changes to it that cause problems for your other project. I
>>>> would recommend starting by forking if it serves your needs.
>>>>
>>>> Reuven
>>>>
>>>> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <
>>>> rmannibucau@gmail.com> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I have a use case where I'd like to be able to expose to a user some
>>>>> file system navigation and enable him to visualize the file system (as in
>>>>> beam sense)
>>>>>
>>>>> Technically it is a matter of being able to use glob pattern to browse
>>>>> the file system using match(specs).
>>>>>
>>>>> What is important in that use case is to align the visualization and
>>>>> the potential runtime to have the same impl/view and not have to split it
>>>>> in 2 code branches which can lead to inconsistency.
>>>>>
>>>>> Therefore i'd like to be able to reuse beam FileSystem but I have a
>>>>> few blockers:
>>>>>
>>>>> 1. it is nested in sdk-java-core which brings 2 drawbacks
>>>>> a. it brings the whole beam sdk which is not desired in that part of
>>>>> the app (should not be visible in the classpath)
>>>>> b. the dependency stack is just unpractiable (guava, jackson,
>>>>> byte-buddy, avro, joda, at least, are not desired at all here) and a shade
>>>>> makes it way too fat to be a valid dependency for that usage
>>>>> 2. I don't know how to configure the FS from one of its instance (I'd
>>>>> like to be able to get its options class like FileSystem#getConfigurationType
>>>>> returning a PipelineOptions)
>>>>>
>>>>> Do you think it is possible to extract the filesystem API in a
>>>>> dependency free beam subproject (or at least submodule) and add the
>>>>> configuration hint in the API?
>>>>>
>>>>> Romain Manni-Bucau
>>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>> <http://rmannibucau.wordpress.com> | Github
>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>
>>>>

Re: building on top of filesystem, can beam help?

Posted by Reuven Lax <re...@google.com>.
In the other thread, we suggested writing a Beam FileSystem impl that wraps
VFS. Is that a path forward here? Then you can build on top of VFS instead,
and simply layer VfsFilesystem on top of it when running on Beam.

On Fri, Apr 6, 2018 at 1:23 PM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Partially. Will run with beam in half of the cases or without in the
> remaining 50% (and in this case the dependencies+api are currently
> blocking). My constraint is that to activate any feature i must be able to
> cover both cases.
>
>
>
>
> Le 6 avr. 2018 22:14, "Reuven Lax" <re...@google.com> a écrit :
>
>> So is this project of yours also built on top of Beam, or is it unrelated?
>>
>> On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>
>>> Issues forking are:
>>>
>>> 1. I have to drop beam FileIO (in all its flavors) which means not
>>> taking any benefit from beam in that area which is 50% of beam gain (the
>>> other being the portability)
>>> 2. I have to maintain a bridge for all filesystem impl - being said it
>>> still misses some info ATM
>>> 3. It is still in beam sdk so "here" which is misleading for dev (can be
>>> fixed if beam becomes modular)
>>>
>>> As a side note - and to link with another thread topic: with vfs as an
>>> abstraction i dont have that issue at all.
>>>
>>> Le 6 avr. 2018 20:35, "Reuven Lax" <re...@google.com> a écrit :
>>>
>>>> Personally, this is a case where I think forking might be a better
>>>> option, even though I'm not generally a fan of duplicating code.
>>>>
>>>> In past projects, depending on internal modules of other projects never
>>>> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
>>>> might make changes to it that cause problems for your other project. I
>>>> would recommend starting by forking if it serves your needs.
>>>>
>>>> Reuven
>>>>
>>>> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <
>>>> rmannibucau@gmail.com> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I have a use case where I'd like to be able to expose to a user some
>>>>> file system navigation and enable him to visualize the file system (as in
>>>>> beam sense)
>>>>>
>>>>> Technically it is a matter of being able to use glob pattern to browse
>>>>> the file system using match(specs).
>>>>>
>>>>> What is important in that use case is to align the visualization and
>>>>> the potential runtime to have the same impl/view and not have to split it
>>>>> in 2 code branches which can lead to inconsistency.
>>>>>
>>>>> Therefore i'd like to be able to reuse beam FileSystem but I have a
>>>>> few blockers:
>>>>>
>>>>> 1. it is nested in sdk-java-core which brings 2 drawbacks
>>>>> a. it brings the whole beam sdk which is not desired in that part of
>>>>> the app (should not be visible in the classpath)
>>>>> b. the dependency stack is just unpractiable (guava, jackson,
>>>>> byte-buddy, avro, joda, at least, are not desired at all here) and a shade
>>>>> makes it way too fat to be a valid dependency for that usage
>>>>> 2. I don't know how to configure the FS from one of its instance (I'd
>>>>> like to be able to get its options class like
>>>>> FileSystem#getConfigurationType returning a PipelineOptions)
>>>>>
>>>>> Do you think it is possible to extract the filesystem API in a
>>>>> dependency free beam subproject (or at least submodule) and add the
>>>>> configuration hint in the API?
>>>>>
>>>>> Romain Manni-Bucau
>>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>>> <http://rmannibucau.wordpress.com> | Github
>>>>> <https://github.com/rmannibucau> | LinkedIn
>>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>>
>>>>

Re: building on top of filesystem, can beam help?

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Partially. Will run with beam in half of the cases or without in the
remaining 50% (and in this case the dependencies+api are currently
blocking). My constraint is that to activate any feature i must be able to
cover both cases.




Le 6 avr. 2018 22:14, "Reuven Lax" <re...@google.com> a écrit :

> So is this project of yours also built on top of Beam, or is it unrelated?
>
> On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> Issues forking are:
>>
>> 1. I have to drop beam FileIO (in all its flavors) which means not taking
>> any benefit from beam in that area which is 50% of beam gain (the other
>> being the portability)
>> 2. I have to maintain a bridge for all filesystem impl - being said it
>> still misses some info ATM
>> 3. It is still in beam sdk so "here" which is misleading for dev (can be
>> fixed if beam becomes modular)
>>
>> As a side note - and to link with another thread topic: with vfs as an
>> abstraction i dont have that issue at all.
>>
>> Le 6 avr. 2018 20:35, "Reuven Lax" <re...@google.com> a écrit :
>>
>>> Personally, this is a case where I think forking might be a better
>>> option, even though I'm not generally a fan of duplicating code.
>>>
>>> In past projects, depending on internal modules of other projects never
>>> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
>>> might make changes to it that cause problems for your other project. I
>>> would recommend starting by forking if it serves your needs.
>>>
>>> Reuven
>>>
>>> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <rm...@gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I have a use case where I'd like to be able to expose to a user some
>>>> file system navigation and enable him to visualize the file system (as in
>>>> beam sense)
>>>>
>>>> Technically it is a matter of being able to use glob pattern to browse
>>>> the file system using match(specs).
>>>>
>>>> What is important in that use case is to align the visualization and
>>>> the potential runtime to have the same impl/view and not have to split it
>>>> in 2 code branches which can lead to inconsistency.
>>>>
>>>> Therefore i'd like to be able to reuse beam FileSystem but I have a few
>>>> blockers:
>>>>
>>>> 1. it is nested in sdk-java-core which brings 2 drawbacks
>>>> a. it brings the whole beam sdk which is not desired in that part of
>>>> the app (should not be visible in the classpath)
>>>> b. the dependency stack is just unpractiable (guava, jackson,
>>>> byte-buddy, avro, joda, at least, are not desired at all here) and a shade
>>>> makes it way too fat to be a valid dependency for that usage
>>>> 2. I don't know how to configure the FS from one of its instance (I'd
>>>> like to be able to get its options class like FileSystem#getConfigurationType
>>>> returning a PipelineOptions)
>>>>
>>>> Do you think it is possible to extract the filesystem API in a
>>>> dependency free beam subproject (or at least submodule) and add the
>>>> configuration hint in the API?
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> | Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>>
>>>

Re: building on top of filesystem, can beam help?

Posted by Reuven Lax <re...@google.com>.
So is this project of yours also built on top of Beam, or is it unrelated?

On Fri, Apr 6, 2018 at 1:12 PM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Issues forking are:
>
> 1. I have to drop beam FileIO (in all its flavors) which means not taking
> any benefit from beam in that area which is 50% of beam gain (the other
> being the portability)
> 2. I have to maintain a bridge for all filesystem impl - being said it
> still misses some info ATM
> 3. It is still in beam sdk so "here" which is misleading for dev (can be
> fixed if beam becomes modular)
>
> As a side note - and to link with another thread topic: with vfs as an
> abstraction i dont have that issue at all.
>
> Le 6 avr. 2018 20:35, "Reuven Lax" <re...@google.com> a écrit :
>
>> Personally, this is a case where I think forking might be a better
>> option, even though I'm not generally a fan of duplicating code.
>>
>> In past projects, depending on internal modules of other projects never
>> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
>> might make changes to it that cause problems for your other project. I
>> would recommend starting by forking if it serves your needs.
>>
>> Reuven
>>
>> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <rm...@gmail.com>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I have a use case where I'd like to be able to expose to a user some
>>> file system navigation and enable him to visualize the file system (as in
>>> beam sense)
>>>
>>> Technically it is a matter of being able to use glob pattern to browse
>>> the file system using match(specs).
>>>
>>> What is important in that use case is to align the visualization and the
>>> potential runtime to have the same impl/view and not have to split it in 2
>>> code branches which can lead to inconsistency.
>>>
>>> Therefore i'd like to be able to reuse beam FileSystem but I have a few
>>> blockers:
>>>
>>> 1. it is nested in sdk-java-core which brings 2 drawbacks
>>> a. it brings the whole beam sdk which is not desired in that part of the
>>> app (should not be visible in the classpath)
>>> b. the dependency stack is just unpractiable (guava, jackson,
>>> byte-buddy, avro, joda, at least, are not desired at all here) and a shade
>>> makes it way too fat to be a valid dependency for that usage
>>> 2. I don't know how to configure the FS from one of its instance (I'd
>>> like to be able to get its options class like
>>> FileSystem#getConfigurationType returning a PipelineOptions)
>>>
>>> Do you think it is possible to extract the filesystem API in a
>>> dependency free beam subproject (or at least submodule) and add the
>>> configuration hint in the API?
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>> <http://rmannibucau.wordpress.com> | Github
>>> <https://github.com/rmannibucau> | LinkedIn
>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>
>>

Re: building on top of filesystem, can beam help?

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Issues forking are:

1. I have to drop beam FileIO (in all its flavors) which means not taking
any benefit from beam in that area which is 50% of beam gain (the other
being the portability)
2. I have to maintain a bridge for all filesystem impl - being said it
still misses some info ATM
3. It is still in beam sdk so "here" which is misleading for dev (can be
fixed if beam becomes modular)

As a side note - and to link with another thread topic: with vfs as an
abstraction i dont have that issue at all.

Le 6 avr. 2018 20:35, "Reuven Lax" <re...@google.com> a écrit :

> Personally, this is a case where I think forking might be a better option,
> even though I'm not generally a fan of duplicating code.
>
> In past projects, depending on internal modules of other projects never
> seems lead to good outcomes. FileSystem exists for Beam today, and Beam
> might make changes to it that cause problems for your other project. I
> would recommend starting by forking if it serves your needs.
>
> Reuven
>
> On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> Hi guys,
>>
>> I have a use case where I'd like to be able to expose to a user some file
>> system navigation and enable him to visualize the file system (as in beam
>> sense)
>>
>> Technically it is a matter of being able to use glob pattern to browse
>> the file system using match(specs).
>>
>> What is important in that use case is to align the visualization and the
>> potential runtime to have the same impl/view and not have to split it in 2
>> code branches which can lead to inconsistency.
>>
>> Therefore i'd like to be able to reuse beam FileSystem but I have a few
>> blockers:
>>
>> 1. it is nested in sdk-java-core which brings 2 drawbacks
>> a. it brings the whole beam sdk which is not desired in that part of the
>> app (should not be visible in the classpath)
>> b. the dependency stack is just unpractiable (guava, jackson, byte-buddy,
>> avro, joda, at least, are not desired at all here) and a shade makes it way
>> too fat to be a valid dependency for that usage
>> 2. I don't know how to configure the FS from one of its instance (I'd
>> like to be able to get its options class like FileSystem#getConfigurationType
>> returning a PipelineOptions)
>>
>> Do you think it is possible to extract the filesystem API in a dependency
>> free beam subproject (or at least submodule) and add the configuration hint
>> in the API?
>>
>> Romain Manni-Bucau
>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>> <https://rmannibucau.metawerx.net/> | Old Blog
>> <http://rmannibucau.wordpress.com> | Github
>> <https://github.com/rmannibucau> | LinkedIn
>> <https://www.linkedin.com/in/rmannibucau> | Book
>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>
>

Re: building on top of filesystem, can beam help?

Posted by Reuven Lax <re...@google.com>.
Personally, this is a case where I think forking might be a better option,
even though I'm not generally a fan of duplicating code.

In past projects, depending on internal modules of other projects never
seems lead to good outcomes. FileSystem exists for Beam today, and Beam
might make changes to it that cause problems for your other project. I
would recommend starting by forking if it serves your needs.

Reuven

On Fri, Apr 6, 2018 at 8:17 AM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Hi guys,
>
> I have a use case where I'd like to be able to expose to a user some file
> system navigation and enable him to visualize the file system (as in beam
> sense)
>
> Technically it is a matter of being able to use glob pattern to browse the
> file system using match(specs).
>
> What is important in that use case is to align the visualization and the
> potential runtime to have the same impl/view and not have to split it in 2
> code branches which can lead to inconsistency.
>
> Therefore i'd like to be able to reuse beam FileSystem but I have a few
> blockers:
>
> 1. it is nested in sdk-java-core which brings 2 drawbacks
> a. it brings the whole beam sdk which is not desired in that part of the
> app (should not be visible in the classpath)
> b. the dependency stack is just unpractiable (guava, jackson, byte-buddy,
> avro, joda, at least, are not desired at all here) and a shade makes it way
> too fat to be a valid dependency for that usage
> 2. I don't know how to configure the FS from one of its instance (I'd like
> to be able to get its options class like FileSystem#getConfigurationType
> returning a PipelineOptions)
>
> Do you think it is possible to extract the filesystem API in a dependency
> free beam subproject (or at least submodule) and add the configuration hint
> in the API?
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>