You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hop.apache.org by Matt Casters <ma...@neo4j.com.INVALID> on 2022/02/25 14:12:22 UTC

[DISCUSS] Apache VFS & Authentication

Dear Hoppists,

Once in a blue moon the topic of authentication against Apache VFS
protocols comes up again.
The issues discussed focus on a number of different requirements that come
up during discussions in various forms on the chat channels and elsewhere.

* It should be possible to use non-system level variables when configuring
VFS features.  This is because we don't expose any IVariables objects into
the various HopVfs calls that resolve FileObjects, InputStream/OutputStream
and so on.  Important features should be configurable through a standard
environment configuration file to match our lifecycle management vision.

* We should be able to tie the authentication of VFS file systems against
the Authentication Provider plugins which are already present in Apache Hop
but go unused.  For example if we have a secure FTP target and we want to
authenticate using a secure key we should be able to provide that key and
use that to authenticate against a certain hostname.

* It should be possible to authenticate differently to several systems in
the same project or environment.  For example it should be possible to
authenticate to different hdfs:// servers at the same time or use multiple
Amazon s3:// accounts at the same time.  This would allow us to transfer
data from one account to another for example.

* It should be possible to create symbolic links through our VFS driver
usage where we map one folder to another hiding the implementation
details.

See also: https://issues.apache.org/jira/browse/HOP-3795

Architecturally I think that what this means is that we first need to
create Authentication metadata elements which use the Authentication
Provider plugins.

https://issues.apache.org/jira/browse/HOP-3796

After that we can use those authentication provider metadata elements to
make it super easy to configure VFS authentication simply by pointing to
whichever method we want to use.  We can do that in Hop VFS Authentication
metadata elements.

Technically what this means is that almost 1000 method uses of class HopVfs
need to be fed the appropriate IVariables and IHopMetadataProvider instance
at the location where the method is used (the lowest level possible).  This
sounds like a lot of work but I feel like that we should get this out of
the way sooner rather than later since the topic just keeps coming back.
The idea that nobody else out there is doing even remotely like it is not a
deterrent, it's an incentive :-)

I would love to hear your ideas and check if anyone else had any
requirements that we should take into account.

Thanks in advance,

Matt

Re: [DISCUSS] Apache VFS & Authentication

Posted by Matt Casters <ma...@neo4j.com.INVALID>.
So indeed I know cases as well where a company has over 30 FTP accounts to
access.

My idea was to have them all listed as VFS Authentication metadata objects
and that you could shield the technical details in the metadata with a
physical to logical mapping. Maybe it could be a new VFS driver like
link://ftp1/ which maps to ftp://user:pass@server1:/somefolder essentially
protecting you from changes like IP address or authentication.

There's a metadata input transform as well which would allow us to list all
these URLs then and so on.

Just an idea.

Cheers
Matt


Op vr 25 feb. 2022 19:29 schreef Brandon Jackson <us...@gmail.com>:

> I definitely agree with the use cases. Frequently, I gather data from
> different accounts.  My examples below are probably more of the same that
> you have in your original requirements observation.
>
> Getting files from one SFTP site using keys for auth and putting the files
> to a different location using passwords.
> Dropbox two different accounts. - Same scenario.
> Different google accounts with access to different projects.
> In the past, with SSH, when I configured it at the user level with key
> authorization, I remembered being able to use VFS to access the locations
> and not worry about the system prompting for passwords.  That was really
> nice because I did not have to bury the credentials in transforms.
> Getting and placing files using Box.
>
> One thing I find myself doing more of these days is writing and running
> python scripts using the requests and json modules to authenticate and hit
> against rest endpoints and save the files for further processing in Hop.
> It is otherwise pretty tricky to gather all the headers, sessions, and
> cookies to interact with endpoints.  I mention this because the
> authentication against sharepoint is a bit tricky.  I wanted enough
> authentication landscape out there to come up with a good scheme for
> storing and using credentials, whether it is VFS or other parts of the
> system like Rest Client etc.  It would be nice to have visibility of all
> the different credentials enabling access from a centralized standpoint,
> even if in practice certs are placed here and there or environment
> variables walk in, or parameters pass in.  I need a way to see the auth
> landscape.
>
> Brandon
>
> On Fri, Feb 25, 2022 at 8:12 AM Matt Casters <matt.casters@neo4j.com
> .invalid>
> wrote:
>
> > Dear Hoppists,
> >
> > Once in a blue moon the topic of authentication against Apache VFS
> > protocols comes up again.
> > The issues discussed focus on a number of different requirements that
> come
> > up during discussions in various forms on the chat channels and
> elsewhere.
> >
> > * It should be possible to use non-system level variables when
> configuring
> > VFS features.  This is because we don't expose any IVariables objects
> into
> > the various HopVfs calls that resolve FileObjects,
> InputStream/OutputStream
> > and so on.  Important features should be configurable through a standard
> > environment configuration file to match our lifecycle management vision.
> >
> > * We should be able to tie the authentication of VFS file systems against
> > the Authentication Provider plugins which are already present in Apache
> Hop
> > but go unused.  For example if we have a secure FTP target and we want to
> > authenticate using a secure key we should be able to provide that key and
> > use that to authenticate against a certain hostname.
> >
> > * It should be possible to authenticate differently to several systems in
> > the same project or environment.  For example it should be possible to
> > authenticate to different hdfs:// servers at the same time or use
> multiple
> > Amazon s3:// accounts at the same time.  This would allow us to transfer
> > data from one account to another for example.
> >
> > * It should be possible to create symbolic links through our VFS driver
> > usage where we map one folder to another hiding the implementation
> > details.
> >
> > See also: https://issues.apache.org/jira/browse/HOP-3795
> >
> > Architecturally I think that what this means is that we first need to
> > create Authentication metadata elements which use the Authentication
> > Provider plugins.
> >
> > https://issues.apache.org/jira/browse/HOP-3796
> >
> > After that we can use those authentication provider metadata elements to
> > make it super easy to configure VFS authentication simply by pointing to
> > whichever method we want to use.  We can do that in Hop VFS
> Authentication
> > metadata elements.
> >
> > Technically what this means is that almost 1000 method uses of class
> HopVfs
> > need to be fed the appropriate IVariables and IHopMetadataProvider
> instance
> > at the location where the method is used (the lowest level possible).
> This
> > sounds like a lot of work but I feel like that we should get this out of
> > the way sooner rather than later since the topic just keeps coming back.
> > The idea that nobody else out there is doing even remotely like it is
> not a
> > deterrent, it's an incentive :-)
> >
> > I would love to hear your ideas and check if anyone else had any
> > requirements that we should take into account.
> >
> > Thanks in advance,
> >
> > Matt
> >
>

Re: [DISCUSS] Apache VFS & Authentication

Posted by Matt Casters <ma...@neo4j.com.INVALID>.
So indeed I know cases as well where a company has over 30 FTP accounts to
access.

My idea was to have them all listed as VFS Authentication metadata objects
and that you could shield the technical details in the metadata with a
physical to logical mapping. Maybe it could be a new VFS driver like
link://ftp1/ which maps to ftp://user:pass@server1:/somefolder essentially
protecting you from changes like IP address or authentication.

There's a metadata input transform as well which would allow us to list all
these URLs then and so on.

Just an idea.

Cheers
Matt


Op vr 25 feb. 2022 19:29 schreef Brandon Jackson <us...@gmail.com>:

> I definitely agree with the use cases. Frequently, I gather data from
> different accounts.  My examples below are probably more of the same that
> you have in your original requirements observation.
>
> Getting files from one SFTP site using keys for auth and putting the files
> to a different location using passwords.
> Dropbox two different accounts. - Same scenario.
> Different google accounts with access to different projects.
> In the past, with SSH, when I configured it at the user level with key
> authorization, I remembered being able to use VFS to access the locations
> and not worry about the system prompting for passwords.  That was really
> nice because I did not have to bury the credentials in transforms.
> Getting and placing files using Box.
>
> One thing I find myself doing more of these days is writing and running
> python scripts using the requests and json modules to authenticate and hit
> against rest endpoints and save the files for further processing in Hop.
> It is otherwise pretty tricky to gather all the headers, sessions, and
> cookies to interact with endpoints.  I mention this because the
> authentication against sharepoint is a bit tricky.  I wanted enough
> authentication landscape out there to come up with a good scheme for
> storing and using credentials, whether it is VFS or other parts of the
> system like Rest Client etc.  It would be nice to have visibility of all
> the different credentials enabling access from a centralized standpoint,
> even if in practice certs are placed here and there or environment
> variables walk in, or parameters pass in.  I need a way to see the auth
> landscape.
>
> Brandon
>
> On Fri, Feb 25, 2022 at 8:12 AM Matt Casters <matt.casters@neo4j.com
> .invalid>
> wrote:
>
> > Dear Hoppists,
> >
> > Once in a blue moon the topic of authentication against Apache VFS
> > protocols comes up again.
> > The issues discussed focus on a number of different requirements that
> come
> > up during discussions in various forms on the chat channels and
> elsewhere.
> >
> > * It should be possible to use non-system level variables when
> configuring
> > VFS features.  This is because we don't expose any IVariables objects
> into
> > the various HopVfs calls that resolve FileObjects,
> InputStream/OutputStream
> > and so on.  Important features should be configurable through a standard
> > environment configuration file to match our lifecycle management vision.
> >
> > * We should be able to tie the authentication of VFS file systems against
> > the Authentication Provider plugins which are already present in Apache
> Hop
> > but go unused.  For example if we have a secure FTP target and we want to
> > authenticate using a secure key we should be able to provide that key and
> > use that to authenticate against a certain hostname.
> >
> > * It should be possible to authenticate differently to several systems in
> > the same project or environment.  For example it should be possible to
> > authenticate to different hdfs:// servers at the same time or use
> multiple
> > Amazon s3:// accounts at the same time.  This would allow us to transfer
> > data from one account to another for example.
> >
> > * It should be possible to create symbolic links through our VFS driver
> > usage where we map one folder to another hiding the implementation
> > details.
> >
> > See also: https://issues.apache.org/jira/browse/HOP-3795
> >
> > Architecturally I think that what this means is that we first need to
> > create Authentication metadata elements which use the Authentication
> > Provider plugins.
> >
> > https://issues.apache.org/jira/browse/HOP-3796
> >
> > After that we can use those authentication provider metadata elements to
> > make it super easy to configure VFS authentication simply by pointing to
> > whichever method we want to use.  We can do that in Hop VFS
> Authentication
> > metadata elements.
> >
> > Technically what this means is that almost 1000 method uses of class
> HopVfs
> > need to be fed the appropriate IVariables and IHopMetadataProvider
> instance
> > at the location where the method is used (the lowest level possible).
> This
> > sounds like a lot of work but I feel like that we should get this out of
> > the way sooner rather than later since the topic just keeps coming back.
> > The idea that nobody else out there is doing even remotely like it is
> not a
> > deterrent, it's an incentive :-)
> >
> > I would love to hear your ideas and check if anyone else had any
> > requirements that we should take into account.
> >
> > Thanks in advance,
> >
> > Matt
> >
>

Re: [DISCUSS] Apache VFS & Authentication

Posted by Brandon Jackson <us...@gmail.com>.
I definitely agree with the use cases. Frequently, I gather data from
different accounts.  My examples below are probably more of the same that
you have in your original requirements observation.

Getting files from one SFTP site using keys for auth and putting the files
to a different location using passwords.
Dropbox two different accounts. - Same scenario.
Different google accounts with access to different projects.
In the past, with SSH, when I configured it at the user level with key
authorization, I remembered being able to use VFS to access the locations
and not worry about the system prompting for passwords.  That was really
nice because I did not have to bury the credentials in transforms.
Getting and placing files using Box.

One thing I find myself doing more of these days is writing and running
python scripts using the requests and json modules to authenticate and hit
against rest endpoints and save the files for further processing in Hop.
It is otherwise pretty tricky to gather all the headers, sessions, and
cookies to interact with endpoints.  I mention this because the
authentication against sharepoint is a bit tricky.  I wanted enough
authentication landscape out there to come up with a good scheme for
storing and using credentials, whether it is VFS or other parts of the
system like Rest Client etc.  It would be nice to have visibility of all
the different credentials enabling access from a centralized standpoint,
even if in practice certs are placed here and there or environment
variables walk in, or parameters pass in.  I need a way to see the auth
landscape.

Brandon

On Fri, Feb 25, 2022 at 8:12 AM Matt Casters <ma...@neo4j.com.invalid>
wrote:

> Dear Hoppists,
>
> Once in a blue moon the topic of authentication against Apache VFS
> protocols comes up again.
> The issues discussed focus on a number of different requirements that come
> up during discussions in various forms on the chat channels and elsewhere.
>
> * It should be possible to use non-system level variables when configuring
> VFS features.  This is because we don't expose any IVariables objects into
> the various HopVfs calls that resolve FileObjects, InputStream/OutputStream
> and so on.  Important features should be configurable through a standard
> environment configuration file to match our lifecycle management vision.
>
> * We should be able to tie the authentication of VFS file systems against
> the Authentication Provider plugins which are already present in Apache Hop
> but go unused.  For example if we have a secure FTP target and we want to
> authenticate using a secure key we should be able to provide that key and
> use that to authenticate against a certain hostname.
>
> * It should be possible to authenticate differently to several systems in
> the same project or environment.  For example it should be possible to
> authenticate to different hdfs:// servers at the same time or use multiple
> Amazon s3:// accounts at the same time.  This would allow us to transfer
> data from one account to another for example.
>
> * It should be possible to create symbolic links through our VFS driver
> usage where we map one folder to another hiding the implementation
> details.
>
> See also: https://issues.apache.org/jira/browse/HOP-3795
>
> Architecturally I think that what this means is that we first need to
> create Authentication metadata elements which use the Authentication
> Provider plugins.
>
> https://issues.apache.org/jira/browse/HOP-3796
>
> After that we can use those authentication provider metadata elements to
> make it super easy to configure VFS authentication simply by pointing to
> whichever method we want to use.  We can do that in Hop VFS Authentication
> metadata elements.
>
> Technically what this means is that almost 1000 method uses of class HopVfs
> need to be fed the appropriate IVariables and IHopMetadataProvider instance
> at the location where the method is used (the lowest level possible).  This
> sounds like a lot of work but I feel like that we should get this out of
> the way sooner rather than later since the topic just keeps coming back.
> The idea that nobody else out there is doing even remotely like it is not a
> deterrent, it's an incentive :-)
>
> I would love to hear your ideas and check if anyone else had any
> requirements that we should take into account.
>
> Thanks in advance,
>
> Matt
>