You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by "Badger, Bruce" <br...@bofa.com> on 2021/07/08 08:58:10 UTC

RE: [Python] testing custom pyarrow.fs filesystems

Hi Antoine,

Further to our recent exchange of messages ...

I have been able to monkey-patch self.pyarrow.tests.test_fs.filesystem_config to include the testing of our filesystem classes along-side (or instead of) the filesystems already defined there, and I have created a lightweight implementation of our filesystem which passes all tests just to show (to myself) that it can be done.

The tests in  self.pyarrow.tests.test_fs  have proved to be a very useful.   While our existing tests ensure that our filesystems interact with our underlying systems as intended, running your tests helps us to conform to your (pyarrow) expectations.   I understand that passing your tests is not a certification of any kind, but it is reassuring and the exercise of running against your tests has been a useful catalyst; the lessons learned from getting the lightweight implementation of our filesystem to pass all your tests has been helpful to our main body of work.

I understand that you did not intend that your tests be used in this way, and the caveats you gave about the scope of the tests (i.e. most tests are in C++),  but it really has been very useful for us so thank you anyway!

Needless to say, having found out how to do this we would like to be able to do the same with future Arrow releases.

Could we ask that you maintain the shape of the tests around self.pyarrow.tests.test_fs.filesystem_config or, if you have to change that, that you provide an alternative way to plug in external filesystem classes?   We would be grateful and I'm sure the developers of other filesystems in the future would be grateful too.

Best wishes,
    Bruce

-----Original Message-----
From: Antoine Pitrou [mailto:antoine@python.org] 
Sent: Monday, June 21, 2021 10:44 AM
To: user@arrow.apache.org
Subject: Re: [Python] testing custom pyarrow.fs filesystems


Hello Bruce,

On Fri, 18 Jun 2021 12:08:39 +0000
"Badger, Bruce" <br...@bofa.com> wrote:
> Dear pyarrow.fs team,
> 
> We are implementing a custom pyarrow.fs filesystem to map the contents of internal file stores as filesystems for use in Arrow.
> 
> We have a suite of unit tests which exercise the internal parts of our implementation, and we can run pyarrow.tests.test_fs to ensure that the supplied pyarrow.fs filesystems work as we have them installed.
> 
> I would like to include our custom filesystem as a sibling of the included pyarrow.fs filesystems in the pyarrow.tests.test_fs tests in order to ensure that our filesystem conforms to the expectations of the pyarrow.fs implementation, and continues to conform as pyarrow.fs and our internal systems evolve.
> 
> Are the pyarrow.tests.test_fs tests extensible to allow the testing of custom filesystems in addition to, and as peer of, the supplied filesystems?  If so, how is this intended to work?  ... and if not, may I suggest that this be added as a feature for the (hopefully not too distant) future?

I don't think we intend to make the PyArrow test suite extensible.  It
is a test suite for PyArrow, not for third-party libraries.

That said, it's probably easy to take those tests and copy/adapt them
inside your own project.

Be aware, however, that most tests for the PyArrow filesystems are
written in C++.  The Python tests do not intend to cover all
functionality in detail.

Regards

Antoine.


----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer .  If you are not the intended recipient, please delete this message. Please note you may be contacted by a different BofA entity acting for and on behalf of your service provider where permitted by applicable law. This does not change your service provider.

Where applicable please note:
Merrill Lynch International is Registered in England (No.2312079). Registered Office: 2 King Edward Street, London EC1A, 1HQ. VAT No. GB 245 1224 93. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Member of the London Stock Exchange.

RE: [Python] testing custom pyarrow.fs filesystems

Posted by "Badger, Bruce" <br...@bofa.com>.
Antoine,

Thank you for responding.  I really appreciate you taking the time to discuss this.

I agree that this is the key question:
> ...  I'm not sure why you can't just copy and paste the tests in
> your own source tree. 

I'm looking at this from within a large bureaucratic organisation in which people move from job to job over time, internal systems evolve as do the libraries we bring in and rely upon, such as pyarrow.

In this case my intent is to ensure that tests are present which exercise the relationship between our implementations and your base libraries, now and over time as the code evolves.  Taking a snapshot of your tests now would mean that future tests would be static (as of 'now') even if your implementation and testing evolved to address new needs, to tackle corner case issues etc.

By testing our code using your tests we'll get immediate notice if our assumptions / implementation no longer match your expectation no matter if the change was at our end or your end.

In an organisation which evolves over time having tests which evolve with the libraries we rely on is a Good Thing.

Thanks to you Antoine, and to your team mates, for your work on Arrow.

Very best wishes,
    Bruce

-----Original Message-----
From: Antoine Pitrou [mailto:antoine@python.org] 
Sent: Thursday, July 8, 2021 6:42 PM
To: user@arrow.apache.org
Subject: Re: [Python] testing custom pyarrow.fs filesystems


Hi Bruce,

On Thu, 08 Jul 2021 08:58:10 +0000
"Badger, Bruce" <br...@bofa.com> wrote:
> 
> I understand that you did not intend that your tests be used in this way, and the caveats you gave about the scope of the tests (i.e. most tests are in C++),  but it really has been very useful for us so thank you anyway!
> 
> Needless to say, having found out how to do this we would like to be able to do the same with future Arrow releases.
> 
> Could we ask that you maintain the shape of the tests around self.pyarrow.tests.test_fs.filesystem_config or, if you have to change that, that you provide an alternative way to plug in external filesystem classes?

Well, I don't think we have any plans to significantly refactor those
tests, so you *can* probably rely on them in the near future.

That said, I'm not sure why you can't just copy and paste the tests in
your own source tree.  They're under the Apache license after all, so
there should not be much impediment in doing so.

Best regards

Antoine.


----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer .  If you are not the intended recipient, please delete this message. Please note you may be contacted by a different BofA entity acting for and on behalf of your service provider where permitted by applicable law. This does not change your service provider.

Where applicable please note:
Merrill Lynch International is Registered in England (No.2312079). Registered Office: 2 King Edward Street, London EC1A, 1HQ. VAT No. GB 245 1224 93. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. Member of the London Stock Exchange.

Re: [Python] testing custom pyarrow.fs filesystems

Posted by Antoine Pitrou <an...@python.org>.
Hi Bruce,

On Thu, 08 Jul 2021 08:58:10 +0000
"Badger, Bruce" <br...@bofa.com> wrote:
> 
> I understand that you did not intend that your tests be used in this way, and the caveats you gave about the scope of the tests (i.e. most tests are in C++),  but it really has been very useful for us so thank you anyway!
> 
> Needless to say, having found out how to do this we would like to be able to do the same with future Arrow releases.
> 
> Could we ask that you maintain the shape of the tests around self.pyarrow.tests.test_fs.filesystem_config or, if you have to change that, that you provide an alternative way to plug in external filesystem classes?

Well, I don't think we have any plans to significantly refactor those
tests, so you *can* probably rely on them in the near future.

That said, I'm not sure why you can't just copy and paste the tests in
your own source tree.  They're under the Apache license after all, so
there should not be much impediment in doing so.

Best regards

Antoine.