You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andy Grove <an...@gmail.com> on 2019/01/27 16:28:12 UTC

[Testing] Create csv-testing submodule?

I like the fact that we have a parquet-testing submodule that is shared
across implementations.  It there any interest in having an equivalent for
CSV files?

Andy.

Re: [Testing] Create csv-testing submodule?

Posted by Wes McKinney <we...@gmail.com>.
That was my intent when I created that repo, so SGTM

On Sun, Jan 27, 2019 at 10:57 AM Andy Grove <an...@gmail.com> wrote:
>
> I see we have an arrow-testing repo already (although it seems to be mostly
> empty). Would this be the correct place to create a PR to add test files?
>
> On Sun, Jan 27, 2019 at 9:53 AM Wes McKinney <we...@gmail.com> wrote:
>
> > I'm in favor of using a submodule for testing data files to avoid
> > bloating the git repository. So far this hasn't been too painful with
> > the Parquet test data files
> >
> > On Sun, Jan 27, 2019 at 10:36 AM Andy Grove <an...@gmail.com> wrote:
> > >
> > > That's a fair point about not needing a submodule... I was thinking about
> > > converting some of the shared parquet files to CSV to help with testing
> > > DataFusion. I guess I can just put them there for now and if other
> > > implementations are interested we can just move them to a shared
> > directory.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > On Sun, Jan 27, 2019 at 9:31 AM Antoine Pitrou <an...@python.org>
> > wrote:
> > >
> > > >
> > > > Well, CSV isn't a standard like Parquet is, meaning each implementation
> > > > can choose their own middle grounds and interpretations.
> > > >
> > > > Also, the parquet-testing submodule exists because Parquet
> > > > implementations are spread accross different repositories.  If we want
> > a
> > > > common location for CSV files accross Arrow implementations, we don't
> > > > really need a submodule ;-)
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 27/01/2019 à 17:28, Andy Grove a écrit :
> > > > > I like the fact that we have a parquet-testing submodule that is
> > shared
> > > > > across implementations.  It there any interest in having an
> > equivalent
> > > > for
> > > > > CSV files?
> > > > >
> > > > > Andy.
> > > > >
> > > >
> >

Re: [Testing] Create csv-testing submodule?

Posted by Andy Grove <an...@gmail.com>.
I see we have an arrow-testing repo already (although it seems to be mostly
empty). Would this be the correct place to create a PR to add test files?

On Sun, Jan 27, 2019 at 9:53 AM Wes McKinney <we...@gmail.com> wrote:

> I'm in favor of using a submodule for testing data files to avoid
> bloating the git repository. So far this hasn't been too painful with
> the Parquet test data files
>
> On Sun, Jan 27, 2019 at 10:36 AM Andy Grove <an...@gmail.com> wrote:
> >
> > That's a fair point about not needing a submodule... I was thinking about
> > converting some of the shared parquet files to CSV to help with testing
> > DataFusion. I guess I can just put them there for now and if other
> > implementations are interested we can just move them to a shared
> directory.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Sun, Jan 27, 2019 at 9:31 AM Antoine Pitrou <an...@python.org>
> wrote:
> >
> > >
> > > Well, CSV isn't a standard like Parquet is, meaning each implementation
> > > can choose their own middle grounds and interpretations.
> > >
> > > Also, the parquet-testing submodule exists because Parquet
> > > implementations are spread accross different repositories.  If we want
> a
> > > common location for CSV files accross Arrow implementations, we don't
> > > really need a submodule ;-)
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 27/01/2019 à 17:28, Andy Grove a écrit :
> > > > I like the fact that we have a parquet-testing submodule that is
> shared
> > > > across implementations.  It there any interest in having an
> equivalent
> > > for
> > > > CSV files?
> > > >
> > > > Andy.
> > > >
> > >
>

Re: [Testing] Create csv-testing submodule?

Posted by Wes McKinney <we...@gmail.com>.
I'm in favor of using a submodule for testing data files to avoid
bloating the git repository. So far this hasn't been too painful with
the Parquet test data files

On Sun, Jan 27, 2019 at 10:36 AM Andy Grove <an...@gmail.com> wrote:
>
> That's a fair point about not needing a submodule... I was thinking about
> converting some of the shared parquet files to CSV to help with testing
> DataFusion. I guess I can just put them there for now and if other
> implementations are interested we can just move them to a shared directory.
>
> Thanks,
>
> Andy.
>
> On Sun, Jan 27, 2019 at 9:31 AM Antoine Pitrou <an...@python.org> wrote:
>
> >
> > Well, CSV isn't a standard like Parquet is, meaning each implementation
> > can choose their own middle grounds and interpretations.
> >
> > Also, the parquet-testing submodule exists because Parquet
> > implementations are spread accross different repositories.  If we want a
> > common location for CSV files accross Arrow implementations, we don't
> > really need a submodule ;-)
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 27/01/2019 à 17:28, Andy Grove a écrit :
> > > I like the fact that we have a parquet-testing submodule that is shared
> > > across implementations.  It there any interest in having an equivalent
> > for
> > > CSV files?
> > >
> > > Andy.
> > >
> >

Re: [Testing] Create csv-testing submodule?

Posted by Andy Grove <an...@gmail.com>.
That's a fair point about not needing a submodule... I was thinking about
converting some of the shared parquet files to CSV to help with testing
DataFusion. I guess I can just put them there for now and if other
implementations are interested we can just move them to a shared directory.

Thanks,

Andy.

On Sun, Jan 27, 2019 at 9:31 AM Antoine Pitrou <an...@python.org> wrote:

>
> Well, CSV isn't a standard like Parquet is, meaning each implementation
> can choose their own middle grounds and interpretations.
>
> Also, the parquet-testing submodule exists because Parquet
> implementations are spread accross different repositories.  If we want a
> common location for CSV files accross Arrow implementations, we don't
> really need a submodule ;-)
>
> Regards
>
> Antoine.
>
>
> Le 27/01/2019 à 17:28, Andy Grove a écrit :
> > I like the fact that we have a parquet-testing submodule that is shared
> > across implementations.  It there any interest in having an equivalent
> for
> > CSV files?
> >
> > Andy.
> >
>

Re: [Testing] Create csv-testing submodule?

Posted by Antoine Pitrou <an...@python.org>.
Well, CSV isn't a standard like Parquet is, meaning each implementation
can choose their own middle grounds and interpretations.

Also, the parquet-testing submodule exists because Parquet
implementations are spread accross different repositories.  If we want a
common location for CSV files accross Arrow implementations, we don't
really need a submodule ;-)

Regards

Antoine.


Le 27/01/2019 à 17:28, Andy Grove a écrit :
> I like the fact that we have a parquet-testing submodule that is shared
> across implementations.  It there any interest in having an equivalent for
> CSV files?
> 
> Andy.
>