You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by kellen sunderland <ke...@gmail.com> on 2018/01/06 23:26:37 UTC

R Build failure

FYI PRs are currently failing to build.  The R "Matrix Factorization" test
is failing to download this dataset: http://files.grouplens.org/datasets/
movielens/ml-100k.zip .  The site https://grouplens.org/ appears to be down.

Issue here: https://github.com/apache/incubator-mxnet/issues/9332
PR to skip the test here:
https://github.com/apache/incubator-mxnet/pull/9333

-Kellen

Re: R Build failure

Posted by pracheer gupta <pr...@hotmail.com>.
+1 on only using free datasets if possible.

________________________________
From: Pedro Larroy <pe...@gmail.com>
Sent: Sunday, January 7, 2018 8:02:43 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: R Build failure

I don't think is ideal to run tests on datasets that prevent
redistributions and are essentially non-free. Are there alternatives
for this? I would be in favor of using only free datasets.

On Sun, Jan 7, 2018 at 2:26 AM, Marco de Abreu
<ma...@googlemail.com> wrote:
> I have been thinking about creating a private s3 bucket, but this would
> render it impossible to run the tests locally. On the other hand, the
> licenses of many datasets like Movielens forbid redistribution, means
> setting the s3 bucket to public is not allowed. We could think about a
> hybrid solution which tries to query the s3 bucket and downloads the file
> from an alternative address (aka the original source) if the s3 bucket is
> not reachable.
>
> On Sun, Jan 7, 2018 at 12:29 AM, Marco de Abreu <
> marco.g.abreu@googlemail.com> wrote:
>
>> I could offer to download the dataset and create an S3 bucket to store all
>> used datasets. This would also reduce external dependencies.
>>
>> Wdyt?
>>
>> -Marco
>>
>> Am 07.01.2018 12:26 vorm. schrieb "kellen sunderland" <
>> kellen.sunderland@gmail.com>:
>>
>>> FYI PRs are currently failing to build.  The R "Matrix Factorization" test
>>> is failing to download this dataset: http://files.grouplens.org/datasets/
>>> movielens/ml-100k.zip
>>> <http://files.grouplens.org/datasets/movielens/ml-100k.zip> .  The site
>>> https://grouplens.org/ appears to be down.
>>>
>>> Issue here: https://github.com/apache/incubator-mxnet/issues/9332
>>> PR to skip the test here:
>>> https://github.com/apache/incubator-mxnet/pull/9333
>>>
>>> -Kellen
>>>
>>

Re: R Build failure

Posted by Pedro Larroy <pe...@gmail.com>.
I don't think is ideal to run tests on datasets that prevent
redistributions and are essentially non-free. Are there alternatives
for this? I would be in favor of using only free datasets.

On Sun, Jan 7, 2018 at 2:26 AM, Marco de Abreu
<ma...@googlemail.com> wrote:
> I have been thinking about creating a private s3 bucket, but this would
> render it impossible to run the tests locally. On the other hand, the
> licenses of many datasets like Movielens forbid redistribution, means
> setting the s3 bucket to public is not allowed. We could think about a
> hybrid solution which tries to query the s3 bucket and downloads the file
> from an alternative address (aka the original source) if the s3 bucket is
> not reachable.
>
> On Sun, Jan 7, 2018 at 12:29 AM, Marco de Abreu <
> marco.g.abreu@googlemail.com> wrote:
>
>> I could offer to download the dataset and create an S3 bucket to store all
>> used datasets. This would also reduce external dependencies.
>>
>> Wdyt?
>>
>> -Marco
>>
>> Am 07.01.2018 12:26 vorm. schrieb "kellen sunderland" <
>> kellen.sunderland@gmail.com>:
>>
>>> FYI PRs are currently failing to build.  The R "Matrix Factorization" test
>>> is failing to download this dataset: http://files.grouplens.org/datasets/
>>> movielens/ml-100k.zip
>>> <http://files.grouplens.org/datasets/movielens/ml-100k.zip> .  The site
>>> https://grouplens.org/ appears to be down.
>>>
>>> Issue here: https://github.com/apache/incubator-mxnet/issues/9332
>>> PR to skip the test here:
>>> https://github.com/apache/incubator-mxnet/pull/9333
>>>
>>> -Kellen
>>>
>>

Re: R Build failure

Posted by Marco de Abreu <ma...@googlemail.com>.
I have been thinking about creating a private s3 bucket, but this would
render it impossible to run the tests locally. On the other hand, the
licenses of many datasets like Movielens forbid redistribution, means
setting the s3 bucket to public is not allowed. We could think about a
hybrid solution which tries to query the s3 bucket and downloads the file
from an alternative address (aka the original source) if the s3 bucket is
not reachable.

On Sun, Jan 7, 2018 at 12:29 AM, Marco de Abreu <
marco.g.abreu@googlemail.com> wrote:

> I could offer to download the dataset and create an S3 bucket to store all
> used datasets. This would also reduce external dependencies.
>
> Wdyt?
>
> -Marco
>
> Am 07.01.2018 12:26 vorm. schrieb "kellen sunderland" <
> kellen.sunderland@gmail.com>:
>
>> FYI PRs are currently failing to build.  The R "Matrix Factorization" test
>> is failing to download this dataset: http://files.grouplens.org/datasets/
>> movielens/ml-100k.zip
>> <http://files.grouplens.org/datasets/movielens/ml-100k.zip> .  The site
>> https://grouplens.org/ appears to be down.
>>
>> Issue here: https://github.com/apache/incubator-mxnet/issues/9332
>> PR to skip the test here:
>> https://github.com/apache/incubator-mxnet/pull/9333
>>
>> -Kellen
>>
>

Re: R Build failure

Posted by Marco de Abreu <ma...@googlemail.com>.
I could offer to download the dataset and create an S3 bucket to store all
used datasets. This would also reduce external dependencies.

Wdyt?

-Marco

Am 07.01.2018 12:26 vorm. schrieb "kellen sunderland" <
kellen.sunderland@gmail.com>:

> FYI PRs are currently failing to build.  The R "Matrix Factorization" test
> is failing to download this dataset: http://files.grouplens.org/datasets/
> movielens/ml-100k.zip .  The site https://grouplens.org/ appears to be
> down.
>
> Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> PR to skip the test here:
> https://github.com/apache/incubator-mxnet/pull/9333
>
> -Kellen
>

Re: R Build failure

Posted by Marco de Abreu <ma...@googlemail.com>.
This is a Todo item. So far, sheng has managed the datasets. I'd recommend
that we review all datasets in terms of their license as well as their
necessity. Some datasets are way too overkill for a unit test.

-Marco

Am 12.01.2018 5:03 nachm. schrieb "Bhavin Thaker" <bh...@gmail.com>:

> Ok, Marco. Do all the permitted datasets reside in S3 or is this a todo
> item?
>
> Bhavin Thaker.
>
> On Fri, Jan 12, 2018 at 7:28 AM Marco de Abreu <
> marco.g.abreu@googlemail.com>
> wrote:
>
> > It would make sense, but the license does not permit redistribution of
> the
> > GroupLens-Movie dataset. We already have a few datasets in an S3 bucket
> > which is backed up by a CDN, but this is only possible for entirely free
> > datasets.
> >
> > -Marco
> >
> > On Fri, Jan 12, 2018 at 4:03 PM, Bhavin Thaker <bh...@gmail.com>
> > wrote:
> >
> > > Does it make sense to cache the datasets into a (reliable) S3 bucket so
> > the
> > > tests to run reliably?
> > >
> > > Does the dataset licensing allow downloading the dataset?
> > >
> > > Bhavin Thaker.
> > >
> > > On Fri, Jan 12, 2018 at 5:52 AM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Hey all, since this server seems to be back up and somewhat stable
> I've
> > > > create a revert PR.
> > https://github.com/apache/incubator-mxnet/pull/9379
> > > >
> > > > Do you all think we should leave this one disabled until it has been
> > > > refactored, or should re-enable the test?
> > > >
> > > > On Thu, Jan 11, 2018 at 11:04 PM, Haibin Lin <ha...@apache.org>
> > wrote:
> > > >
> > > > > +1 for using free datasets or datasets without license issues and
> > host
> > > > > them on s3 buckets to reduce external dependencies.
> > > > >
> > > > > On 2018-01-06 15:26, kellen sunderland <
> kellen.sunderland@gmail.com>
> > > > > wrote:
> > > > > > FYI PRs are currently failing to build.  The R "Matrix
> > Factorization"
> > > > > test
> > > > > > is failing to download this dataset: http://files.grouplens.org/
> > > > > datasets/
> > > > > > movielens/ml-100k.zip .  The site https://grouplens.org/ appears
> > to
> > > be
> > > > > down.
> > > > > >
> > > > > > Issue here: https://github.com/apache/
> incubator-mxnet/issues/9332
> > > > > > PR to skip the test here:
> > > > > > https://github.com/apache/incubator-mxnet/pull/9333
> > > > > >
> > > > > > -Kellen
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: R Build failure

Posted by Bhavin Thaker <bh...@gmail.com>.
Ok, Marco. Do all the permitted datasets reside in S3 or is this a todo
item?

Bhavin Thaker.

On Fri, Jan 12, 2018 at 7:28 AM Marco de Abreu <ma...@googlemail.com>
wrote:

> It would make sense, but the license does not permit redistribution of the
> GroupLens-Movie dataset. We already have a few datasets in an S3 bucket
> which is backed up by a CDN, but this is only possible for entirely free
> datasets.
>
> -Marco
>
> On Fri, Jan 12, 2018 at 4:03 PM, Bhavin Thaker <bh...@gmail.com>
> wrote:
>
> > Does it make sense to cache the datasets into a (reliable) S3 bucket so
> the
> > tests to run reliably?
> >
> > Does the dataset licensing allow downloading the dataset?
> >
> > Bhavin Thaker.
> >
> > On Fri, Jan 12, 2018 at 5:52 AM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Hey all, since this server seems to be back up and somewhat stable I've
> > > create a revert PR.
> https://github.com/apache/incubator-mxnet/pull/9379
> > >
> > > Do you all think we should leave this one disabled until it has been
> > > refactored, or should re-enable the test?
> > >
> > > On Thu, Jan 11, 2018 at 11:04 PM, Haibin Lin <ha...@apache.org>
> wrote:
> > >
> > > > +1 for using free datasets or datasets without license issues and
> host
> > > > them on s3 buckets to reduce external dependencies.
> > > >
> > > > On 2018-01-06 15:26, kellen sunderland <ke...@gmail.com>
> > > > wrote:
> > > > > FYI PRs are currently failing to build.  The R "Matrix
> Factorization"
> > > > test
> > > > > is failing to download this dataset: http://files.grouplens.org/
> > > > datasets/
> > > > > movielens/ml-100k.zip .  The site https://grouplens.org/ appears
> to
> > be
> > > > down.
> > > > >
> > > > > Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> > > > > PR to skip the test here:
> > > > > https://github.com/apache/incubator-mxnet/pull/9333
> > > > >
> > > > > -Kellen
> > > > >
> > > >
> > >
> >
>

Re: R Build failure

Posted by Marco de Abreu <ma...@googlemail.com>.
It would make sense, but the license does not permit redistribution of the
GroupLens-Movie dataset. We already have a few datasets in an S3 bucket
which is backed up by a CDN, but this is only possible for entirely free
datasets.

-Marco

On Fri, Jan 12, 2018 at 4:03 PM, Bhavin Thaker <bh...@gmail.com>
wrote:

> Does it make sense to cache the datasets into a (reliable) S3 bucket so the
> tests to run reliably?
>
> Does the dataset licensing allow downloading the dataset?
>
> Bhavin Thaker.
>
> On Fri, Jan 12, 2018 at 5:52 AM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Hey all, since this server seems to be back up and somewhat stable I've
> > create a revert PR.  https://github.com/apache/incubator-mxnet/pull/9379
> >
> > Do you all think we should leave this one disabled until it has been
> > refactored, or should re-enable the test?
> >
> > On Thu, Jan 11, 2018 at 11:04 PM, Haibin Lin <ha...@apache.org> wrote:
> >
> > > +1 for using free datasets or datasets without license issues and host
> > > them on s3 buckets to reduce external dependencies.
> > >
> > > On 2018-01-06 15:26, kellen sunderland <ke...@gmail.com>
> > > wrote:
> > > > FYI PRs are currently failing to build.  The R "Matrix Factorization"
> > > test
> > > > is failing to download this dataset: http://files.grouplens.org/
> > > datasets/
> > > > movielens/ml-100k.zip .  The site https://grouplens.org/ appears to
> be
> > > down.
> > > >
> > > > Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> > > > PR to skip the test here:
> > > > https://github.com/apache/incubator-mxnet/pull/9333
> > > >
> > > > -Kellen
> > > >
> > >
> >
>

Re: R Build failure

Posted by Bhavin Thaker <bh...@gmail.com>.
Does it make sense to cache the datasets into a (reliable) S3 bucket so the
tests to run reliably?

Does the dataset licensing allow downloading the dataset?

Bhavin Thaker.

On Fri, Jan 12, 2018 at 5:52 AM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Hey all, since this server seems to be back up and somewhat stable I've
> create a revert PR.  https://github.com/apache/incubator-mxnet/pull/9379
>
> Do you all think we should leave this one disabled until it has been
> refactored, or should re-enable the test?
>
> On Thu, Jan 11, 2018 at 11:04 PM, Haibin Lin <ha...@apache.org> wrote:
>
> > +1 for using free datasets or datasets without license issues and host
> > them on s3 buckets to reduce external dependencies.
> >
> > On 2018-01-06 15:26, kellen sunderland <ke...@gmail.com>
> > wrote:
> > > FYI PRs are currently failing to build.  The R "Matrix Factorization"
> > test
> > > is failing to download this dataset: http://files.grouplens.org/
> > datasets/
> > > movielens/ml-100k.zip .  The site https://grouplens.org/ appears to be
> > down.
> > >
> > > Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> > > PR to skip the test here:
> > > https://github.com/apache/incubator-mxnet/pull/9333
> > >
> > > -Kellen
> > >
> >
>

Re: R Build failure

Posted by kellen sunderland <ke...@gmail.com>.
Hey all, since this server seems to be back up and somewhat stable I've
create a revert PR.  https://github.com/apache/incubator-mxnet/pull/9379

Do you all think we should leave this one disabled until it has been
refactored, or should re-enable the test?

On Thu, Jan 11, 2018 at 11:04 PM, Haibin Lin <ha...@apache.org> wrote:

> +1 for using free datasets or datasets without license issues and host
> them on s3 buckets to reduce external dependencies.
>
> On 2018-01-06 15:26, kellen sunderland <ke...@gmail.com>
> wrote:
> > FYI PRs are currently failing to build.  The R "Matrix Factorization"
> test
> > is failing to download this dataset: http://files.grouplens.org/
> datasets/
> > movielens/ml-100k.zip .  The site https://grouplens.org/ appears to be
> down.
> >
> > Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> > PR to skip the test here:
> > https://github.com/apache/incubator-mxnet/pull/9333
> >
> > -Kellen
> >
>

Re: R Build failure

Posted by Haibin Lin <ha...@apache.org>.
+1 for using free datasets or datasets without license issues and host them on s3 buckets to reduce external dependencies. 

On 2018-01-06 15:26, kellen sunderland <ke...@gmail.com> wrote: 
> FYI PRs are currently failing to build.  The R "Matrix Factorization" test
> is failing to download this dataset: http://files.grouplens.org/datasets/
> movielens/ml-100k.zip .  The site https://grouplens.org/ appears to be down.
> 
> Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> PR to skip the test here:
> https://github.com/apache/incubator-mxnet/pull/9333
> 
> -Kellen
>