You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by Shafaq Siddiqi <sh...@tugraz.at.INVALID> on 2021/03/24 14:04:55 UTC
Refactoring datasets in SystemDS
Hi,
Some of the test suites in SystemDS use external data filesĀ that are
stored in the test package along with the test files. I have observed
that there are some test files that use the dataset residing in another
test package such as the Iris dataset is being used by gmm and
gmmPredict testsĀ and it is stored inside the transform test package. The
same is the case for the Salary dataset that is used by different test
files.
In my opinion, it would be effective if we store all datasets inside the
resource folder so that the existing datasets are available up-front and
could be reused instead of introducing a new dataset every now and then
and it will also simplify the referencing of the datasets across test
suites.
--br,
Shafaq Siddiqi
Re: Refactoring datasets in SystemDS
Posted by Mark Dokter <md...@know-center.at>.
On 24.03.2021 17:15, Matthias Boehm wrote:
> thanks for bringing this up - sounds good to me as well.
>
+1
I also think it's a good suggestion.
However, I see another cause of repository bloat here. Like with the
binaries this could be separated out as not everybody who wants to check
out the source necessarily needs all that. *If* Maven can be configured
to download the needed files upon first mvn package from a third party
source (which we control ofc), that'd be great and we can modularize a
bit better.
> Regards,
> Matthias
>
Regards, Mark
> On 3/24/2021 3:21 PM, arnab phani wrote:
>> I agree.
>> That way, we don't need to look through the folders for datasets while
>> writing a new test.
>> In addition to that, is it possible to write the test functions in a way
>> that the test will automatically apply to all the datasets in
>> /resource? If
>> so, then it will be much easier to test with a new dataset --- we will
>> just
>> need to add it in the designated folder.
>>
>> Regards,
>> Arnab..
>>
>> On Wed, Mar 24, 2021 at 3:05 PM Shafaq Siddiqi
>> <sh...@tugraz.at.invalid> wrote:
>>
>>> Hi,
>>>
>>> Some of the test suites in SystemDS use external data files that are
>>> stored in the test package along with the test files. I have observed
>>> that there are some test files that use the dataset residing in another
>>> test package such as the Iris dataset is being used by gmm and
>>> gmmPredict tests and it is stored inside the transform test package. The
>>> same is the case for the Salary dataset that is used by different test
>>> files.
>>>
>>> In my opinion, it would be effective if we store all datasets inside the
>>> resource folder so that the existing datasets are available up-front and
>>> could be reused instead of introducing a new dataset every now and then
>>> and it will also simplify the referencing of the datasets across test
>>> suites.
>>>
>>>
>>> --br,
>>> Shafaq Siddiqi
>>>
>>>
>>
Re: Refactoring datasets in SystemDS
Posted by Matthias Boehm <mb...@gmail.com>.
thanks for bringing this up - sounds good to me as well.
Regards,
Matthias
On 3/24/2021 3:21 PM, arnab phani wrote:
> I agree.
> That way, we don't need to look through the folders for datasets while
> writing a new test.
> In addition to that, is it possible to write the test functions in a way
> that the test will automatically apply to all the datasets in /resource? If
> so, then it will be much easier to test with a new dataset --- we will just
> need to add it in the designated folder.
>
> Regards,
> Arnab..
>
> On Wed, Mar 24, 2021 at 3:05 PM Shafaq Siddiqi
> <sh...@tugraz.at.invalid> wrote:
>
>> Hi,
>>
>> Some of the test suites in SystemDS use external data files that are
>> stored in the test package along with the test files. I have observed
>> that there are some test files that use the dataset residing in another
>> test package such as the Iris dataset is being used by gmm and
>> gmmPredict tests and it is stored inside the transform test package. The
>> same is the case for the Salary dataset that is used by different test
>> files.
>>
>> In my opinion, it would be effective if we store all datasets inside the
>> resource folder so that the existing datasets are available up-front and
>> could be reused instead of introducing a new dataset every now and then
>> and it will also simplify the referencing of the datasets across test
>> suites.
>>
>>
>> --br,
>> Shafaq Siddiqi
>>
>>
>
Re: Refactoring datasets in SystemDS
Posted by arnab phani <ph...@gmail.com>.
I agree.
That way, we don't need to look through the folders for datasets while
writing a new test.
In addition to that, is it possible to write the test functions in a way
that the test will automatically apply to all the datasets in /resource? If
so, then it will be much easier to test with a new dataset --- we will just
need to add it in the designated folder.
Regards,
Arnab..
On Wed, Mar 24, 2021 at 3:05 PM Shafaq Siddiqi
<sh...@tugraz.at.invalid> wrote:
> Hi,
>
> Some of the test suites in SystemDS use external data files that are
> stored in the test package along with the test files. I have observed
> that there are some test files that use the dataset residing in another
> test package such as the Iris dataset is being used by gmm and
> gmmPredict tests and it is stored inside the transform test package. The
> same is the case for the Salary dataset that is used by different test
> files.
>
> In my opinion, it would be effective if we store all datasets inside the
> resource folder so that the existing datasets are available up-front and
> could be reused instead of introducing a new dataset every now and then
> and it will also simplify the referencing of the datasets across test
> suites.
>
>
> --br,
> Shafaq Siddiqi
>
>