You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by Mark Dokter <md...@know-center.at> on 2021/06/16 17:50:32 UTC

Jars in the release artifacts

Hey there!

When testing the release candidate, Shafaq and I found jars without
which SystemDS crashes under certain conditions.

* janino*.jar is needed for map() and for codegen.
* spark-core*.jar is needed for -exec hybrid when running locally
(otherwise provided by Spark)

These problems do not occur when using the source distribution as Maven
pulls everything in that's needed.

The safe solution is to include everything that Maven pulls into the lib
directory. But taking a look at the size of this directory shows 190 MB.
So aside from any legal redistribution issues (I hope there are none?)
the release artifacts would be blown up by quite a bit.

Any thoughts?

Regards, Mark


Re: Jars in the release artifacts

Posted by Matthias Boehm <mb...@gmail.com>.
thanks for catching this issue. We have to distinguish two scenarios:

1) Deployed spark cluster: here we do not need any of these libraries 
(Spark's assembly jar is enough) but only the SystemDS jar which 
includes the very few libraries that are not provided.

2) Standalone: here we need all libraries that are reachable through 
local operations including pseudo-distributed spark operations. In this 
mode, for example, DFS interactions are redirected to local file system 
which does not require all libraries.

I like Arnab's proposal to additional include the must-have libraries 
like Janino and Spark core. In the past we did an orthogonal thing of 
creating a SystemML-lite as minimal self-contained jar by running a 
variety of algorithms under different configurations and tracing the 
class loader to get a fine-grained list of things to pack in. However, 
for the binary release we should be at jar level.

Regards,
Matthias

On 6/16/2021 7:56 PM, arnab phani wrote:
> Today we include libraries that are needed to run in standalone mode. I
> like this balance.
> However, we might have missing libraries such as janino, that is needed for
> standalone execution.
> 
> Regards,
> Arnab..
> 
> On Wed, Jun 16, 2021, 19:50 Mark Dokter <md...@know-center.at> wrote:
> 
>> Hey there!
>>
>> When testing the release candidate, Shafaq and I found jars without
>> which SystemDS crashes under certain conditions.
>>
>> * janino*.jar is needed for map() and for codegen.
>> * spark-core*.jar is needed for -exec hybrid when running locally
>> (otherwise provided by Spark)
>>
>> These problems do not occur when using the source distribution as Maven
>> pulls everything in that's needed.
>>
>> The safe solution is to include everything that Maven pulls into the lib
>> directory. But taking a look at the size of this directory shows 190 MB.
>> So aside from any legal redistribution issues (I hope there are none?)
>> the release artifacts would be blown up by quite a bit.
>>
>> Any thoughts?
>>
>> Regards, Mark
>>
>>
> 

Re: Jars in the release artifacts

Posted by Shafaq Siddiqi <sh...@tugraz.at.INVALID>.
If there are no size constraints then I will suggest adding Janino into 
the lib directory as we are using it for providing string processing 
functionalities in DML.

Shafaq Siddiqi

On 6/16/2021 7:56 PM, arnab phani wrote:
> Today we include libraries that are needed to run in standalone mode. I
> like this balance.
> However, we might have missing libraries such as janino, that is needed for
> standalone execution.
>
> Regards,
> Arnab..
>
> On Wed, Jun 16, 2021, 19:50 Mark Dokter <md...@know-center.at> wrote:
>
>> Hey there!
>>
>> When testing the release candidate, Shafaq and I found jars without
>> which SystemDS crashes under certain conditions.
>>
>> * janino*.jar is needed for map() and for codegen.
>> * spark-core*.jar is needed for -exec hybrid when running locally
>> (otherwise provided by Spark)
>>
>> These problems do not occur when using the source distribution as Maven
>> pulls everything in that's needed.
>>
>> The safe solution is to include everything that Maven pulls into the lib
>> directory. But taking a look at the size of this directory shows 190 MB.
>> So aside from any legal redistribution issues (I hope there are none?)
>> the release artifacts would be blown up by quite a bit.
>>
>> Any thoughts?
>>
>> Regards, Mark
>>
>>

Re: Jars in the release artifacts

Posted by arnab phani <ph...@gmail.com>.
Today we include libraries that are needed to run in standalone mode. I
like this balance.
However, we might have missing libraries such as janino, that is needed for
standalone execution.

Regards,
Arnab..

On Wed, Jun 16, 2021, 19:50 Mark Dokter <md...@know-center.at> wrote:

> Hey there!
>
> When testing the release candidate, Shafaq and I found jars without
> which SystemDS crashes under certain conditions.
>
> * janino*.jar is needed for map() and for codegen.
> * spark-core*.jar is needed for -exec hybrid when running locally
> (otherwise provided by Spark)
>
> These problems do not occur when using the source distribution as Maven
> pulls everything in that's needed.
>
> The safe solution is to include everything that Maven pulls into the lib
> directory. But taking a look at the size of this directory shows 190 MB.
> So aside from any legal redistribution issues (I hope there are none?)
> the release artifacts would be blown up by quite a bit.
>
> Any thoughts?
>
> Regards, Mark
>
>