You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Xi Shen <da...@gmail.com> on 2015/03/24 12:13:58 UTC

How to deploy binary dependencies to workers?

Hi,

I am doing ML using Spark mllib. However, I do not have full control to the
cluster. I am using Microsoft Azure HDInsight

I want to deploy the BLAS or whatever required dependencies to accelerate
the computation. But I don't know how to deploy those DLLs when I submit my
JAR to the cluster.

I know how to pack those DLLs into a jar. The real challenge is how to let
the system find them...


Thanks,
David

Re: How to deploy binary dependencies to workers?

Posted by Dean Wampler <de...@gmail.com>.
Both spark-submit and spark-shell have a --jars option for passing
additional jars to the cluster. They will be added to the appropriate
classpaths.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Tue, Mar 24, 2015 at 4:13 AM, Xi Shen <da...@gmail.com> wrote:

> Hi,
>
> I am doing ML using Spark mllib. However, I do not have full control to
> the cluster. I am using Microsoft Azure HDInsight
>
> I want to deploy the BLAS or whatever required dependencies to accelerate
> the computation. But I don't know how to deploy those DLLs when I submit my
> JAR to the cluster.
>
> I know how to pack those DLLs into a jar. The real challenge is how to let
> the system find them...
>
>
> Thanks,
> David
>
>

Re: How to deploy binary dependencies to workers?

Posted by Xi Shen <da...@gmail.com>.
OK, after various testing, I found the native library can be loaded if
running in yarn-cluster mode. But I still cannot find out why it won't load
when running in yarn-client mode...


Thanks,
David


On Thu, Mar 26, 2015 at 4:21 PM Xi Shen <da...@gmail.com> wrote:

> Not of course...all machines in HDInsight are Windows 64bit server. And I
> have made sure all my DLLs are for 64bit machines. I have managed to get
> those DLLs loade on my local machine which is also Windows 64bit.
>
>
>
>
> [image: --]
> Xi Shen
> [image: http://]about.me/davidshen
> <http://about.me/davidshen?promo=email_sig>
>   <http://about.me/davidshen>
>
> On Thu, Mar 26, 2015 at 11:11 AM, DB Tsai <db...@dbtsai.com> wrote:
>
>> Are you deploying the windows dll to linux machine?
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> Blog: https://www.dbtsai.com
>>
>>
>> On Wed, Mar 25, 2015 at 3:57 AM, Xi Shen <da...@gmail.com> wrote:
>> > I think you meant to use the "--files" to deploy the DLLs. I gave a
>> try, but
>> > it did not work.
>> >
>> > From the Spark UI, Environment tab, I can see
>> >
>> > spark.yarn.dist.files
>> >
>> >
>> file:/c:/openblas/libgcc_s_seh-1.dll,file:/c:/openblas/libblas3.dll,file:/c:/openblas/libgfortran-3.dll,file:/c:/openblas/liblapack3.dll,file:/c:/openblas/libquadmath-0.dll
>> >
>> > I think my DLLs are all deployed. But I still got the warn message that
>> > native BLAS library cannot be load.
>> >
>> > And idea?
>> >
>> >
>> > Thanks,
>> > David
>> >
>> >
>> > On Wed, Mar 25, 2015 at 5:40 AM DB Tsai <db...@dbtsai.com> wrote:
>> >>
>> >> I would recommend to upload those jars to HDFS, and use add jars
>> >> option in spark-submit with URI from HDFS instead of URI from local
>> >> filesystem. Thus, it can avoid the problem of fetching jars from
>> >> driver which can be a bottleneck.
>> >>
>> >> Sincerely,
>> >>
>> >> DB Tsai
>> >> -------------------------------------------------------
>> >> Blog: https://www.dbtsai.com
>> >>
>> >>
>> >> On Tue, Mar 24, 2015 at 4:13 AM, Xi Shen <da...@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I am doing ML using Spark mllib. However, I do not have full control
>> to
>> >> > the
>> >> > cluster. I am using Microsoft Azure HDInsight
>> >> >
>> >> > I want to deploy the BLAS or whatever required dependencies to
>> >> > accelerate
>> >> > the computation. But I don't know how to deploy those DLLs when I
>> submit
>> >> > my
>> >> > JAR to the cluster.
>> >> >
>> >> > I know how to pack those DLLs into a jar. The real challenge is how
>> to
>> >> > let
>> >> > the system find them...
>> >> >
>> >> >
>> >> > Thanks,
>> >> > David
>> >> >
>>
>
>

Re: How to deploy binary dependencies to workers?

Posted by Jörn Franke <jo...@gmail.com>.
You probably need to add the dll directory to the path (not  classpath!)
environment variable on all nodes.
Le 26 mars 2015 06:23, "Xi Shen" <da...@gmail.com> a écrit :

> Not of course...all machines in HDInsight are Windows 64bit server. And I
> have made sure all my DLLs are for 64bit machines. I have managed to get
> those DLLs loade on my local machine which is also Windows 64bit.
>
>
>
>
> [image: --]
> Xi Shen
> [image: http://]about.me/davidshen
> <http://about.me/davidshen?promo=email_sig>
>   <http://about.me/davidshen>
>
> On Thu, Mar 26, 2015 at 11:11 AM, DB Tsai <db...@dbtsai.com> wrote:
>
>> Are you deploying the windows dll to linux machine?
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> Blog: https://www.dbtsai.com
>>
>>
>> On Wed, Mar 25, 2015 at 3:57 AM, Xi Shen <da...@gmail.com> wrote:
>> > I think you meant to use the "--files" to deploy the DLLs. I gave a
>> try, but
>> > it did not work.
>> >
>> > From the Spark UI, Environment tab, I can see
>> >
>> > spark.yarn.dist.files
>> >
>> >
>> file:/c:/openblas/libgcc_s_seh-1.dll,file:/c:/openblas/libblas3.dll,file:/c:/openblas/libgfortran-3.dll,file:/c:/openblas/liblapack3.dll,file:/c:/openblas/libquadmath-0.dll
>> >
>> > I think my DLLs are all deployed. But I still got the warn message that
>> > native BLAS library cannot be load.
>> >
>> > And idea?
>> >
>> >
>> > Thanks,
>> > David
>> >
>> >
>> > On Wed, Mar 25, 2015 at 5:40 AM DB Tsai <db...@dbtsai.com> wrote:
>> >>
>> >> I would recommend to upload those jars to HDFS, and use add jars
>> >> option in spark-submit with URI from HDFS instead of URI from local
>> >> filesystem. Thus, it can avoid the problem of fetching jars from
>> >> driver which can be a bottleneck.
>> >>
>> >> Sincerely,
>> >>
>> >> DB Tsai
>> >> -------------------------------------------------------
>> >> Blog: https://www.dbtsai.com
>> >>
>> >>
>> >> On Tue, Mar 24, 2015 at 4:13 AM, Xi Shen <da...@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I am doing ML using Spark mllib. However, I do not have full control
>> to
>> >> > the
>> >> > cluster. I am using Microsoft Azure HDInsight
>> >> >
>> >> > I want to deploy the BLAS or whatever required dependencies to
>> >> > accelerate
>> >> > the computation. But I don't know how to deploy those DLLs when I
>> submit
>> >> > my
>> >> > JAR to the cluster.
>> >> >
>> >> > I know how to pack those DLLs into a jar. The real challenge is how
>> to
>> >> > let
>> >> > the system find them...
>> >> >
>> >> >
>> >> > Thanks,
>> >> > David
>> >> >
>>
>
>

Re: How to deploy binary dependencies to workers?

Posted by Xi Shen <da...@gmail.com>.
Not of course...all machines in HDInsight are Windows 64bit server. And I
have made sure all my DLLs are for 64bit machines. I have managed to get
those DLLs loade on my local machine which is also Windows 64bit.




[image: --]
Xi Shen
[image: http://]about.me/davidshen
<http://about.me/davidshen?promo=email_sig>
  <http://about.me/davidshen>

On Thu, Mar 26, 2015 at 11:11 AM, DB Tsai <db...@dbtsai.com> wrote:

> Are you deploying the windows dll to linux machine?
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> Blog: https://www.dbtsai.com
>
>
> On Wed, Mar 25, 2015 at 3:57 AM, Xi Shen <da...@gmail.com> wrote:
> > I think you meant to use the "--files" to deploy the DLLs. I gave a try,
> but
> > it did not work.
> >
> > From the Spark UI, Environment tab, I can see
> >
> > spark.yarn.dist.files
> >
> >
> file:/c:/openblas/libgcc_s_seh-1.dll,file:/c:/openblas/libblas3.dll,file:/c:/openblas/libgfortran-3.dll,file:/c:/openblas/liblapack3.dll,file:/c:/openblas/libquadmath-0.dll
> >
> > I think my DLLs are all deployed. But I still got the warn message that
> > native BLAS library cannot be load.
> >
> > And idea?
> >
> >
> > Thanks,
> > David
> >
> >
> > On Wed, Mar 25, 2015 at 5:40 AM DB Tsai <db...@dbtsai.com> wrote:
> >>
> >> I would recommend to upload those jars to HDFS, and use add jars
> >> option in spark-submit with URI from HDFS instead of URI from local
> >> filesystem. Thus, it can avoid the problem of fetching jars from
> >> driver which can be a bottleneck.
> >>
> >> Sincerely,
> >>
> >> DB Tsai
> >> -------------------------------------------------------
> >> Blog: https://www.dbtsai.com
> >>
> >>
> >> On Tue, Mar 24, 2015 at 4:13 AM, Xi Shen <da...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I am doing ML using Spark mllib. However, I do not have full control
> to
> >> > the
> >> > cluster. I am using Microsoft Azure HDInsight
> >> >
> >> > I want to deploy the BLAS or whatever required dependencies to
> >> > accelerate
> >> > the computation. But I don't know how to deploy those DLLs when I
> submit
> >> > my
> >> > JAR to the cluster.
> >> >
> >> > I know how to pack those DLLs into a jar. The real challenge is how to
> >> > let
> >> > the system find them...
> >> >
> >> >
> >> > Thanks,
> >> > David
> >> >
>

Re: How to deploy binary dependencies to workers?

Posted by DB Tsai <db...@dbtsai.com>.
Are you deploying the windows dll to linux machine?

Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com


On Wed, Mar 25, 2015 at 3:57 AM, Xi Shen <da...@gmail.com> wrote:
> I think you meant to use the "--files" to deploy the DLLs. I gave a try, but
> it did not work.
>
> From the Spark UI, Environment tab, I can see
>
> spark.yarn.dist.files
>
> file:/c:/openblas/libgcc_s_seh-1.dll,file:/c:/openblas/libblas3.dll,file:/c:/openblas/libgfortran-3.dll,file:/c:/openblas/liblapack3.dll,file:/c:/openblas/libquadmath-0.dll
>
> I think my DLLs are all deployed. But I still got the warn message that
> native BLAS library cannot be load.
>
> And idea?
>
>
> Thanks,
> David
>
>
> On Wed, Mar 25, 2015 at 5:40 AM DB Tsai <db...@dbtsai.com> wrote:
>>
>> I would recommend to upload those jars to HDFS, and use add jars
>> option in spark-submit with URI from HDFS instead of URI from local
>> filesystem. Thus, it can avoid the problem of fetching jars from
>> driver which can be a bottleneck.
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> Blog: https://www.dbtsai.com
>>
>>
>> On Tue, Mar 24, 2015 at 4:13 AM, Xi Shen <da...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am doing ML using Spark mllib. However, I do not have full control to
>> > the
>> > cluster. I am using Microsoft Azure HDInsight
>> >
>> > I want to deploy the BLAS or whatever required dependencies to
>> > accelerate
>> > the computation. But I don't know how to deploy those DLLs when I submit
>> > my
>> > JAR to the cluster.
>> >
>> > I know how to pack those DLLs into a jar. The real challenge is how to
>> > let
>> > the system find them...
>> >
>> >
>> > Thanks,
>> > David
>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to deploy binary dependencies to workers?

Posted by Xi Shen <da...@gmail.com>.
I think you meant to use the "--files" to deploy the DLLs. I gave a try,
but it did not work.

>From the Spark UI, Environment tab, I can see

spark.yarn.dist.files

file:/c:/openblas/libgcc_s_seh-1.dll,file:/c:/openblas/libblas3.dll,file:/c:/openblas/libgfortran-3.dll,file:/c:/openblas/liblapack3.dll,file:/c:/openblas/libquadmath-0.dll

I think my DLLs are all deployed. But I still got the warn message that
native BLAS library cannot be load.

And idea?


Thanks,
David


On Wed, Mar 25, 2015 at 5:40 AM DB Tsai <db...@dbtsai.com> wrote:

> I would recommend to upload those jars to HDFS, and use add jars
> option in spark-submit with URI from HDFS instead of URI from local
> filesystem. Thus, it can avoid the problem of fetching jars from
> driver which can be a bottleneck.
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> Blog: https://www.dbtsai.com
>
>
> On Tue, Mar 24, 2015 at 4:13 AM, Xi Shen <da...@gmail.com> wrote:
> > Hi,
> >
> > I am doing ML using Spark mllib. However, I do not have full control to
> the
> > cluster. I am using Microsoft Azure HDInsight
> >
> > I want to deploy the BLAS or whatever required dependencies to accelerate
> > the computation. But I don't know how to deploy those DLLs when I submit
> my
> > JAR to the cluster.
> >
> > I know how to pack those DLLs into a jar. The real challenge is how to
> let
> > the system find them...
> >
> >
> > Thanks,
> > David
> >
>

Re: How to deploy binary dependencies to workers?

Posted by DB Tsai <db...@dbtsai.com>.
I would recommend to upload those jars to HDFS, and use add jars
option in spark-submit with URI from HDFS instead of URI from local
filesystem. Thus, it can avoid the problem of fetching jars from
driver which can be a bottleneck.

Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com


On Tue, Mar 24, 2015 at 4:13 AM, Xi Shen <da...@gmail.com> wrote:
> Hi,
>
> I am doing ML using Spark mllib. However, I do not have full control to the
> cluster. I am using Microsoft Azure HDInsight
>
> I want to deploy the BLAS or whatever required dependencies to accelerate
> the computation. But I don't know how to deploy those DLLs when I submit my
> JAR to the cluster.
>
> I know how to pack those DLLs into a jar. The real challenge is how to let
> the system find them...
>
>
> Thanks,
> David
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org