You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by David M <da...@gmail.com> on 2015/10/08 22:13:03 UTC

mesos fetch uri when behind a squid proxy

Hi everyone.

I have Mesos cluster (0.24.1) for running Spark (1.5.2) that runs great.

I have a requirement to move my Mesos cluster nodes behind a Squid http
proxy.
All cluster nodes previously had direct outbound Internet access so
accessing SPARK_EXECUTOR_URI from a public source was not a problem.

System-wide I have http_proxy and https_proxy environment variables set.
Command line tools like curl and wget operate just fine against Internet
resources.
After configuring Maven's proxy settings the Mesos build completed
successfully.

I copied my /etc/hosts file to HDFS and attempted the WordCount example
from:
http://documentation.altiscale.com/spark-shell-examples-1-1

It failed with this in the executors stderr file:

I1008 15:39:48.417644 21698 logging.cpp:172] INFO level logging started!
I1008 15:39:48.417819 21698 fetcher.cpp:414] Fetcher Info:
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151007-154648-2701359370-5050-25191-S3\/spark","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/
d3kbcqa49mib13.cloudfront.net
\/spark-1.5.1-bin-hadoop2.6.tgz"}}],"sandbox_directory":"\/var\/run\/mesos\/slaves\/20151007-154648-2701359370-5050-25191-S3\/frameworks\/20151008-123957-2701359370-5050-6382-0001\/executors\/20151007-154648-2701359370-5050-25191-S3\/runs\/507827fb-cfb0-4a1d-977d-9b9afb972c29","user":"spark"}
I1008 15:39:48.418918 21698 fetcher.cpp:369] Fetching URI '
http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz'
I1008 15:39:48.418936 21698 fetcher.cpp:243] Fetching directly into the
sandbox directory
I1008 15:39:48.418949 21698 fetcher.cpp:180] Fetching URI '
http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz'
I1008 15:39:48.418958 21698 fetcher.cpp:127] Downloading resource from '
http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz' to
'/var/run/mesos/slaves/20151007-154648-2701359370-5050-25191-S3/frameworks/20151008-123957-2701359370-5050-6382-0001/executors/20151007-154648-2701359370-5050-25191-S3/runs/507827fb-cfb0-4a1d-977d-9b9afb972c29/spark-1.5.1-bin-hadoop2.6.tgz'
Failed to fetch '
http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz': Error
downloading resource, received HTTP return code 400
Failed to synchronize with slave (it's probably exited)

Long troubleshooting story short it appears that libcurl isn't finding out
about my proxy.

In ./3rdparty/libprocess/3rdparty/stout/include/stout/posix/net.hpp

I added

  curl_easy_setopt(curl, CURLOPT_VERBOSE, true);
  curl_easy_setopt(curl, CURLOPT_PROXY, "<my squid server hostname here>");
  curl_easy_setopt(curl, CURLOPT_PROXYPORT, <my squid server port here>);

before

CURLcode curlErrorCode = curl_easy_perform(curl);

I then recompiled Mesos and the WordCount example is successful.

What is the correct way to set proxy so that libcurl will make use of it?

Thank you.
David

Re: mesos fetch uri when behind a squid proxy

Posted by Greg Mann <gr...@mesosphere.io>.
It's possible that Spark sets the executor environment explicitly, which
would lead to the http_proxy and https_proxy environment variables not
being passed along to the executor. You could try using the
`--executor_environment_variables` command-line flag when running the agent
to specify these environment variables, ensuring that they get passed
through.

On Sat, Oct 31, 2015 at 12:06 AM, Zhongyue Luo <zh...@gmail.com>
wrote:

> Any advise on this issue? I'm having the same problem.
>
> On Fri, Oct 9, 2015 at 4:13 AM, David M <da...@gmail.com> wrote:
>
>> Hi everyone.
>>
>> I have Mesos cluster (0.24.1) for running Spark (1.5.2) that runs great.
>>
>> I have a requirement to move my Mesos cluster nodes behind a Squid http
>> proxy.
>> All cluster nodes previously had direct outbound Internet access so
>> accessing SPARK_EXECUTOR_URI from a public source was not a problem.
>>
>> System-wide I have http_proxy and https_proxy environment variables set.
>> Command line tools like curl and wget operate just fine against Internet
>> resources.
>> After configuring Maven's proxy settings the Mesos build completed
>> successfully.
>>
>> I copied my /etc/hosts file to HDFS and attempted the WordCount example
>> from:
>> http://documentation.altiscale.com/spark-shell-examples-1-1
>>
>> It failed with this in the executors stderr file:
>>
>> I1008 15:39:48.417644 21698 logging.cpp:172] INFO level logging started!
>> I1008 15:39:48.417819 21698 fetcher.cpp:414] Fetcher Info:
>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151007-154648-2701359370-5050-25191-S3\/spark","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/
>> d3kbcqa49mib13.cloudfront.net
>> \/spark-1.5.1-bin-hadoop2.6.tgz"}}],"sandbox_directory":"\/var\/run\/mesos\/slaves\/20151007-154648-2701359370-5050-25191-S3\/frameworks\/20151008-123957-2701359370-5050-6382-0001\/executors\/20151007-154648-2701359370-5050-25191-S3\/runs\/507827fb-cfb0-4a1d-977d-9b9afb972c29","user":"spark"}
>> I1008 15:39:48.418918 21698 fetcher.cpp:369] Fetching URI '
>> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz'
>> I1008 15:39:48.418936 21698 fetcher.cpp:243] Fetching directly into the
>> sandbox directory
>> I1008 15:39:48.418949 21698 fetcher.cpp:180] Fetching URI '
>> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz'
>> I1008 15:39:48.418958 21698 fetcher.cpp:127] Downloading resource from '
>> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz' to
>> '/var/run/mesos/slaves/20151007-154648-2701359370-5050-25191-S3/frameworks/20151008-123957-2701359370-5050-6382-0001/executors/20151007-154648-2701359370-5050-25191-S3/runs/507827fb-cfb0-4a1d-977d-9b9afb972c29/spark-1.5.1-bin-hadoop2.6.tgz'
>> Failed to fetch '
>> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz':
>> Error downloading resource, received HTTP return code 400
>> Failed to synchronize with slave (it's probably exited)
>>
>> Long troubleshooting story short it appears that libcurl isn't finding
>> out about my proxy.
>>
>> In ./3rdparty/libprocess/3rdparty/stout/include/stout/posix/net.hpp
>>
>> I added
>>
>>   curl_easy_setopt(curl, CURLOPT_VERBOSE, true);
>>   curl_easy_setopt(curl, CURLOPT_PROXY, "<my squid server hostname
>> here>");
>>   curl_easy_setopt(curl, CURLOPT_PROXYPORT, <my squid server port here>);
>>
>> before
>>
>> CURLcode curlErrorCode = curl_easy_perform(curl);
>>
>> I then recompiled Mesos and the WordCount example is successful.
>>
>> What is the correct way to set proxy so that libcurl will make use of it?
>>
>> Thank you.
>> David
>>
>>
>>
>>
>
>
> --
> *Intel SSG/STO/BDT*
> 880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
> China
> +862161166500
>

Re: mesos fetch uri when behind a squid proxy

Posted by Zhongyue Luo <zh...@gmail.com>.
Any advise on this issue? I'm having the same problem.

On Fri, Oct 9, 2015 at 4:13 AM, David M <da...@gmail.com> wrote:

> Hi everyone.
>
> I have Mesos cluster (0.24.1) for running Spark (1.5.2) that runs great.
>
> I have a requirement to move my Mesos cluster nodes behind a Squid http
> proxy.
> All cluster nodes previously had direct outbound Internet access so
> accessing SPARK_EXECUTOR_URI from a public source was not a problem.
>
> System-wide I have http_proxy and https_proxy environment variables set.
> Command line tools like curl and wget operate just fine against Internet
> resources.
> After configuring Maven's proxy settings the Mesos build completed
> successfully.
>
> I copied my /etc/hosts file to HDFS and attempted the WordCount example
> from:
> http://documentation.altiscale.com/spark-shell-examples-1-1
>
> It failed with this in the executors stderr file:
>
> I1008 15:39:48.417644 21698 logging.cpp:172] INFO level logging started!
> I1008 15:39:48.417819 21698 fetcher.cpp:414] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151007-154648-2701359370-5050-25191-S3\/spark","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/
> d3kbcqa49mib13.cloudfront.net
> \/spark-1.5.1-bin-hadoop2.6.tgz"}}],"sandbox_directory":"\/var\/run\/mesos\/slaves\/20151007-154648-2701359370-5050-25191-S3\/frameworks\/20151008-123957-2701359370-5050-6382-0001\/executors\/20151007-154648-2701359370-5050-25191-S3\/runs\/507827fb-cfb0-4a1d-977d-9b9afb972c29","user":"spark"}
> I1008 15:39:48.418918 21698 fetcher.cpp:369] Fetching URI '
> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz'
> I1008 15:39:48.418936 21698 fetcher.cpp:243] Fetching directly into the
> sandbox directory
> I1008 15:39:48.418949 21698 fetcher.cpp:180] Fetching URI '
> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz'
> I1008 15:39:48.418958 21698 fetcher.cpp:127] Downloading resource from '
> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz' to
> '/var/run/mesos/slaves/20151007-154648-2701359370-5050-25191-S3/frameworks/20151008-123957-2701359370-5050-6382-0001/executors/20151007-154648-2701359370-5050-25191-S3/runs/507827fb-cfb0-4a1d-977d-9b9afb972c29/spark-1.5.1-bin-hadoop2.6.tgz'
> Failed to fetch '
> http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz':
> Error downloading resource, received HTTP return code 400
> Failed to synchronize with slave (it's probably exited)
>
> Long troubleshooting story short it appears that libcurl isn't finding out
> about my proxy.
>
> In ./3rdparty/libprocess/3rdparty/stout/include/stout/posix/net.hpp
>
> I added
>
>   curl_easy_setopt(curl, CURLOPT_VERBOSE, true);
>   curl_easy_setopt(curl, CURLOPT_PROXY, "<my squid server hostname here>");
>   curl_easy_setopt(curl, CURLOPT_PROXYPORT, <my squid server port here>);
>
> before
>
> CURLcode curlErrorCode = curl_easy_perform(curl);
>
> I then recompiled Mesos and the WordCount example is successful.
>
> What is the correct way to set proxy so that libcurl will make use of it?
>
> Thank you.
> David
>
>
>
>


-- 
*Intel SSG/STO/BDT*
880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
China
+862161166500