You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Sofiane Cherchalli <so...@gmail.com> on 2017/05/07 08:22:55 UTC

Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Hi,

I have a standalone cluster, one master and one worker, running in separate
nodes. Zeppelin is running is in a separate node too in client mode.

When I run a notebook that reads a CSV file located in the worker node with
Spark-CSV package, Zeppelin tries to read the CSV locally and fails because
the CVS is in the worker node and not in Zeppelin node.

Is this the expected behavior?

Thanks.

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Posted by Sofiane Cherchalli <so...@gmail.com>.

I've put the csv in the worker node since the job is run in the worker. I
didn't put the csv in the master because I believe it doesn't run jobs.

If I put the csv in the zeppelin node with the same path as the worker, it
reads the csv and writes a _SUCCESS file locally. The job is run on the
worker too but doesn't terminate. The result is saved under a _temporary
directory in the worker.

worker - ls -laRt /data/02.csv/


02.csv/:
total 0
drwxr-xr-x. 3 root root 24 Apr 28 09:55 .
drwxr-xr-x. 3 root root 15 Apr 28 09:55 _temporary
drwxr-xr-x. 3 root root 64 Apr 28 09:55 ..

02.csv/_temporary:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 0
drwxr-xr-x. 3 root root  15 Apr 28 09:55 .
drwxr-xr-x. 3 root root  24 Apr 28 09:55 ..

02.csv/_temporary/0:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 .
drwxr-xr-x. 2 root root   6 Apr 28 09:56 _temporary
drwxr-xr-x. 2 root root 129 Apr 28 09:56 task_20170428095632_0005_m_000000
drwxr-xr-x. 2 root root 129 Apr 28 09:55 task_20170428095516_0002_m_000000
drwxr-xr-x. 3 root root  15 Apr 28 09:55 ..

02.csv/_temporary/0/_temporary:
total 0
drwxr-xr-x. 2 root root   6 Apr 28 09:56 .
drwxr-xr-x. 5 root root 106 Apr 28 09:56 ..

02.csv/_temporary/0/task_20170428095632_0005_m_000000:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:56
.part-00000-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:56
part-00000-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv
drwxr-xr-x. 2 root root   129 Apr 28 09:56 .

02.csv/_temporary/0/task_20170428095516_0002_m_000000:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:55
.part-00000-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:55
part-00000-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv


zeppelin - ls -laRt 02.csv/


02.csv/:
total 12
drwxr-sr-x    2 root     10000700      4096 Apr 28 09:56 .
-rw-r--r--    1 root     10000700         8 Apr 28 09:56 ._SUCCESS.crc
-rw-r--r--    1 root     10000700         0 Apr 28 09:56 _SUCCESS
drwxrwsr-x    5 root     10000700      4096 Apr 28 09:56 ..




El El mié, 10 may 2017 a las 14:06, Meethu Mathew <me...@flytxt.com>
escribió:

> Try putting the csv in the same path in all the nodes or in a mount point
> path which is accessible by all the nodes
>
> Regards,
>
>
> Meethu Mathew
>
>
> On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli <so...@gmail.com>
> wrote:
>
>> Yes, I already tested with spark-shell and pyspark , with the same result.
>>
>> Can't I use Linux filesystem to read CSV, such as file:///data/file.csv.
>> My understanding is that the job is sent and is interpreted in the worker,
>> isn't it?
>>
>> Thanks.
>>
>> El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <jo...@gmail.com>
>> escribió:
>>
>>> Could you test if it works with spark-shell?
>>>
>>> On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <so...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a standalone cluster, one master and one worker, running in
>>>> separate nodes. Zeppelin is running is in a separate node too in client
>>>> mode.
>>>>
>>>> When I run a notebook that reads a CSV file located in the worker
>>>> node with Spark-CSV package, Zeppelin tries to read the CSV locally and
>>>> fails because the CVS is in the worker node and not in Zeppelin node.
>>>>
>>>> Is this the expected behavior?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Posted by Meethu Mathew <me...@flytxt.com>.

Try putting the csv in the same path in all the nodes or in a mount point
path which is accessible by all the nodes

Regards,
Meethu Mathew


On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli <so...@gmail.com>
wrote:

> Yes, I already tested with spark-shell and pyspark , with the same result.
>
> Can't I use Linux filesystem to read CSV, such as file:///data/file.csv.
> My understanding is that the job is sent and is interpreted in the worker,
> isn't it?
>
> Thanks.
>
> El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <jo...@gmail.com>
> escribió:
>
>> Could you test if it works with spark-shell?
>>
>> On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <so...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a standalone cluster, one master and one worker, running in
>>> separate nodes. Zeppelin is running is in a separate node too in client
>>> mode.
>>>
>>> When I run a notebook that reads a CSV file located in the worker
>>> node with Spark-CSV package, Zeppelin tries to read the CSV locally and
>>> fails because the CVS is in the worker node and not in Zeppelin node.
>>>
>>> Is this the expected behavior?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>
>

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Posted by Sofiane Cherchalli <so...@gmail.com>.

Yes, I already tested with spark-shell and pyspark , with the same result.

Can't I use Linux filesystem to read CSV, such as file:///data/file.csv. My
understanding is that the job is sent and is interpreted in the worker,
isn't it?

Thanks.

El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <jo...@gmail.com>
escribió:

> Could you test if it works with spark-shell?
>
> On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <so...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a standalone cluster, one master and one worker, running in
>> separate nodes. Zeppelin is running is in a separate node too in client
>> mode.
>>
>> When I run a notebook that reads a CSV file located in the worker
>> node with Spark-CSV package, Zeppelin tries to read the CSV locally and
>> fails because the CVS is in the worker node and not in Zeppelin node.
>>
>> Is this the expected behavior?
>>
>> Thanks.
>>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Posted by Jongyoul Lee <jo...@gmail.com>.

Could you test if it works with spark-shell?

On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <so...@gmail.com>
wrote:

> Hi,
>
> I have a standalone cluster, one master and one worker, running in
> separate nodes. Zeppelin is running is in a separate node too in client
> mode.
>
> When I run a notebook that reads a CSV file located in the worker
> node with Spark-CSV package, Zeppelin tries to read the CSV locally and
> fails because the CVS is in the worker node and not in Zeppelin node.
>
> Is this the expected behavior?
>
> Thanks.
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net