You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Javier Rey <jr...@gmail.com> on 2016/07/11 02:42:40 UTC

Spark crashes with two parquet files

Hi everybody,

I installed Spark 1.6.1, I have two parquet files, but when I need show
registers using unionAll, Spark crash I don't understand what happens.

But when I use show() only one parquet file this is work correctly.

code with fault:

path = '/data/train_parquet/'
train_df = sqlContext.read.parquet(path)
train_df.take(1)

code works:

path = '/data/train_parquet/0_0_0.parquet'
train0_df = sqlContext.read.load(path)
train_df.take(1)

Thanks in advance.

Samir

Re: Spark crashes with two parquet files

Posted by Takeshi Yamamuro <li...@gmail.com>.

The log explicitly said "java.lang.OutOfMemoryError: Java heap space", so
you need to allocate more JVM memory for spark?

// maropu

On Mon, Jul 11, 2016 at 11:59 AM, Javier Rey <jr...@gmail.com> wrote:

> Also the problem appears when I used clause: unionAll
>
> 2016-07-10 21:58 GMT-05:00 Javier Rey <jr...@gmail.com>:
>
>> This is a part of trace log.
>>
>>  WARN TaskSetManager: Lost task 4.0 in stage 2.0 (TID 13, localhost):
>> java.lang.OutOfMemoryError: Java heap space
>>     at
>> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:755)
>>     at
>> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:494)
>>     at
>> org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader.checkEndOfRowGroup(UnsafeRowParquetRecord
>>
>> 2016-07-10 21:47 GMT-05:00 Takeshi Yamamuro <li...@gmail.com>:
>>
>>> Hi,
>>>
>>> What's the schema in the parquets?
>>> Also, could you show us the stack trace when the error happens?
>>>
>>> // maropu
>>>
>>> On Mon, Jul 11, 2016 at 11:42 AM, Javier Rey <jr...@gmail.com> wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> I installed Spark 1.6.1, I have two parquet files, but when I need show
>>>> registers using unionAll, Spark crash I don't understand what happens.
>>>>
>>>> But when I use show() only one parquet file this is work correctly.
>>>>
>>>> code with fault:
>>>>
>>>> path = '/data/train_parquet/'
>>>> train_df = sqlContext.read.parquet(path)
>>>> train_df.take(1)
>>>>
>>>> code works:
>>>>
>>>> path = '/data/train_parquet/0_0_0.parquet'
>>>> train0_df = sqlContext.read.load(path)
>>>> train_df.take(1)
>>>>
>>>> Thanks in advance.
>>>>
>>>> Samir
>>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Spark crashes with two parquet files

Posted by Takeshi Yamamuro <li...@gmail.com>.

Hi,

What's the schema in the parquets?
Also, could you show us the stack trace when the error happens?

// maropu

On Mon, Jul 11, 2016 at 11:42 AM, Javier Rey <jr...@gmail.com> wrote:

> Hi everybody,
>
> I installed Spark 1.6.1, I have two parquet files, but when I need show
> registers using unionAll, Spark crash I don't understand what happens.
>
> But when I use show() only one parquet file this is work correctly.
>
> code with fault:
>
> path = '/data/train_parquet/'
> train_df = sqlContext.read.parquet(path)
> train_df.take(1)
>
> code works:
>
> path = '/data/train_parquet/0_0_0.parquet'
> train0_df = sqlContext.read.load(path)
> train_df.take(1)
>
> Thanks in advance.
>
> Samir
>



-- 
---
Takeshi Yamamuro