You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Kunal Ghosh <ku...@icedq.com> on 2015/08/28 15:47:20 UTC

Regarding drill jdbc with big file

Hi,

I am new to apache drill. I have configured apache drill on machine with centos.

"DRILL_MAX_DIRECT_MEMORY" = 25g
"DRILL_HEAP" = 4g

I have a 600 mb and 3 gb json file [sample file attached]. When i fire query on
relativly small size file everything works fine but as I fire same query with
600 mb and 3 gb files it gives following error.

Query -
select tbl5.product_id product_id,tbl5.gender gender,tbl5.item_number
item_number,tbl5.price price,tbl5.description
description,tbl5.color_swatch.image image,tbl5.color_swatch.color color from
(select tbl4.product_id product_id,tbl4.gender gender,tbl4.item_number
item_number,tbl4.price price,tbl4.size.description
description,FLATTEN(tbl4.size.color_swatch) color_swatch from
(select tbl3.product_id product_id,tbl3.catalog_item.gender
gender,tbl3.catalog_item.item_number item_number,tbl3.catalog_item.price
price,FLATTEN(tbl3.catalog_item.size) size from
(select tbl2.product.product_id as product_id,FLATTEN(tbl2.product.catalog_item)
as catalog_item from
(select FLATTEN(tbl1.catalog.product) product from dfs.root.`demo.json` tbl1)
tbl2) tbl3) tbl4) tbl5
--------------------------------------------------------------------------------------------------
Error -

SYSTEM ERROR: IllegalArgumentException: initialCapacity: -2147483648 (expectd:
0+)

Fragment 0:0

[Error Id: 60cf1b95-762d-4a0d-8cae-a2db418d4ea9 on sinhagad:31010]

--------------------------------------------------------------------------------------------------

1) Am i doing someting wrong or missing something ( probably because i am not
using cluster ?? ).

Please guide me through this.

Thanks & Regards

Kunal Ghosh

Re: Regarding drill jdbc with big file

Posted by Andries Engelbrecht <ae...@maprtech.com>.

I also commented on the JIRA.

How much memory is available on the system for Drill?

Also see what happens when you increase the planner query memory on the node, as the files are large and will execute in a single thread. Normally it is better to have JSON files in the 128-256MB range size pending the use case, as it will allow for better execution with more threads than a single large file.

See what the query memory per node is set at and increase it to see if it resolves your problem.
The parameter is planner.memory.max_query_memory_per_node
Query sys.options to see what it is set as and use alter system to modify.
https://drill.apache.org/docs/configuring-drill-memory/ <https://drill.apache.org/docs/configuring-drill-memory/>
https://drill.apache.org/docs/alter-system/ <https://drill.apache.org/docs/alter-system/>
https://drill.apache.org/docs/configuration-options-introduction/ <https://drill.apache.org/docs/configuration-options-introduction/>

—Andries

> On Aug 28, 2015, at 7:00 AM, rahul challapalli <ch...@gmail.com> wrote:
> 
> Can you search for the error id in the logs and post the stack trace?
> 
> It looks like an overflow bug to me.
> 
> - Rahul
> On Aug 28, 2015 6:47 AM, "Kunal Ghosh" <ku...@icedq.com> wrote:
> 
>> Hi,
>> 
>> I am new to apache drill. I have configured apache drill on machine with
>> centos.
>> 
>> "DRILL_MAX_DIRECT_MEMORY" = 25g
>> "DRILL_HEAP" = 4g
>> 
>> I have a 600 mb and 3 gb json file [sample file attached]. When i fire
>> query on relativly small size file everything works fine but as I fire same
>> query with 600 mb and 3 gb files it gives following error.
>> 
>> Query -
>> select tbl5.product_id product_id,tbl5.gender gender,tbl5.item_number
>> item_number,tbl5.price price,tbl5.description
>> description,tbl5.color_swatch.image image,tbl5.color_swatch.color color from
>> (select tbl4.product_id product_id,tbl4.gender gender,tbl4.item_number
>> item_number,tbl4.price price,tbl4.size.description
>> description,FLATTEN(tbl4.size.color_swatch) color_swatch from
>> (select tbl3.product_id product_id,tbl3.catalog_item.gender
>> gender,tbl3.catalog_item.item_number item_number,tbl3.catalog_item.price
>> price,FLATTEN(tbl3.catalog_item.size) size from
>> (select tbl2.product.product_id as
>> product_id,FLATTEN(tbl2.product.catalog_item) as catalog_item from
>> (select FLATTEN(tbl1.catalog.product) product from dfs.root.`demo.json`
>> tbl1) tbl2) tbl3) tbl4) tbl5
>> 
>> --------------------------------------------------------------------------------------------------
>> Error -
>> 
>> SYSTEM ERROR: IllegalArgumentException: initialCapacity: -2147483648
>> (expectd: 0+)
>> 
>> Fragment 0:0
>> 
>> [Error Id: 60cf1b95-762d-4a0d-8cae-a2db418d4ea9 on sinhagad:31010]
>> 
>> 
>> --------------------------------------------------------------------------------------------------
>> 
>> 1) Am i doing someting wrong or missing something ( probably because i am
>> not using cluster ?? ).
>> 
>> Please guide me through this.
>> 
>> Thanks & Regards
>> 
>> Kunal Ghosh
>>

Re: Regarding drill jdbc with big file

Posted by rahul challapalli <ch...@gmail.com>.

Can you search for the error id in the logs and post the stack trace?

It looks like an overflow bug to me.

- Rahul
On Aug 28, 2015 6:47 AM, "Kunal Ghosh" <ku...@icedq.com> wrote:

> Hi,
>
> I am new to apache drill. I have configured apache drill on machine with
> centos.
>
> "DRILL_MAX_DIRECT_MEMORY" = 25g
> "DRILL_HEAP" = 4g
>
> I have a 600 mb and 3 gb json file [sample file attached]. When i fire
> query on relativly small size file everything works fine but as I fire same
> query with 600 mb and 3 gb files it gives following error.
>
> Query -
> select tbl5.product_id product_id,tbl5.gender gender,tbl5.item_number
> item_number,tbl5.price price,tbl5.description
> description,tbl5.color_swatch.image image,tbl5.color_swatch.color color from
> (select tbl4.product_id product_id,tbl4.gender gender,tbl4.item_number
> item_number,tbl4.price price,tbl4.size.description
> description,FLATTEN(tbl4.size.color_swatch) color_swatch from
> (select tbl3.product_id product_id,tbl3.catalog_item.gender
> gender,tbl3.catalog_item.item_number item_number,tbl3.catalog_item.price
> price,FLATTEN(tbl3.catalog_item.size) size from
> (select tbl2.product.product_id as
> product_id,FLATTEN(tbl2.product.catalog_item) as catalog_item from
> (select FLATTEN(tbl1.catalog.product) product from dfs.root.`demo.json`
> tbl1) tbl2) tbl3) tbl4) tbl5
>
> --------------------------------------------------------------------------------------------------
> Error -
>
> SYSTEM ERROR: IllegalArgumentException: initialCapacity: -2147483648
> (expectd: 0+)
>
> Fragment 0:0
>
> [Error Id: 60cf1b95-762d-4a0d-8cae-a2db418d4ea9 on sinhagad:31010]
>
>
> --------------------------------------------------------------------------------------------------
>
> 1) Am i doing someting wrong or missing something ( probably because i am
> not using cluster ?? ).
>
> Please guide me through this.
>
> Thanks & Regards
>
> Kunal Ghosh
>