You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Stefan Sedich <st...@gmail.com> on 2016/05/13 16:07:41 UTC

CTAS Out of Memory

Just trying to do a CTAS on a postgres table, it is not huge and only has
16 odd million rows, I end up with an out of memory after a while.

Unable to handle out of memory condition in FragmentExecutor.

java.lang.OutOfMemoryError: GC overhead limit exceeded


Is there a way to avoid this without needing to do the CTAS on a subset of
my table?

Re: CTAS Out of Memory

Posted by Stefan Sedich <st...@gmail.com>.
Jacques,

I will look into it at some point this week, as it is a side project I
might not get time to do it until later in the week, and as I am not overly
familiar with Java profiling I will need to work out that part too!

Will get back to you when I know more.



--
Stefan

On Sat, May 14, 2016 at 4:49 PM Jacques Nadeau <ja...@dremio.com> wrote:

> Any chance you can do a jmap/jhat combo and take a look at what is holding
> the most memory? I'm guessing we're not managing the Postgres JDBC cursor
> or backpressure correctly.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <st...@gmail.com>
> wrote:
>
> > Just trying to do a CTAS on a postgres table, it is not huge and only has
> > 16 odd million rows, I end up with an out of memory after a while.
> >
> > Unable to handle out of memory condition in FragmentExecutor.
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> >
> > Is there a way to avoid this without needing to do the CTAS on a subset
> of
> > my table?
> >
>

Re: CTAS Out of Memory

Posted by Jacques Nadeau <ja...@dremio.com>.
Any chance you can do a jmap/jhat combo and take a look at what is holding
the most memory? I'm guessing we're not managing the Postgres JDBC cursor
or backpressure correctly.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <st...@gmail.com>
wrote:

> Just trying to do a CTAS on a postgres table, it is not huge and only has
> 16 odd million rows, I end up with an out of memory after a while.
>
> Unable to handle out of memory condition in FragmentExecutor.
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>
> Is there a way to avoid this without needing to do the CTAS on a subset of
> my table?
>

Re: CTAS Out of Memory

Posted by Stefan Sedich <st...@gmail.com>.
Interesting,

Wonder if it is related to the varchar issue Zelaine mentioned above, even
with the specific columns specified the query plan shows a SELECT * being
pushed to postgres, does the select not send down the specific columns?

I will create another table with only the columns I want and try again to
see if it is in fact due to the varchar columns.


Thanks

On Fri, May 13, 2016 at 10:54 AM Stefan Sedich <st...@gmail.com>
wrote:

> Jason.
>
> Ran the following:
>
> alter session set `store.format`='csv';
> create table dfs.tmp.foo as select * from my_large_table;
>
>
> Same end result, chews memory until it heaps my heap size and eventually
> hits the OOM, this table has a number of varchar columns but I did only
> select a couple columns in my select, so was hoping it would avoid the
> issue mentioned above with varchar columns, I will create some other test
> tables later with only the values I need and see how that works out.
>
>
>
> Thanks
>
> On Fri, May 13, 2016 at 10:38 AM Jason Altekruse <ja...@dremio.com> wrote:
>
>> I am curious if this is a bug in the JDBC plugin. Can you try to change
>> the
>> output format to CSV? In that case we don't do any large buffering.
>>
>> Jason Altekruse
>> Software Engineer at Dremio
>> Apache Drill Committer
>>
>> On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich <st...@gmail.com>
>> wrote:
>>
>> > Seems like it just ran out of memory again and was not hanging. I tried
>> to
>> > append a limit 100 to the select query and it still runs out of memory,
>> > Just ran the CTAS against some other smaller tables and it works fine.
>> >
>> > I will play around with this some more on the weekend, I can only
>> assume I
>> > am messing something up here, I have in the past created parquet files
>> from
>> > large tables without any issue, will report back.
>> >
>> >
>> >
>> > Thanks
>> >
>> > On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche <
>> > adeneche@maprtech.com>
>> > wrote:
>> >
>> > > Stefan,
>> > >
>> > > Can you share the query profile for the query that seems to be running
>> > > forever ? you won't find it on disk but you can append .json to the
>> > profile
>> > > web url and save the file.
>> > >
>> > > Thanks
>> > >
>> > > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich <
>> stefan.sedich@gmail.com>
>> > > wrote:
>> > >
>> > > > Zelaine,
>> > > >
>> > > > It does, I forgot about those ones, I will do a test where I filter
>> > those
>> > > > out and see how I go, in my test with a 12GB heap size it seemed to
>> > just
>> > > > sit there forever and not finish.
>> > > >
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com>
>> > wrote:
>> > > >
>> > > > > Stefan,
>> > > > >
>> > > > > Does your source data contain varchar columns?  We've seen
>> instances
>> > > > where
>> > > > > Drill isn't as efficient as it can be when Parquet is dealing with
>> > > > variable
>> > > > > length columns.
>> > > > >
>> > > > > -- Zelaine
>> > > > >
>> > > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
>> > > stefan.sedich@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks for getting back to me so fast!
>> > > > > >
>> > > > > > I was just playing with that now, went up to 8GB and still ran
>> into
>> > > it,
>> > > > > > trying to go higher to see if I can find the sweet spot, only
>> got
>> > > 16GB
>> > > > > > total RAM on this laptop :)
>> > > > > >
>> > > > > > Is this an expected amount of memory for not an overly huge
>> table
>> > (16
>> > > > > > million rows, 6 columns of integers), even now at a 12GB heap
>> seems
>> > > to
>> > > > > have
>> > > > > > filled up again.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <
>> jason@dremio.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > I could not find anywhere this is mentioned in the docs, but
>> it
>> > has
>> > > > > come
>> > > > > > up
>> > > > > > > a few times one the list. While we made a number of efforts to
>> > move
>> > > > our
>> > > > > > > interactions with the Parquet library to the off-heap memory
>> > (which
>> > > > we
>> > > > > > use
>> > > > > > > everywhere else in the engine during processing) the version
>> of
>> > the
>> > > > > > writer
>> > > > > > > we are using still buffers a non-trivial amount of data into
>> heap
>> > > > > memory
>> > > > > > > when writing parquet files. Try raising your JVM heap memory
>> in
>> > > > > > > drill-env.sh on startup and see if that prevents the out of
>> > memory
>> > > > > issue.
>> > > > > > >
>> > > > > > > Jason Altekruse
>> > > > > > > Software Engineer at Dremio
>> > > > > > > Apache Drill Committer
>> > > > > > >
>> > > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
>> > > > > stefan.sedich@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Just trying to do a CTAS on a postgres table, it is not huge
>> > and
>> > > > only
>> > > > > > has
>> > > > > > > > 16 odd million rows, I end up with an out of memory after a
>> > > while.
>> > > > > > > >
>> > > > > > > > Unable to handle out of memory condition in
>> FragmentExecutor.
>> > > > > > > >
>> > > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Is there a way to avoid this without needing to do the CTAS
>> on
>> > a
>> > > > > subset
>> > > > > > > of
>> > > > > > > > my table?
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Abdelhakim Deneche
>> > >
>> > > Software Engineer
>> > >
>> > >   <http://www.mapr.com/>
>> > >
>> > >
>> > > Now Available - Free Hadoop On-Demand Training
>> > > <
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > >
>> > >
>> >
>>
>

Re: CTAS Out of Memory

Posted by Stefan Sedich <st...@gmail.com>.
Jason.

Ran the following:

alter session set `store.format`='csv';
create table dfs.tmp.foo as select * from my_large_table;


Same end result, chews memory until it heaps my heap size and eventually
hits the OOM, this table has a number of varchar columns but I did only
select a couple columns in my select, so was hoping it would avoid the
issue mentioned above with varchar columns, I will create some other test
tables later with only the values I need and see how that works out.



Thanks

On Fri, May 13, 2016 at 10:38 AM Jason Altekruse <ja...@dremio.com> wrote:

> I am curious if this is a bug in the JDBC plugin. Can you try to change the
> output format to CSV? In that case we don't do any large buffering.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich <st...@gmail.com>
> wrote:
>
> > Seems like it just ran out of memory again and was not hanging. I tried
> to
> > append a limit 100 to the select query and it still runs out of memory,
> > Just ran the CTAS against some other smaller tables and it works fine.
> >
> > I will play around with this some more on the weekend, I can only assume
> I
> > am messing something up here, I have in the past created parquet files
> from
> > large tables without any issue, will report back.
> >
> >
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche <
> > adeneche@maprtech.com>
> > wrote:
> >
> > > Stefan,
> > >
> > > Can you share the query profile for the query that seems to be running
> > > forever ? you won't find it on disk but you can append .json to the
> > profile
> > > web url and save the file.
> > >
> > > Thanks
> > >
> > > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich <
> stefan.sedich@gmail.com>
> > > wrote:
> > >
> > > > Zelaine,
> > > >
> > > > It does, I forgot about those ones, I will do a test where I filter
> > those
> > > > out and see how I go, in my test with a 12GB heap size it seemed to
> > just
> > > > sit there forever and not finish.
> > > >
> > > >
> > > > Thanks
> > > >
> > > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com>
> > wrote:
> > > >
> > > > > Stefan,
> > > > >
> > > > > Does your source data contain varchar columns?  We've seen
> instances
> > > > where
> > > > > Drill isn't as efficient as it can be when Parquet is dealing with
> > > > variable
> > > > > length columns.
> > > > >
> > > > > -- Zelaine
> > > > >
> > > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
> > > stefan.sedich@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for getting back to me so fast!
> > > > > >
> > > > > > I was just playing with that now, went up to 8GB and still ran
> into
> > > it,
> > > > > > trying to go higher to see if I can find the sweet spot, only got
> > > 16GB
> > > > > > total RAM on this laptop :)
> > > > > >
> > > > > > Is this an expected amount of memory for not an overly huge table
> > (16
> > > > > > million rows, 6 columns of integers), even now at a 12GB heap
> seems
> > > to
> > > > > have
> > > > > > filled up again.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <
> jason@dremio.com>
> > > > > wrote:
> > > > > >
> > > > > > > I could not find anywhere this is mentioned in the docs, but it
> > has
> > > > > come
> > > > > > up
> > > > > > > a few times one the list. While we made a number of efforts to
> > move
> > > > our
> > > > > > > interactions with the Parquet library to the off-heap memory
> > (which
> > > > we
> > > > > > use
> > > > > > > everywhere else in the engine during processing) the version of
> > the
> > > > > > writer
> > > > > > > we are using still buffers a non-trivial amount of data into
> heap
> > > > > memory
> > > > > > > when writing parquet files. Try raising your JVM heap memory in
> > > > > > > drill-env.sh on startup and see if that prevents the out of
> > memory
> > > > > issue.
> > > > > > >
> > > > > > > Jason Altekruse
> > > > > > > Software Engineer at Dremio
> > > > > > > Apache Drill Committer
> > > > > > >
> > > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > > > > stefan.sedich@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Just trying to do a CTAS on a postgres table, it is not huge
> > and
> > > > only
> > > > > > has
> > > > > > > > 16 odd million rows, I end up with an out of memory after a
> > > while.
> > > > > > > >
> > > > > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > > > > >
> > > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > > > > >
> > > > > > > >
> > > > > > > > Is there a way to avoid this without needing to do the CTAS
> on
> > a
> > > > > subset
> > > > > > > of
> > > > > > > > my table?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>

Re: CTAS Out of Memory

Posted by Jason Altekruse <ja...@dremio.com>.
I am curious if this is a bug in the JDBC plugin. Can you try to change the
output format to CSV? In that case we don't do any large buffering.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, May 13, 2016 at 10:35 AM, Stefan Sedich <st...@gmail.com>
wrote:

> Seems like it just ran out of memory again and was not hanging. I tried to
> append a limit 100 to the select query and it still runs out of memory,
> Just ran the CTAS against some other smaller tables and it works fine.
>
> I will play around with this some more on the weekend, I can only assume I
> am messing something up here, I have in the past created parquet files from
> large tables without any issue, will report back.
>
>
>
> Thanks
>
> On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > Stefan,
> >
> > Can you share the query profile for the query that seems to be running
> > forever ? you won't find it on disk but you can append .json to the
> profile
> > web url and save the file.
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich <st...@gmail.com>
> > wrote:
> >
> > > Zelaine,
> > >
> > > It does, I forgot about those ones, I will do a test where I filter
> those
> > > out and see how I go, in my test with a 12GB heap size it seemed to
> just
> > > sit there forever and not finish.
> > >
> > >
> > > Thanks
> > >
> > > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com>
> wrote:
> > >
> > > > Stefan,
> > > >
> > > > Does your source data contain varchar columns?  We've seen instances
> > > where
> > > > Drill isn't as efficient as it can be when Parquet is dealing with
> > > variable
> > > > length columns.
> > > >
> > > > -- Zelaine
> > > >
> > > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
> > stefan.sedich@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for getting back to me so fast!
> > > > >
> > > > > I was just playing with that now, went up to 8GB and still ran into
> > it,
> > > > > trying to go higher to see if I can find the sweet spot, only got
> > 16GB
> > > > > total RAM on this laptop :)
> > > > >
> > > > > Is this an expected amount of memory for not an overly huge table
> (16
> > > > > million rows, 6 columns of integers), even now at a 12GB heap seems
> > to
> > > > have
> > > > > filled up again.
> > > > >
> > > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <ja...@dremio.com>
> > > > wrote:
> > > > >
> > > > > > I could not find anywhere this is mentioned in the docs, but it
> has
> > > > come
> > > > > up
> > > > > > a few times one the list. While we made a number of efforts to
> move
> > > our
> > > > > > interactions with the Parquet library to the off-heap memory
> (which
> > > we
> > > > > use
> > > > > > everywhere else in the engine during processing) the version of
> the
> > > > > writer
> > > > > > we are using still buffers a non-trivial amount of data into heap
> > > > memory
> > > > > > when writing parquet files. Try raising your JVM heap memory in
> > > > > > drill-env.sh on startup and see if that prevents the out of
> memory
> > > > issue.
> > > > > >
> > > > > > Jason Altekruse
> > > > > > Software Engineer at Dremio
> > > > > > Apache Drill Committer
> > > > > >
> > > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > > > stefan.sedich@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Just trying to do a CTAS on a postgres table, it is not huge
> and
> > > only
> > > > > has
> > > > > > > 16 odd million rows, I end up with an out of memory after a
> > while.
> > > > > > >
> > > > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > > > >
> > > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > > > >
> > > > > > >
> > > > > > > Is there a way to avoid this without needing to do the CTAS on
> a
> > > > subset
> > > > > > of
> > > > > > > my table?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Re: CTAS Out of Memory

Posted by Stefan Sedich <st...@gmail.com>.
Seems like it just ran out of memory again and was not hanging. I tried to
append a limit 100 to the select query and it still runs out of memory,
Just ran the CTAS against some other smaller tables and it works fine.

I will play around with this some more on the weekend, I can only assume I
am messing something up here, I have in the past created parquet files from
large tables without any issue, will report back.



Thanks

On Fri, May 13, 2016 at 10:05 AM Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> Stefan,
>
> Can you share the query profile for the query that seems to be running
> forever ? you won't find it on disk but you can append .json to the profile
> web url and save the file.
>
> Thanks
>
> On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich <st...@gmail.com>
> wrote:
>
> > Zelaine,
> >
> > It does, I forgot about those ones, I will do a test where I filter those
> > out and see how I go, in my test with a 12GB heap size it seemed to just
> > sit there forever and not finish.
> >
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com> wrote:
> >
> > > Stefan,
> > >
> > > Does your source data contain varchar columns?  We've seen instances
> > where
> > > Drill isn't as efficient as it can be when Parquet is dealing with
> > variable
> > > length columns.
> > >
> > > -- Zelaine
> > >
> > > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <
> stefan.sedich@gmail.com>
> > > wrote:
> > >
> > > > Thanks for getting back to me so fast!
> > > >
> > > > I was just playing with that now, went up to 8GB and still ran into
> it,
> > > > trying to go higher to see if I can find the sweet spot, only got
> 16GB
> > > > total RAM on this laptop :)
> > > >
> > > > Is this an expected amount of memory for not an overly huge table (16
> > > > million rows, 6 columns of integers), even now at a 12GB heap seems
> to
> > > have
> > > > filled up again.
> > > >
> > > >
> > > >
> > > > Thanks
> > > >
> > > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <ja...@dremio.com>
> > > wrote:
> > > >
> > > > > I could not find anywhere this is mentioned in the docs, but it has
> > > come
> > > > up
> > > > > a few times one the list. While we made a number of efforts to move
> > our
> > > > > interactions with the Parquet library to the off-heap memory (which
> > we
> > > > use
> > > > > everywhere else in the engine during processing) the version of the
> > > > writer
> > > > > we are using still buffers a non-trivial amount of data into heap
> > > memory
> > > > > when writing parquet files. Try raising your JVM heap memory in
> > > > > drill-env.sh on startup and see if that prevents the out of memory
> > > issue.
> > > > >
> > > > > Jason Altekruse
> > > > > Software Engineer at Dremio
> > > > > Apache Drill Committer
> > > > >
> > > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > > stefan.sedich@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Just trying to do a CTAS on a postgres table, it is not huge and
> > only
> > > > has
> > > > > > 16 odd million rows, I end up with an out of memory after a
> while.
> > > > > >
> > > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > > >
> > > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > > >
> > > > > >
> > > > > > Is there a way to avoid this without needing to do the CTAS on a
> > > subset
> > > > > of
> > > > > > my table?
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: CTAS Out of Memory

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
Stefan,

Can you share the query profile for the query that seems to be running
forever ? you won't find it on disk but you can append .json to the profile
web url and save the file.

Thanks

On Fri, May 13, 2016 at 9:55 AM, Stefan Sedich <st...@gmail.com>
wrote:

> Zelaine,
>
> It does, I forgot about those ones, I will do a test where I filter those
> out and see how I go, in my test with a 12GB heap size it seemed to just
> sit there forever and not finish.
>
>
> Thanks
>
> On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com> wrote:
>
> > Stefan,
> >
> > Does your source data contain varchar columns?  We've seen instances
> where
> > Drill isn't as efficient as it can be when Parquet is dealing with
> variable
> > length columns.
> >
> > -- Zelaine
> >
> > On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <st...@gmail.com>
> > wrote:
> >
> > > Thanks for getting back to me so fast!
> > >
> > > I was just playing with that now, went up to 8GB and still ran into it,
> > > trying to go higher to see if I can find the sweet spot, only got 16GB
> > > total RAM on this laptop :)
> > >
> > > Is this an expected amount of memory for not an overly huge table (16
> > > million rows, 6 columns of integers), even now at a 12GB heap seems to
> > have
> > > filled up again.
> > >
> > >
> > >
> > > Thanks
> > >
> > > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <ja...@dremio.com>
> > wrote:
> > >
> > > > I could not find anywhere this is mentioned in the docs, but it has
> > come
> > > up
> > > > a few times one the list. While we made a number of efforts to move
> our
> > > > interactions with the Parquet library to the off-heap memory (which
> we
> > > use
> > > > everywhere else in the engine during processing) the version of the
> > > writer
> > > > we are using still buffers a non-trivial amount of data into heap
> > memory
> > > > when writing parquet files. Try raising your JVM heap memory in
> > > > drill-env.sh on startup and see if that prevents the out of memory
> > issue.
> > > >
> > > > Jason Altekruse
> > > > Software Engineer at Dremio
> > > > Apache Drill Committer
> > > >
> > > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> > stefan.sedich@gmail.com>
> > > > wrote:
> > > >
> > > > > Just trying to do a CTAS on a postgres table, it is not huge and
> only
> > > has
> > > > > 16 odd million rows, I end up with an out of memory after a while.
> > > > >
> > > > > Unable to handle out of memory condition in FragmentExecutor.
> > > > >
> > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > >
> > > > >
> > > > > Is there a way to avoid this without needing to do the CTAS on a
> > subset
> > > > of
> > > > > my table?
> > > > >
> > > >
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: CTAS Out of Memory

Posted by Stefan Sedich <st...@gmail.com>.
Zelaine,

It does, I forgot about those ones, I will do a test where I filter those
out and see how I go, in my test with a 12GB heap size it seemed to just
sit there forever and not finish.


Thanks

On Fri, May 13, 2016 at 9:50 AM Zelaine Fong <zf...@maprtech.com> wrote:

> Stefan,
>
> Does your source data contain varchar columns?  We've seen instances where
> Drill isn't as efficient as it can be when Parquet is dealing with variable
> length columns.
>
> -- Zelaine
>
> On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <st...@gmail.com>
> wrote:
>
> > Thanks for getting back to me so fast!
> >
> > I was just playing with that now, went up to 8GB and still ran into it,
> > trying to go higher to see if I can find the sweet spot, only got 16GB
> > total RAM on this laptop :)
> >
> > Is this an expected amount of memory for not an overly huge table (16
> > million rows, 6 columns of integers), even now at a 12GB heap seems to
> have
> > filled up again.
> >
> >
> >
> > Thanks
> >
> > On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <ja...@dremio.com>
> wrote:
> >
> > > I could not find anywhere this is mentioned in the docs, but it has
> come
> > up
> > > a few times one the list. While we made a number of efforts to move our
> > > interactions with the Parquet library to the off-heap memory (which we
> > use
> > > everywhere else in the engine during processing) the version of the
> > writer
> > > we are using still buffers a non-trivial amount of data into heap
> memory
> > > when writing parquet files. Try raising your JVM heap memory in
> > > drill-env.sh on startup and see if that prevents the out of memory
> issue.
> > >
> > > Jason Altekruse
> > > Software Engineer at Dremio
> > > Apache Drill Committer
> > >
> > > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <
> stefan.sedich@gmail.com>
> > > wrote:
> > >
> > > > Just trying to do a CTAS on a postgres table, it is not huge and only
> > has
> > > > 16 odd million rows, I end up with an out of memory after a while.
> > > >
> > > > Unable to handle out of memory condition in FragmentExecutor.
> > > >
> > > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > >
> > > >
> > > > Is there a way to avoid this without needing to do the CTAS on a
> subset
> > > of
> > > > my table?
> > > >
> > >
> >
>

Re: CTAS Out of Memory

Posted by Zelaine Fong <zf...@maprtech.com>.
Stefan,

Does your source data contain varchar columns?  We've seen instances where
Drill isn't as efficient as it can be when Parquet is dealing with variable
length columns.

-- Zelaine

On Fri, May 13, 2016 at 9:26 AM, Stefan Sedich <st...@gmail.com>
wrote:

> Thanks for getting back to me so fast!
>
> I was just playing with that now, went up to 8GB and still ran into it,
> trying to go higher to see if I can find the sweet spot, only got 16GB
> total RAM on this laptop :)
>
> Is this an expected amount of memory for not an overly huge table (16
> million rows, 6 columns of integers), even now at a 12GB heap seems to have
> filled up again.
>
>
>
> Thanks
>
> On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <ja...@dremio.com> wrote:
>
> > I could not find anywhere this is mentioned in the docs, but it has come
> up
> > a few times one the list. While we made a number of efforts to move our
> > interactions with the Parquet library to the off-heap memory (which we
> use
> > everywhere else in the engine during processing) the version of the
> writer
> > we are using still buffers a non-trivial amount of data into heap memory
> > when writing parquet files. Try raising your JVM heap memory in
> > drill-env.sh on startup and see if that prevents the out of memory issue.
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <st...@gmail.com>
> > wrote:
> >
> > > Just trying to do a CTAS on a postgres table, it is not huge and only
> has
> > > 16 odd million rows, I end up with an out of memory after a while.
> > >
> > > Unable to handle out of memory condition in FragmentExecutor.
> > >
> > > java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >
> > >
> > > Is there a way to avoid this without needing to do the CTAS on a subset
> > of
> > > my table?
> > >
> >
>

Re: CTAS Out of Memory

Posted by Stefan Sedich <st...@gmail.com>.
Thanks for getting back to me so fast!

I was just playing with that now, went up to 8GB and still ran into it,
trying to go higher to see if I can find the sweet spot, only got 16GB
total RAM on this laptop :)

Is this an expected amount of memory for not an overly huge table (16
million rows, 6 columns of integers), even now at a 12GB heap seems to have
filled up again.



Thanks

On Fri, May 13, 2016 at 9:20 AM Jason Altekruse <ja...@dremio.com> wrote:

> I could not find anywhere this is mentioned in the docs, but it has come up
> a few times one the list. While we made a number of efforts to move our
> interactions with the Parquet library to the off-heap memory (which we use
> everywhere else in the engine during processing) the version of the writer
> we are using still buffers a non-trivial amount of data into heap memory
> when writing parquet files. Try raising your JVM heap memory in
> drill-env.sh on startup and see if that prevents the out of memory issue.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <st...@gmail.com>
> wrote:
>
> > Just trying to do a CTAS on a postgres table, it is not huge and only has
> > 16 odd million rows, I end up with an out of memory after a while.
> >
> > Unable to handle out of memory condition in FragmentExecutor.
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded
> >
> >
> > Is there a way to avoid this without needing to do the CTAS on a subset
> of
> > my table?
> >
>

Re: CTAS Out of Memory

Posted by Jason Altekruse <ja...@dremio.com>.
I could not find anywhere this is mentioned in the docs, but it has come up
a few times one the list. While we made a number of efforts to move our
interactions with the Parquet library to the off-heap memory (which we use
everywhere else in the engine during processing) the version of the writer
we are using still buffers a non-trivial amount of data into heap memory
when writing parquet files. Try raising your JVM heap memory in
drill-env.sh on startup and see if that prevents the out of memory issue.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, May 13, 2016 at 9:07 AM, Stefan Sedich <st...@gmail.com>
wrote:

> Just trying to do a CTAS on a postgres table, it is not huge and only has
> 16 odd million rows, I end up with an out of memory after a while.
>
> Unable to handle out of memory condition in FragmentExecutor.
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>
> Is there a way to avoid this without needing to do the CTAS on a subset of
> my table?
>