You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@groovy.apache.org by Daniel Price <da...@gmail.com> on 2016/11/21 20:52:17 UTC

Groovy Script Memory Management Anti-patterns

Good afternoon, all.  I've a Groovy script with what might be a code caused
memory leak, but I can't find the cause.  Basically, I'm using a script to
take 2 GB chunks of data from a SQL Server DB, manipulate it, then insert
it into a different SQL Server DB.  I've a few scripts that do this using
Groovy SQL, and the others work well, but this one always ends in a Java
heap OOME.  The difference, I think, is that this script is migrating a lot
more data (TBs), so it runs much longer.

I've done heap dump analysis, but I'm not ready to get into the details of
that yet.  Seems the script might be holding onto every invocation of:

List datalist = sourceDB.rows("${Sql.expand getDataQuery}")

even though I've been careful to use static variables as well as dataList =
null...

I'm starting to think I'm missing something fundamental, rather than just a
coding error.

Any advice appreciated...

D

Re: Groovy Script Memory Management Anti-patterns

Posted by Jacopo Cappellato <ja...@gmail.com>.
Hi Daniel,

I think that the problem you are facing is that when your code calls:

List datalist = sourceDB.rows(...)

all the records in the resultset are retrieved by Groovy and added to the
datalist List (implemented by an ArrayList).
If the resultset is large (e.g. 2GB) then the memory allocated by the JVM
will not be enough to hold in memory the whole ArrayList.
A simple solution may be that of replacing the call to rows(...) with a
call to eachRow:

sql.eachRow('select * from ...') { row ->
    // add your data manipulation here and insert to the other db
}

With the above code you will load into memory only one row at a time and
you will be able to process a large resultset.

Regards,

Jacopo

On Mon, Nov 21, 2016 at 9:52 PM, Daniel Price <da...@gmail.com> wrote:

> Good afternoon, all.  I've a Groovy script with what might be a code
> caused memory leak, but I can't find the cause.  Basically, I'm using a
> script to take 2 GB chunks of data from a SQL Server DB, manipulate it,
> then insert it into a different SQL Server DB.  I've a few scripts that do
> this using Groovy SQL, and the others work well, but this one always ends
> in a Java heap OOME.  The difference, I think, is that this script is
> migrating a lot more data (TBs), so it runs much longer.
>
> I've done heap dump analysis, but I'm not ready to get into the details of
> that yet.  Seems the script might be holding onto every invocation of:
>
> List datalist = sourceDB.rows("${Sql.expand getDataQuery}")
>
> even though I've been careful to use static variables as well as dataList
> = null...
>
> I'm starting to think I'm missing something fundamental, rather than just
> a coding error.
>
> Any advice appreciated...
>
> D
>

Re: Groovy Script Memory Management Anti-patterns

Posted by Daniel Price <da...@gmail.com>.
I've 'fixed' the problem, but I don't understand what caused the OOME.


Basically, my script retrieves images from a database, appends those images
to a list, and then batch inserts the list with images into another
database.  I used VisualVM to monitor heap allocation and narrowed the
issue down to the method that appends images to the list.

This method:

def appendImages(List emptyList, Map guidImageMap)
{
    List fullList = []
    emptyList.each{
        String guid = it[0]
def image = guidImageMap.get(guid,null)
List tempList = it.clone()
tempList[-1] = image
fullList.add(tempList.clone())
tempList = []
guidImageMap.remove(guid)
    }
   return fullList
}

Results in a consistent heap use profile.  All heap allocated when images
are retrieved is de-allocated by garbage control after the batch insert.

Use of the following methods will eventually end in OOME, as not all heap
allocated during image retrieval is recovered after batch insert:

def appendImages(List emptyList, Map guidImageMap)
{
    List fullList = []
    emptyList.each{
        String guid = it[0]
def image = guidImageMap.get(guid,null)
List tempList = it
tempList[-1] = image
fullList.add(tempList)
tempList = []
guidImageMap.remove(guid)
    }
   return fullList
}

def appendImages(List emptyList, Map guidImageMap)
{
    emptyList.each{
        String guid = it[0]
def image = guidImageMap.get(guid,null)
it[-1] = image
guidImageMap.remove(guid)
    }
   return emptyList
}

In all cases, after the batch insert method, the list with images is set to
null.  Seems some object references are maintained some how, even though
all methods and my only class (main) use local variables.  Likely something
fundamental I've done wrong, so I'd appreciate any explanations!

Regards,
D



On Mon, Nov 21, 2016 at 5:43 PM, Jochen Theodorou <bl...@gmx.org> wrote:

> there is not really any useful information to make any properly based
> speculations in this post for me. In other words: to really say something I
> will need more detail
>
>
> On 21.11.2016 21:52, Daniel Price wrote:
>
>> Good afternoon, all.  I've a Groovy script with what might be a code
>> caused memory leak, but I can't find the cause.  Basically, I'm using a
>> script to take 2 GB chunks of data from a SQL Server DB, manipulate it,
>> then insert it into a different SQL Server DB.  I've a few scripts that
>> do this using Groovy SQL, and the others work well, but this one always
>> ends in a Java heap OOME.  The difference, I think, is that this script
>> is migrating a lot more data (TBs), so it runs much longer.
>>
>> I've done heap dump analysis, but I'm not ready to get into the details
>> of that yet.  Seems the script might be holding onto every invocation of:
>>
>> List datalist = sourceDB.rows("${Sql.expand getDataQuery}")
>>
>> even though I've been careful to use static variables as well as
>> dataList = null...
>>
>> I'm starting to think I'm missing something fundamental, rather than
>> just a coding error.
>>
>> Any advice appreciated...
>>
>> D
>>
>
>

Re: Groovy Script Memory Management Anti-patterns

Posted by Jochen Theodorou <bl...@gmx.org>.
there is not really any useful information to make any properly based 
speculations in this post for me. In other words: to really say 
something I will need more detail

On 21.11.2016 21:52, Daniel Price wrote:
> Good afternoon, all.  I've a Groovy script with what might be a code
> caused memory leak, but I can't find the cause.  Basically, I'm using a
> script to take 2 GB chunks of data from a SQL Server DB, manipulate it,
> then insert it into a different SQL Server DB.  I've a few scripts that
> do this using Groovy SQL, and the others work well, but this one always
> ends in a Java heap OOME.  The difference, I think, is that this script
> is migrating a lot more data (TBs), so it runs much longer.
>
> I've done heap dump analysis, but I'm not ready to get into the details
> of that yet.  Seems the script might be holding onto every invocation of:
>
> List datalist = sourceDB.rows("${Sql.expand getDataQuery}")
>
> even though I've been careful to use static variables as well as
> dataList = null...
>
> I'm starting to think I'm missing something fundamental, rather than
> just a coding error.
>
> Any advice appreciated...
>
> D