You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by camer314 <ca...@towerswatson.com> on 2019/11/14 07:12:15 UTC

Question about memory when uploading CSV using .NET DataStreamer

I have a large CSV file (50 million rows) that i wish to upload to a cache. I
am using .NET and a DataStreamer from my application which is designated as
a client only node.

What i dont understand is i quickly run out of memory on my C# streaming
(client) application while my data node (an instance of Apache.Ignite.exe)
slowly increases RAM usage but not at the rate as my client app does.

So it would seem that either (A) my client IS actually being used to cache
data or (B) there is a memory leak where data that has been sent to the
cache is not released.

As for figures, Apache.Ignite.exe when first started uses 165Mb. After
loading in 1 million records and letting it all settle down,
Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
sits at 1.5Gb.

The total size of the input file is 5Gb so 1 million records should really
only be about 100Mb so i dont know how my client even gets to 1.5Gb to begin
with. If i comment out the AddData() then my client never gets past 200Mb so
its certainly something happening in the cache.

Is this expected behaviour? If so then i dont know how to import huge CSV
files without memory issues on the streaming machine.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by Mikael <mi...@telia.com>.
Hi!

If each row is stored as an entry in the cache you can expect an 
overhead of around 200 byte per entry, so 200MB just for the actual 
entries (1M) not counting your data (more if you have any index).

You can control the streamer, how much data and when it should be 
flushed, I have no idea how this work on the .NET client though, so 
maybe something there, you could try and manually call flush on the 
streamer at intervals (this is not needed, but just to see if it makes 
any difference), I use a lot of streamers (from java) and have never had 
any problems with it so maybe it is something on the .NET side.

Mikael

Den 2019-11-14 kl. 12:14, skrev Pavel Tupitsyn:
> Sounds nasty, can you share a reproducer please?
>
> On Thu, Nov 14, 2019 at 10:12 AM camer314 
> <cameron.murray@towerswatson.com 
> <ma...@towerswatson.com>> wrote:
>
>     I have a large CSV file (50 million rows) that i wish to upload to
>     a cache. I
>     am using .NET and a DataStreamer from my application which is
>     designated as
>     a client only node.
>
>     What i dont understand is i quickly run out of memory on my C#
>     streaming
>     (client) application while my data node (an instance of
>     Apache.Ignite.exe)
>     slowly increases RAM usage but not at the rate as my client app does.
>
>     So it would seem that either (A) my client IS actually being used
>     to cache
>     data or (B) there is a memory leak where data that has been sent
>     to the
>     cache is not released.
>
>     As for figures, Apache.Ignite.exe when first started uses 165Mb. After
>     loading in 1 million records and letting it all settle down,
>     Apache.Ignite.exe now sits at 450Mb while my client app (the one
>     streaming)
>     sits at 1.5Gb.
>
>     The total size of the input file is 5Gb so 1 million records
>     should really
>     only be about 100Mb so i dont know how my client even gets to
>     1.5Gb to begin
>     with. If i comment out the AddData() then my client never gets
>     past 200Mb so
>     its certainly something happening in the cache.
>
>     Is this expected behaviour? If so then i dont know how to import
>     huge CSV
>     files without memory issues on the streaming machine.
>
>
>
>
>
>     --
>     Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by Pavel Tupitsyn <pt...@apache.org>.
Here is what I tried:
https://gist.github.com/ptupitsyn/7dacefd1cebb936d5f516d8afeba7efe

Ran for a minute or so, 200Mb used on client, 5Gb on server, seems to work
as expected to me.

On Thu, Nov 14, 2019 at 2:14 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> Sounds nasty, can you share a reproducer please?
>
> On Thu, Nov 14, 2019 at 10:12 AM camer314 <ca...@towerswatson.com>
> wrote:
>
>> I have a large CSV file (50 million rows) that i wish to upload to a
>> cache. I
>> am using .NET and a DataStreamer from my application which is designated
>> as
>> a client only node.
>>
>> What i dont understand is i quickly run out of memory on my C# streaming
>> (client) application while my data node (an instance of Apache.Ignite.exe)
>> slowly increases RAM usage but not at the rate as my client app does.
>>
>> So it would seem that either (A) my client IS actually being used to cache
>> data or (B) there is a memory leak where data that has been sent to the
>> cache is not released.
>>
>> As for figures, Apache.Ignite.exe when first started uses 165Mb. After
>> loading in 1 million records and letting it all settle down,
>> Apache.Ignite.exe now sits at 450Mb while my client app (the one
>> streaming)
>> sits at 1.5Gb.
>>
>> The total size of the input file is 5Gb so 1 million records should really
>> only be about 100Mb so i dont know how my client even gets to 1.5Gb to
>> begin
>> with. If i comment out the AddData() then my client never gets past 200Mb
>> so
>> its certainly something happening in the cache.
>>
>> Is this expected behaviour? If so then i dont know how to import huge CSV
>> files without memory issues on the streaming machine.
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by Pavel Tupitsyn <pt...@apache.org>.
Sounds nasty, can you share a reproducer please?

On Thu, Nov 14, 2019 at 10:12 AM camer314 <ca...@towerswatson.com>
wrote:

> I have a large CSV file (50 million rows) that i wish to upload to a
> cache. I
> am using .NET and a DataStreamer from my application which is designated as
> a client only node.
>
> What i dont understand is i quickly run out of memory on my C# streaming
> (client) application while my data node (an instance of Apache.Ignite.exe)
> slowly increases RAM usage but not at the rate as my client app does.
>
> So it would seem that either (A) my client IS actually being used to cache
> data or (B) there is a memory leak where data that has been sent to the
> cache is not released.
>
> As for figures, Apache.Ignite.exe when first started uses 165Mb. After
> loading in 1 million records and letting it all settle down,
> Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
> sits at 1.5Gb.
>
> The total size of the input file is 5Gb so 1 million records should really
> only be about 100Mb so i dont know how my client even gets to 1.5Gb to
> begin
> with. If i comment out the AddData() then my client never gets past 200Mb
> so
> its certainly something happening in the cache.
>
> Is this expected behaviour? If so then i dont know how to import huge CSV
> files without memory issues on the streaming machine.
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by Pavel Tupitsyn <pt...@apache.org>.
I would not recommend doing so, because it may affect Ignite performance,
but you can tweak JVM to use less memory and return it to OS more
frequently like this:

var cfg = new IgniteConfiguration
{
    ClientMode = true,
    JvmOptions = new[]{"-XX:MaxHeapFreeRatio=30", "-XX:MinHeapFreeRatio=10"},
    JvmInitialMemoryMb = 100,
    JvmMaxMemoryMb = 900
};




On Mon, Nov 18, 2019 at 3:42 AM camer314 <ca...@towerswatson.com>
wrote:

> Ok yes i see. Seems like with my code changes I made to provide the example
> that the memory consumption is way more inline with expectations, so I
> guess
> it was a code error on my part.
>
> However, it seems strange that my client node, which has no cache, still
> wants to hang onto over 1Gb of heap space even though its using less than
> 100Mb. Is there no way to release that back?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by camer314 <ca...@towerswatson.com>.
Ok yes i see. Seems like with my code changes I made to provide the example
that the memory consumption is way more inline with expectations, so I guess
it was a code error on my part.

However, it seems strange that my client node, which has no cache, still
wants to hang onto over 1Gb of heap space even though its using less than
100Mb. Is there no way to release that back?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by Pavel Tupitsyn <pt...@apache.org>.
I've ran your code under .NET and Java memory profilers.
In short, everything is working fine - nothing to worry about.


*DotMemory*:
[image: image.png]

.NET managed memory usage is under 1Mb, unmanaged memory is much higher -
that is what Java part allocates.

*jvisualvm*:
[image: image.png]
(I've clicked Perform GC - this corresponds to the last used heap drop)

As we can see, streamer usage have caused some heap allocations, but in the
end it settled down to 17Mb
To put it simply, JVM reserves more memory from OS than it actually uses,
so Task Manager reports high memory usage to you.




On Fri, Nov 15, 2019 at 4:45 AM camer314 <ca...@towerswatson.com>
wrote:

> In my sample code i had a bit of a bug, this should be the line to add:
>
> var _ = ldr.AddData(id++,data);
>
> However it doesnt appear to make any difference, this is the state of
> memory
> (with ignite.exe being my client executable). This is paused after
> insertion
> of 1 million rows, why is my client memory usage still so high?
>
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled.png>
>
> If i comment out the AddData then i get:
>
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled2.png>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by camer314 <ca...@towerswatson.com>.
In my sample code i had a bit of a bug, this should be the line to add:

var _ = ldr.AddData(id++,data);

However it doesnt appear to make any difference, this is the state of memory
(with ignite.exe being my client executable). This is paused after insertion
of 1 million rows, why is my client memory usage still so high?

<http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled.png> 

If i comment out the AddData then i get:

<http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled2.png> 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by camer314 <ca...@towerswatson.com>.
Here is my source file and a 1 million row CSV file.

I am not sure whats different between my code and yours but my version
quickly consumes memory on the client side for some reason.

Caveat, I am normally a Python programmer so i might have missed something
obvious...

https://wtwdeeplearning.blob.core.windows.net/ignite/Program.zip?st=2019-11-15T00%3A58%3A20Z&se=2019-11-25T00%3A58%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=IkMuGbNJ4YAp5Ko%2BmcqC5PkbSLeuUfQLegMXpj3WNQ0%3D





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by Pavel Tupitsyn <pt...@apache.org>.
>  Since we're in 2019, we don't recommend running any Ignite nodes with
-Xmx2G (that is, 2 gigabytes of heap allowance
Does 2019 somehow allow us to consume 2Gb for nothing?
I don't think a client node needs that much.

Let's see a reproducer.
My testing shows that streaming works out of the box on client node, no
custom JVM tuning or anything else required.

On Thu, Nov 14, 2019 at 4:12 PM Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> Since we're in 2019, we don't recommend running any Ignite nodes with
> -Xmx2G (that is, 2 gigabytes of heap allowance).
>
> It is certainly possible to run Ignite with less heap, but the reasoning
> of such is not very clear.
>
> Please also note that our JDBC thin driver supports streaming, and it
> should be usable from .Net in some way. In this case, memory overhead is
> supposed to be small.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 14 нояб. 2019 г. в 10:12, camer314 <ca...@towerswatson.com>:
>
>> I have a large CSV file (50 million rows) that i wish to upload to a
>> cache. I
>> am using .NET and a DataStreamer from my application which is designated
>> as
>> a client only node.
>>
>> What i dont understand is i quickly run out of memory on my C# streaming
>> (client) application while my data node (an instance of Apache.Ignite.exe)
>> slowly increases RAM usage but not at the rate as my client app does.
>>
>> So it would seem that either (A) my client IS actually being used to cache
>> data or (B) there is a memory leak where data that has been sent to the
>> cache is not released.
>>
>> As for figures, Apache.Ignite.exe when first started uses 165Mb. After
>> loading in 1 million records and letting it all settle down,
>> Apache.Ignite.exe now sits at 450Mb while my client app (the one
>> streaming)
>> sits at 1.5Gb.
>>
>> The total size of the input file is 5Gb so 1 million records should really
>> only be about 100Mb so i dont know how my client even gets to 1.5Gb to
>> begin
>> with. If i comment out the AddData() then my client never gets past 200Mb
>> so
>> its certainly something happening in the cache.
>>
>> Is this expected behaviour? If so then i dont know how to import huge CSV
>> files without memory issues on the streaming machine.
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: Question about memory when uploading CSV using .NET DataStreamer

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Since we're in 2019, we don't recommend running any Ignite nodes with
-Xmx2G (that is, 2 gigabytes of heap allowance).

It is certainly possible to run Ignite with less heap, but the reasoning of
such is not very clear.

Please also note that our JDBC thin driver supports streaming, and it
should be usable from .Net in some way. In this case, memory overhead is
supposed to be small.

Regards,
-- 
Ilya Kasnacheev


чт, 14 нояб. 2019 г. в 10:12, camer314 <ca...@towerswatson.com>:

> I have a large CSV file (50 million rows) that i wish to upload to a
> cache. I
> am using .NET and a DataStreamer from my application which is designated as
> a client only node.
>
> What i dont understand is i quickly run out of memory on my C# streaming
> (client) application while my data node (an instance of Apache.Ignite.exe)
> slowly increases RAM usage but not at the rate as my client app does.
>
> So it would seem that either (A) my client IS actually being used to cache
> data or (B) there is a memory leak where data that has been sent to the
> cache is not released.
>
> As for figures, Apache.Ignite.exe when first started uses 165Mb. After
> loading in 1 million records and letting it all settle down,
> Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
> sits at 1.5Gb.
>
> The total size of the input file is 5Gb so 1 million records should really
> only be about 100Mb so i dont know how my client even gets to 1.5Gb to
> begin
> with. If i comment out the AddData() then my client never gets past 200Mb
> so
> its certainly something happening in the cache.
>
> Is this expected behaviour? If so then i dont know how to import huge CSV
> files without memory issues on the streaming machine.
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>