You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Hengesbach, Martin" <Ma...@fiz-Karlsruhe.de> on 2020/05/11 09:20:13 UTC

Possible memory leak in PutHive3Streaming?

Hi,

I'm using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

As Record Reader I'm using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

The Hive Table is created as follow:
create table yyyy(id string, data string) clustered by (id) into 50 buckets stored as orc tblproperties("transactional"="true")

Any ideas if I make something wrong or is there really a memory leak in the processor?

Thanks in advance and best regards

Martin



2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection
2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxxxx:9083
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 1
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection-heartbeat
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxx:9083
2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 2
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: { metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy, partitioned-table: false, dynamic-partitioning: false, username: nifi, secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] }
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44]
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance: 484197097
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to 2147483647
2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 129.54GB Max: 130.00GB
2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Opened new transaction batch TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy };  TxnStatus[O] LastUsed txnid:0
2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.PhysicalFsWriter ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.WriterImpl ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions: default.yyyy
2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions: default.yyyy
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 1
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 0
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size: 0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }

At the beginning the memory starts with Used: 2.63GB Max: 130.00GB


Re: Possible memory leak in PutHive3Streaming?

Posted by Mark Payne <ma...@hotmail.com>.
Thanks Martin,

This is not particularly surprising. What you’re seeing here is that you’ve got 70+ GB of heap used, but no full garbage collections. Java does not free up memory until it needs to do so because it’s more efficient this way. If you were to send that same dataset through 10 more times you’d likely see the heap staying steady. If you get to the point where the heap fills up and remains full even after full garbage collections occur, that’s when you need to be concerned.

That being said, a 130 GB heap is enormous for NiFi. Typically I wouldn’t recommend running higher than 8-16 GB unless you’ve got a particularly specific use case. Is there a reason that you’ve allocated such a large heap?

Thanks
-Mark


On May 12, 2020, at 9:03 AM, Hengesbach, Martin <Ma...@fiz-Karlsruhe.de>> wrote:

Hi Mark and Matt,

Thank you for your answers. I have checked the versions of the Hive client libraries. The output is as expected version 3.1.2:

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar
work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar

I have done a new test: 12,000 XML documents with a total of 55 GB in size. I have restarted NiFi before the test.

At the beginning I have:
<image005.png>
2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 659.08MB/130.00GB]

At the end of the test:
<image006.png>

2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 72.18GB/130.00GB]

The increment is round about the same as my documents are in size. It seems somewhere is memory allocated in size of the document but never freed.
Matt the HIVE-20979 has a comment from February that the bug is not fixed in 3.1.2 But I’m not sure whether this bug can lead to such a large memory leak.

Any ideas?


Best regards,
Martin


-----Ursprüngliche Nachricht-----
Von: Matt Burgess [mailto:mattyb149@apache.org]
Gesendet: Montag, 11. Mai 2020 23:00
An: users@nifi.apache.org<ma...@nifi.apache.org>
Betreff: Re: Possible memory leak in PutHive3Streaming?

Martin,

There were two memory leaks a while back, one in NiFi code and one in Hive client code (brought in as a dependency for the Hive 3 components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means those memory leaks should be gone. Can you verify the versions of Hive 3 in your NiFi installation with something like:

find work -name "hive*.jar" | grep hive3

That should show you the Hive JARs in the Hive 3 NAR, and the version is part of the filename of each.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-5841
[2] https://issues.apache.org/jira/browse/HIVE-20979
[3] https://issues.apache.org/jira/browse/NIFI-7239

On Mon, May 11, 2020 at 4:42 PM Mark Payne <ma...@hotmail.com>> wrote:
>
> Martin,
>
> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).
>
> Thanks
> -Mark
>
>
> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <Ma...@fiz-Karlsruhe.de>> wrote:
>
> Hi,
>
> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.
>
> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.
>
> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.
>
> The Hive Table is created as follow:
> create table yyyy(id string, data string) clustered by (id) into 50
> buckets stored as orc tblproperties("transactional"="true")
>
> Any ideas if I make something wrong or is there really a memory leak in the processor?
>
> Thanks in advance and best regards
>
> Martin
>
>
>
> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Creating metastore client for
> streaming-connection
> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to
> metastore with URI thrift://xxxxxx:9083
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to
> metastore, current connections: 1
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi
> (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Creating metastore client for
> streaming-connection-heartbeat
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to
> metastore with URI thrift://xxxx:9083
> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to
> metastore, current connections: 2
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]
> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi
> (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {
> metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,
> partitioned-table: false, dynamic-partitioning: false, username: nifi,
> secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi
> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
> 152[Timer-Driven Process Thread-44] }
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with
> interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi
> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
> 152[Timer-Driven Process Thread-44]
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Created new filesystem
> instance: 484197097
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -
> autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
> o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage
> threshold to 2147483647
> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen
> memory is already low. Increase memory to improve performance. Used:
> 129.54GB Max: 130.00GB
> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Opened new transaction batch
> TxnId/WriteIds=[292449/7336...292449/7336] on connection = {
> metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy
> };  TxnStatus[O] LastUsed txnid:0
> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]
> org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:
> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:
> 32768
> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]
> org.apache.orc.impl.WriterImpl ORC writer created for path:
> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:
> 32768
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats before flush:
> [record-updaters: 50, partitions: 1, buffered-records: 1
> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Flushing record updater for
> partitions: default.yyyy
> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats after flush:
> [record-updaters: 50, partitions: 1, buffered-records: 0
> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats before close:
> [record-updaters: 50, partitions: 1, buffered-records: 0
> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Closing updater for
> partitions: default.yyyy
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.hive.streaming.AbstractRecordWriter Stats after close:
> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:
> 1 buffered-ingest-size: 0B, total-ingest-size: 0B
> tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to
> metastore, current connections: 1
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to
> metastore, current connections: 0
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
> o.a.h.streaming.HiveStreamingConnection Closed streaming connection.
> Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]
> thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,
> records-size: 0, committed-transactions: 1, aborted-transactions: 0,
> auto-flushes: 0, metastore-calls: 7 }
>
> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB
>
> ________________________________
>
> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
> Geschäftsführerin: Sabine Brünger-Weilandt.
> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.
>
> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
>
>
________________________________

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".


AW: Possible memory leak in PutHive3Streaming?

Posted by "Hengesbach, Martin" <Ma...@fiz-Karlsruhe.de>.
Hi Matt and Mark,

@Matt: thank you, I think this will be the solution.

@Mark: The memory isn’t freed from the garbage collector. My NiFi has stopped 2 hours ago and isn’t reachable anymore. The last error message is:

2020-05-13 04:28:34,963 WARN [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 128.79GB Max: 130.00GB

Conclusion: only a fixed Hive client will solve the problem.

Thank you both
Martin

Von: Matt Burgess [mailto:mattyb149@apache.org]
Gesendet: Dienstag, 12. Mai 2020 19:24
An: users@nifi.apache.org
Betreff: Re: Possible memory leak in PutHive3Streaming?

Martin,

I confirmed the comment you found in the Jira that the fix is indeed not in Apache Hive 3.1.1 or 3.1.2, which is very unfortunate (especially given the incorrect Fix Version). Wearing my "vendor hat" I know there is a solution as I work heavily on NiFi / Hive integration. Wearing my "Apache hat" I imagine we'd need a 3.1.3 release of Hive with the fix actually in it, and an update to NiFi to use that version. I have reached out to the Hive community asking if there will be a 3.1.3 release, and if so if it can include this fix. If it does, I will write up the NiFi Jira and upgrade the Hive 3 library for the subsequent release.

Regards,
Matt


On Tue, May 12, 2020 at 9:04 AM Hengesbach, Martin <Ma...@fiz-karlsruhe.de>> wrote:

Hi Mark and Matt,



Thank you for your answers. I have checked the versions of the Hive client libraries. The output is as expected version 3.1.2:



work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar


I have done a new test: 12,000 XML documents with a total of 55 GB in size. I have restarted NiFi before the test.

At the beginning I have:
[cid:image002.png@01D628F3.F4C28100]
2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 659.08MB/130.00GB]

At the end of the test:
[cid:image004.png@01D628F3.F4C28100]

2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 72.18GB/130.00GB]

The increment is round about the same as my documents are in size. It seems somewhere is memory allocated in size of the document but never freed.
Matt the HIVE-20979 has a comment from February that the bug is not fixed in 3.1.2 But I’m not sure whether this bug can lead to such a large memory leak.



Any ideas?





Best regards,

Martin





-----Ursprüngliche Nachricht-----
Von: Matt Burgess [mailto:mattyb149@apache.org<ma...@apache.org>]
Gesendet: Montag, 11. Mai 2020 23:00
An: users@nifi.apache.org<ma...@nifi.apache.org>
Betreff: Re: Possible memory leak in PutHive3Streaming?



Martin,



There were two memory leaks a while back, one in NiFi code and one in Hive client code (brought in as a dependency for the Hive 3 components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means those memory leaks should be gone. Can you verify the versions of Hive 3 in your NiFi installation with something like:



find work -name "hive*.jar" | grep hive3



That should show you the Hive JARs in the Hive 3 NAR, and the version is part of the filename of each.



Regards,

Matt



[1] https://issues.apache.org/jira/browse/NIFI-5841

[2] https://issues.apache.org/jira/browse/HIVE-20979

[3] https://issues.apache.org/jira/browse/NIFI-7239



On Mon, May 11, 2020 at 4:42 PM Mark Payne <ma...@hotmail.com>> wrote:

>

> Martin,

>

> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).

>

> Thanks

> -Mark

>

>

> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <Ma...@fiz-Karlsruhe.de>> wrote:

>

> Hi,

>

> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

>

> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

>

> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

>

> The Hive Table is created as follow:

> create table yyyy(id string, data string) clustered by (id) into 50

> buckets stored as orc tblproperties("transactional"="true")

>

> Any ideas if I make something wrong or is there really a memory leak in the processor?

>

> Thanks in advance and best regards

>

> Martin

>

>

>

> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection

> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxxxx:9083

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection-heartbeat

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxx:9083

> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 2

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {

> metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,

> partitioned-table: false, dynamic-partitioning: false, username: nifi,

> secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44] }

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with

> interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44]

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Created new filesystem

> instance: 484197097

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -

> autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage

> threshold to 2147483647

> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen

> memory is already low. Increase memory to improve performance. Used:

> 129.54GB Max: 130.00GB

> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Opened new transaction batch

> TxnId/WriteIds=[292449/7336...292449/7336] on connection = {

> metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy

> };  TxnStatus[O] LastUsed txnid:0

> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.WriterImpl ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before flush:

> [record-updaters: 50, partitions: 1, buffered-records: 1

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Flushing record updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after flush:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before close:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Closing updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after close:

> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:

> 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 0

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Closed streaming connection.

> Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]

> thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,

> records-size: 0, committed-transactions: 1, aborted-transactions: 0,

> auto-flushes: 0, metastore-calls: 7 }

>

> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

>

> ________________________________

>

> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.

> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.

> Geschäftsführerin: Sabine Brünger-Weilandt.

> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

>

> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

>

>

________________________________

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.
FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
------------------------------------------------------------------------------

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig’in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

Re: Possible memory leak in PutHive3Streaming?

Posted by Matt Burgess <ma...@apache.org>.
Martin,

I confirmed the comment you found in the Jira that the fix is indeed not in
Apache Hive 3.1.1 or 3.1.2, which is very unfortunate (especially given the
incorrect Fix Version). Wearing my "vendor hat" I know there is a solution
as I work heavily on NiFi / Hive integration. Wearing my "Apache hat" I
imagine we'd need a 3.1.3 release of Hive with the fix actually in it, and
an update to NiFi to use that version. I have reached out to the Hive
community asking if there will be a 3.1.3 release, and if so if it can
include this fix. If it does, I will write up the NiFi Jira and upgrade the
Hive 3 library for the subsequent release.

Regards,
Matt


On Tue, May 12, 2020 at 9:04 AM Hengesbach, Martin <
Martin.Hengesbach@fiz-karlsruhe.de> wrote:

> Hi Mark and Matt,
>
>
>
> Thank you for your answers. I have checked the versions of the Hive client
> libraries. The output is as expected version 3.1.2:
>
>
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar
>
>
> work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar
>
>
>
> I have done a new test: 12,000 XML documents with a total of 55 GB in
> size. I have restarted NiFi before the test.
>
>
>
> At the beginning I have:
>
> 2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232]
> o.a.hive.streaming.AbstractRecordWriter Stats before flush:
> [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1
> buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage:
> used/max => 659.08MB/130.00GB]
>
>
>
> At the end of the test:
>
>
>
> 2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257]
> o.a.hive.streaming.AbstractRecordWriter Stats after close:
> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1
> buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage:
> used/max => 72.18GB/130.00GB]
>
>
>
> The increment is round about the same as my documents are in size. It
> seems somewhere is memory allocated in size of the document but never freed.
>
> Matt the HIVE-20979 has a comment from February that the bug is not fixed
> in 3.1.2 But I’m not sure whether this bug can lead to such a large memory
> leak.
>
>
>
> Any ideas?
>
>
>
>
>
> Best regards,
>
> Martin
>
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Matt Burgess [mailto:mattyb149@apache.org]
> Gesendet: Montag, 11. Mai 2020 23:00
> An: users@nifi.apache.org
> Betreff: Re: Possible memory leak in PutHive3Streaming?
>
>
>
> Martin,
>
>
>
> There were two memory leaks a while back, one in NiFi code and one in Hive
> client code (brought in as a dependency for the Hive 3 components). NiFi
> has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their
> side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using
> Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via
> NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means
> those memory leaks should be gone. Can you verify the versions of Hive 3 in
> your NiFi installation with something like:
>
>
>
> find work -name "hive*.jar" | grep hive3
>
>
>
> That should show you the Hive JARs in the Hive 3 NAR, and the version is
> part of the filename of each.
>
>
>
> Regards,
>
> Matt
>
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-5841
>
> [2] https://issues.apache.org/jira/browse/HIVE-20979
>
> [3] https://issues.apache.org/jira/browse/NIFI-7239
>
>
>
> On Mon, May 11, 2020 at 4:42 PM Mark Payne <ma...@hotmail.com> wrote:
>
> >
>
> > Martin,
>
> >
>
> > I don’t know a lot about Hive. The log message you’re noting there is
> definitely coming from Hive’s client, but I can’t say whether or not it’s
> anything that should actually be concerning in terms of Java memory/heap.
> Is the NiFi heap size actually set to a max of 130 GB? (That would be a
> really massive heap for NiFi). Are you seeing any full garbage collections
> occurring? This information would be available in the Cluster table if
> you’re running clustered, or else in the Summary table (hamburger menu in
> the top-right -> Summary -> System Diagnostics in the bottom-right).
>
> >
>
> > Thanks
>
> > -Mark
>
> >
>
> >
>
> > On May 11, 2020, at 5:20 AM, Hengesbach, Martin <
> Martin.Hengesbach@fiz-Karlsruhe.de> wrote:
>
> >
>
> > Hi,
>
> >
>
> > I’m using the PutHive3Streaming processor to write round about 30,000
> avro records into a Hive table (round about 150 GB). This works for a while
> but the used memory is increasing over the time. And after some thousands
> records, the memory is full.
>
> >
>
> > When I remove the PutHive3Streaming processor from my flow, everything
> works OK. So, the problem must be the PutHive3Streaming processor.
>
> >
>
> > As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4,
> Hive is version 3.1.0.
>
> >
>
> > The Hive Table is created as follow:
>
> > create table yyyy(id string, data string) clustered by (id) into 50
>
> > buckets stored as orc tblproperties("transactional"="true")
>
> >
>
> > Any ideas if I make something wrong or is there really a memory leak in
> the processor?
>
> >
>
> > Thanks in advance and best regards
>
> >
>
> > Martin
>
> >
>
> >
>
> >
>
> > 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.streaming.HiveStreamingConnection Creating metastore client for
>
> > streaming-connection
>
> > 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to
>
> > metastore with URI thrift://xxxxxx:9083
>
> > 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to
>
> > metastore, current connections: 1
>
> > 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
>
> > 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
>
> > org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi
>
> > (auth:SIMPLE) retries=24 delay=5 lifetime=0
>
> > 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.streaming.HiveStreamingConnection Creating metastore client for
>
> > streaming-connection-heartbeat
>
> > 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to
>
> > metastore with URI thrift://xxxx:9083
>
> > 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to
>
> > metastore, current connections: 2
>
> > 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]
> o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
>
> > 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
>
> > org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi
>
> > (auth:SIMPLE) retries=24 delay=5 lifetime=0
>
> > 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {
>
> > metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,
>
> > partitioned-table: false, dynamic-partitioning: false, username: nifi,
>
> > secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi
>
> > PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
>
> > 152[Timer-Driven Process Thread-44] }
>
> > 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with
>
> > interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi
>
> > PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
>
> > 152[Timer-Driven Process Thread-44]
>
> > 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Created new filesystem
>
> > instance: 484197097
>
> > 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -
>
> > autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
>
> > 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage
>
> > threshold to 2147483647
>
> > 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen
>
> > memory is already low. Increase memory to improve performance. Used:
>
> > 129.54GB Max: 130.00GB
>
> > 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.streaming.HiveStreamingConnection Opened new transaction batch
>
> > TxnId/WriteIds=[292449/7336...292449/7336] on connection = {
>
> > metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy
>
> > };  TxnStatus[O] LastUsed txnid:0
>
> > 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]
>
> > org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:
>
> > hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
>
> > stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:
>
> > 32768
>
> > 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]
>
> > org.apache.orc.impl.WriterImpl ORC writer created for path:
>
> > hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
>
> > stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:
>
> > 32768
>
> > 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Stats before flush:
>
> > [record-updaters: 50, partitions: 1, buffered-records: 1
>
> > total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
>
> > tenured-memory-usage: used/max => 129.54GB/130.00GB]
>
> > 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Flushing record updater for
>
> > partitions: default.yyyy
>
> > 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Stats after flush:
>
> > [record-updaters: 50, partitions: 1, buffered-records: 0
>
> > total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
>
> > tenured-memory-usage: used/max => 129.54GB/130.00GB]
>
> > 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Stats before close:
>
> > [record-updaters: 50, partitions: 1, buffered-records: 0
>
> > total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B
>
> > tenured-memory-usage: used/max => 129.54GB/130.00GB]
>
> > 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Closing updater for
>
> > partitions: default.yyyy
>
> > 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
>
> > o.a.hive.streaming.AbstractRecordWriter Stats after close:
>
> > [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:
>
> > 1 buffered-ingest-size: 0B, total-ingest-size: 0B
>
> > tenured-memory-usage: used/max => 129.54GB/130.00GB]
>
> > 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to
>
> > metastore, current connections: 1
>
> > 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to
>
> > metastore, current connections: 0
>
> > 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
>
> > o.a.h.streaming.HiveStreamingConnection Closed streaming connection.
>
> > Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]
>
> > thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,
>
> > records-size: 0, committed-transactions: 1, aborted-transactions: 0,
>
> > auto-flushes: 0, metastore-calls: 7 }
>
> >
>
> > At the beginning the memory starts with Used: 2.63GB Max: 130.00GB
>
> >
>
> > ________________________________
>
> >
>
> > FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
>
> > Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim
> HRB 101892.
>
> > Geschäftsführerin: Sabine Brünger-Weilandt.
>
> > Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.
>
> >
>
> > FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
>
> >
>
> >
> ------------------------------
>
> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB
> 101892.
> Geschäftsführerin: Sabine Brünger-Weilandt.
> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.
> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
>

AW: Possible memory leak in PutHive3Streaming?

Posted by "Hengesbach, Martin" <Ma...@fiz-Karlsruhe.de>.
Hi Mark and Matt,



Thank you for your answers. I have checked the versions of the Hive client libraries. The output is as expected version 3.1.2:



work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-hcatalog-core-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-upgrade-acid-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-exec-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-streaming-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-serde-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-vector-code-gen-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-tez-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-classification-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-0.23-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-client-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-standalone-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-rpc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-scheduler-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-jdbc-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-shims-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2-tests.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-server-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-metastore-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-llap-common-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-cli-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-service-3.1.2.jar

work/nar/extensions/nifi-hive3-nar-1.11.4.nar-unpacked/NAR-INF/bundled-dependencies/hive-storage-api-2.7.0.jar


I have done a new test: 12,000 XML documents with a total of 55 GB in size. I have restarted NiFi before the test.

At the beginning I have:
[cid:image005.png@01D6286E.86C99600]
2020-05-12 08:13:07,426 INFO [Timer-Driven Process Thread-232] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 659.08MB/130.00GB]

At the end of the test:
[cid:image006.png@01D6286E.86C99600]

2020-05-12 10:28:12,911 INFO [Timer-Driven Process Thread-257] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 72.18GB/130.00GB]

The increment is round about the same as my documents are in size. It seems somewhere is memory allocated in size of the document but never freed.
Matt the HIVE-20979 has a comment from February that the bug is not fixed in 3.1.2 But I’m not sure whether this bug can lead to such a large memory leak.



Any ideas?





Best regards,

Martin





-----Ursprüngliche Nachricht-----
Von: Matt Burgess [mailto:mattyb149@apache.org]
Gesendet: Montag, 11. Mai 2020 23:00
An: users@nifi.apache.org
Betreff: Re: Possible memory leak in PutHive3Streaming?



Martin,



There were two memory leaks a while back, one in NiFi code and one in Hive client code (brought in as a dependency for the Hive 3 components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1]) and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our Hive 3 dependencies to 3.1.2, which means those memory leaks should be gone. Can you verify the versions of Hive 3 in your NiFi installation with something like:



find work -name "hive*.jar" | grep hive3



That should show you the Hive JARs in the Hive 3 NAR, and the version is part of the filename of each.



Regards,

Matt



[1] https://issues.apache.org/jira/browse/NIFI-5841

[2] https://issues.apache.org/jira/browse/HIVE-20979

[3] https://issues.apache.org/jira/browse/NIFI-7239



On Mon, May 11, 2020 at 4:42 PM Mark Payne <ma...@hotmail.com>> wrote:

>

> Martin,

>

> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).

>

> Thanks

> -Mark

>

>

> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <Ma...@fiz-Karlsruhe.de>> wrote:

>

> Hi,

>

> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

>

> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

>

> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

>

> The Hive Table is created as follow:

> create table yyyy(id string, data string) clustered by (id) into 50

> buckets stored as orc tblproperties("transactional"="true")

>

> Any ideas if I make something wrong or is there really a memory leak in the processor?

>

> Thanks in advance and best regards

>

> Martin

>

>

>

> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection

> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxxxx:9083

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Creating metastore client for

> streaming-connection-heartbeat

> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to

> metastore with URI thrift://xxxx:9083

> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to

> metastore, current connections: 2

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.

> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]

> o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class

> org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi

> (auth:SIMPLE) retries=24 delay=5 lifetime=0

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {

> metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,

> partitioned-table: false, dynamic-partitioning: false, username: nifi,

> secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44] }

> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with

> interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi

> PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread

> 152[Timer-Driven Process Thread-44]

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Created new filesystem

> instance: 484197097

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -

> autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0

> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]

> o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage

> threshold to 2147483647

> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen

> memory is already low. Increase memory to improve performance. Used:

> 129.54GB Max: 130.00GB

> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Opened new transaction batch

> TxnId/WriteIds=[292449/7336...292449/7336] on connection = {

> metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy

> };  TxnStatus[O] LastUsed txnid:0

> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]

> org.apache.orc.impl.WriterImpl ORC writer created for path:

> hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with

> stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize:

> 32768

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before flush:

> [record-updaters: 50, partitions: 1, buffered-records: 1

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Flushing record updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after flush:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats before close:

> [record-updaters: 50, partitions: 1, buffered-records: 0

> total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Closing updater for

> partitions: default.yyyy

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.hive.streaming.AbstractRecordWriter Stats after close:

> [record-updaters: 0, partitions: 1, buffered-records: 0 total-records:

> 1 buffered-ingest-size: 0B, total-ingest-size: 0B

> tenured-memory-usage: used/max => 129.54GB/130.00GB]

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 1

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to

> metastore, current connections: 0

> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]

> o.a.h.streaming.HiveStreamingConnection Closed streaming connection.

> Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458]

> thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1,

> records-size: 0, committed-transactions: 1, aborted-transactions: 0,

> auto-flushes: 0, metastore-calls: 7 }

>

> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

>

> ________________________________

>

> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.

> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.

> Geschäftsführerin: Sabine Brünger-Weilandt.

> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

>

> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

>

>
------------------------------------------------------------------------------

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig’in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

Re: Possible memory leak in PutHive3Streaming?

Posted by Matt Burgess <ma...@apache.org>.
Martin,

There were two memory leaks a while back, one in NiFi code and one in
Hive client code (brought in as a dependency for the Hive 3
components). NiFi has fixed their side in 1.9.0 (via NIFI-5841 [1])
and Hive has fixed their side in 3.1.1 (via HIVE-20979 [2]). Until
NiFi 1.11.4, we were still using Hive 3.1.0 dependencies which still
has the leak. But as of 1.11.4 (via NIFI-7239 [3]) we upgraded our
Hive 3 dependencies to 3.1.2, which means those memory leaks should be
gone. Can you verify the versions of Hive 3 in your NiFi installation
with something like:

find work -name "hive*.jar" | grep hive3

That should show you the Hive JARs in the Hive 3 NAR, and the version
is part of the filename of each.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-5841
[2] https://issues.apache.org/jira/browse/HIVE-20979
[3] https://issues.apache.org/jira/browse/NIFI-7239

On Mon, May 11, 2020 at 4:42 PM Mark Payne <ma...@hotmail.com> wrote:
>
> Martin,
>
> I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).
>
> Thanks
> -Mark
>
>
> On May 11, 2020, at 5:20 AM, Hengesbach, Martin <Ma...@fiz-Karlsruhe.de> wrote:
>
> Hi,
>
> I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.
>
> When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.
>
> As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.
>
> The Hive Table is created as follow:
> create table yyyy(id string, data string) clustered by (id) into 50 buckets stored as orc tblproperties("transactional"="true")
>
> Any ideas if I make something wrong or is there really a memory leak in the processor?
>
> Thanks in advance and best regards
>
> Martin
>
>
>
> 2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection
> 2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxxxx:9083
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 1
> 2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection-heartbeat
> 2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxx:9083
> 2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 2
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
> 2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: { metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy, partitioned-table: false, dynamic-partitioning: false, username: nifi, secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] }
> 2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44]
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance: 484197097
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
> 2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to 2147483647
> 2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 129.54GB Max: 130.00GB
> 2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Opened new transaction batch TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy };  TxnStatus[O] LastUsed txnid:0
> 2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.PhysicalFsWriter ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
> 2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.WriterImpl ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions: default.yyyy
> 2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions: default.yyyy
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 1
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 0
> 2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size: 0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }
>
> At the beginning the memory starts with Used: 2.63GB Max: 130.00GB
>
> ________________________________
>
> FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
> Geschäftsführerin: Sabine Brünger-Weilandt.
> Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.
>
> FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
>
>

Re: Possible memory leak in PutHive3Streaming?

Posted by Mark Payne <ma...@hotmail.com>.
Martin,

I don’t know a lot about Hive. The log message you’re noting there is definitely coming from Hive’s client, but I can’t say whether or not it’s anything that should actually be concerning in terms of Java memory/heap. Is the NiFi heap size actually set to a max of 130 GB? (That would be a really massive heap for NiFi). Are you seeing any full garbage collections occurring? This information would be available in the Cluster table if you’re running clustered, or else in the Summary table (hamburger menu in the top-right -> Summary -> System Diagnostics in the bottom-right).

Thanks
-Mark


On May 11, 2020, at 5:20 AM, Hengesbach, Martin <Ma...@fiz-Karlsruhe.de>> wrote:

Hi,

I’m using the PutHive3Streaming processor to write round about 30,000 avro records into a Hive table (round about 150 GB). This works for a while but the used memory is increasing over the time. And after some thousands records, the memory is full.

When I remove the PutHive3Streaming processor from my flow, everything works OK. So, the problem must be the PutHive3Streaming processor.

As Record Reader I’m using the AvroReader, the NiFi version is 1.11.4, Hive is version 3.1.0.

The Hive Table is created as follow:
create table yyyy(id string, data string) clustered by (id) into 50 buckets stored as orc tblproperties("transactional"="true")

Any ideas if I make something wrong or is there really a memory leak in the processor?

Thanks in advance and best regards

Martin



2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection
2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxxxx:9083
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 1
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Creating metastore client for streaming-connection-heartbeat
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with URI thrift://xxxx:9083
2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, current connections: 2
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: { metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy, partitioned-table: false, dynamic-partitioning: false, username: nifi, secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] }
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44]
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance: 484197097
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to 2147483647
2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is already low. Increase memory to improve performance. Used: 129.54GB Max: 130.00GB
2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Opened new transaction batch TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri: thrift://xxxxxxxx:9083, database: default, table: yyyy };  TxnStatus[O] LastUsed txnid:0
2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.PhysicalFsWriter ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44] org.apache.orc.impl.WriterImpl ORC writer created for path: hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions: default.yyyy
2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters: 50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions: default.yyyy
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 1
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, current connections: 0
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent: NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size: 0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }

At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

________________________________

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig'in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".