You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by "zcsizmadia (via GitHub)" <gi...@apache.org> on 2023/08/14 13:53:27 UTC

[GitHub] [avro] zcsizmadia commented on a diff in pull request #2439: AVRO-3802: [Csharp] Fix memory leak on deflate codec decompression

zcsizmadia commented on code in PR #2439:
URL: https://github.com/apache/avro/pull/2439#discussion_r1293479830


##########
lang/csharp/src/apache/main/IO/BinaryEncoder.cs:
##########
@@ -228,5 +228,8 @@ public void Flush()
         {
             Stream.Flush();
         }
+
+        /// <inheritdoc />
+        public void Dispose() => Stream?.Dispose();

Review Comment:
   Same issue as BinaryDecoder.Dispose



##########
lang/csharp/src/apache/test/File/FileTests.cs:
##########
@@ -668,6 +668,55 @@ public void TestPartialReadAll([Values(specificSchema)] string schemaStr, [Value
             }
         }
 
+        [Test]
+        public void TestDeflateReadMemoryUsage([Values(specificSchema)] string schemaStr)
+        {
+            // create and write out
+            IList<Foo> records = MakeRecords(GetTestFooObject());
+
+            Process currentProcess = Process.GetCurrentProcess();
+
+            MemoryStream dataFileOutputStream = new MemoryStream();
+
+            Schema schema = Schema.Parse(schemaStr);
+            DatumWriter<Foo> writer = new SpecificWriter<Foo>(schema);
+            using (IFileWriter<Foo> dataFileWriter = DataFileWriter<Foo>.OpenWriter(writer, dataFileOutputStream, Codec.CreateCodec(Codec.Type.Deflate)))
+            {
+                for (int i = 0; i < 10; ++i)
+                {
+                    foreach (Foo foo in records)
+                    {
+                        dataFileWriter.Append(foo);
+                    }
+
+                    // write out block
+                    if (i == 1 || i == 4)
+                    {
+                        dataFileWriter.Sync();
+                    }
+                }
+            }
+
+            long startMemoryUsedBytes = currentProcess.WorkingSet64;
+
+            MemoryStream dataFileInputStream = new MemoryStream(dataFileOutputStream.ToArray());
+            dataFileInputStream.Position = 0;
+
+            // read back
+            IList<Foo> readRecords = new List<Foo>();
+            using (IFileReader<Foo> reader = DataFileReader<Foo>.OpenReader(dataFileInputStream, schema))
+            {
+                // read records from synced position
+                foreach (Foo rec in reader.NextEntries)
+                    readRecords.Add(rec);
+            }
+
+            long totalMemoryUsedBytes = currentProcess.WorkingSet64 - startMemoryUsedBytes;
+
+            Assert.IsTrue(totalMemoryUsedBytes  == 0, "Total memory usage in working set");

Review Comment:
   Agreed with @KalleOlaviNiemitalo. This test is unreliable and IMO not needed.



##########
lang/csharp/src/apache/main/File/DeflateCodec.cs:
##########
@@ -58,18 +58,19 @@ public override void Compress(MemoryStream inputStream, MemoryStream outputStrea
         /// <inheritdoc/>
         public override byte[] Decompress(byte[] compressedData, int length)
         {
-
-            MemoryStream inStream = new MemoryStream(compressedData);
-            MemoryStream outStream = new MemoryStream();
-
-            using (DeflateStream Decompress =
-                        new DeflateStream(inStream,
-                        CompressionMode.Decompress))
+            using (MemoryStream inStream = new MemoryStream(compressedData))
             {
-                CopyTo(Decompress, outStream);
+                using (MemoryStream outStream = new MemoryStream(inStream.Capacity))

Review Comment:
   Personally I like to stack the using in the beginning in cases like this. e.g. https://github.com/apache/avro/blob/master/lang/csharp/src/apache/codec/Avro.File.BZip2/BZip2.cs#L78-L79
   
   Addistionaly use the MemoryStream constructor to limit the inSTream access only to length, mentioned by @KalleOlaviNiemitalo 



##########
lang/csharp/src/apache/main/IO/BinaryDecoder.cs:
##########
@@ -296,5 +296,8 @@ private void Skip(long p)
         {
             stream.Seek(p, SeekOrigin.Current);
         }
+
+        /// <inheritdoc />
+        public void Dispose() => stream?.Dispose();

Review Comment:
   This is a breaking change. This assumes, that the BinaryDecoder object takes ownership if the stream and once the BinaryDecoder is disposed, the stream is disposed as well and cannot be used any more. Most of the nurmal use cases the stream can be disposed here without any harm, however the calling code might need to keep trhe stream alive in special cases, e.g. there is still some additional data in the stream for other purposes.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org