You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Vimal Varghese (Jira)" <ji...@apache.org> on 2022/06/28 09:04:00 UTC

[jira] [Created] (ARROW-16920) [Java]: DictionaryProvider leaks memory while adding dictionaries with duplicate encoding

|  ![](cid:jira-generated-image-avatar-0d23492b-b7c3-406d-bf57-f3948e15f40a) |
[Vimal
Varghese](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jaguarpo)
**created** an issue  
---|---  
|  
---  
|  [Apache Arrow](https://issues.apache.org/jira/browse/ARROW) /
[![Bug](cid:jira-generated-image-
avatar-7ed66b5e-3fae-4d38-8a25-239b97def497)](https://issues.apache.org/jira/browse/ARROW-16920)
[ARROW-16920](https://issues.apache.org/jira/browse/ARROW-16920)  
---  
[[Java]: DictionaryProvider leaks memory while adding dictionaries with
duplicate encoding](https://issues.apache.org/jira/browse/ARROW-16920)  
| Issue Type: |  ![Bug](cid:jira-generated-image-
avatar-7ed66b5e-3fae-4d38-8a25-239b97def497) Bug  
---|---  
Affects Versions: |  7.0.0  
Assignee: |  Unassigned  
Components: |  Java  
Created: |  28/Jun/22 09:03  
Priority: |  ![Major](cid:jira-generated-image-static-
major-3dbf366e-7cbd-449e-84f2-adcbb28acafc) Major  
Reporter: |  [Vimal
Varghese](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jaguarpo)  
|

DictionaryProvider leaks memory while adding dictionaries with duplicate
encoding. Is this expected? Should the provider release the memory of the
existing dictionary vector if it accepts another one with same encoding id ?

Sample code:

    
    
    "dictionaryProvider" should " not leak memory while adding dictionaries with duplicate encoding" in {
    
      val allocator: RootAllocator = new RootAllocator()
    
      val vector: ListVector = ListVector.empty("vector", allocator)
      val dictionaryVector1: ListVector = ListVector.empty("dict1", allocator)
      val dictionaryVector2: ListVector = ListVector.empty("dict2", allocator)
    
      val writer1: UnionListWriter = vector.getWriter
      writer1.allocate
      writer1.setValueCount(1)
    
      val dictWriter1: UnionListWriter = dictionaryVector1.getWriter
      dictWriter1.allocate
      dictWriter1.setValueCount(1)
    
      val dictWriter2: UnionListWriter = dictionaryVector2.getWriter
      dictWriter2.allocate
      dictWriter2.setValueCount(1)
    
      val dictionary1: Dictionary = new Dictionary(dictionaryVector1, new DictionaryEncoding(1L, false, None.orNull))
      val dictionary2: Dictionary = new Dictionary(dictionaryVector2, new DictionaryEncoding(1L, false, None.orNull))
    
      val provider = new DictionaryProvider.MapDictionaryProvider
      provider.put(dictionary1)
      provider.put(dictionary2)
    
      vector.clear()
      provider.getDictionaryIds.asScala.map(id => provider.lookup(id).getVector.clear())
    
      allocator.getAllocatedMemory shouldBe 0
    }   
  
---  
|  |  [ ![Add Comment](cid:jira-generated-image-static-comment-
icon-b0c94185-c251-4afe-8c8e-9c912544c928)
](https://issues.apache.org/jira/browse/ARROW-16920#add-comment "Add Comment")
|  [Add Comment](https://issues.apache.org/jira/browse/ARROW-16920#add-comment
"Add Comment")  
---|---  
  
|  This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) |  |
![Atlassian logo](https://issues.apache.org/jira/images/mail/atlassian-email-
logo.png)  
---