You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Dali Susanibar Arce (Jira)" <ji...@apache.org> on 2022/07/01 15:46:00 UTC

[jira] [Assigned] (ARROW-16920) [Java] DictionaryProvider leaks memory while adding dictionaries with duplicate encoding

     [ https://issues.apache.org/jira/browse/ARROW-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Dali Susanibar Arce reassigned ARROW-16920:
-------------------------------------------------

    Assignee: David Dali Susanibar Arce

> [Java] DictionaryProvider leaks memory while adding dictionaries with duplicate encoding
> ----------------------------------------------------------------------------------------
>
>                 Key: ARROW-16920
>                 URL: https://issues.apache.org/jira/browse/ARROW-16920
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 7.0.0
>            Reporter: Vimal Varghese
>            Assignee: David Dali Susanibar Arce
>            Priority: Major
>
> DictionaryProvider leaks memory while adding dictionaries with duplicate encoding. Is this expected? Should the provider release the memory of the existing dictionary vector if it accepts another one with same encoding id ?
> Sample code:
> {code:java}
> "dictionaryProvider" should " not leak memory while adding dictionaries with duplicate encoding" in {
>   val allocator: RootAllocator = new RootAllocator()
>   val vector: ListVector = ListVector.empty("vector", allocator)
>   val dictionaryVector1: ListVector = ListVector.empty("dict1", allocator)
>   val dictionaryVector2: ListVector = ListVector.empty("dict2", allocator)
>   val writer1: UnionListWriter = vector.getWriter
>   writer1.allocate
>   writer1.setValueCount(1)
>   val dictWriter1: UnionListWriter = dictionaryVector1.getWriter
>   dictWriter1.allocate
>   dictWriter1.setValueCount(1)
>   val dictWriter2: UnionListWriter = dictionaryVector2.getWriter
>   dictWriter2.allocate
>   dictWriter2.setValueCount(1)
>   val dictionary1: Dictionary = new Dictionary(dictionaryVector1, new DictionaryEncoding(1L, false, None.orNull))
>   val dictionary2: Dictionary = new Dictionary(dictionaryVector2, new DictionaryEncoding(1L, false, None.orNull))
>   val provider = new DictionaryProvider.MapDictionaryProvider
>   provider.put(dictionary1)
>   provider.put(dictionary2)
>   vector.clear()
>   provider.getDictionaryIds.asScala.map(id => provider.lookup(id).getVector.clear())
>   allocator.getAllocatedMemory shouldBe 0
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)