You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Apache Arrow JIRA Bot (Jira)" <ji...@apache.org> on 2022/09/29 17:52:00 UTC

[jira] [Assigned] (ARROW-16920) [Java] DictionaryProvider leaks memory while adding dictionaries with duplicate encoding

     [ https://issues.apache.org/jira/browse/ARROW-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Arrow JIRA Bot reassigned ARROW-16920:
---------------------------------------------

    Assignee:     (was: David Dali Susanibar Arce)

> [Java] DictionaryProvider leaks memory while adding dictionaries with duplicate encoding
> ----------------------------------------------------------------------------------------
>
>                 Key: ARROW-16920
>                 URL: https://issues.apache.org/jira/browse/ARROW-16920
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 7.0.0
>            Reporter: Vimal Varghese
>            Priority: Major
>
> DictionaryProvider leaks memory while adding dictionaries with duplicate encoding. Is this expected? Should the provider release the memory of the existing dictionary vector if it accepts another one with same encoding id ?
> Sample code:
> {code:java}
> "dictionaryProvider" should " not leak memory while adding dictionaries with duplicate encoding" in {
>   val allocator: RootAllocator = new RootAllocator()
>   val vector: ListVector = ListVector.empty("vector", allocator)
>   val dictionaryVector1: ListVector = ListVector.empty("dict1", allocator)
>   val dictionaryVector2: ListVector = ListVector.empty("dict2", allocator)
>   val writer1: UnionListWriter = vector.getWriter
>   writer1.allocate
>   writer1.setValueCount(1)
>   val dictWriter1: UnionListWriter = dictionaryVector1.getWriter
>   dictWriter1.allocate
>   dictWriter1.setValueCount(1)
>   val dictWriter2: UnionListWriter = dictionaryVector2.getWriter
>   dictWriter2.allocate
>   dictWriter2.setValueCount(1)
>   val dictionary1: Dictionary = new Dictionary(dictionaryVector1, new DictionaryEncoding(1L, false, None.orNull))
>   val dictionary2: Dictionary = new Dictionary(dictionaryVector2, new DictionaryEncoding(1L, false, None.orNull))
>   val provider = new DictionaryProvider.MapDictionaryProvider
>   provider.put(dictionary1)
>   provider.put(dictionary2)
>   vector.clear()
>   provider.getDictionaryIds.asScala.map(id => provider.lookup(id).getVector.clear())
>   allocator.getAllocatedMemory shouldBe 0
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)