You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Renat Valiullin (JIRA)" <ji...@apache.org> on 2019/01/10 01:46:00 UTC

[jira] [Updated] (ORC-458) [C++] Redesign of ColumnVectorBatch/ColumnWriter

     [ https://issues.apache.org/jira/browse/ORC-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Renat Valiullin updated ORC-458:
--------------------------------
    Description: 
Current implementation is not convenient for nested types and has memory overhead since we have to construct whole batch before add it to the writer.

Will be better add to the each batch link to its ColumnWriter to allow possibility to flush data when batch is full:
{code:java}
listBatch = writer->createRowBatch(batchSize); // create batch tree
elementsBatch = listBatch->elements.get();

for (array : arrays) {
  for (element: array) {
    if (elementsBatch.size == batchSize) elementsBatch.add(); // reset batch size to 0
    elementsBatch.data[elementsBatch.size++] = element;
  }

  if (listBatch.size == batchSize) listBatch.add();
  listBatch.data[listBatch.size++] = array.size; // sizes, not offsets
}

writer->add(listBatch); // writeStripe() if needed
{code}
 

  was:
Current implementation is not convenient for nested types and has memory overhead since we have to construct whole batch before add it to the writer.

Will be better add to the each batch link to its ColumnWriter to allow possibility to flush data when batch is full:

listBatch = writer->createRowBatch(batchSize); // create batch tree

elementsBatch = listBatch->elements.get();

for (array : arrays) {

    for (element: array) {

        if (elementsBatch.size == batchSize) elementsBatch.add(); // reset batch size to 0

        elementsBatch.data[elementsBatch.size++] = element;

    }

    if (listBatch.size == batchSize) listBatch.add();

    listBatch.data[listBatch.size++] = array.size; // sizes, not offsets

}

writer->add(listBatch); // writeStripe() if needed


> [C++] Redesign of ColumnVectorBatch/ColumnWriter 
> -------------------------------------------------
>
>                 Key: ORC-458
>                 URL: https://issues.apache.org/jira/browse/ORC-458
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Renat Valiullin
>            Priority: Major
>
> Current implementation is not convenient for nested types and has memory overhead since we have to construct whole batch before add it to the writer.
> Will be better add to the each batch link to its ColumnWriter to allow possibility to flush data when batch is full:
> {code:java}
> listBatch = writer->createRowBatch(batchSize); // create batch tree
> elementsBatch = listBatch->elements.get();
> for (array : arrays) {
>   for (element: array) {
>     if (elementsBatch.size == batchSize) elementsBatch.add(); // reset batch size to 0
>     elementsBatch.data[elementsBatch.size++] = element;
>   }
>   if (listBatch.size == batchSize) listBatch.add();
>   listBatch.data[listBatch.size++] = array.size; // sizes, not offsets
> }
> writer->add(listBatch); // writeStripe() if needed
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)