You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2017/08/10 22:49:00 UTC

[jira] [Updated] (ORC-228) Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable

     [ https://issues.apache.org/jira/browse/ORC-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eugene Koifman updated ORC-228:
-------------------------------
    Description: 
currently addedRow() looks like
{noformat}
public void addedRow(int rows) throws IOException {
    rowsAddedSinceCheck += rows;
    if (rowsAddedSinceCheck >= ROWS_BETWEEN_CHECKS) {
      notifyWriters();
    }
  }
{noformat}

it would be convenient for testing to set ROWS_BETWEEN_CHECKS to a low value so that we can generate multiple stripes with very little data.

Currently the only way to do this is to create a new MemoryManager that overrides this method and install it via OrcFile.WriterOptions but this only works when you have control over creating the Writer.
For example _org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta()_

There is no way to do this via some set of config params to make Hive query for example, create multiple stripes with little data.

  was:
currently addedRow() looks like
{noformat}
public void addedRow(int rows) throws IOException {
    rowsAddedSinceCheck += rows;
    if (rowsAddedSinceCheck >= ROWS_BETWEEN_CHECKS) {
      notifyWriters();
    }
  }
{noformat}

it would be convenient for testing to set ROWS_BETWEEN_CHECKS to a low value so that we can generate multiple stripes with very little data.

Currently the only way to do this is to create a new MemoryManager that overrides this method and install it via OrcFile.WriterOptions but this only works when you have control over creating the Writer.

There is no way to do this via some set of config params to make Hive query for example, create multiple stripes with little data.


> Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable
> -------------------------------------------------------
>
>                 Key: ORC-228
>                 URL: https://issues.apache.org/jira/browse/ORC-228
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> currently addedRow() looks like
> {noformat}
> public void addedRow(int rows) throws IOException {
>     rowsAddedSinceCheck += rows;
>     if (rowsAddedSinceCheck >= ROWS_BETWEEN_CHECKS) {
>       notifyWriters();
>     }
>   }
> {noformat}
> it would be convenient for testing to set ROWS_BETWEEN_CHECKS to a low value so that we can generate multiple stripes with very little data.
> Currently the only way to do this is to create a new MemoryManager that overrides this method and install it via OrcFile.WriterOptions but this only works when you have control over creating the Writer.
> For example _org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta()_
> There is no way to do this via some set of config params to make Hive query for example, create multiple stripes with little data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)