You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Tao Li (JIRA)" <ji...@apache.org> on 2016/07/19 05:20:20 UTC

[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

    [ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383593#comment-15383593 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:19 AM:
--------------------------------------------------------

~Sahil Takiar I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even better to deprecate the buffered row mode completely due to OOM issue. I don't think this is a breaking change since it does not affect the query result. I am not sure about the correct behavior with "--incremental=false" though (maybe we have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass (e.g. IncrementalRowsWithNormalization). The reason is that the non-table formats don't require column width normalization at all so it's better to isolate the normalization related code from these formats. Without any code change (other than setting default of incremental to true), the non-table formats should just work fine. Only the table format will involve the normalization code path (e.g. your incremental normalization code).


was (Author: taoli-hwx):
[~Sahil Takiar] I am pretty new to Hive, so correct me if I am wrong. 

1. We probably want to set the incremental to true by default. Maybe it's even better to deprecate the buffered row mode completely due to OOM issue. I don't think this is a breaking change since it does not affect the query result. I am not sure about the correct behavior with "--incremental=false" though (maybe we have to stick with "buffered page" mode for that case). 

2. We may want to keep IncrementalRows class unchanged and define a subclass (e.g. IncrementalRowsWithNormalization). The reason is that the non-table formats don't require column width normalization at all so it's better to isolate the normalization related code from these formats. Without any code change (other than setting default of incremental to true), the non-table formats should just work fine. Only the table format will involve the normalization code path (e.g. your incremental normalization code).

> Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14170
>                 URL: https://issues.apache.org/jira/browse/HIVE-14170
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Beeline
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed out immediately. However, if {{TableOutputFormat}} is used with this option the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of the optimal width size for {{TableOutputFormat}} (it can't because it only sees one row at a time). The output of {{BufferedRows}} looks much better because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width should be re-calculated every "x" rows ("x" can be configurable and by default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)