You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/07/20 11:40:04 UTC

[jira] [Commented] (TAJO-1486) Tajo should be able to skip header and footer rows when creating external table

    [ https://issues.apache.org/jira/browse/TAJO-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633226#comment-14633226 ] 

ASF GitHub Bot commented on TAJO-1486:
--------------------------------------

Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/615#discussion_r34979963
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java ---
    @@ -197,6 +197,6 @@ public ByteBuf readLineBuf(AtomicInteger reads) throws IOException {
           }
         }
         reads.set(readBytes);
    -    return buffer.slice(startIndex, readBytes - newlineLength);
    +    return buffer.slice(startIndex, readBytes - newlineLength).retain();
    --- End diff --
    
    This buffer is shared until closing the ByteBufLineReader. if you want to   keep the sliced buffer, you must copy to new buffer


> Tajo should be able to skip header and footer rows when creating external table
> -------------------------------------------------------------------------------
>
>                 Key: TAJO-1486
>                 URL: https://issues.apache.org/jira/browse/TAJO-1486
>             Project: Tajo
>          Issue Type: Improvement
>    Affects Versions: 0.10.0
>            Reporter: Youngkyong Ko
>            Assignee: Jongyoung Park
>            Priority: Minor
>             Fix For: 0.11.0
>
>         Attachments: TAJO-1486-1.patch, TAJO-1486.patch
>
>
> It is quite common to see header/footer lines in real world data set. So skipping first/last N lines in "create external table" DDL can be useful feature for Tajo users.  In this way, user don't need additional processing of data which generated by other application with a header or footer and directly use the file for table operations.
> cf. Same feature added in Hive 0.13 : https://issues.apache.org/jira/browse/HIVE-5795



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)