You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/11/27 04:55:12 UTC

[jira] [Commented] (TAJO-1209) Pluggable line (de)serializer for DelimitedTextFile

    [ https://issues.apache.org/jira/browse/TAJO-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227233#comment-14227233 ] 

ASF GitHub Bot commented on TAJO-1209:
--------------------------------------

GitHub user hyunsik opened a pull request:

    https://github.com/apache/tajo/pull/271

    TAJO-1209: Pluggable line (de)serializer for DelimitedTextFile.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hyunsik/tajo TAJO-1209

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #271
    
----
commit 7837dde631cb4d66635663b5c80db47a944db044
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-11-27T03:39:12Z

    TAJO-1209: Pluggable line (de)serializer for DelimitedTextFile.

----


> Pluggable line (de)serializer for DelimitedTextFile
> ---------------------------------------------------
>
>                 Key: TAJO-1209
>                 URL: https://issues.apache.org/jira/browse/TAJO-1209
>             Project: Tajo
>          Issue Type: Improvement
>          Components: storage
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.9.1
>
>
> DelimitedTextFile directly parses line delimited text files and parses each line into CSV or TSV field. It has many limits when we deal with custom text-based file format.
> This patch enables DelimitedTextFile to use a pluggable line (de) serializer.
> First of all, I add an abstract class for user-defined line serde class as follows:
> {code:java}
> public abstract class TextLineSerde {
>   protected Schema schema;
>   protected TableMeta meta;
>   protected int [] targetColumnIndexes;
>   public TextLineSerde(Schema schema, TableMeta meta, int[] targetColumnIndexes) {
>     this.schema = schema;
>     this.meta = meta;
>     this.targetColumnIndexes = targetColumnIndexes;
>   }
>   public abstract void init();
>   public abstract void buildTuple(final ByteBuf buf, Tuple tuple) throws IOException;
>   public abstract void release();
> }
> {code}
> I also added a table property {{text.serde.class}} which allows users to specify a custom line serder. This table property affects only {{TEXT}} file format. You can specify your own line serder as follows:
> {code:sql}
> CREATE XXX (x int, y int) USING TEXT WITH ('text.serde.class' = 'org.apache.tajo.storage.text.CSVLineSerder')
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)