You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/11/27 04:55:12 UTC
[jira] [Commented] (TAJO-1209) Pluggable line (de)serializer for
DelimitedTextFile
[ https://issues.apache.org/jira/browse/TAJO-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227233#comment-14227233 ]
ASF GitHub Bot commented on TAJO-1209:
--------------------------------------
GitHub user hyunsik opened a pull request:
https://github.com/apache/tajo/pull/271
TAJO-1209: Pluggable line (de)serializer for DelimitedTextFile.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hyunsik/tajo TAJO-1209
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/271.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #271
----
commit 7837dde631cb4d66635663b5c80db47a944db044
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-11-27T03:39:12Z
TAJO-1209: Pluggable line (de)serializer for DelimitedTextFile.
----
> Pluggable line (de)serializer for DelimitedTextFile
> ---------------------------------------------------
>
> Key: TAJO-1209
> URL: https://issues.apache.org/jira/browse/TAJO-1209
> Project: Tajo
> Issue Type: Improvement
> Components: storage
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.9.1
>
>
> DelimitedTextFile directly parses line delimited text files and parses each line into CSV or TSV field. It has many limits when we deal with custom text-based file format.
> This patch enables DelimitedTextFile to use a pluggable line (de) serializer.
> First of all, I add an abstract class for user-defined line serde class as follows:
> {code:java}
> public abstract class TextLineSerde {
> protected Schema schema;
> protected TableMeta meta;
> protected int [] targetColumnIndexes;
> public TextLineSerde(Schema schema, TableMeta meta, int[] targetColumnIndexes) {
> this.schema = schema;
> this.meta = meta;
> this.targetColumnIndexes = targetColumnIndexes;
> }
> public abstract void init();
> public abstract void buildTuple(final ByteBuf buf, Tuple tuple) throws IOException;
> public abstract void release();
> }
> {code}
> I also added a table property {{text.serde.class}} which allows users to specify a custom line serder. This table property affects only {{TEXT}} file format. You can specify your own line serder as follows:
> {code:sql}
> CREATE XXX (x int, y int) USING TEXT WITH ('text.serde.class' = 'org.apache.tajo.storage.text.CSVLineSerder')
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)