You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "Zhichao Zhang (JIRA)" <ji...@apache.org> on 2018/02/08 17:22:00 UTC

[jira] [Created] (CARBONDATA-2148) Use Row parser to replace current default parser:CSVStreamParserImp

Zhichao  Zhang created CARBONDATA-2148:
------------------------------------------

             Summary: Use Row parser to replace current default parser:CSVStreamParserImp
                 Key: CARBONDATA-2148
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2148
             Project: CarbonData
          Issue Type: Improvement
          Components: data-load, spark-integration
    Affects Versions: 1.3.0
            Reporter: Zhichao  Zhang
            Assignee: Zhichao  Zhang
             Fix For: 1.3.0


Currently the default value of 'carbon.stream.parser' is CSVStreamParserImp, it transforms InternalRow(0) to Array[Object], InternalRow(0) represents the value of one line which is received from Socket. When it receives data from Kafka, the schema of InternalRow is changed, either it need to assemble the fields of kafka data Row into a String and stored it as InternalRow(0), or define a new parser to convert kafka data Row to Array[Object]. It needs the same operation for every table.

*Solution:*
Use a new parser called RowStreamParserImpl as the default parser instead of CSVStreamParserImpl, this new parser will automatically convert InternalRow to Array[Object] according to the schema. In general, we will transform source data to a structed Row object, using this way, we do not need to define a parser for every table.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)