You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/03/24 09:11:53 UTC

[jira] Created: (HIVE-360) Generalize the FileFormat Interface in Hive

Generalize the FileFormat Interface in Hive
-------------------------------------------

                 Key: HIVE-360
                 URL: https://issues.apache.org/jira/browse/HIVE-360
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.

The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.

Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
{code}
KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
{code}

Please add the handling of TOK_TABLEFILEFORMAT here:
DDLSemanticAnalyzer.java:223
{code}
        case HiveParser.TOK_TBLSEQUENCEFILE:
        ...
{code}

Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
FileSinkOperator.java:129-174:
{code}
      if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
        finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
                             Utilities.getFileExtension(jc, isCompressed));
      ...
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694543#action_12694543 ] 

He Yongqiang commented on HIVE-360:
-----------------------------------

Thanks, Thusoo.
Actually i am refactoring the code now. 
I have talked with Zheng about the current patch. There are some improvements:
(1) make HiveInputFormat as an interface, and extends from InputFormat. Add a new getRecordWriter. The main different between its getRecordWriter and Hadoop OutputFormat's getRecordWriter is that the new getRecordWriter accepts a path parameter, and create the out file at the calling.
(2) make HiveSequenceFileOutputFormat extend Hadoop's SequenceFileOutputFormat and implement the new HiveOutputFormat
(3) Deprecate Hive's IgnoreKeyOutputFormat and replace it with a new IgnoreKeyOutputFormat which uses the new HiveOutputFormat

In this way, the code will be more clear. The disadvantage is that the HiveOutputFormat's signature is like:
{code}
HiveOutputFormat extends
    OutputFormat<WritableComparable, Writable>
{code} 
It can only use subclasses of WritableComparable as its key and subclasses of Writable as its value. I think it is ok in Hive, isn't it?

Should i cancel the patch now and resubmit one once the refactory is done?

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694957#action_12694957 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

Currently we need a tableDesc in the getHiveRecordWriter method, can we change it to java.util.Properties instead?
We don't need all the information about the table to create the File.


> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696163#action_12696163 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

This piece of code looks bad in FileSinkOperator.
Can you move it into the HiveOutputFileFormat by adding a new Method? Like getFileExtension(jc, isCompressed)?

{code}
      if (outputFormat instanceof IgnoreKeyTextOutputFormat) {
        finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities
            .getTaskId(jc)
            + Utilities.getFileExtension(jc, isCompressed));
      }
{code}

Can you also make "initRecordWriter" in FileSinkOperator static?  There is just one member variable that is referenced: outPath, and you can make initRecordWriter return the outWriter value.


> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo reassigned HIVE-360:
----------------------------------

    Assignee: He Yongqiang

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-360:
----------------------------

    Attachment: HIVE-360.patch

Almost the same as Yongqiang's patch except fixing a few typos etc.

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695227#action_12695227 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

Good point. We should do an INSERT after the CREATE TABLE as well.


> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-360:
-------------------------------

    Affects Version/s: 0.4.0
               Status: Patch Available  (was: Open)

Marking this as patch submitted. That helps to get this on the radar for the reviewers.

Thanks,
Ashish

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-04-08.patch

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Status: Open  (was: Patch Available)

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697571#action_12697571 ] 

Joydeep Sen Sarma commented on HIVE-360:
----------------------------------------

looked at this a bit - looks great to me.

One comment is that the getFinalPath call should be made part of the HiveOutputFormat as well. Actually all we need is to let the outputformat determine the file extension. rest of the path name is always the same. but it's not a big deal.

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696166#action_12696166 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

Also MoveTask.java needs to be modified as well. There is some code to check the file format. We need to change it accordingly as well.

{code}
          boolean fileIsSequenceFile = true;   
          try {
            SequenceFile.Reader reader = new SequenceFile.Reader(
              fs, files.get(fileId).getPath(), conf);
            reader.close();
          } catch (IOException e) {
            fileIsSequenceFile = false;
          }
{code}


> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-03-31.patch

Attached hive-360-2009-03-31.patch, a draft version.
1) add a new HiveOutputFormat
2) wrap existing IgnoreKeyTextOutputFormat and SequenceFileOutputFormat to HiveIgnoreKeyTextOutputFormat and HiveSequenceFileOutputFormat respectly. 
4) add a HiveOutputFormatUtils for backward compability
3) factor FileSinkOperator to use HiveOutputFormat to create write

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696854#action_12696854 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

Some more details on 5):  SequenceFileFormat was inheriting from FileOutputFormat using different generic arguments, which makes it impossible for HiveOutputFormat to extend OutputFormat<K,V>, and HiveSequenceFileFormat to inherit from SequenceFileFormat, and implement HiveOutputFormat.

As a result, we have to drop the inheritance of HiveOutputFormat on OutputFormat.

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-04-08-3.patch

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-04-07-5.patch

1) 
HiveOutputFileFormat by adding a new Method? Like getFileExtension(jc, isCompressed)?

if (outputFormat instanceof IgnoreKeyTextOutputFormat) {
        finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities
            .getTaskId(jc)
            + Utilities.getFileExtension(jc, isCompressed));
      }

Done. 
A new method called getOutputFormatFinalPath (..) is added in HiveFileFormatUtils 

2) HiveOutputFormatUtils is renamed to HiveFileFormatUtils 

3) "initRecordWriter" in FileSinkOperator is made static and renamed to getRecordWriter which returns a RecordWriter.

4) code from MoveTask.java is moved to a method called checkInputFormat in HiveFileFormatUtils 


In addition to Zheng's suggestions mentioned in the jira page. We talked about other modifications.
5) drop the inheritance of HiveOutputFormat on OutputFormat<K,V>, because SequenceFileOutputFormat in 0.17 and 0.19 are inheriting from FileOutputFormat differently.
6) modify tableDesc to support HiveOutputFormat directly

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-04-09.patch

1) modified the .out files using -Doverwrite=true
2) removed fileformat_void.q
3) checked errors from Hive's test
" 
    [junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED
    [junit] < FAILED: Error in semantic analysis: Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat
    [junit] > FAILED: Error in metadata: Class not found: ClassDoesNotExist
    [junit] > FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
    [junit] < FAILED: Error in semantic analysis: Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat
    [junit] > FAILED: Error in semantic analysis: line 4:23 Output Format must implement OutputFormat dest1
    [junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED
    [junit] Test org.apache.hadoop.hive.ql.TestMTQueries FAILED
    [junit] Test org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker FAILED
    [junit] Test org.apache.hadoop.hive.ql.parse.TestParse FAILED
"
fixed them and passed them in my local.


> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-04-09-3.patch

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693637#action_12693637 ] 

He Yongqiang commented on HIVE-360:
-----------------------------------

TOK_TABLEFILEFORMAT is already handled in DDLSemanticAnalyzer.
{code}
case HiveParser.TOK_TBLSEQUENCEFILE:
          inputFormat = SEQUENCEFILE_INPUT;
          outputFormat = SEQUENCEFILE_OUTPUT;
          break;
{code}



> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697696#action_12697696 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

@Joydeep: Agreed, although currently the only use case for that is inside text file format. It's kind of rare so I think we can generalize that when we have a second use case.
We should also move file format check function into the specific file format as well.


> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694959#action_12694959 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

Also, please include 2 tests which uses the syntax of CREATE TABLE ... STORED AS INPUTFORMAT xxxx OUTPUTFORMAT xxx.
One positive test, and one negative test with invalid file formats (e.g., just a hadoop file format, but no corresponding hive file format).



> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-03-31.patch

Attached hive-360-2009-03-31.patch, a draft version.
1) add a new HiveOutputFormat
2) wrap existing IgnoreKeyTextOutputFormat and SequenceFileOutputFormat to HiveIgnoreKeyTextOutputFormat and HiveSequenceFileOutputFormat respectly. 
4) add a HiveOutputFormatUtils for backward compability
3) factor FileSinkOperator to use HiveOutputFormat to create write

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: qfile.tar

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695219#action_12695219 ] 

He Yongqiang commented on HIVE-360:
-----------------------------------

Is CREATE TABLE ... STORED AS INPUTFORMAT xxxx OUTPUTFORMAT xxx a good testcase for OutputFormat.
Hive only uses OutputFormat in its FileSinkOperator, and it seems CREATE only stores the information. 

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-04-01.patch

hive-360-2009-04-01.patch is a refactored version of hive-360-2009-03-32.patch.
It adds javadoc and the apache licence header in each new file.

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment:     (was: hive-360-2009-03-31.patch)

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>         Attachments: hive-360-2009-03-31.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697408#action_12697408 ] 

Zheng Shao commented on HIVE-360:
---------------------------------

+1


> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Attachment: hive-360-2009-04-04-4.patch

1) change tableDesc in the getHiveRecordWriter method to java.util.Properties
2) added in two .q file. one in clientpositive and the other in clientnegative
Thanks Zheng.

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-360:
------------------------------

    Status: Patch Available  (was: Open)

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-360) Generalize the FileFormat Interface in Hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-360:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.4.0
     Release Note: HIVE-360. Generalize the FileFormat Interface in Hive. (He Yongqiang via zshao)
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang.

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>             Fix For: 0.4.0
>
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch, hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09-3.patch, hive-360-2009-04-09.patch, HIVE-360.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without changing the "if...else" structure. We should make an interface for the FileFormat need for Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf) +
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.