You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Owen O'Malley (Created) (JIRA)" <ji...@apache.org> on 2012/01/12 17:31:40 UTC

[jira] [Created] (HIVE-2711) Make the header of RCFile unique

Make the header of RCFile unique
--------------------------------

                 Key: HIVE-2711
                 URL: https://issues.apache.org/jira/browse/HIVE-2711
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
            Reporter: Owen O'Malley


The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.

I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274414#comment-13274414 ] 

Hudson commented on HIVE-2711:
------------------------------

Integrated in Hive-trunk-h0.21 #1428 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1428/])
    HIVE-3018 Make the new header for RC Files introduced in HIVE-2711 optional
(Kevin Wilfong via namit) (Revision 1337937)

     Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337937
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.10.0
>
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch, rc-file-v0.rc
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Ashutosh Chauhan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated HIVE-2711:
-----------------------------------

    Status: Open  (was: Patch Available)
    
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2711:
------------------------------

    Attachment: HIVE-2711.D2115.2.patch

omalley updated the revision "HIVE-2711 [jira] Make the header of RCFile unique".
Reviewers: JIRA

  updated to current trunk

REVISION DETAIL
  https://reviews.facebook.net/D2115

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
  ql/src/test/data/rc-file-v0.rc
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2571.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Owen O'Malley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HIVE-2711:
--------------------------------

    Attachment: rc-file-v0.rc

Here's the binary file that arc didn't include in the patch.
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch, rc-file-v0.rc
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Owen O'Malley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HIVE-2711:
--------------------------------

    Status: Patch Available  (was: Open)
    
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242932#comment-13242932 ] 

Phabricator commented on HIVE-2711:
-----------------------------------

ashutoshc has commented on the revision "HIVE-2711 [jira] Make the header of RCFile unique".

  It seems like you are backward compatible with SEQ6 but not for anything before that. If the intent is to break backward compatibility, then I think we should send an email on both dev and user list about this change, since folks might have historical data in this format, which can't be read then.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1205 By doing this, you are effectively dropping the ability to read data before version SEQ6. This is backward incompatible change.
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1224 By removing this, you will not be able to read data before version SEQ2.
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1252 This removes ability to read before version 6.
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1256 This removes ability to read before SEQ1
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1357 This removes ability to read before SEQ1

REVISION DETAIL
  https://reviews.facebook.net/D2115

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2711:
------------------------------

    Attachment: HIVE-2711.D2115.3.patch

omalley updated the revision "HIVE-2711 [jira] Make the header of RCFile unique".
Reviewers: JIRA

  Fixed unit tests based on the new file sizes.

REVISION DETAIL
  https://reviews.facebook.net/D2115

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
  ql/src/test/data/rc-file-v0.rc
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java
  ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out
  ql/src/test/results/clientpositive/alter_merge.q.out
  ql/src/test/results/clientpositive/alter_merge_stats.q.out
  ql/src/test/results/clientpositive/create_merge_compressed.q.out
  ql/src/test/results/clientpositive/ctas.q.out
  ql/src/test/results/clientpositive/partition_wise_fileformat.q.out
  ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out
  ql/src/test/results/clientpositive/sample10.q.out

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Ashutosh Chauhan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated HIVE-2711:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.10
           Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Owen!
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.10
>
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch, rc-file-v0.rc
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Owen O'Malley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HIVE-2711:
--------------------------------

    Status: Patch Available  (was: Open)

Sorry, I didn't re-run the test cases after removing the unused fields from the header. All of the unit tests pass now.
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253070#comment-13253070 ] 

Hudson commented on HIVE-2711:
------------------------------

Integrated in Hive-trunk-h0.21 #1370 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1370/])
    HIVE-2711 : Make the header of RCFile unique (Owen Omalley via Ashutosh Chauhan) (Revision 1325442)

     Result = SUCCESS
hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1325442
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/data
* /hive/trunk/ql/src/test/data/rc-file-v0.rc
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java
* /hive/trunk/ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge_stats.q.out
* /hive/trunk/ql/src/test/results/clientpositive/create_merge_compressed.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ctas.q.out
* /hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat.q.out
* /hive/trunk/ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sample10.q.out

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.10
>
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch, rc-file-v0.rc
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Owen O'Malley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238857#comment-13238857 ] 

Owen O'Malley commented on HIVE-2711:
-------------------------------------

The compatibility is very fragile in that SequenceFile can only read an RCFile if they have the RCFile class since the key and value classes are contained in RCFile.java. Additionally, SequenceFile would present the RCFile blocks instead of the RCFile rows.

Furthermore, in the future if either RCFile or SequenceFile has a new format revision, it breaks badly.
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244404#comment-13244404 ] 

Phabricator commented on HIVE-2711:
-----------------------------------

omalley has abandoned the revision "HIVE-2711 [jira] Make the header of RCFile unique".

  I accidentally got a new revision. This is the same as 2511.

REVISION DETAIL
  https://reviews.facebook.net/D2571

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2571.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244981#comment-13244981 ] 

Ashutosh Chauhan commented on HIVE-2711:
----------------------------------------

Patch results failures in TestCliDriver in following queries:

* alter_concatenate_indexed_table.q
* alter_merge.q
* alter_merge_stats.q
* create_merge_compressed.q
* ctas.q
* partition_wise_fileformat.q
* partition_wise_fileformat3.q
* sample10.q
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2571.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2711:
------------------------------

    Attachment: HIVE-2711.D2571.1.patch

omalley requested code review of "HIVE-2711 [jira] Make the header of RCFile unique".
Reviewers: JIRA

  HIVE-2711

  Make the header of RCFile unique wrt SequenceFile

  The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.

  I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D2571

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
  ql/src/test/data/rc-file-v0.rc
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/5835/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2571.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242930#comment-13242930 ] 

Ashutosh Chauhan commented on HIVE-2711:
----------------------------------------

That makes sense. I will take a look.
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243910#comment-13243910 ] 

Phabricator commented on HIVE-2711:
-----------------------------------

omalley has commented on the revision "HIVE-2711 [jira] Make the header of RCFile unique".

  Ashutosh,

  My point is that RCFile was *always* distinct from Sequence Files. RCFile was a fork of Sequence File when the Sequence File version was 6, therefore nothing before version 6 can possibly be an RCFile.

  Headers:
    Sequence Files: SEQ1, SEQ2, SEQ3, SEQ4, SEQ5, SEQ6
    RCFiles: SEQ6, RCF1

  Also note that SEQ5 was last written by Hadoop 0.10 back in Feb 2007, a year and a half before Hive was created.

REVISION DETAIL
  https://reviews.facebook.net/D2115

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237371#comment-13237371 ] 

Ashutosh Chauhan commented on HIVE-2711:
----------------------------------------

@Owen,
 I think original design of RCFile was done with compatibility with Sequence File in mind. This patch will break that. Whats the advantage of this change?
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2711:
------------------------------

    Attachment: HIVE-2711.D2115.1.patch

omalley requested code review of "HIVE-2711 [jira] Make the header of RCFile unique".
Reviewers: JIRA

  HIVE-2711

  Make the header of RCFile unique wrt SequenceFile

  The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.

  I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D2115

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
  ql/src/test/data/rc-file-v0.rc
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/4587/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247463#comment-13247463 ] 

Phabricator commented on HIVE-2711:
-----------------------------------

ashutoshc has accepted the revision "HIVE-2711 [jira] Make the header of RCFile unique".

  +1

REVISION DETAIL
  https://reviews.facebook.net/D2115

BRANCH
  h-2711

                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch, HIVE-2711.D2115.2.patch, HIVE-2711.D2115.3.patch, HIVE-2711.D2571.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243920#comment-13243920 ] 

Ashutosh Chauhan commented on HIVE-2711:
----------------------------------------

Patch fails to apply. Needs to be rebased.
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HIVE-2711) Make the header of RCFile unique

Posted by "Owen O'Malley (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HIVE-2711:
-----------------------------------

    Assignee: Owen O'Malley
    
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2711) Make the header of RCFile unique

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243918#comment-13243918 ] 

Ashutosh Chauhan commented on HIVE-2711:
----------------------------------------

I see. Yeah, very first commit of RCFile http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java?view=markup&pathrev=770548 started with SEQ6 so there could possibly be no data written in RCFile format with version SEQ5 or earlier. So, backward compatibility with SEQ6 suffices. So, +1 will commit if tests pass.
                
> Make the header of RCFile unique
> --------------------------------
>
>                 Key: HIVE-2711
>                 URL: https://issues.apache.org/jira/browse/HIVE-2711
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-2711.D2115.1.patch
>
>
> The RCFile implementation was copied from Hadoop's SequenceFile and copied the 'magic' string in the header. This means that you can't use the header to distinguish between RCFiles and SequenceFiles.
> I'd propose that we create a new header for RCFiles (RCF?) to replace the current SEQ. To maintain compatibility, we'll need to continue to accept the current 'SEQ\06' and just make new files contain the new header.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira