You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (Jira)" <ji...@apache.org> on 2019/10/25 20:35:00 UTC

[jira] [Updated] (ORC-541) Extend CHAR behavior to STRING

     [ https://issues.apache.org/jira/browse/ORC-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated ORC-541:
------------------------------
    Fix Version/s:     (was: 1.5.7)

> Extend CHAR behavior to STRING
> ------------------------------
>
>                 Key: ORC-541
>                 URL: https://issues.apache.org/jira/browse/ORC-541
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 1.5.6
>            Reporter: Jerry Adair
>            Priority: Minor
>
> This issue is a dual-purpose animal of sorts; I'd like to offer a suggestion and a contribution to satisfy that suggestion, as well as to ask a question.  The context is in regard to why the ORC types of CHAR and VARCHAR are processed differently from that of STRING.  I'm guessing that there was a reason, but not certain as to what that reason might be.
>  
> The specific area that I am addressing is in regard to the maxLength attribute of the TypeImpl class.  With CHAR and VARCHAR, a user can define this maxLength attribute but with STRING they cannot.  Granted, there is a "convenience method" if you will for only the CHAR class, thus:
>  ORC_UNIQUE_PTR<Type> createCharType(TypeKind kind,
>  uint64_t maxLength);
> In my lil' test program, I used this like so:
> container->addStructField( std::string( "char column" ), createCharType( orc::TypeKind::CHAR, 20 ) );
>  
> So at a minimum it would seem that there should be an equivalent for the VARCHAR type.  However I was able to "get crafty" and create one via the following:
> container->addStructField( std::string( "varchar column" ), std::unique_ptr<Type>(new TypeImpl(orc::TypeKind::VARCHAR, 20)));
>  
> And both of these would produce a type of either char(20) or varchar(20) and the getMaximumLength() method would return a value of 20 as well.
>  
> However, none of this works for the STRING type.  As with VARCHAR, there is no "convenience method" and a similar attempt to that of the varchar shown above, thus:
> container->addStructField( std::string( "string column" ), std::unique_ptr<Type>(new TypeImpl(orc::TypeKind::STRING, 20)));
> failed to produce the result I would have expected.  It was easy to see why the output type was just "string", that is readily seen in the toString() method.  However I was a bit surprised to see that getMaximumLength returned 0 when I used the second variant of the TypeImpl constructor, ergo the one that has the maxLength set via the second parm.
>  
> Unfortunately I didn't have time to dig into why that was happening, but I'd seen enough to warrant an issue report, albeit not of critical importance.
>  
> All that said, as a user of ORC, I'd like to see the STRING type handled in the same manner as the CHAR or VARCHAR type, with convenience methods for both, as there is for CHAR.  Or at least learn why there is only the one convenience method and why STRING is treated so differently.  We could use this functionality in our project (in which we use ORC), and this is the reason I am opening the issue ticket in the first place.
>  
> I'd be willing to contribute the fix, as it seems easy enough to do.  But I'll leave that up to Owen or other project folk to decide.
>  
> Thanks,
> Jerry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)