You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Pete Wyckoff (JIRA)" <ji...@apache.org> on 2008/10/30 19:26:44 UTC

[jira] Commented: (HADOOP-4550) Make DynamicSerDe capable of skipping fields that will not be used in the query

    [ https://issues.apache.org/jira/browse/HADOOP-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644065#action_12644065 ] 

Pete Wyckoff commented on HADOOP-4550:
--------------------------------------

I propose 

1. we add a 'skip' attribute to the field specification in the dynamicserde grammar. When this field attribute is set, DynamicSerDeFieldList will call protocol.skip for that field.  
2. We add an interface for protocols, something like: TFastSkippable { void skip(type); } or maybe need skipI32, skipi64, skipString, skipList, ... 
3. for TCTLSeparatedProtocol, we implement TFastSkippable
4. Modify the runtime to insert skip attributes in the runtime DDL passed to DynamicSerDe.

This will need to be prioritized with other optimizations, but for TCTLSeparatedProtocol this is certainly a performance issue and may block replacing TMetadataTypedColumnsetSerDe with DynamicSerDe since the latter is only strings and cost of not skipping is low.


> Make DynamicSerDe capable of skipping fields that will not be used in the query
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-4550
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4550
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Pete Wyckoff
>
> Thrift/DynamicSerDe always deseriualize and convert fields to the correct type for every field in the record. Many times, only a few of the fields will be used.
> e.g., select foo.user from foo where foo.created < 'today'
> where foo is something like
> struct {
>   string user
>    i64 created
>    string fullname
>    string description
>     i32 something
>     i32 somethingelse
>    ...
> }
> Parsing fullname, description, something and something else is a waste in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.