You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/03/24 21:59:50 UTC

[jira] Issue Comment Edited: (HIVE-337) LazySimpleSerDe should support array and map types

    [ https://issues.apache.org/jira/browse/HIVE-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688599#action_12688599 ] 

Zheng Shao edited comment on HIVE-337 at 3/24/09 1:58 PM:
----------------------------------------------------------

Done with all the comments except 6.

I also renamed the setAll() function to init() to make it clearer.

Because we now pass TypeInfo around in LazyObject hierarchy, we don't even need to create the LazyObject for an array element if that element is never accessed (we can create it on demand when it's accessed).

The current code works fine without the change of 6.  The change of 6 requires either 12 bytes more storage per primitive object (by adding the byte[], int, int to the LazyPrimitive), or more complicated logic in removing the int start and int length from LazyNonPrimitive (we will have to parse the data right in init(..) but we don't have access to the separators because it's in the next-level ObjectInspectors - unless we add the pointers from LazyObject to ObjectInspector, but that's another overhead and complicates the data structure).

After all, the implementation of init() is private to the class and I don't think there is a strong need to make the implementation the same across LazyPrimitive and LazyNonPrimitive. The fact that the parsing of LazyPrimitive does not require delimiters and LazyNonPrimitive requires is good enough for them to have different implementations.


Future improvements include:
1. Support escaping: HIVE-136;
2. Columnar storage: HIVE-352;
3. Use Writable/Text for values: HIVE-266;
4. Short-circuit serialization: HIVE-358;
5. Short-circuit expression evaluation: HIVE-359.
6. Common expression evaluation: HIVE-364


      was (Author: zshao):
    Done with all the comments except 6.

I also renamed the setAll() function to init() to make it clearer.

Because we now pass TypeInfo around in LazyObject hierarchy, we don't even need to create the LazyObject for an array element if that element is never accessed (we can create it on demand when it's accessed).

The current code works fine without the change of 6.  The change of 6 requires either 12 bytes more storage per primitive object (by adding the byte[], int, int to the LazyPrimitive), or more complicated logic in removing the int start and int length from LazyNonPrimitive (we will have to parse the data right in init(..) but we don't have access to the separators because it's in the next-level ObjectInspectors - unless we add the pointers from LazyObject to ObjectInspector, but that's another overhead and complicates the data structure).

After all, the implementation of init() is private to the class and I don't think there is a strong need to make the implementation the same across LazyPrimitive and LazyNonPrimitive. The fact that the parsing of LazyPrimitive does not require delimiters and LazyNonPrimitive requires is good enough for them to have different implementations.


Future improvements include:
1. Support escaping: HIVE-136;
2. Columnar storage: HIVE-352;
3. Use Writable/Text for values: HIVE-266;
4. Short-circuit serialization: HIVE-358;
5. Short-circuit expression evaluation: HIVE-359.

  
> LazySimpleSerDe should support array and map types
> --------------------------------------------------
>
>                 Key: HIVE-337
>                 URL: https://issues.apache.org/jira/browse/HIVE-337
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Blocker
>         Attachments: HIVE-337.1.patch, HIVE-337.2.patch, HIVE-337.5.patch
>
>
> Once we do that, we can completely deprecate DynamicSerDe/TCTLSeparatedProtocol, and close any bugs that DynamicSerDe/TCTLSeparatedProtocol has.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.