You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/09/10 01:45:44 UTC

[jira] Created: (HADOOP-4138) Hive: refactor the SerDe library

Hive: refactor the SerDe library
--------------------------------

                 Key: HADOOP-4138
                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
             Project: Hadoop Core
          Issue Type: Improvement
          Components: contrib/hive
            Reporter: Zheng Shao
            Assignee: Zheng Shao


Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
We want to do a refactoring of the library to:

1. Split Serializer and Deserializer interface
2. Split Serializer/Deserializer and ObjectInspector interface
3. Change hive/metaserver and hive/ql to use the new SerDe framework


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631180#action_12631180 ] 

Joydeep Sen Sarma commented on HADOOP-4138:
-------------------------------------------

i thought we pretty much ruled out objectinspectors being different for different rows in the same task - no?

and for the reflection stuff  there is real caching going on.  Would it be fair to say then that the factory code could be better refactored (removed even):
- reflectionoi does caching (the constructor could check an inbuilt static cache) and the thrift stuff would inherit (ie. no need for factory)
- for other standard* oi's - no caching is required - just straight constructors would be good enough.

the factory code is unnecessarily hard to understand IMHO and does not encapsulate things very well (considering that class specific logic must be put in the factory - which is a strange pattern indeed). A programmer looking to add serde's/objectinspector is likely to look at the factory class (which the normal expectations from a factory class) and be reasonbly plussed.

i am not saying this is a blocker - just that complexity can be reduced and code be made more developer friendly.

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631258#action_12631258 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

Actually I did implement caching for all object inspectors - not just the reflection ones but also the standard ones.

I had a second look at the code in the factory. All functions except one contains just 10 lines each - which will probably be the same amount as the equal and hashCode function if we choose to do that instead of doing caching (and people may point out the potential inefficiency in the recursive implementation of equals and hashCode which can be eliminated by caching all instances)

The only one long function (ObjectInspectorFactory.getReflectionObjectInspectorNoCache) is meant to allow ReflectionObjectInspectors to work with recursive types (e.g., linked list or trees). And for that we have to have caching.

For the developers, the semantics of these functions are also pretty clear from the name. For the implementation the only tricky point is the recursive thing that we won't be able to get rid of (unless we don't want to provide the support for recursive types).

So I am not sure whether we could simply the code much, without considering performance.


But I do agree that one thing can be improved. That is the organization of the code: the recursive logic can be moved into reflection oi, while keeping the common caching part in the factory and add a signature class so I can merge all caching code together.  I can work on that after these huge commits (since the code organization is all internal - does not change the APIs).


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631144#action_12631144 ] 

Joydeep Sen Sarma commented on HADOOP-4138:
-------------------------------------------

+1 code reviewed.

Just so i understand - why was the equals method taken out (of the StandardMapOI)? (I hadn't paid attention to this earlier - but don't you need equals/hashcode for the objectinspector containers?)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630159#action_12630159 ] 

Hadoop QA commented on HADOOP-4138:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389890/HADOOP-4138-1.txt
  against trunk revision 693705.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 36 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3242/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3242/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3242/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3242/console

This message is automatically generated.

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631100#action_12631100 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

Pete - Yes but I could not directly overwrite the existing serde stuff because the plan is to commit the serde change and the execution part change separately.
I've added the replacement of class names in the SerDeUtils so the code will be compatible with existing meta data.

After the execution part is committed and all references removed, we can remove the old serde package, and we can rename serde2 back to serde if needed. This route has more steps but has less risks.


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4138:
------------------------------------

    Release Note: Introduced new SerDe library for src/contrib/hive.  (was: New SerDe library for src/contrib/hive.)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Attachment: HADOOP-4138-1.txt

This is the implementation intended for check in. It has gone through an internal code review already.

The changes for hive/metastore and hive/ql are done and passed all tests, but are still in the process of internal code review. They will come as a separate jira issue.



> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Status: Patch Available  (was: Open)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Status: Patch Available  (was: Open)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Fix Version/s: 0.19.0
     Release Note: New SerDe library for src/contrib/hive.
           Status: Patch Available  (was: Open)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630838#action_12630838 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

> let's move the clear() to an else clause for the preceding if().
done

> setStructField in Reflection and ThriftStructObjectInspector - is this used anywhere? what's the motivation behind this
removed.

* please remove references to TypeInfo in ObjectInspector.java comments and explain differently.
changed to type.

> the MetadataListStructObjectInspector.getStructFieldData looks pretty high overhead to me. we have gone to so much trouble to avoid creating objectinspectors - but those are just one time per map/reduce instance. but the getField() type of calls are per row. creating a list from an array type per evaluation seems unnecessary - we should be able to get directly from the backing array. there are quite a few function calls as well (nested function calls and class equality checks and so on).

The purpose of MetadataListStructObjectInspector is mainly backward compatibility - it can be thrown away once DynamicSerDe is out, so I am not sure whether it's worth to optimize it a lot or should we focus more on DynamicSerDe.
And by the way, the code for dealing with Arrays in StandardStructObjectInspector never run for MetadataListStructObjectInspector  - because MetadataListStructObjectInspector is always inspectoring columnSet which has a member col which is an ArrayList. So we won't create a List per row.

> minor: getCategory() calls in Standard* can be marked final.
Done.

> One thing that i found somewhat complicated is the way the ObjectInspectorFactory() is written. It sounds like this would be the factory for most objectinspectors - but it's constructed to be only for reflection and reflection derived ones. in particular - the metadatatyped... class has it's own caching and factory like methods. You might want to think about structuring this more cleanly (instead of 'Type' - there could be a more generic concept of a signature and inspector family and a cache per type X family).

This can be done in a separate jira if we find it absolutely necessary. In most cases, authors of a new SerDe will create an instance of the reflection-based object inspector, or create an instance using all the standard object inspectors. What we have now is sufficient for that. I agree the current factory structure is not optimal for authors of new ObjectInspectors, but we won't have such a need till we start to write lazy-deserialized SerDes.

The impact of this jira (and the subsequent execution code refactoring) is probably much bigger than this small change. Let's try to get these big things in for 0.19 and do small fixes later?

I will submit a diff with the other changes.


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630430#action_12630430 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

1. that's already fixed:
+  public static Object deserialize(ColumnSet c, String row, String sep, String nullString) throws Exception {
+    if (c.col == null) {
+      c.col = new ArrayList<String>();
+    }
+    c.col.clear();
...

2. There are 2 approaches for storing information for serde:
A. assume ser and de are always implemented in the same class, as what we do now. The good thing about this is ser and de are put together so we won't forget to change one side when changing the other. This also allows us to use a single register function.
B. we pair ser and de up in the configuration of the table. This approach is used by hadoop input/output file format. The good thing is that we can pair them up arbitarily (like SequenceFileAsTextInputFormat). The bad thing is that it's easy to make mistakes by pairing them incorrectly.

Not sure whether there is any use to register just a serializer - that means the user can write to a table but not be able to read from it.

If we don't want to support registerSerializer, then the name of registerSerDe should be fine - it assumes the parameter is always a Deserializer, but in case it supports Serializer then we can write to that table.


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630449#action_12630449 ] 

Joydeep Sen Sarma commented on HADOOP-4138:
-------------------------------------------

1. let's move the clear() to an else clause for the preceding if().
2. sounds ok.

a few more comments:

- setStructField in Reflection and ThriftStructObjectInspector - is this used anywhere? what's the motivation behind this
- please remove references to TypeInfo in ObjectInspector.java comments and explain differently.

the MetadataListStructObjectInspector.getStructFieldData looks pretty high overhead to me. we have gone to so much trouble to avoid creating objectinspectors - but those are just one time per map/reduce instance. but the getField() type of calls are per row. creating a list from an array type per evaluation seems unnecessary - we should be able to get directly from the backing array. there are quite a few function calls as well (nested function calls and class equality checks and so on).

minor: getCategory() calls in Standard* can be marked final. 

One thing that i found somewhat complicated is the way the ObjectInspectorFactory() is written. It sounds like this would be the factory for most objectinspectors - but it's constructed to be only for reflection and reflection derived ones. in particular - the metadatatyped... class has it's own caching and factory like methods. You might want to think about structuring this more cleanly (instead of 'Type' - there could be a more generic concept of a signature and inspector family and a cache per type X family).


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4138:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Zheng!

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Attachment: HADOOP-4138-4.txt

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Attachment: HADOOP-4138-4.txt

submit again for hadoop QA.

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Attachment: HADOOP-4138-2.txt

Fixed a few minor bugs to make the library more stable (for erroneous formatted data).


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Status: Open  (was: Patch Available)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631150#action_12631150 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

The factory (cache) makes sure there is only one instance of an ObjectInspector for a particular configuration, so Object.equals() is good enough. That makes the comparison really fast (which will be useful if we want to compare the ObjectInspector of the last row with this row for each of the rows).


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630425#action_12630425 ] 

Joydeep Sen Sarma commented on HADOOP-4138:
-------------------------------------------

couple of things:

1. Metadatatypedcolumnsetserde does not really reuse (deserialize always allocates new arraylist)

2. wrt. registerSerDe(): i think this function needs to be renamed. ThriftDeserializer seems to be registered - but is not a SerDe. Also lookupDeserializer seems to assume Deserializer - but not sure this is case (can i just register just a serializer?)


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631984#action_12631984 ] 

Hadoop QA commented on HADOOP-4138:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12390228/HADOOP-4138-4.txt
  against trunk revision 696427.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 36 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3290/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3290/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3290/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3290/console

This message is automatically generated.

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630936#action_12630936 ] 

Pete Wyckoff commented on HADOOP-4138:
--------------------------------------

Zheng - the latest patch is still serde2 - I thought we were going with replacing existing serde??


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633306#action_12633306 ] 

Hudson commented on HADOOP-4138:
--------------------------------

Integrated in Hadoop-trunk #611 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/])

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631223#action_12631223 ] 

Joydeep Sen Sarma commented on HADOOP-4138:
-------------------------------------------

ha - i didn't say one needed caching and the other didn't. i was just observing that we have implemented caching for one and not for other (because i don't think different OI's in successive rows will be interesting for the longest time). i don't think it's clear to me why caching OI was needed at all - perhaps there is some performance improvement - but we don't know - and certainly there were much much bigger fish to fry on that front.

anyway - i would vote for simplicity if the performance goals/benefits are not clear.

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Summary: [Hive] refactor the SerDe library  (was: Hive: refactor the SerDe library)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Attachment: HADOOP-4138-4.txt

resubmit for hadoop QA

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630048#action_12630048 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

Here is the implementation of the new serde interface.

The main principles of the design are:
1. Efficiency: we allow lazy deserialization (or on-demand deserialization) to make it really efficient. One example use case is the column-based storage format which stores different columns in different files, or column-based compression inside sequence file, in which the same column from different rows are stored together and compressed.
2. Simplicity and Extensibility: we want to allow developers to write a new serde very easily.


> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631196#action_12631196 ] 

Zheng Shao commented on HADOOP-4138:
------------------------------------

I understand the idea of pushing the caching logic down to each class, but why does the reflectionoi need caching while standard oi does not?

Comparing with removing the factory (and the caching inside each class) and spread it to individual classes, I prefer your earlier idea of creating a signature class for objectinspector.
That will keep all caching logic in the same place.

I think the changes needed to transform the current code to using the signature class is relatively small - we just need to push all arguments into a hashmap. I will start doing that once this big change (including the execution code) is committed so I can unblock other people first.



> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Status: Open  (was: Patch Available)

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt, HADOOP-4138-4.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4138) [Hive] refactor the SerDe library

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-4138:
-------------------------------

    Attachment: HADOOP-4138-3.txt

> [Hive] refactor the SerDe library
> ---------------------------------
>
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt
>
>
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.