You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alex Kozlov (JIRA)" <ji...@apache.org> on 2010/05/25 21:28:23 UTC

[jira] Created: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

LazySimpleSerDe should be able to read classes that support some form of toString()
-----------------------------------------------------------------------------------

                 Key: HIVE-1369
                 URL: https://issues.apache.org/jira/browse/HIVE-1369
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Alex Kozlov
            Priority: Minor


Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.

Ideas or concerns?

Alex K


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902585#action_12902585 ] 

Namit Jain commented on HIVE-1369:
----------------------------------

@Carl, can you regenerate the patch ?

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875909#action_12875909 ] 

Edward Capriolo commented on HIVE-1369:
---------------------------------------

This seems interesting. In some cases toString() output can change between java versions for some objects. Do we need to compensate for that.

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>         Attachments: HIVE-1369.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1369:
-----------------------------

    Status: Open  (was: Patch Available)

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1369:
---------------------------------

    Affects Version/s: 0.5.0
          Component/s: Serializers/Deserializers

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach reassigned HIVE-1369:
------------------------------------

    Assignee: Alex Kozlov

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>         Attachments: HIVE-1369.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Alex Kozlov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Kozlov updated HIVE-1369:
------------------------------

    Attachment: HIVE-1369.patch

A patch that allows to deserialize an arbitrary object implementing toString(byte[] separators) method.

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Alex Kozlov
>            Priority: Minor
>         Attachments: HIVE-1369.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872997#action_12872997 ] 

Ashish Thusoo commented on HIVE-1369:
-------------------------------------

I I do not see any drawbacks here. I think another requirement from this was that the serialization be such that it is order preserving whereever there is a notion of order, as this serde could also be used to serialize between map/reduce boundaries. So if the implementation takes care of that and does not introduce oerhead I think this would be good.

Others, what do you think about this?

Ashish

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Alex Kozlov
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1369:
---------------------------------

    Fix Version/s: 0.7.0
                       (was: 0.6.0)

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902589#action_12902589 ] 

Carl Steinbach commented on HIVE-1369:
--------------------------------------

@Namit: Will do.

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Alex Kozlov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Kozlov updated HIVE-1369:
------------------------------

    Status: Patch Available  (was: Open)

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Alex Kozlov
>            Priority: Minor
>         Attachments: HIVE-1369.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879978#action_12879978 ] 

HBase Review Board commented on HIVE-1369:
------------------------------------------

Message from: "Carl Steinbach" <ca...@cloudera.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/202/
-----------------------------------------------------------

Review request for Hive Developers.


Summary
-------

Review for https://issues.apache.org/jira/secure/attachment/12447394/HIVE-1369.svn.patch


This addresses bug HIVE-1369.
    http://issues.apache.org/jira/browse/HIVE-1369


Diffs
-----

  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 211c733 
  serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleSerDe.java 6db9bc8 

Diff: http://review.hbase.org/r/202/diff


Testing
-------


Thanks,

Carl




> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Alex Kozlov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876509#action_12876509 ] 

Alex Kozlov commented on HIVE-1369:
-----------------------------------

Here is a more complete description how to use the new functionality.

Let's say you have a Writable object in a Sequence file.  Let's say it is an implementation of Session class which contains an array of events and each Event object associated with type, timestamp, and a Map<String,String>.

You can define the following table in Hive:

CREATE EXTERNAL TABLE session (
  uid STRING,
  events ARRAY < STRUCT < type : INT, ts : BIGINT, map : MAP < STRING, STRING > > >
)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.LazySimpleSerDe'
 STORED AS SEQUENCEFILE
LOCATION 'location_of_your_sequence_file_with_your_writable_as_value'
;

Instead of implementing a fully functional SerDe for this class (even though it's probably a good exercise in the long run), with HIVE-1369 one can just write toString(byte[]) method for the above Writable:

public String toString(byte[] sep) {
  StringBuffer sb = new StringBuffer();
  sb.append(getUId());
  sb.append((char)sep[0]);
  boolean firstEvent = true;
  for (Event event : getEvents()) {
    if (firstEvent) {
      firstEvent = false;
    } else {
      sb.append((char)sep[1]);
    }
    sb.append(getType());
    sb.append((char) sep[2]);
    sb.append(getTimestamp());
    sb.append((char) sep[2]);
    Map<String,String> map = event.getMap();
    boolean firstKey = true;
    if (map != null && !map.isEmpty()) {
       for(Key k : map.getKeys()) {
         if (firstKey) {
            firstKey = false;
         } else {
            sb.append((char) sep[3]);
         }
         sb.append(key);
         sb.append((char) sep[4]);
         sb.append(map.get(key));
      }
    } else {
      sb.append("\\N");
    }
  }
}

This will obviously be less efficient than implementing a full SerDe, but much more flexible and faster.

The default Java implementation is toString() with no parameters, so there is no conflict here.  I was thinking about adding some other parameters like null string or escape char, but decided to keep it simple.  There is an option to use JSON serialization as well (probably slower).

Alex K


> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>         Attachments: HIVE-1369.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1369:
---------------------------------

    Attachment: HIVE-1369.svn.patch

Looks like the original patch was generated with "git diff" without the --no-prefix switch. This causes patch to barf. HIVE-1369.svn.patch is an updated copy that applies cleanly with patch.

> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1369.patch, HIVE-1369.svn.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.  It should be pretty easy to extend the class to read any object that implements toString() method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.