You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Steven Wong (JIRA)" <ji...@apache.org> on 2010/08/20 03:40:16 UTC

[jira] Created: (HIVE-1575) get_json_object does not support JSON array at the root level

get_json_object does not support JSON array at the root level
-------------------------------------------------------------

                 Key: HIVE-1575
                 URL: https://issues.apache.org/jira/browse/HIVE-1575
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: UDF
    Affects Versions: 0.7.0
            Reporter: Steven Wong


Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.

I have a table column of JSON arrays at the root level, but I can't parse it because of that.

get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi reassigned HIVE-1575:
--------------------------------

    Assignee: Mike Lewis

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>            Assignee: Mike Lewis
>         Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925059#action_12925059 ] 

Paul Yang commented on HIVE-1575:
---------------------------------

It's hard to say, but my guess is that a regex will be slower than those string operations. Same thing with the cache. What might be good to do is compare the performance before and after these changes. Do you have a dataset that you could use to test?

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>            Assignee: Mike Lewis
>         Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924077#action_12924077 ] 

Paul Yang commented on HIVE-1575:
---------------------------------

Cool, could you modify your patch so that it applies cleanly with 'patch -p0 -i <filename>'? Right now, it requires -p1. And could you add a test case to udf_json.q? Though, that file is empty at the moment - seems like the test cases were removed accidentally.

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>            Assignee: Mike Lewis
>         Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Mike Lewis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Lewis updated HIVE-1575:
-----------------------------

    Attachment: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch

Sorry if this is a repeat post, wasn't sure how to attach the patch.

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>         Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Mike Lewis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Lewis updated HIVE-1575:
-----------------------------

    Status: Patch Available  (was: Open)

Here's a quick patch I made to detect whether the root is an object or an array.  if it's an array, it will create a new JSONArray

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Mike Lewis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924085#action_12924085 ] 

Mike Lewis commented on HIVE-1575:
----------------------------------

Yep, I will take care of it once I get a few free cycles.  One concern I have in my patch is this part:

{code:java} 
char firstChar = jsonString.trim().charAt(0);
if (firstChar == '{') {
	// object logic ...
} else if (firstChar == '[') {
	// array logic
}
{code}

Basically, I want to get the first non-whitespace character.  Is there a faster way to do this rather than just trimming (and essentially creating a new string).  Would a regexp be more efficient?

Also, changing the extractObjectCache to have an array as a key might reduce the efficiency a bit.  Is there an alternative that I may have not thought of?


> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>            Assignee: Mike Lewis
>         Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Mike Lewis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923644#action_12923644 ] 

Mike Lewis commented on HIVE-1575:
----------------------------------

Apologies.  The patch I submitted was broken.  I think I may have messed up the state of this issue :(

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Mike Lewis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Lewis updated HIVE-1575:
-----------------------------

    Attachment: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch

This patch actually works, albeit a bit sloppy.

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>         Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Paul Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Yang updated HIVE-1575:
----------------------------

    Status: Open  (was: Patch Available)

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>            Assignee: Mike Lewis
>         Attachments: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

Posted by "Mike Lewis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Lewis updated HIVE-1575:
-----------------------------

    Attachment:     (was: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch)

> get_json_object does not support JSON array at the root level
> -------------------------------------------------------------
>
>                 Key: HIVE-1575
>                 URL: https://issues.apache.org/jira/browse/HIVE-1575
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 0.7.0
>            Reporter: Steven Wong
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it because of that.
> get_json_object should accept any JSON value (string, number, object, array, true, false, null), not just object, at the root level. In other words, it should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.