You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Hao Liu (JIRA)" <ji...@apache.org> on 2008/12/11 07:27:44 UTC

[jira] Created: (HIVE-163) support loading json data into hive

support loading json data into hive
-----------------------------------

                 Key: HIVE-163
                 URL: https://issues.apache.org/jira/browse/HIVE-163
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Serializers/Deserializers
            Reporter: Hao Liu


The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663207#action_12663207 ] 

Joydeep Sen Sarma commented on HIVE-163:
----------------------------------------

there are a couple of debug printfs in the last patch posted on this jira:

+          System.out.println("ZSHAO: " + System.getenv("CLASSPATH"));

+      System.out.println("ExecDriver CMD:PC " + cmdLine);

?

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-163) support loading json data into hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-163:
--------------------------------

    Fix Version/s: 0.3.0
                       (was: 0.6.0)

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.3.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.3.patch, HIVE-163.4.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-163) support loading json data into hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao resolved HIVE-163.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.2.0
     Release Note: HIVE-163. JSON udf function added. (Hao Liu via zshao)
     Hadoop Flags: [Reviewed]

Committed revision 733992. Thanks Hao!

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663212#action_12663212 ] 

Zheng Shao commented on HIVE-163:
---------------------------------

The committed code does not contain the extra printlns.
Thanks for checking this Joy.


> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.3.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-163) support loading json data into hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-163:
----------------------------

    Attachment: HIVE-163.3.patch

This is the right patch. HIVE-163.2.patch was an outdated one with additional printfs.


> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.3.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663187#action_12663187 ] 

Ashish Thusoo commented on HIVE-163:
------------------------------------

Hao just showed me that the license it there :). So +1 from my side as well.

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>         Attachments: HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-163) support loading json data into hive

Posted by "Hao Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hao Liu reassigned HIVE-163:
----------------------------

    Assignee: Hao Liu

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>         Attachments: HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-163) support loading json data into hive

Posted by "Hao Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hao Liu updated HIVE-163:
-------------------------

    Attachment: HIVE-163.4.patch

add patch to .classpath. also it looks like build-common.xml was not in HIVE-163.3.patch, so add to this one.

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.3.patch, HIVE-163.4.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663401#action_12663401 ] 

Prasad Chakka commented on HIVE-163:
------------------------------------

.classpath should be updated with new lib json.jar 

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.3.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663185#action_12663185 ] 

Ashish Thusoo commented on HIVE-163:
------------------------------------

We also have to include the license text for json.jar. Can you include that in the patch. You should probably remove the README for json.jar.

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>         Attachments: HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663573#action_12663573 ] 

Prasad Chakka commented on HIVE-163:
------------------------------------

Hao, the build-common.xml change is not needed since json.jar is not an auxiliary lib but any other 3rd party library like commons-lang.jar etc...

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.3.patch, HIVE-163.4.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-163) support loading json data into hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-163:
----------------------------

    Attachment: HIVE-163.2.patch

We had to unpack json.jar and put it into hive_exec.jar to work with hadoop 0.17.


> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>             Fix For: 0.2.0
>
>         Attachments: HIVE-163.2.patch, HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-163) support loading json data into hive

Posted by "Hao Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hao Liu updated HIVE-163:
-------------------------

    Attachment: HIVE-163.patch

add a patch to support json udf as suggested.

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>         Attachments: HIVE-163.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663136#action_12663136 ] 

Zheng Shao commented on HIVE-163:
---------------------------------

+1

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>            Assignee: Hao Liu
>         Attachments: HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-163) support loading json data into hive

Posted by "Hao Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hao Liu updated HIVE-163:
-------------------------

    Attachment: json.jar

json.jar from hadoop project. It should be included in ${hive.root}/lib

> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>         Attachments: HIVE-163.patch, json.jar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-163) support loading json data into hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655556#action_12655556 ] 

Zheng Shao commented on HIVE-163:
---------------------------------

I would suggest to first add a JSONField function that extracts an JSON field:

{code}
class UDFJsonField {
  /** Extracts a field of a JSON object.
    * @param jsonObject The main JSON Object in string format
    * @param field The field expression, e.g. "[1]a.b[3].c"
    * @return the JSON Object that represents the field in string format
    */
  String evaluate(String jsonObject, String field) {
     ...
  }
}
{code}

This function will support most of the requests.


> support loading json data into hive
> -----------------------------------
>
>                 Key: HIVE-163
>                 URL: https://issues.apache.org/jira/browse/HIVE-163
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Hao Liu
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The JSON format is commonly used for transmitting structured data over a network, especially for ajax web applications. People also choose json format to store log data.
> Support loading and query json format data will be a desirable features in Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.