You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by TJay Belt <tj...@yahoo.com> on 2015/03/17 23:15:08 UTC

Fw: help with getting the summary of a list of numbers from a json document




 
I have a json document that has several sub documents. I can query this doc and pull out numbers from a repetitive subdocument. The result is a list within brackets. I want to sum these numbers, and cannot figure out how.

Here is a sample document. 
{    "_id": "005f2c46-4a70-4fed-a8e9-a445011d2aba",    "Revision": 7,    "ActivityCountedCollection": [        {            "Lesson": "98d66ab9-1ef4-4b61-a05d-857b3e07e0f8",            "DateTime": "2015-03-02T14:11:05.5407801+00:00",            "ElapsedSeconds": 66.80226        },        {            "Lesson": "30698aac-5a3d-4464-935c-16de4ba9db70",
            "DateTime": "2015-03-02T14:18:29.7132608+00:00",
            "ElapsedSeconds": 435.24593        },        {            "Lesson": "30698aac-5a3d-4464-935c-16de4ba9db70",
            "DateTime": "2015-03-02T14:21:38.4381108+00:00",
            "ElapsedSeconds": 182.47075        }    ]}


I can use the following hive code from within HDInsight to pull out the values. 
DROP TABLE IF EXISTS HDStudents;CREATE EXTERNAL TABLE HDStudents(  json_response STRING)STORED AS TEXTFILE LOCATION 'wasb://blob@tjaystorage.blob.core.windows.net/HDStudentData';
DROP TABLE IF EXISTS HDStudent_Usage;CREATE TABLE HDStudent_Usage(  id STRING,  sec STRING); FROM HDStudentsINSERT OVERWRITE TABLE HDStudent_UsageSELECT    CAST(get_json_object(json_response, '$.id') as STRING),    get_json_object(json_response, '$.ActivityCountedCollection.ElapsedSeconds') as STRING;
and the result ends up being something like the following
    \N[66.80226,435.24593,182.47075]

I dont know whats happening to my ID, but that is not important now. But the numbers from the above example come out as a list, within brackets. Its not even formed as JSon anymore, so the previous tricks to pull them out of their key value pair do not apply. Since it is reading it as a string, with commas and numbers, i cannot perform aggregate functions on it.Online, i get suggestions to create some maven compiled code function that is referenced, and i get lost attempting this too. I thought about pushing these results into yet another table, and doing some processing on it. But do not know how to extract the items from the list, or array, or whatever it is in now. 

Help?