You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Terry Blankers (JIRA)" <ji...@apache.org> on 2014/04/02 23:25:14 UTC

[jira] [Created] (MAHOUT-1505) structure of clusterdump's JSON output

Terry Blankers created MAHOUT-1505:
--------------------------------------

             Summary: structure of clusterdump's JSON output
                 Key: MAHOUT-1505
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1505
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.9
            Reporter: Terry Blankers


Hi all, I'm working on some automated analysis of the clusterdump output using '-of = JSON'. While digging into the structure of the representation of the data I've noticed something that seems a little odd to me.

In order to access the data for a particular cluster, the 'cluster', 'n', 'c' & 'r' values are all in one continuous string. For example:

{noformat}
{"cluster":"VL-10515{n=5924 c=[action:0.023, adherence:0.223, administration:0.011 r=[action:0.446, adherence:1.501, administration:0.306]}"}
{noformat}

This is also the case for the "point":

{noformat}
{"point":"013FFD34580BA31AECE5D75DE65478B3D691D138 = [body:6.904, harm:10.101]","vector_name":"013FFD34580BA31AECE5D75DE65478B3D691D138","weight":"1.0"}
{noformat}

This leads me to believe that the only way I can get to the individual data in these items is by string parsing. For JSON deserialization I would have expected to see something along the lines of:

{noformat}
{
    "cluster":"VL-10515",
    "n":5924,
    "c":
    [
        {"action":0.023},
        {"adherence":0.223},
        {"administration":0.011}
    ],
    "r":
    [
        {"action":0.446},
        {"adherence":1.501},
        {"administration":0.306}
    ]
}
{noformat}

and:

{noformat}
{
    "point": {
        "body": 6.904,
        "harm": 10.101
    },
    "vector_name": "013FFD34580BA31AECE5D75DE65478B3D691D138",
    "weight": 1.0
} 
{noformat}


Andrew Musselman replied:

{quote}
Looks like a bug to me as well; I would have expected something similar to
what you were expecting except maybe something like this which puts the "c"
and "r" values in objects rather than arrays of single-element objects:

{noformat}
{
    "cluster":"VL-10515",
    "n":5924,
    "c":
    {
        "action":0.023,
        "adherence":0.223,
        "administration":0.011
    },
    "r":
    {
       "action":0.446,
       "adherence":1.501,
       "administration":0.306
    }
}
{noformat}
{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)