You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Terry Blankers (JIRA)" <ji...@apache.org> on 2014/04/02 23:25:14 UTC
[jira] [Created] (MAHOUT-1505) structure of clusterdump's JSON
output
Terry Blankers created MAHOUT-1505:
--------------------------------------
Summary: structure of clusterdump's JSON output
Key: MAHOUT-1505
URL: https://issues.apache.org/jira/browse/MAHOUT-1505
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.9
Reporter: Terry Blankers
Hi all, I'm working on some automated analysis of the clusterdump output using '-of = JSON'. While digging into the structure of the representation of the data I've noticed something that seems a little odd to me.
In order to access the data for a particular cluster, the 'cluster', 'n', 'c' & 'r' values are all in one continuous string. For example:
{noformat}
{"cluster":"VL-10515{n=5924 c=[action:0.023, adherence:0.223, administration:0.011 r=[action:0.446, adherence:1.501, administration:0.306]}"}
{noformat}
This is also the case for the "point":
{noformat}
{"point":"013FFD34580BA31AECE5D75DE65478B3D691D138 = [body:6.904, harm:10.101]","vector_name":"013FFD34580BA31AECE5D75DE65478B3D691D138","weight":"1.0"}
{noformat}
This leads me to believe that the only way I can get to the individual data in these items is by string parsing. For JSON deserialization I would have expected to see something along the lines of:
{noformat}
{
"cluster":"VL-10515",
"n":5924,
"c":
[
{"action":0.023},
{"adherence":0.223},
{"administration":0.011}
],
"r":
[
{"action":0.446},
{"adherence":1.501},
{"administration":0.306}
]
}
{noformat}
and:
{noformat}
{
"point": {
"body": 6.904,
"harm": 10.101
},
"vector_name": "013FFD34580BA31AECE5D75DE65478B3D691D138",
"weight": 1.0
}
{noformat}
Andrew Musselman replied:
{quote}
Looks like a bug to me as well; I would have expected something similar to
what you were expecting except maybe something like this which puts the "c"
and "r" values in objects rather than arrays of single-element objects:
{noformat}
{
"cluster":"VL-10515",
"n":5924,
"c":
{
"action":0.023,
"adherence":0.223,
"administration":0.011
},
"r":
{
"action":0.446,
"adherence":1.501,
"administration":0.306
}
}
{noformat}
{quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)