You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Steven Schlansker (JIRA)" <ji...@apache.org> on 2015/10/20 20:06:27 UTC
[jira] [Created] (MESOS-3771) Mesos JSON API creates invalid JSON
due to lack of binary data / non-ASCII handling
Steven Schlansker created MESOS-3771:
----------------------------------------
Summary: Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling
Key: MESOS-3771
URL: https://issues.apache.org/jira/browse/MESOS-3771
Project: Mesos
Issue Type: Bug
Components: HTTP API
Affects Versions: 0.24.1
Reporter: Steven Schlansker
Priority: Critical
Spark encodes some binary data into the ExecutorInfo.data field. This field is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
If you have such a field, it seems that it is splatted out into JSON without any regards to proper character encoding:
{quote}
0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.|
0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac|
0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".|
0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u0000\u0005ur\|
0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u0000\u000f[Lsca|
0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00|
{quote}
I suspect this is because the HTTP api emits the executorInfo.data directly:
{code}
JSON::Object model(const ExecutorInfo& executorInfo)
{
JSON::Object object;
object.values["executor_id"] = executorInfo.executor_id().value();
object.values["name"] = executorInfo.name();
object.values["data"] = executorInfo.data();
object.values["framework_id"] = executorInfo.framework_id().value();
object.values["command"] = model(executorInfo.command());
object.values["resources"] = model(executorInfo.resources());
return object;
}
{code}
I think this may be because the custom JSON processing library in stout seems to not have any idea of what a byte array is. I'm guessing that some implicit conversion makes it get written as a String instead, but:
{code}
inline std::ostream& operator<<(std::ostream& out, const String& string)
{
// TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
// See RFC4627 for the JSON string specificiation.
return out << picojson::value(string.value).serialize();
}
{code}
Thank you for any assistance here. Our cluster is currently entirely down -- the frameworks cannot handle parsing the invalid JSON produced (it is not even valid utf-8)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)