You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yunfeng Zhou (Jira)" <ji...@apache.org> on 2022/11/23 12:18:00 UTC

[jira] [Commented] (FLINK-30122) Flink ML KMeans getting model data throws TypeError

    [ https://issues.apache.org/jira/browse/FLINK-30122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637718#comment-17637718 ] 

Yunfeng Zhou commented on FLINK-30122:
--------------------------------------

It has been discovered that PyFlink has a bug dealing with object arrays.

https://issues.apache.org/jira/browse/FLINK-30168

Flink ML has walked around this problem with a temporary solution as shown in this PR.

https://github.com/apache/flink-ml/pull/181

> Flink ML KMeans getting model data throws TypeError
> ---------------------------------------------------
>
>                 Key: FLINK-30122
>                 URL: https://issues.apache.org/jira/browse/FLINK-30122
>             Project: Flink
>          Issue Type: Bug
>          Components: Library / Machine Learning
>    Affects Versions: ml-2.1.0
>            Reporter: Yunfeng Zhou
>            Priority: Major
>
> When the following test case is added to flink-ml-python/pyflink/ml/lib/clustering/tests/test_kmeans.py,
> {code:java}
> def test_get_model_data(self):
> kmeans = KMeans().set_max_iter(2).set_k(2)
> model = kmeans.fit(self.data_table)
> model_data = model.get_model_data()[0]
> expected_field_names = ['centroids', 'weights']
> self.assertEqual(expected_field_names, model_data.get_schema().get_field_names())self.t_env.to_data_stream(model_data).execute_and_collect().next(){code}
> The following exception would be thrown.
> {code:java}
> data = 0, field_type = DenseVectorTypeInfo
> def pickled_bytes_to_python_converter(data, field_type):
> if isinstance(field_type, RowTypeInfo):
> row_kind = RowKind(int.from_bytes(data[0], 'little'))
> data = zip(list(data[1:]), field_type.get_field_types())
> fields = []
> for d, d_type in data:
> fields.append(pickled_bytes_to_python_converter(d, d_type))
> row = Row.of_kind(row_kind, *fields)
> return row
> else:
> > data = pickle.loads(data)
> E TypeError: a bytes-like object is required, not 'int'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)