You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by DocDVZ <do...@gmail.com> on 2018/08/20 14:53:10 UTC
NPE exception in KMeansTrainer
Hello,
Since I'm new to data science, I'm not really sure if it's a bug or wrong
incoming data, so I decided to ask here for advice before submitting a
ticket. I tried to apply Kmeans algorithm on my bag-of-words data with ~8k
features. So I copy-pasted some lines from example:
IgniteCache<String, double[]> dataCache = ignite.cache(storageName);
KMeansTrainer trainer = new KMeansTrainer().withSeed(1234L);
KMeansModel mdl = trainer.fit(
ignite,
dataCache,
(k, v) -> Arrays.copyOfRange(v, 1, v.length),
(k, v) -> v[0]
);
But this leads to a NullPointerException in KMeansTrainer.class:
Caused by: java.lang.NullPointerException
at
org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.lambda$initClusterCentersRandomly$4dba08e1$1(KMeansTrainer.java:190)
at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.computeForAllPartitions(CacheBasedDataset.java:158)
at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.compute(CacheBasedDataset.java:122)
at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:102)
at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:156)
at
org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:186)
at
org.apache.ignite.ml.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86)
at line:
List<LabeledVector> rndPnts = dataset.compute(data -> {
List<LabeledVector> rndPnt = new ArrayList<>();
rndPnt.add(data.getRow(new
Random(seed).nextInt(data.rowSize())));
return rndPnt;
}, (a, b) -> a == null ? b : Stream.concat(a.stream(),
b.stream()).collect(Collectors.toList()));
The reducer receives null value for b and since there's no check for null,
b.stream() leads to NPE. Ignite version is 2.6. This seems like a bug for
me, is there any ways to workaround this issue?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: NPE exception in KMeansTrainer
Posted by Alexey Zinoviev <za...@gmail.com>.
Thank you, I think I've found this bug (or related to this) here
https://issues.apache.org/jira/browse/IGNITE-9239
It will be delivered in 2.7 (Currently it's in master branch).
To be sure 100% that the bug is closed, could @DocDVZ provide an approach
of cache populating?
I mean this cache
IgniteCache<String, double[]> dataCache = ignite.cache(storageName);
Thank you.
вт, 21 авг. 2018 г. в 10:11, Denis Magda <dm...@apache.org>:
> Hey, ML experts,
>
> Here is an ML issue reported. Please have a look.
>
> --
> Denis
>
> ---------- Forwarded message ---------
> From: DocDVZ <do...@gmail.com>
> Date: Mon, Aug 20, 2018 at 10:53 AM
> Subject: NPE exception in KMeansTrainer
> To: <us...@ignite.apache.org>
>
>
> Hello,
>
> Since I'm new to data science, I'm not really sure if it's a bug or wrong
> incoming data, so I decided to ask here for advice before submitting a
> ticket. I tried to apply Kmeans algorithm on my bag-of-words data with ~8k
> features. So I copy-pasted some lines from example:
>
> IgniteCache<String, double[]> dataCache =
> ignite.cache(storageName);
> KMeansTrainer trainer = new KMeansTrainer().withSeed(1234L);
> KMeansModel mdl = trainer.fit(
> ignite,
> dataCache,
> (k, v) -> Arrays.copyOfRange(v, 1, v.length),
> (k, v) -> v[0]
> );
>
> But this leads to a NullPointerException in KMeansTrainer.class:
>
> Caused by: java.lang.NullPointerException
> at
> org.apache.ignite.ml
>
> .clustering.kmeans.KMeansTrainer.lambda$initClusterCentersRandomly$4dba08e1$1(KMeansTrainer.java:190)
> at
>
> org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.computeForAllPartitions(CacheBasedDataset.java:158)
> at
>
> org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.compute(CacheBasedDataset.java:122)
> at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:102)
> at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:156)
> at
> org.apache.ignite.ml
>
> .clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:186)
> at
> org.apache.ignite.ml
> .clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86)
>
>
> at line:
>
> List<LabeledVector> rndPnts = dataset.compute(data -> {
> List<LabeledVector> rndPnt = new ArrayList<>();
> rndPnt.add(data.getRow(new
> Random(seed).nextInt(data.rowSize())));
> return rndPnt;
> }, (a, b) -> a == null ? b : Stream.concat(a.stream(),
> b.stream()).collect(Collectors.toList()));
>
> The reducer receives null value for b and since there's no check for null,
> b.stream() leads to NPE. Ignite version is 2.6. This seems like a bug for
> me, is there any ways to workaround this issue?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
Fwd: NPE exception in KMeansTrainer
Posted by Denis Magda <dm...@apache.org>.
Hey, ML experts,
Here is an ML issue reported. Please have a look.
--
Denis
---------- Forwarded message ---------
From: DocDVZ <do...@gmail.com>
Date: Mon, Aug 20, 2018 at 10:53 AM
Subject: NPE exception in KMeansTrainer
To: <us...@ignite.apache.org>
Hello,
Since I'm new to data science, I'm not really sure if it's a bug or wrong
incoming data, so I decided to ask here for advice before submitting a
ticket. I tried to apply Kmeans algorithm on my bag-of-words data with ~8k
features. So I copy-pasted some lines from example:
IgniteCache<String, double[]> dataCache = ignite.cache(storageName);
KMeansTrainer trainer = new KMeansTrainer().withSeed(1234L);
KMeansModel mdl = trainer.fit(
ignite,
dataCache,
(k, v) -> Arrays.copyOfRange(v, 1, v.length),
(k, v) -> v[0]
);
But this leads to a NullPointerException in KMeansTrainer.class:
Caused by: java.lang.NullPointerException
at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.lambda$initClusterCentersRandomly$4dba08e1$1(KMeansTrainer.java:190)
at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.computeForAllPartitions(CacheBasedDataset.java:158)
at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.compute(CacheBasedDataset.java:122)
at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:102)
at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:156)
at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:186)
at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86)
at line:
List<LabeledVector> rndPnts = dataset.compute(data -> {
List<LabeledVector> rndPnt = new ArrayList<>();
rndPnt.add(data.getRow(new
Random(seed).nextInt(data.rowSize())));
return rndPnt;
}, (a, b) -> a == null ? b : Stream.concat(a.stream(),
b.stream()).collect(Collectors.toList()));
The reducer receives null value for b and since there's no check for null,
b.stream() leads to NPE. Ignite version is 2.6. This seems like a bug for
me, is there any ways to workaround this issue?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Fwd: NPE exception in KMeansTrainer
Posted by Denis Magda <dm...@apache.org>.
Hey, ML experts,
Here is an ML issue reported. Please have a look.
--
Denis
---------- Forwarded message ---------
From: DocDVZ <do...@gmail.com>
Date: Mon, Aug 20, 2018 at 10:53 AM
Subject: NPE exception in KMeansTrainer
To: <us...@ignite.apache.org>
Hello,
Since I'm new to data science, I'm not really sure if it's a bug or wrong
incoming data, so I decided to ask here for advice before submitting a
ticket. I tried to apply Kmeans algorithm on my bag-of-words data with ~8k
features. So I copy-pasted some lines from example:
IgniteCache<String, double[]> dataCache = ignite.cache(storageName);
KMeansTrainer trainer = new KMeansTrainer().withSeed(1234L);
KMeansModel mdl = trainer.fit(
ignite,
dataCache,
(k, v) -> Arrays.copyOfRange(v, 1, v.length),
(k, v) -> v[0]
);
But this leads to a NullPointerException in KMeansTrainer.class:
Caused by: java.lang.NullPointerException
at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.lambda$initClusterCentersRandomly$4dba08e1$1(KMeansTrainer.java:190)
at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.computeForAllPartitions(CacheBasedDataset.java:158)
at
org.apache.ignite.ml.dataset.impl.cache.CacheBasedDataset.compute(CacheBasedDataset.java:122)
at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:102)
at org.apache.ignite.ml.dataset.Dataset.compute(Dataset.java:156)
at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.initClusterCentersRandomly(KMeansTrainer.java:186)
at
org.apache.ignite.ml
.clustering.kmeans.KMeansTrainer.fit(KMeansTrainer.java:86)
at line:
List<LabeledVector> rndPnts = dataset.compute(data -> {
List<LabeledVector> rndPnt = new ArrayList<>();
rndPnt.add(data.getRow(new
Random(seed).nextInt(data.rowSize())));
return rndPnt;
}, (a, b) -> a == null ? b : Stream.concat(a.stream(),
b.stream()).collect(Collectors.toList()));
The reducer receives null value for b and since there's no check for null,
b.stream() leads to NPE. Ignite version is 2.6. This seems like a bug for
me, is there any ways to workaround this issue?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: NPE exception in KMeansTrainer
Posted by zaleslaw <za...@gmail.com>.
Hi, try to play with current KMeans from master
I hope this bug was detected correctly and fixed in
https://issues.apache.org/jira/browse/IGNITE-9393
Could you post any results of your experiments here?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: NPE exception in KMeansTrainer
Posted by DocDVZ <do...@gmail.com>.
Also, is there any ways to workaround this issue in 2.6 to make it works?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: NPE exception in KMeansTrainer
Posted by DocDVZ <do...@gmail.com>.
@zaleslaw
I have a Map<String, Map<String, Long>> where outer map is userId ->
hitsMap and inner hitsMap is domain -> hits. So I need to do domain vectors
for each of the user. Cache populated through Cache.put operation.
private Map<String, Map<String, Long>> hits;
<...>
storageName = "profile-clustering-" + LocalDateTime.now();
IgniteCache<String, double[]> cache =
ignite.getOrCreateCache(storageName);
<...>
AtomicInteger ai = new AtomicInteger(0);
hits.forEach((profileId, hit) -> {
List<Long> features = new ArrayList<>();
// surrogate label for now
features.add(Long.valueOf(ai.incrementAndGet()));
ai.compareAndSet(5, 0);
sortedDomains.forEach(d -> features.add(hit.getOrDefault(d,
0L)));
double[] doubles = features.stream().mapToDouble(p ->
p).toArray();
if (profileId != null && !profileId.isEmpty()){
cache.put(profileId, doubles);
}
});
No specific affinity function defined, total dataset contains ~3.1k users &
~8k domains.
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: NPE exception in KMeansTrainer
Posted by zaleslaw <za...@gmail.com>.
Thank you, I think I've found this bug (or related to this) here
https://issues.apache.org/jira/browse/IGNITE-9239
It will be delivered in 2.7 (Currently it's in master branch).
To be sure 100% that the bug is closed, could @DocDVZ provide an approach of
cache populating?
I mean this cache
IgniteCache<String, double[]> dataCache = ignite.cache(storageName);
Thank you.
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/