You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by otorreno <os...@shapelets.io> on 2019/01/02 14:24:05 UTC

Ignite ML withKeepBinary cache

Hi everyone, 

I posted the following message on the Ignite Users list, but 
stephendarlington
<http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=1881>  
suggested that it would be better to post it in the dev list.

Original message:

After the new release (2.7.0), I have been playing around with the machine 
learning algorithms a bit. 
We have some data in a cache created with the "withKeepBinary()" option, and 
I wanted 
to test if the machine learning algos would work with such a cache. I tried, 
but it fails with the following stacktrace: 

org.apache.ignite.IgniteException: testType 
    at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1858) 
    at 
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568) 
    at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6816) 
    at 
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562) 
    at 
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491) 
    at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
    at java.lang.Thread.run(Thread.java:748) 
Caused by: org.apache.ignite.binary.BinaryInvalidTypeException: testType 
    at 
org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:707) 
    at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1757) 
    at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716) 
    at 
org.apache.ignite.internal.binary.BinaryObjectImpl.deserializeValue(BinaryObjectImpl.java:798) 
    at 
org.apache.ignite.internal.binary.BinaryObjectImpl.value(BinaryObjectImpl.java:143) 
    at 
org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinary(CacheObjectUtils.java:177) 
    at 
org.apache.ignite.internal.processors.cache.CacheObjectUtils.unwrapBinaryIfNeeded(CacheObjectUtils.java:39) 
    at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.advance(GridCacheQueryManager.java:3063) 
    at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager$ScanQueryIterator.onHasNext(GridCacheQueryManager.java:2965) 
    at 
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53) 
    at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45) 
    at 
org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.computeCount(ComputeUtils.java:313) 
    at 
org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.computeCount(ComputeUtils.java:300) 
    at 
org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.lambda$initContext$9b68d858$1(ComputeUtils.java:222) 
    at 
org.apache.ignite.ml.dataset.impl.cache.util.ComputeUtils.lambda$affinityCallWithRetries$b46c4136$1(ComputeUtils.java:90) 
    at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1855) 
    ... 8 common frames omitted 
Caused by: java.lang.ClassNotFoundException: testType 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
    at java.lang.Class.forName0(Native Method) 
    at java.lang.Class.forName(Class.java:348) 
    at 
org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8771) 
    at 
org.apache.ignite.internal.MarshallerContextImpl.getClass(MarshallerContextImpl.java:349) 
    at 
org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:698) 
    ... 23 common frames omitted 

Debugging, I found the source of the error, at some point you are just 
taking the 
name of the upstreamCache (where the data resides), and creating a new 
IgniteCache 
object using such name before copying the data to a dataset cache. However, 
you 
are not using the keepBinary property of the original cache. I hardcoded the 
"withKeepBinary()" to the following lines: 
https://github.com/apache/ignite/blob/2.7.0/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/util/ComputeUtils.java#L162
https://github.com/apache/ignite/blob/2.7.0/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/util/ComputeUtils.java#L215
https://github.com/apache/ignite/blob/2.7.0/modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/CacheBasedDatasetBuilder.java#L99

The previous made it work. I tried to retrieve the keep binary property from 
the 
upstreamCache, but I was not able to find the right method to obtain it (I 
saw the property is 
stored in the operation context field (opCtx), but it is private and cannot 
be 
accessed from the lines I modified) 

My example code is available at: 
https://gist.github.com/otorreno/ca6c5347c1bbde2d4fedd02b51d02cbb

Any plans on making the machine learning algorithms work with caches with 
keepBinary set to true? 



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Ignite ML withKeepBinary cache

Posted by Yuriy Babak <y....@gmail.com>.
Hi all,

Ticket 10700 [1] is resolved, this ticked added support of training models
over a cache with binary objects(cache with enabled keepBinary flag) for
more details please take a look the mentioned ticked or added example [2].

[1] - https://issues.apache.org/jira/browse/IGNITE-10700
[2] - org.apache.ignite.examples.ml.TrainingWithBinaryObjectExample

Sincerely,
Best regards,
Yuriy Babak


чт, 10 янв. 2019 г. в 14:07, Alexey Zinoviev <za...@gmail.com>:

> Thanks a lot for the example. Will write later about keepBinary support in
> this thread.
>
> чт, 10 янв. 2019 г. в 13:28, otorreno <os...@shapelets.io>:
>
> > Alexey, thanks for your support.
> >
> > Answer to your questions:
> > 1) At the moment the types are: String, Long and Double. But this could
> > actually change in the future to any other user-defined types/classes (We
> > know we would need to provide data encoders for such types)
> > 2) Yes, all data series have the same schema (same number of columns and
> > same types)
> > 3)  all_sites.csv
> > <
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/file/t659/all_sites.csv
> >
> >
> > contains an example of what data we are trying to work with. Each of the
> > rows of such file contains the metadata of a given data series, as I
> > described in my first post of this thread.
> >
> > Remember that we have such table stored in a cache which uses the
> > withKeepBinary method. And the problem I faced was not being able to use
> > such cache as input to the ML algos (a copy of such cache to a cache
> > without
> > the keepBinary property would work, but that is not the solution we want
> to
> > apply). What I would like to do is add support to caches with keepBinary
> to
> > Ignite ML.
> >
> > Best,
> > Oscar
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>

Re: Ignite ML withKeepBinary cache

Posted by Alexey Zinoviev <za...@gmail.com>.
Thanks a lot for the example. Will write later about keepBinary support in
this thread.

чт, 10 янв. 2019 г. в 13:28, otorreno <os...@shapelets.io>:

> Alexey, thanks for your support.
>
> Answer to your questions:
> 1) At the moment the types are: String, Long and Double. But this could
> actually change in the future to any other user-defined types/classes (We
> know we would need to provide data encoders for such types)
> 2) Yes, all data series have the same schema (same number of columns and
> same types)
> 3)  all_sites.csv
> <
> http://apache-ignite-developers.2346864.n4.nabble.com/file/t659/all_sites.csv>
>
> contains an example of what data we are trying to work with. Each of the
> rows of such file contains the metadata of a given data series, as I
> described in my first post of this thread.
>
> Remember that we have such table stored in a cache which uses the
> withKeepBinary method. And the problem I faced was not being able to use
> such cache as input to the ML algos (a copy of such cache to a cache
> without
> the keepBinary property would work, but that is not the solution we want to
> apply). What I would like to do is add support to caches with keepBinary to
> Ignite ML.
>
> Best,
> Oscar
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

Re: Ignite ML withKeepBinary cache

Posted by otorreno <os...@shapelets.io>.
Alexey, thanks for your support.

Answer to your questions:
1) At the moment the types are: String, Long and Double. But this could
actually change in the future to any other user-defined types/classes (We
know we would need to provide data encoders for such types)
2) Yes, all data series have the same schema (same number of columns and
same types)
3)  all_sites.csv
<http://apache-ignite-developers.2346864.n4.nabble.com/file/t659/all_sites.csv>  
contains an example of what data we are trying to work with. Each of the
rows of such file contains the metadata of a given data series, as I
described in my first post of this thread.

Remember that we have such table stored in a cache which uses the
withKeepBinary method. And the problem I faced was not being able to use
such cache as input to the ML algos (a copy of such cache to a cache without
the keepBinary property would work, but that is not the solution we want to
apply). What I would like to do is add support to caches with keepBinary to
Ignite ML.

Best,
Oscar



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Ignite ML withKeepBinary cache

Posted by Alexey Zinoviev <za...@gmail.com>.
Oscar, great use-case, and ML-related developers (me too) will happy to
help you with this case.

Could you please answer a three questions?

1) What kind of types (Java types or SQL types) could be in column
properties?
2) Have all data series (observation) has the same schema with equal number
of columns and equal types in them?
3) Could you provide an obfuscated data example  in csv or in another easy
readable format to make our experiments more efficient?

Also, if you have any issues related to algorithm usage (what to use, how
to calibrate features and etc) write on user list with me in copy.

Alex

чт, 3 янв. 2019 г. в 22:59, otorreno <os...@shapelets.io>:

> Denis,
>
> That's great news! I will wait till your ML expert is back from holidays to
> work with him in a clean solution.
>
> Regarding the blog post, sure, it could be interesting and useful writing
> about how to use Ignite ML with BinaryObject caches IMHO.
>
> Thanks,
> Oscar
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

Re: Ignite ML withKeepBinary cache

Posted by otorreno <os...@shapelets.io>.
Denis,

That's great news! I will wait till your ML expert is back from holidays to
work with him in a clean solution.

Regarding the blog post, sure, it could be interesting and useful writing
about how to use Ignite ML with BinaryObject caches IMHO.

Thanks,
Oscar



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Ignite ML withKeepBinary cache

Posted by Denis Magda <dm...@apache.org>.
Oscar,

Sounds like Ignite ML is a perfect fit for your task. Our ML expert will
help you to come up with a clean solution once the holidays season is over.

In general, will you be able to write a blog post on how Ignite ML is used
for your task once the issues are addressed?

--
Denis

On Wed, Jan 2, 2019 at 11:25 PM otorreno <os...@shapelets.io> wrote:

> Denis,
>
> We have some metadata stored in an Ignite Cache where each row describes a
> certain data series, and each column is a property (could be actually of
> any
> type: strings, doubles, etc.). You can think about it as a table describing
> our data series. This table might be potentially quite big, given a high
> number of series and properties.
>
> Based on this table we would like to clusterize our data using different
> algorithms (e.g. k-means, decision tree).
>
> I started looking at it and I liked pretty much the way you have done the
> pre-processing pipeline for feature selection, transformation,
> normalization
> and scaling. The only stone I found on my way was the BinaryObject problem
> I
> mentioned.
>
> In fact I made it work as I described in my first post, but with a dirty
> solution as I didn't find the way to access the keepBinary property of the
> cache used as input. In any case, I will be glad to help in finding a clean
> solution to the problem if needed.
>
> Best,
> Oscar
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

Re: Ignite ML withKeepBinary cache

Posted by otorreno <os...@shapelets.io>.
Denis,

We have some metadata stored in an Ignite Cache where each row describes a
certain data series, and each column is a property (could be actually of any
type: strings, doubles, etc.). You can think about it as a table describing
our data series. This table might be potentially quite big, given a high
number of series and properties.

Based on this table we would like to clusterize our data using different
algorithms (e.g. k-means, decision tree).

I started looking at it and I liked pretty much the way you have done the
pre-processing pipeline for feature selection, transformation, normalization
and scaling. The only stone I found on my way was the BinaryObject problem I
mentioned.

In fact I made it work as I described in my first post, but with a dirty
solution as I didn't find the way to access the keepBinary property of the
cache used as input. In any case, I will be glad to help in finding a clean
solution to the problem if needed.

Best,
Oscar



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Ignite ML withKeepBinary cache

Posted by Denis Magda <dm...@apache.org>.
Anton,

It shouldn't be hard to support binary objects, right? Are you guys
committed to releasing it with the next Ignite version?

Oscar, could you please share your Ignite ML use case with us?

--
Denis

On Wed, Jan 2, 2019 at 7:34 AM dmitrievanthony <dm...@gmail.com>
wrote:

> Hi, I guess we have plans to support caches with binary objects in ML.
> Please
> have a look the following JIRA for details:
> https://issues.apache.org/jira/browse/IGNITE-10700.
>
> Best regards,
> Anton Dmitriev.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>

Re: Ignite ML withKeepBinary cache

Posted by dmitrievanthony <dm...@gmail.com>.
Hi, I guess we have plans to support caches with binary objects in ML. Please
have a look the following JIRA for details:
https://issues.apache.org/jira/browse/IGNITE-10700.

Best regards,
Anton Dmitriev.



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/