You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Piotr Kołaczkowski <pk...@datastax.com> on 2014/04/24 12:14:04 UTC

Problem creating objects through reflection

Hi,

I'm working on Cassandra-Spark integration and I hit a pretty severe
problem. One of the provided functionality is mapping Cassandra rows into
objects of user-defined classes. E.g. like this:

class MyRow(val key: String, val data: Int)
sc.cassandraTable("keyspace", "table").select("key", "data").as[MyRow]  //
returns CassandraRDD[MyRow]

In this example CassandraRDD creates MyRow instances by reflection, i.e.
matches selected fields from Cassandra table and passes them to the
constructor.

Unfortunately this does not work in Spark REPL.
Turns out any class declared on the REPL is an inner classes, and to be
successfully created, it needs a reference to the outer object, even though
it doesn't really use anything from the outer context.

scala> class SomeClass
defined class SomeClass

scala> classOf[SomeClass].getConstructors()(0)
res11: java.lang.reflect.Constructor[_] = public
$iwC$$iwC$SomeClass($iwC$$iwC)

I tried passing a null as a temporary workaround, and it also doesn't work
- I get NPE.
How can I get a reference to the current outer object representing the
context of the current line?

Also, plain non-spark Scala REPL doesn't exhibit this behaviour - and
classes declared on the REPL are proper top-most classes, not inner ones.
Why?

Thanks,
Piotr







-- 
Piotr Kolaczkowski, Lead Software Engineer
pkolaczk@datastax.com

777 Mariners Island Blvd., Suite 510
San Mateo, CA 94404

Re: Problem creating objects through reflection

Posted by Rohit Rai <ro...@tuplejump.com>.
Hi Piotr,

The easiest solution to this for now is to write all your code (including
the case class) inside a Object and the execution part in a method in the
object. Then you can call the method on the spark shell to executed your
code.

Cheers,
Rohit


*Founder & CEO, **Tuplejump, Inc.*
____________________________
www.tuplejump.com
*The Data Engineering Platform*


On Fri, Apr 25, 2014 at 12:58 PM, Piotr Kołaczkowski
<pk...@datastax.com>wrote:

> Yeah, this is related.
>
> From
> https://groups.google.com/forum/#!msg/spark-users/bwAmbUgxWrA/HwP4Nv4adfEJ
> :
> "This is a limitation that will hopefully go away in Scala 2.10 or 2.10 .1,
> when we'll use macros to remove the need to do this. (Or more generally if
> we get some changes in the Scala interpreter to do something smarter in
> this case.) "
>
> We're using Spark 0.9.0, Scala 2.10.3 and the limitation is there. Any
> ideas when it is going to be fixed?
>
> The workaround with embedding everything inside a singleton object does not
> work for me, because nested classes defined there are still inner  and
> require additional argument to the constructor (when invoked by
> reflection).
>
> If I only had some reliable way to obtain a reference to that outer object
> by reflection, we could somehow workaround it. E.g. saving it in some
> singleton object, etc. However, a proper fix would be to make non-inner
> classes properly non-inner.
>
> Thanks,
> Piotr
>
>
>
>
> 2014-04-25 0:13 GMT+02:00 Michael Armbrust <mi...@databricks.com>:
>
> > The Spark REPL is slightly modified from the normal Scala REPL to prevent
> > work from being done twice when closures are deserialized on the workers.
> >  I'm not sure exactly why this causes your problem, but its probably
> worth
> > filing a JIRA about it.
> >
> > Here is another issues with classes defined in the REPL.  Not sure if it
> is
> > related, but I'd be curious if the workaround helps you:
> > https://issues.apache.org/jira/browse/SPARK-1199
> >
> > Michael
> >
> >
> > On Thu, Apr 24, 2014 at 3:14 AM, Piotr Kołaczkowski
> > <pk...@datastax.com>wrote:
> >
> > > Hi,
> > >
> > > I'm working on Cassandra-Spark integration and I hit a pretty severe
> > > problem. One of the provided functionality is mapping Cassandra rows
> into
> > > objects of user-defined classes. E.g. like this:
> > >
> > > class MyRow(val key: String, val data: Int)
> > > sc.cassandraTable("keyspace", "table").select("key", "data").as[MyRow]
> >  //
> > > returns CassandraRDD[MyRow]
> > >
> > > In this example CassandraRDD creates MyRow instances by reflection,
> i.e.
> > > matches selected fields from Cassandra table and passes them to the
> > > constructor.
> > >
> > > Unfortunately this does not work in Spark REPL.
> > > Turns out any class declared on the REPL is an inner classes, and to be
> > > successfully created, it needs a reference to the outer object, even
> > though
> > > it doesn't really use anything from the outer context.
> > >
> > > scala> class SomeClass
> > > defined class SomeClass
> > >
> > > scala> classOf[SomeClass].getConstructors()(0)
> > > res11: java.lang.reflect.Constructor[_] = public
> > > $iwC$$iwC$SomeClass($iwC$$iwC)
> > >
> > > I tried passing a null as a temporary workaround, and it also doesn't
> > work
> > > - I get NPE.
> > > How can I get a reference to the current outer object representing the
> > > context of the current line?
> > >
> > > Also, plain non-spark Scala REPL doesn't exhibit this behaviour - and
> > > classes declared on the REPL are proper top-most classes, not inner
> ones.
> > > Why?
> > >
> > > Thanks,
> > > Piotr
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Piotr Kolaczkowski, Lead Software Engineer
> > > pkolaczk@datastax.com
> > >
> > > 777 Mariners Island Blvd., Suite 510
> > > San Mateo, CA 94404
> > >
> >
>
>
>
> --
> Piotr Kolaczkowski, Lead Software Engineer
> pkolaczk@datastax.com
>
> 777 Mariners Island Blvd., Suite 510
> San Mateo, CA 94404
>

Re: Problem creating objects through reflection

Posted by Piotr Kołaczkowski <pk...@datastax.com>.
Yeah, this is related.

From
https://groups.google.com/forum/#!msg/spark-users/bwAmbUgxWrA/HwP4Nv4adfEJ:
"This is a limitation that will hopefully go away in Scala 2.10 or 2.10 .1,
when we'll use macros to remove the need to do this. (Or more generally if
we get some changes in the Scala interpreter to do something smarter in
this case.) "

We're using Spark 0.9.0, Scala 2.10.3 and the limitation is there. Any
ideas when it is going to be fixed?

The workaround with embedding everything inside a singleton object does not
work for me, because nested classes defined there are still inner  and
require additional argument to the constructor (when invoked by
reflection).

If I only had some reliable way to obtain a reference to that outer object
by reflection, we could somehow workaround it. E.g. saving it in some
singleton object, etc. However, a proper fix would be to make non-inner
classes properly non-inner.

Thanks,
Piotr




2014-04-25 0:13 GMT+02:00 Michael Armbrust <mi...@databricks.com>:

> The Spark REPL is slightly modified from the normal Scala REPL to prevent
> work from being done twice when closures are deserialized on the workers.
>  I'm not sure exactly why this causes your problem, but its probably worth
> filing a JIRA about it.
>
> Here is another issues with classes defined in the REPL.  Not sure if it is
> related, but I'd be curious if the workaround helps you:
> https://issues.apache.org/jira/browse/SPARK-1199
>
> Michael
>
>
> On Thu, Apr 24, 2014 at 3:14 AM, Piotr Kołaczkowski
> <pk...@datastax.com>wrote:
>
> > Hi,
> >
> > I'm working on Cassandra-Spark integration and I hit a pretty severe
> > problem. One of the provided functionality is mapping Cassandra rows into
> > objects of user-defined classes. E.g. like this:
> >
> > class MyRow(val key: String, val data: Int)
> > sc.cassandraTable("keyspace", "table").select("key", "data").as[MyRow]
>  //
> > returns CassandraRDD[MyRow]
> >
> > In this example CassandraRDD creates MyRow instances by reflection, i.e.
> > matches selected fields from Cassandra table and passes them to the
> > constructor.
> >
> > Unfortunately this does not work in Spark REPL.
> > Turns out any class declared on the REPL is an inner classes, and to be
> > successfully created, it needs a reference to the outer object, even
> though
> > it doesn't really use anything from the outer context.
> >
> > scala> class SomeClass
> > defined class SomeClass
> >
> > scala> classOf[SomeClass].getConstructors()(0)
> > res11: java.lang.reflect.Constructor[_] = public
> > $iwC$$iwC$SomeClass($iwC$$iwC)
> >
> > I tried passing a null as a temporary workaround, and it also doesn't
> work
> > - I get NPE.
> > How can I get a reference to the current outer object representing the
> > context of the current line?
> >
> > Also, plain non-spark Scala REPL doesn't exhibit this behaviour - and
> > classes declared on the REPL are proper top-most classes, not inner ones.
> > Why?
> >
> > Thanks,
> > Piotr
> >
> >
> >
> >
> >
> >
> >
> > --
> > Piotr Kolaczkowski, Lead Software Engineer
> > pkolaczk@datastax.com
> >
> > 777 Mariners Island Blvd., Suite 510
> > San Mateo, CA 94404
> >
>



-- 
Piotr Kolaczkowski, Lead Software Engineer
pkolaczk@datastax.com

777 Mariners Island Blvd., Suite 510
San Mateo, CA 94404

Re: Problem creating objects through reflection

Posted by Michael Armbrust <mi...@databricks.com>.
The Spark REPL is slightly modified from the normal Scala REPL to prevent
work from being done twice when closures are deserialized on the workers.
 I'm not sure exactly why this causes your problem, but its probably worth
filing a JIRA about it.

Here is another issues with classes defined in the REPL.  Not sure if it is
related, but I'd be curious if the workaround helps you:
https://issues.apache.org/jira/browse/SPARK-1199

Michael


On Thu, Apr 24, 2014 at 3:14 AM, Piotr Kołaczkowski
<pk...@datastax.com>wrote:

> Hi,
>
> I'm working on Cassandra-Spark integration and I hit a pretty severe
> problem. One of the provided functionality is mapping Cassandra rows into
> objects of user-defined classes. E.g. like this:
>
> class MyRow(val key: String, val data: Int)
> sc.cassandraTable("keyspace", "table").select("key", "data").as[MyRow]  //
> returns CassandraRDD[MyRow]
>
> In this example CassandraRDD creates MyRow instances by reflection, i.e.
> matches selected fields from Cassandra table and passes them to the
> constructor.
>
> Unfortunately this does not work in Spark REPL.
> Turns out any class declared on the REPL is an inner classes, and to be
> successfully created, it needs a reference to the outer object, even though
> it doesn't really use anything from the outer context.
>
> scala> class SomeClass
> defined class SomeClass
>
> scala> classOf[SomeClass].getConstructors()(0)
> res11: java.lang.reflect.Constructor[_] = public
> $iwC$$iwC$SomeClass($iwC$$iwC)
>
> I tried passing a null as a temporary workaround, and it also doesn't work
> - I get NPE.
> How can I get a reference to the current outer object representing the
> context of the current line?
>
> Also, plain non-spark Scala REPL doesn't exhibit this behaviour - and
> classes declared on the REPL are proper top-most classes, not inner ones.
> Why?
>
> Thanks,
> Piotr
>
>
>
>
>
>
>
> --
> Piotr Kolaczkowski, Lead Software Engineer
> pkolaczk@datastax.com
>
> 777 Mariners Island Blvd., Suite 510
> San Mateo, CA 94404
>