You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by tgodden <tg...@vub.ac.be> on 2016/05/13 07:40:53 UTC

Java: Return type of RDDFunctions.sliding(int, int)

Hello,

We're trying to use PrefixSpan on sequential data, by passing a sliding
window over it. Spark Streaming is not an option.
RDDFunctions.sliding() returns an item of class RDD<Java.lang.Object>,
regardless of the original type of the RDD. Because of this, the
returned item seems to be pretty much worthless.
Is this a bug/nyi? Is there a way to circumvent this somehow?

Official docs:
https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html

Thanks




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Java-Return-type-of-RDDFunctions-sliding-int-int-tp26948.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Java: Return type of RDDFunctions.sliding(int, int)

Posted by Tom Godden <tg...@vub.ac.be>.
I corrected the type to RDD<ArrayList<Integer>[]>, but it's still giving
me the error.
I believe I have found the reason though. The vals variable is created
using the map procedure on some other RDD. Although it is declared as a
JavaRDD<ArrayList<Integer>>, the classTag it returns is Object. I think
that because of this, the RDD returned from sliding() only accepts
Object as a type.
I have no idea how to fix this though.

On 13-05-16 13:12, Sean Owen wrote:
> The Java docs won't help since they only show "Object", yes. Have a
> look at the Scala docs:
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.rdd.RDDFunctions
>
> An RDD of T produces an RDD of T[].
>
> On Fri, May 13, 2016 at 12:10 PM, Tom Godden <tg...@vub.ac.be> wrote:
>> I assumed the "fixed size blocks" mentioned in the documentation
>> (https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html#sliding%28int,%20int%29)
>> were RDDs, but I guess they're arrays? Even when I change the RDD to
>> arrays (so it looks like RDD<ArrayList<Integer>[]>), it doesn't work.
>> I'm passing an RDD of ArrayLists of Integers to the sliding functions,
>> so that's where the ArrayList comes from.
>> I can't seem to find up to date example code, could you maybe give an
>> example?
>>
>> On 13-05-16 12:53, Sean Owen wrote:
>>> I'm not sure what you're trying there. The return type is an RDD of
>>> arrays, not of RDDs or of ArrayLists. There may be another catch but
>>> that is not it.
>>>
>>> On Fri, May 13, 2016 at 11:50 AM, Tom Godden <tg...@vub.ac.be> wrote:
>>>> I believe it's an illegal cast. This is the line of code:
>>>>> RDD<RDD<ArrayList<Integer>>> windowed =
>>>>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
>>>> with vals being a JavaRDD<ArrayList<Integer>>.  Explicitly casting
>>>> doesn't work either:
>>>>> RDD<RDD<ArrayList<Integer>>> windowed = (RDD<RDD<ArrayList<Integer>>>)
>>>>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
>>>> Did I miss something?
>>>>
>>>> On 13-05-16 09:44, Sean Owen wrote:
>>>>> The problem is there's no Java-friendly version of this, and the Scala
>>>>> API return type actually has no analog in Java (an array of any type,
>>>>> not just of objects) so it becomes Object. You can just cast it to the
>>>>> type you know it will be -- RDD<String[]> or RDD<long[]> or whatever.
>>>>>
>>>>> On Fri, May 13, 2016 at 8:40 AM, tgodden <tg...@vub.ac.be> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> We're trying to use PrefixSpan on sequential data, by passing a sliding
>>>>>> window over it. Spark Streaming is not an option.
>>>>>> RDDFunctions.sliding() returns an item of class RDD<Java.lang.Object>,
>>>>>> regardless of the original type of the RDD. Because of this, the
>>>>>> returned item seems to be pretty much worthless.
>>>>>> Is this a bug/nyi? Is there a way to circumvent this somehow?
>>>>>>
>>>>>> Official docs:
>>>>>> https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> ________________________________
>>>>>> View this message in context: Java: Return type of RDDFunctions.sliding(int,
>>>>>> int)
>>>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Java: Return type of RDDFunctions.sliding(int, int)

Posted by Sean Owen <so...@cloudera.com>.
The Java docs won't help since they only show "Object", yes. Have a
look at the Scala docs:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.rdd.RDDFunctions

An RDD of T produces an RDD of T[].

On Fri, May 13, 2016 at 12:10 PM, Tom Godden <tg...@vub.ac.be> wrote:
> I assumed the "fixed size blocks" mentioned in the documentation
> (https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html#sliding%28int,%20int%29)
> were RDDs, but I guess they're arrays? Even when I change the RDD to
> arrays (so it looks like RDD<ArrayList<Integer>[]>), it doesn't work.
> I'm passing an RDD of ArrayLists of Integers to the sliding functions,
> so that's where the ArrayList comes from.
> I can't seem to find up to date example code, could you maybe give an
> example?
>
> On 13-05-16 12:53, Sean Owen wrote:
>> I'm not sure what you're trying there. The return type is an RDD of
>> arrays, not of RDDs or of ArrayLists. There may be another catch but
>> that is not it.
>>
>> On Fri, May 13, 2016 at 11:50 AM, Tom Godden <tg...@vub.ac.be> wrote:
>>> I believe it's an illegal cast. This is the line of code:
>>>> RDD<RDD<ArrayList<Integer>>> windowed =
>>>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
>>> with vals being a JavaRDD<ArrayList<Integer>>.  Explicitly casting
>>> doesn't work either:
>>>> RDD<RDD<ArrayList<Integer>>> windowed = (RDD<RDD<ArrayList<Integer>>>)
>>>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
>>> Did I miss something?
>>>
>>> On 13-05-16 09:44, Sean Owen wrote:
>>>> The problem is there's no Java-friendly version of this, and the Scala
>>>> API return type actually has no analog in Java (an array of any type,
>>>> not just of objects) so it becomes Object. You can just cast it to the
>>>> type you know it will be -- RDD<String[]> or RDD<long[]> or whatever.
>>>>
>>>> On Fri, May 13, 2016 at 8:40 AM, tgodden <tg...@vub.ac.be> wrote:
>>>>> Hello,
>>>>>
>>>>> We're trying to use PrefixSpan on sequential data, by passing a sliding
>>>>> window over it. Spark Streaming is not an option.
>>>>> RDDFunctions.sliding() returns an item of class RDD<Java.lang.Object>,
>>>>> regardless of the original type of the RDD. Because of this, the
>>>>> returned item seems to be pretty much worthless.
>>>>> Is this a bug/nyi? Is there a way to circumvent this somehow?
>>>>>
>>>>> Official docs:
>>>>> https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html
>>>>>
>>>>> Thanks
>>>>>
>>>>> ________________________________
>>>>> View this message in context: Java: Return type of RDDFunctions.sliding(int,
>>>>> int)
>>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Java: Return type of RDDFunctions.sliding(int, int)

Posted by Tom Godden <tg...@vub.ac.be>.
I assumed the "fixed size blocks" mentioned in the documentation
(https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html#sliding%28int,%20int%29)
were RDDs, but I guess they're arrays? Even when I change the RDD to
arrays (so it looks like RDD<ArrayList<Integer>[]>), it doesn't work.
I'm passing an RDD of ArrayLists of Integers to the sliding functions,
so that's where the ArrayList comes from.
I can't seem to find up to date example code, could you maybe give an
example?

On 13-05-16 12:53, Sean Owen wrote:
> I'm not sure what you're trying there. The return type is an RDD of
> arrays, not of RDDs or of ArrayLists. There may be another catch but
> that is not it.
>
> On Fri, May 13, 2016 at 11:50 AM, Tom Godden <tg...@vub.ac.be> wrote:
>> I believe it's an illegal cast. This is the line of code:
>>> RDD<RDD<ArrayList<Integer>>> windowed =
>>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
>> with vals being a JavaRDD<ArrayList<Integer>>.  Explicitly casting
>> doesn't work either:
>>> RDD<RDD<ArrayList<Integer>>> windowed = (RDD<RDD<ArrayList<Integer>>>)
>>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
>> Did I miss something?
>>
>> On 13-05-16 09:44, Sean Owen wrote:
>>> The problem is there's no Java-friendly version of this, and the Scala
>>> API return type actually has no analog in Java (an array of any type,
>>> not just of objects) so it becomes Object. You can just cast it to the
>>> type you know it will be -- RDD<String[]> or RDD<long[]> or whatever.
>>>
>>> On Fri, May 13, 2016 at 8:40 AM, tgodden <tg...@vub.ac.be> wrote:
>>>> Hello,
>>>>
>>>> We're trying to use PrefixSpan on sequential data, by passing a sliding
>>>> window over it. Spark Streaming is not an option.
>>>> RDDFunctions.sliding() returns an item of class RDD<Java.lang.Object>,
>>>> regardless of the original type of the RDD. Because of this, the
>>>> returned item seems to be pretty much worthless.
>>>> Is this a bug/nyi? Is there a way to circumvent this somehow?
>>>>
>>>> Official docs:
>>>> https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html
>>>>
>>>> Thanks
>>>>
>>>> ________________________________
>>>> View this message in context: Java: Return type of RDDFunctions.sliding(int,
>>>> int)
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Java: Return type of RDDFunctions.sliding(int, int)

Posted by Sean Owen <so...@cloudera.com>.
I'm not sure what you're trying there. The return type is an RDD of
arrays, not of RDDs or of ArrayLists. There may be another catch but
that is not it.

On Fri, May 13, 2016 at 11:50 AM, Tom Godden <tg...@vub.ac.be> wrote:
> I believe it's an illegal cast. This is the line of code:
>> RDD<RDD<ArrayList<Integer>>> windowed =
>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
> with vals being a JavaRDD<ArrayList<Integer>>.  Explicitly casting
> doesn't work either:
>> RDD<RDD<ArrayList<Integer>>> windowed = (RDD<RDD<ArrayList<Integer>>>)
>> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
> Did I miss something?
>
> On 13-05-16 09:44, Sean Owen wrote:
>> The problem is there's no Java-friendly version of this, and the Scala
>> API return type actually has no analog in Java (an array of any type,
>> not just of objects) so it becomes Object. You can just cast it to the
>> type you know it will be -- RDD<String[]> or RDD<long[]> or whatever.
>>
>> On Fri, May 13, 2016 at 8:40 AM, tgodden <tg...@vub.ac.be> wrote:
>>> Hello,
>>>
>>> We're trying to use PrefixSpan on sequential data, by passing a sliding
>>> window over it. Spark Streaming is not an option.
>>> RDDFunctions.sliding() returns an item of class RDD<Java.lang.Object>,
>>> regardless of the original type of the RDD. Because of this, the
>>> returned item seems to be pretty much worthless.
>>> Is this a bug/nyi? Is there a way to circumvent this somehow?
>>>
>>> Official docs:
>>> https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html
>>>
>>> Thanks
>>>
>>> ________________________________
>>> View this message in context: Java: Return type of RDDFunctions.sliding(int,
>>> int)
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Java: Return type of RDDFunctions.sliding(int, int)

Posted by Tom Godden <tg...@vub.ac.be>.
I believe it's an illegal cast. This is the line of code:
> RDD<RDD<ArrayList<Integer>>> windowed =
> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
with vals being a JavaRDD<ArrayList<Integer>>.  Explicitly casting
doesn't work either:
> RDD<RDD<ArrayList<Integer>>> windowed = (RDD<RDD<ArrayList<Integer>>>)
> RDDFunctions.fromRDD(vals.rdd(), vals.classTag()).sliding(20, 1);
Did I miss something?

On 13-05-16 09:44, Sean Owen wrote:
> The problem is there's no Java-friendly version of this, and the Scala
> API return type actually has no analog in Java (an array of any type,
> not just of objects) so it becomes Object. You can just cast it to the
> type you know it will be -- RDD<String[]> or RDD<long[]> or whatever.
>
> On Fri, May 13, 2016 at 8:40 AM, tgodden <tg...@vub.ac.be> wrote:
>> Hello,
>>
>> We're trying to use PrefixSpan on sequential data, by passing a sliding
>> window over it. Spark Streaming is not an option.
>> RDDFunctions.sliding() returns an item of class RDD<Java.lang.Object>,
>> regardless of the original type of the RDD. Because of this, the
>> returned item seems to be pretty much worthless.
>> Is this a bug/nyi? Is there a way to circumvent this somehow?
>>
>> Official docs:
>> https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html
>>
>> Thanks
>>
>> ________________________________
>> View this message in context: Java: Return type of RDDFunctions.sliding(int,
>> int)
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Java: Return type of RDDFunctions.sliding(int, int)

Posted by Sean Owen <so...@cloudera.com>.
The problem is there's no Java-friendly version of this, and the Scala
API return type actually has no analog in Java (an array of any type,
not just of objects) so it becomes Object. You can just cast it to the
type you know it will be -- RDD<String[]> or RDD<long[]> or whatever.

On Fri, May 13, 2016 at 8:40 AM, tgodden <tg...@vub.ac.be> wrote:
> Hello,
>
> We're trying to use PrefixSpan on sequential data, by passing a sliding
> window over it. Spark Streaming is not an option.
> RDDFunctions.sliding() returns an item of class RDD<Java.lang.Object>,
> regardless of the original type of the RDD. Because of this, the
> returned item seems to be pretty much worthless.
> Is this a bug/nyi? Is there a way to circumvent this somehow?
>
> Official docs:
> https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html
>
> Thanks
>
> ________________________________
> View this message in context: Java: Return type of RDDFunctions.sliding(int,
> int)
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org