You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bharath Ravi Kumar <re...@gmail.com> on 2014/12/02 03:40:51 UTC

Re: ALS failure with size > Integer.MAX_VALUE

Yes, the issue appears to be due to the 2GB block size limitation. I am
hence looking for (user, product) block sizing suggestions to work around
the block size limitation.

On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:

> (It won't be that, since you see that the error occur when reading a
> block from disk. I think this is an instance of the 2GB block size
> limitation.)
>
> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
> <Il...@capitalone.com> wrote:
> > Hi Bharath – I’m unsure if this is your problem but the
> > MatrixFactorizationModel in MLLIB which is the underlying component for
> ALS
> > expects your User/Product fields to be integers. Specifically, the input
> to
> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
> wondering if
> > perhaps one of your identifiers exceeds MAX_INT, could you write a quick
> > check for that?
> >
> > I have been running a very similar use case to yours (with more
> constrained
> > hardware resources) and I haven’t seen this exact problem but I’m sure
> we’ve
> > seen similar issues. Please let me know if you have other questions.
> >
> > From: Bharath Ravi Kumar <re...@gmail.com>
> > Date: Thursday, November 27, 2014 at 1:30 PM
> > To: "user@spark.apache.org" <us...@spark.apache.org>
> > Subject: ALS failure with size > Integer.MAX_VALUE
> >
> > We're training a recommender with ALS in mllib 1.1 against a dataset of
> 150M
> > users and 4.5K items, with the total number of training records being 1.2
> > Billion (~30GB data). The input data is spread across 1200 partitions on
> > HDFS. For the training, rank=10, and we've configured {number of user
> data
> > blocks = number of item data blocks}. The number of user/item blocks was
> > varied  between 50 to 1200. Irrespective of the block size (e.g. at 1200
> > blocks each), there are atleast a couple of tasks that end up shuffle
> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and failing
> with
> > the following exception:
> >
> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
> >         at
> org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
> >         at
> org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
> >         at
> >
> org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
> >         at
> >
> org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
> >
>

Re: ALS failure with size > Integer.MAX_VALUE

Posted by Bharath Ravi Kumar <re...@gmail.com>.
Ok. We'll try using it in a test cluster running 1.2.
On 16-Dec-2014 1:36 am, "Xiangrui Meng" <me...@gmail.com> wrote:

Unfortunately, it will depends on the Sorter API in 1.2. -Xiangrui

On Mon, Dec 15, 2014 at 11:48 AM, Bharath Ravi Kumar
<re...@gmail.com> wrote:
> Hi Xiangrui,
>
> The block size limit was encountered even with reduced number of item
blocks
> as you had expected. I'm wondering if I could try the new implementation
as
> a standalone library against a 1.1 deployment. Does it have dependencies
on
> any core API's in the current master?
>
> Thanks,
> Bharath
>
> On Wed, Dec 3, 2014 at 10:10 PM, Bharath Ravi Kumar <re...@gmail.com>
> wrote:
>>
>> Thanks Xiangrui. I'll try out setting a smaller number of item blocks.
And
>> yes, I've been following the JIRA for the new ALS implementation. I'll
try
>> it out when it's ready for testing. .
>>
>> On Wed, Dec 3, 2014 at 4:24 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>
>>> Hi Bharath,
>>>
>>> You can try setting a small item blocks in this case. 1200 is
>>> definitely too large for ALS. Please try 30 or even smaller. I'm not
>>> sure whether this could solve the problem because you have 100 items
>>> connected with 10^8 users. There is a JIRA for this issue:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-3735
>>>
>>> which I will try to implement in 1.3. I'll ping you when it is ready.
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <reachbach@gmail.com
>
>>> wrote:
>>> > Yes, the issue appears to be due to the 2GB block size limitation. I
am
>>> > hence looking for (user, product) block sizing suggestions to work
>>> > around
>>> > the block size limitation.
>>> >
>>> > On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>
>>> >> (It won't be that, since you see that the error occur when reading a
>>> >> block from disk. I think this is an instance of the 2GB block size
>>> >> limitation.)
>>> >>
>>> >> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
>>> >> <Il...@capitalone.com> wrote:
>>> >> > Hi Bharath – I’m unsure if this is your problem but the
>>> >> > MatrixFactorizationModel in MLLIB which is the underlying component
>>> >> > for
>>> >> > ALS
>>> >> > expects your User/Product fields to be integers. Specifically, the
>>> >> > input
>>> >> > to
>>> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
>>> >> > wondering if
>>> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a
>>> >> > quick
>>> >> > check for that?
>>> >> >
>>> >> > I have been running a very similar use case to yours (with more
>>> >> > constrained
>>> >> > hardware resources) and I haven’t seen this exact problem but I’m
>>> >> > sure
>>> >> > we’ve
>>> >> > seen similar issues. Please let me know if you have other
questions.
>>> >> >
>>> >> > From: Bharath Ravi Kumar <re...@gmail.com>
>>> >> > Date: Thursday, November 27, 2014 at 1:30 PM
>>> >> > To: "user@spark.apache.org" <us...@spark.apache.org>
>>> >> > Subject: ALS failure with size > Integer.MAX_VALUE
>>> >> >
>>> >> > We're training a recommender with ALS in mllib 1.1 against a
dataset
>>> >> > of
>>> >> > 150M
>>> >> > users and 4.5K items, with the total number of training records
>>> >> > being
>>> >> > 1.2
>>> >> > Billion (~30GB data). The input data is spread across 1200
>>> >> > partitions on
>>> >> > HDFS. For the training, rank=10, and we've configured {number of
>>> >> > user
>>> >> > data
>>> >> > blocks = number of item data blocks}. The number of user/item
blocks
>>> >> > was
>>> >> > varied  between 50 to 1200. Irrespective of the block size (e.g. at
>>> >> > 1200
>>> >> > blocks each), there are atleast a couple of tasks that end up
>>> >> > shuffle
>>> >> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and
>>> >> > failing
>>> >> > with
>>> >> > the following exception:
>>> >> >
>>> >> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>>> >> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
>>> >> >         at
>>> >> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
>>> >> >         at
>>> >> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> >
org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> >
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
>>> >> >
>>> >
>>> >
>>
>>
>

Re: ALS failure with size > Integer.MAX_VALUE

Posted by Xiangrui Meng <me...@gmail.com>.
Unfortunately, it will depends on the Sorter API in 1.2. -Xiangrui

On Mon, Dec 15, 2014 at 11:48 AM, Bharath Ravi Kumar
<re...@gmail.com> wrote:
> Hi Xiangrui,
>
> The block size limit was encountered even with reduced number of item blocks
> as you had expected. I'm wondering if I could try the new implementation as
> a standalone library against a 1.1 deployment. Does it have dependencies on
> any core API's in the current master?
>
> Thanks,
> Bharath
>
> On Wed, Dec 3, 2014 at 10:10 PM, Bharath Ravi Kumar <re...@gmail.com>
> wrote:
>>
>> Thanks Xiangrui. I'll try out setting a smaller number of item blocks. And
>> yes, I've been following the JIRA for the new ALS implementation. I'll try
>> it out when it's ready for testing. .
>>
>> On Wed, Dec 3, 2014 at 4:24 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>
>>> Hi Bharath,
>>>
>>> You can try setting a small item blocks in this case. 1200 is
>>> definitely too large for ALS. Please try 30 or even smaller. I'm not
>>> sure whether this could solve the problem because you have 100 items
>>> connected with 10^8 users. There is a JIRA for this issue:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-3735
>>>
>>> which I will try to implement in 1.3. I'll ping you when it is ready.
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <re...@gmail.com>
>>> wrote:
>>> > Yes, the issue appears to be due to the 2GB block size limitation. I am
>>> > hence looking for (user, product) block sizing suggestions to work
>>> > around
>>> > the block size limitation.
>>> >
>>> > On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>
>>> >> (It won't be that, since you see that the error occur when reading a
>>> >> block from disk. I think this is an instance of the 2GB block size
>>> >> limitation.)
>>> >>
>>> >> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
>>> >> <Il...@capitalone.com> wrote:
>>> >> > Hi Bharath – I’m unsure if this is your problem but the
>>> >> > MatrixFactorizationModel in MLLIB which is the underlying component
>>> >> > for
>>> >> > ALS
>>> >> > expects your User/Product fields to be integers. Specifically, the
>>> >> > input
>>> >> > to
>>> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
>>> >> > wondering if
>>> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a
>>> >> > quick
>>> >> > check for that?
>>> >> >
>>> >> > I have been running a very similar use case to yours (with more
>>> >> > constrained
>>> >> > hardware resources) and I haven’t seen this exact problem but I’m
>>> >> > sure
>>> >> > we’ve
>>> >> > seen similar issues. Please let me know if you have other questions.
>>> >> >
>>> >> > From: Bharath Ravi Kumar <re...@gmail.com>
>>> >> > Date: Thursday, November 27, 2014 at 1:30 PM
>>> >> > To: "user@spark.apache.org" <us...@spark.apache.org>
>>> >> > Subject: ALS failure with size > Integer.MAX_VALUE
>>> >> >
>>> >> > We're training a recommender with ALS in mllib 1.1 against a dataset
>>> >> > of
>>> >> > 150M
>>> >> > users and 4.5K items, with the total number of training records
>>> >> > being
>>> >> > 1.2
>>> >> > Billion (~30GB data). The input data is spread across 1200
>>> >> > partitions on
>>> >> > HDFS. For the training, rank=10, and we've configured {number of
>>> >> > user
>>> >> > data
>>> >> > blocks = number of item data blocks}. The number of user/item blocks
>>> >> > was
>>> >> > varied  between 50 to 1200. Irrespective of the block size (e.g. at
>>> >> > 1200
>>> >> > blocks each), there are atleast a couple of tasks that end up
>>> >> > shuffle
>>> >> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and
>>> >> > failing
>>> >> > with
>>> >> > the following exception:
>>> >> >
>>> >> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>>> >> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
>>> >> >         at
>>> >> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
>>> >> >         at
>>> >> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> > org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
>>> >> >
>>> >
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: ALS failure with size > Integer.MAX_VALUE

Posted by Bharath Ravi Kumar <re...@gmail.com>.
Hi Xiangrui,

The block size limit was encountered even with reduced number of item
blocks as you had expected. I'm wondering if I could try the new
implementation as a standalone library against a 1.1 deployment. Does it
have dependencies on any core API's in the current master?

Thanks,
Bharath

On Wed, Dec 3, 2014 at 10:10 PM, Bharath Ravi Kumar <re...@gmail.com>
wrote:
>
> Thanks Xiangrui. I'll try out setting a smaller number of item blocks. And
> yes, I've been following the JIRA for the new ALS implementation. I'll try
> it out when it's ready for testing. .
>
> On Wed, Dec 3, 2014 at 4:24 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> Hi Bharath,
>>
>> You can try setting a small item blocks in this case. 1200 is
>> definitely too large for ALS. Please try 30 or even smaller. I'm not
>> sure whether this could solve the problem because you have 100 items
>> connected with 10^8 users. There is a JIRA for this issue:
>>
>> https://issues.apache.org/jira/browse/SPARK-3735
>>
>> which I will try to implement in 1.3. I'll ping you when it is ready.
>>
>> Best,
>> Xiangrui
>>
>> On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <re...@gmail.com>
>> wrote:
>> > Yes, the issue appears to be due to the 2GB block size limitation. I am
>> > hence looking for (user, product) block sizing suggestions to work
>> around
>> > the block size limitation.
>> >
>> > On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> (It won't be that, since you see that the error occur when reading a
>> >> block from disk. I think this is an instance of the 2GB block size
>> >> limitation.)
>> >>
>> >> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
>> >> <Il...@capitalone.com> wrote:
>> >> > Hi Bharath – I’m unsure if this is your problem but the
>> >> > MatrixFactorizationModel in MLLIB which is the underlying component
>> for
>> >> > ALS
>> >> > expects your User/Product fields to be integers. Specifically, the
>> input
>> >> > to
>> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
>> >> > wondering if
>> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a
>> quick
>> >> > check for that?
>> >> >
>> >> > I have been running a very similar use case to yours (with more
>> >> > constrained
>> >> > hardware resources) and I haven’t seen this exact problem but I’m
>> sure
>> >> > we’ve
>> >> > seen similar issues. Please let me know if you have other questions.
>> >> >
>> >> > From: Bharath Ravi Kumar <re...@gmail.com>
>> >> > Date: Thursday, November 27, 2014 at 1:30 PM
>> >> > To: "user@spark.apache.org" <us...@spark.apache.org>
>> >> > Subject: ALS failure with size > Integer.MAX_VALUE
>> >> >
>> >> > We're training a recommender with ALS in mllib 1.1 against a dataset
>> of
>> >> > 150M
>> >> > users and 4.5K items, with the total number of training records being
>> >> > 1.2
>> >> > Billion (~30GB data). The input data is spread across 1200
>> partitions on
>> >> > HDFS. For the training, rank=10, and we've configured {number of user
>> >> > data
>> >> > blocks = number of item data blocks}. The number of user/item blocks
>> was
>> >> > varied  between 50 to 1200. Irrespective of the block size (e.g. at
>> 1200
>> >> > blocks each), there are atleast a couple of tasks that end up shuffle
>> >> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and
>> failing
>> >> > with
>> >> > the following exception:
>> >> >
>> >> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>> >> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
>> >> >         at
>> >> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
>> >> >         at
>> >> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
>> >> >         at
>> >> >
>> >> >
>> org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
>> >> >         at
>> >> >
>> >> >
>> org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
>> >> >
>> >
>> >
>>
>
>

Re: ALS failure with size > Integer.MAX_VALUE

Posted by Bharath Ravi Kumar <re...@gmail.com>.
Thanks Xiangrui. I'll try out setting a smaller number of item blocks. And
yes, I've been following the JIRA for the new ALS implementation. I'll try
it out when it's ready for testing. .

On Wed, Dec 3, 2014 at 4:24 AM, Xiangrui Meng <me...@gmail.com> wrote:

> Hi Bharath,
>
> You can try setting a small item blocks in this case. 1200 is
> definitely too large for ALS. Please try 30 or even smaller. I'm not
> sure whether this could solve the problem because you have 100 items
> connected with 10^8 users. There is a JIRA for this issue:
>
> https://issues.apache.org/jira/browse/SPARK-3735
>
> which I will try to implement in 1.3. I'll ping you when it is ready.
>
> Best,
> Xiangrui
>
> On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <re...@gmail.com>
> wrote:
> > Yes, the issue appears to be due to the 2GB block size limitation. I am
> > hence looking for (user, product) block sizing suggestions to work around
> > the block size limitation.
> >
> > On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> (It won't be that, since you see that the error occur when reading a
> >> block from disk. I think this is an instance of the 2GB block size
> >> limitation.)
> >>
> >> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
> >> <Il...@capitalone.com> wrote:
> >> > Hi Bharath – I’m unsure if this is your problem but the
> >> > MatrixFactorizationModel in MLLIB which is the underlying component
> for
> >> > ALS
> >> > expects your User/Product fields to be integers. Specifically, the
> input
> >> > to
> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
> >> > wondering if
> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a
> quick
> >> > check for that?
> >> >
> >> > I have been running a very similar use case to yours (with more
> >> > constrained
> >> > hardware resources) and I haven’t seen this exact problem but I’m sure
> >> > we’ve
> >> > seen similar issues. Please let me know if you have other questions.
> >> >
> >> > From: Bharath Ravi Kumar <re...@gmail.com>
> >> > Date: Thursday, November 27, 2014 at 1:30 PM
> >> > To: "user@spark.apache.org" <us...@spark.apache.org>
> >> > Subject: ALS failure with size > Integer.MAX_VALUE
> >> >
> >> > We're training a recommender with ALS in mllib 1.1 against a dataset
> of
> >> > 150M
> >> > users and 4.5K items, with the total number of training records being
> >> > 1.2
> >> > Billion (~30GB data). The input data is spread across 1200 partitions
> on
> >> > HDFS. For the training, rank=10, and we've configured {number of user
> >> > data
> >> > blocks = number of item data blocks}. The number of user/item blocks
> was
> >> > varied  between 50 to 1200. Irrespective of the block size (e.g. at
> 1200
> >> > blocks each), there are atleast a couple of tasks that end up shuffle
> >> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and failing
> >> > with
> >> > the following exception:
> >> >
> >> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
> >> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
> >> >         at
> >> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
> >> >         at
> >> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
> >> >         at
> >> >
> >> >
> org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
> >> >         at
> >> >
> >> >
> org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
> >> >
> >
> >
>

Re: ALS failure with size > Integer.MAX_VALUE

Posted by Xiangrui Meng <me...@gmail.com>.
Hi Bharath,

You can try setting a small item blocks in this case. 1200 is
definitely too large for ALS. Please try 30 or even smaller. I'm not
sure whether this could solve the problem because you have 100 items
connected with 10^8 users. There is a JIRA for this issue:

https://issues.apache.org/jira/browse/SPARK-3735

which I will try to implement in 1.3. I'll ping you when it is ready.

Best,
Xiangrui

On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <re...@gmail.com> wrote:
> Yes, the issue appears to be due to the 2GB block size limitation. I am
> hence looking for (user, product) block sizing suggestions to work around
> the block size limitation.
>
> On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> (It won't be that, since you see that the error occur when reading a
>> block from disk. I think this is an instance of the 2GB block size
>> limitation.)
>>
>> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
>> <Il...@capitalone.com> wrote:
>> > Hi Bharath – I’m unsure if this is your problem but the
>> > MatrixFactorizationModel in MLLIB which is the underlying component for
>> > ALS
>> > expects your User/Product fields to be integers. Specifically, the input
>> > to
>> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
>> > wondering if
>> > perhaps one of your identifiers exceeds MAX_INT, could you write a quick
>> > check for that?
>> >
>> > I have been running a very similar use case to yours (with more
>> > constrained
>> > hardware resources) and I haven’t seen this exact problem but I’m sure
>> > we’ve
>> > seen similar issues. Please let me know if you have other questions.
>> >
>> > From: Bharath Ravi Kumar <re...@gmail.com>
>> > Date: Thursday, November 27, 2014 at 1:30 PM
>> > To: "user@spark.apache.org" <us...@spark.apache.org>
>> > Subject: ALS failure with size > Integer.MAX_VALUE
>> >
>> > We're training a recommender with ALS in mllib 1.1 against a dataset of
>> > 150M
>> > users and 4.5K items, with the total number of training records being
>> > 1.2
>> > Billion (~30GB data). The input data is spread across 1200 partitions on
>> > HDFS. For the training, rank=10, and we've configured {number of user
>> > data
>> > blocks = number of item data blocks}. The number of user/item blocks was
>> > varied  between 50 to 1200. Irrespective of the block size (e.g. at 1200
>> > blocks each), there are atleast a couple of tasks that end up shuffle
>> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and failing
>> > with
>> > the following exception:
>> >
>> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
>> >         at
>> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
>> >         at
>> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
>> >         at
>> >
>> > org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
>> >         at
>> >
>> > org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org