You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by sudhir vaidya <v....@gmail.com> on 2013/11/21 01:41:05 UTC

Trouble with MLbase exercise Documentation

I am a beginner and have started to go through the Mlbase exercises.

But i get a java.io.indexoutofbounds.exception when i run the first command
of step 2.1 here :

http://ampcamp.berkeley.edu/3/exercises/mli-document-categorization.html

All i am doing is Copying the command and pasting it to the spark shell
interface.

I tried splitting the command by loading the data set initially and
filtering subsequently.. but that didnt work.

I also tried to change value of "r(0)" to "r(1)" in that step. But i still
get the same error.

Any help is really appreciated.

-Sudhir

Re: Trouble with MLbase exercise Documentation

Posted by sudhir vaidya <v....@gmail.com>.

I don't quite know what you mean by "Docker container"... but i followed
the instructions for installing spark 0.8.0 here:

http://spark.incubator.apache.org/screencasts/1-first-steps-with-spark.html

And also I put in scala 2.9.3 initially.

Was there something I missed??







On Thu, Nov 21, 2013 at 1:52 PM, Evan R. Sparks <ev...@gmail.com>wrote:

> Ah - how have you configured your machine for spark? Inside of a docker
> container?
>
> The .numRows will actually need to run through the entire file in sequence
> (it calls rdd.count() under the hood) - 10 minutes sounds a little long but
> not unreasonable if on a single machine.
>
>
> On Thu, Nov 21, 2013 at 9:24 AM, sudhir vaidya <v....@gmail.com>wrote:
>
>> Hey Evan,
>>
>> I do get the output when i load the file. I also see an output when i do
>> the "x.take(5)" command.
>>
>> But x.numRows takes a long time to execute.. i waited for like 10 mins
>> ... and had to do a Ctrl + C. My take on that is.. since the file is around
>> 40 Gigs and I am running it on a quadcore machine (not a very high end
>> machine and its just one machine and not a cluster).. maybe it takes a lot
>> more time... I am not sure though...
>>
>> Regards,
>> Sudhir
>>
>>
>> On Thu, Nov 21, 2013 at 11:18 AM, Evan R. Sparks <ev...@gmail.com>wrote:
>>
>>> What happens when you do:
>>> val x = mc.loadFile("/enwiki_txt")
>>>
>>> and then
>>> x.numRows
>>> or
>>> x.take(5)
>>>
>>> Do you see output there?
>>>
>>>
>>>
>>> On Wed, Nov 20, 2013 at 4:41 PM, sudhir vaidya <v....@gmail.com>wrote:
>>>
>>>> I am a beginner and have started to go through the Mlbase exercises.
>>>>
>>>> But i get a java.io.indexoutofbounds.exception when i run the first
>>>> command of step 2.1 here :
>>>>
>>>> http://ampcamp.berkeley.edu/3/exercises/mli-document-categorization.html
>>>>
>>>> All i am doing is Copying the command and pasting it to the spark shell
>>>> interface.
>>>>
>>>> I tried splitting the command by loading the data set initially and
>>>> filtering subsequently.. but that didnt work.
>>>>
>>>> I also tried to change value of "r(0)" to "r(1)" in that step. But i
>>>> still get the same error.
>>>>
>>>> Any help is really appreciated.
>>>>
>>>> -Sudhir
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Trouble with MLbase exercise Documentation

Posted by "Evan R. Sparks" <ev...@gmail.com>.

Ah - how have you configured your machine for spark? Inside of a docker
container?

The .numRows will actually need to run through the entire file in sequence
(it calls rdd.count() under the hood) - 10 minutes sounds a little long but
not unreasonable if on a single machine.


On Thu, Nov 21, 2013 at 9:24 AM, sudhir vaidya <v....@gmail.com> wrote:

> Hey Evan,
>
> I do get the output when i load the file. I also see an output when i do
> the "x.take(5)" command.
>
> But x.numRows takes a long time to execute.. i waited for like 10 mins ...
> and had to do a Ctrl + C. My take on that is.. since the file is around 40
> Gigs and I am running it on a quadcore machine (not a very high end machine
> and its just one machine and not a cluster).. maybe it takes a lot more
> time... I am not sure though...
>
> Regards,
> Sudhir
>
>
> On Thu, Nov 21, 2013 at 11:18 AM, Evan R. Sparks <ev...@gmail.com>wrote:
>
>> What happens when you do:
>> val x = mc.loadFile("/enwiki_txt")
>>
>> and then
>> x.numRows
>> or
>> x.take(5)
>>
>> Do you see output there?
>>
>>
>>
>> On Wed, Nov 20, 2013 at 4:41 PM, sudhir vaidya <v....@gmail.com>wrote:
>>
>>> I am a beginner and have started to go through the Mlbase exercises.
>>>
>>> But i get a java.io.indexoutofbounds.exception when i run the first
>>> command of step 2.1 here :
>>>
>>> http://ampcamp.berkeley.edu/3/exercises/mli-document-categorization.html
>>>
>>> All i am doing is Copying the command and pasting it to the spark shell
>>> interface.
>>>
>>> I tried splitting the command by loading the data set initially and
>>> filtering subsequently.. but that didnt work.
>>>
>>> I also tried to change value of "r(0)" to "r(1)" in that step. But i
>>> still get the same error.
>>>
>>> Any help is really appreciated.
>>>
>>> -Sudhir
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Trouble with MLbase exercise Documentation

Posted by sudhir vaidya <v....@gmail.com>.

Hey Evan,

I do get the output when i load the file. I also see an output when i do
the "x.take(5)" command.

But x.numRows takes a long time to execute.. i waited for like 10 mins ...
and had to do a Ctrl + C. My take on that is.. since the file is around 40
Gigs and I am running it on a quadcore machine (not a very high end machine
and its just one machine and not a cluster).. maybe it takes a lot more
time... I am not sure though...

Regards,
Sudhir

On Thu, Nov 21, 2013 at 11:18 AM, Evan R. Sparks <ev...@gmail.com>wrote:

> What happens when you do:
> val x = mc.loadFile("/enwiki_txt")
>
> and then
> x.numRows
> or
> x.take(5)
>
> Do you see output there?
>
>
>
> On Wed, Nov 20, 2013 at 4:41 PM, sudhir vaidya <v....@gmail.com>wrote:
>
>> I am a beginner and have started to go through the Mlbase exercises.
>>
>> But i get a java.io.indexoutofbounds.exception when i run the first
>> command of step 2.1 here :
>>
>> http://ampcamp.berkeley.edu/3/exercises/mli-document-categorization.html
>>
>> All i am doing is Copying the command and pasting it to the spark shell
>> interface.
>>
>> I tried splitting the command by loading the data set initially and
>> filtering subsequently.. but that didnt work.
>>
>> I also tried to change value of "r(0)" to "r(1)" in that step. But i
>> still get the same error.
>>
>> Any help is really appreciated.
>>
>> -Sudhir
>>
>>
>>
>>
>>
>

Re: Trouble with MLbase exercise Documentation

Posted by "Evan R. Sparks" <ev...@gmail.com>.

What happens when you do:
val x = mc.loadFile("/enwiki_txt")

and then
x.numRows
or
x.take(5)

Do you see output there?



On Wed, Nov 20, 2013 at 4:41 PM, sudhir vaidya <v....@gmail.com> wrote:

> I am a beginner and have started to go through the Mlbase exercises.
>
> But i get a java.io.indexoutofbounds.exception when i run the first
> command of step 2.1 here :
>
> http://ampcamp.berkeley.edu/3/exercises/mli-document-categorization.html
>
> All i am doing is Copying the command and pasting it to the spark shell
> interface.
>
> I tried splitting the command by loading the data set initially and
> filtering subsequently.. but that didnt work.
>
> I also tried to change value of "r(0)" to "r(1)" in that step. But i still
> get the same error.
>
> Any help is really appreciated.
>
> -Sudhir
>
>
>
>
>