You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Tim Harsch <th...@cray.com> on 2014/12/12 01:06:20 UTC

running the Terasort example

Hi all,
I just joined the list, so I don¹t have a message history that would allow
me to reply to this post:
http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-example-
td9284.html

I am interested in running the terasort example.  I cloned the repo
https://github.com/ehiggs/spark and did checkout of the terasort branch.
In the above referenced post Ewan gives the example

# Generate 1M 100 byte records:
  ./bin/run-example terasort.TeraGen 100M ~/data/terasort_in


I don¹t see a ³run-example² in that repo.  I¹m sure I am missing something
basic, or less likely, maybe some changes weren¹t pushed?

Thanks for any help,
Tim


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: running the Terasort example

Posted by Tim Harsch <th...@cray.com>.
On 12/16/14, 11:42 PM, "Ewan Higgs" <ew...@ugent.be> wrote:

>Hi Tim,
>
>> On 16 Dec 2014, at 19:27, Tim Harsch <th...@cray.com> wrote:
>> 
>> Hi Ewan,
>> Thanks, I think I was just a bit confused at the time, I was looking at
>> the spark-perf repo when there was the problem (uh.. ok)…
>> 
>The PR that I am working on is indeed for spark-perf.
Yes but the example usage you gave, is for the code in ehiggs/spark (which
is where I got myself confused)

? git remote show origin
* remote origin
  Fetch URL: git@github.com:ehiggs/spark.git
  Push  URL: git@github.com:ehiggs/spark.git
…

? ll bin/run-example
-rwxr-xr-x  1 tharsch  513   2.1K Dec 11 21:02 bin/run-example


run-example is not in spark-perf, What is the expected usage, for the code
that is in spark-perf?  I’m hoping I’ll have time to run it later today,
so hopefully I will figure it out on my own.



> 
>
>> …snip...
>> 
>> 
>> I can get past this by setting hadoop.version to 2.5.0 in the parent
>>pom.
>> 
>I wasn’t sure how to get this working across all the Hadoop versions so I
>made it work with 2.4.0 and above. If you have advice on back porting
>this then I’m happy to implement it.

I would like to try, hopefully I can find the time.

>
>NB, TeraValidate may not be functioning appropriately. If you have
>trouble with it, I recommend using the Hadoop version.

Thanks for the warning, I bet I could have banged my head on that for
hours.

>
>Yours,
>Ewan
>
>> Thanks,
>> Tim
>> 
>> 
>> On 12/16/14, 12:38 AM, "Ewan Higgs" <ew...@ugent.be> wrote:
>> 
>>> Hi Tim,
>>> run-example is here:
>>> https://github.com/ehiggs/spark/blob/terasort/bin/run-example
>>> 
>>> It should be in the repository that you cloned. So if you were at the
>>> top level of the checkout, run-example would be run as
>>>./bin/run-example.
>>> 
>>> Yours,
>>> Ewan Higgs
>>> 
>>> On 12/12/14 01:06, Tim Harsch wrote:
>>>> Hi all,
>>>> I just joined the list, so I don¹t have a message history that would
>>>> allow
>>>> me to reply to this post:
>>>> 
>>>> 
>>>>http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-exam
>>>>pl
>>>> e-
>>>> td9284.html
>>>> 
>>>> I am interested in running the terasort example.  I cloned the repo
>>>> https://github.com/ehiggs/spark and did checkout of the terasort
>>>>branch.
>>>> In the above referenced post Ewan gives the example
>>>> 
>>>> # Generate 1M 100 byte records:
>>>>   ./bin/run-example terasort.TeraGen 100M ~/data/terasort_in
>>>> 
>>>> 
>>>> I don¹t see a ³run-example² in that repo.  I¹m sure I am missing
>>>> something
>>>> basic, or less likely, maybe some changes weren¹t pushed?
>>>> 
>>>> Thanks for any help,
>>>> Tim
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>> 
>>> 
>> 
>


Re: running the Terasort example

Posted by Ewan Higgs <ew...@ugent.be>.
Hi Tim,

> On 16 Dec 2014, at 19:27, Tim Harsch <th...@cray.com> wrote:
> 
> Hi Ewan,
> Thanks, I think I was just a bit confused at the time, I was looking at
> the spark-perf repo when there was the problem (uh.. ok)…
> 
The PR that I am working on is indeed for spark-perf. 

> …snip...
> 
> 
> I can get past this by setting hadoop.version to 2.5.0 in the parent pom.
> 
I wasn’t sure how to get this working across all the Hadoop versions so I made it work with 2.4.0 and above. If you have advice on back porting this then I’m happy to implement it.

NB, TeraValidate may not be functioning appropriately. If you have trouble with it, I recommend using the Hadoop version.

Yours,
Ewan

> Thanks,
> Tim
> 
> 
> On 12/16/14, 12:38 AM, "Ewan Higgs" <ew...@ugent.be> wrote:
> 
>> Hi Tim,
>> run-example is here:
>> https://github.com/ehiggs/spark/blob/terasort/bin/run-example
>> 
>> It should be in the repository that you cloned. So if you were at the
>> top level of the checkout, run-example would be run as ./bin/run-example.
>> 
>> Yours,
>> Ewan Higgs
>> 
>> On 12/12/14 01:06, Tim Harsch wrote:
>>> Hi all,
>>> I just joined the list, so I don¹t have a message history that would
>>> allow
>>> me to reply to this post:
>>> 
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-exampl
>>> e-
>>> td9284.html
>>> 
>>> I am interested in running the terasort example.  I cloned the repo
>>> https://github.com/ehiggs/spark and did checkout of the terasort branch.
>>> In the above referenced post Ewan gives the example
>>> 
>>> # Generate 1M 100 byte records:
>>>   ./bin/run-example terasort.TeraGen 100M ~/data/terasort_in
>>> 
>>> 
>>> I don¹t see a ³run-example² in that repo.  I¹m sure I am missing
>>> something
>>> basic, or less likely, maybe some changes weren¹t pushed?
>>> 
>>> Thanks for any help,
>>> Tim
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>> 
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: running the Terasort example

Posted by Tim Harsch <th...@cray.com>.
Hi Ewan,
Thanks, I think I was just a bit confused at the time, I was looking at
the spark-perf repo when there was the problem (uh.. ok)…

I notice now with a pull down just minutes back that I still get a compile
problem. 
[ERROR] 
/Users/tharsch/git/ehiggs/spark/examples/src/main/scala/org/apache/spark/ex
amples/terasort/TeraInputFormat.scala:40: object task is not a member of
package org.apache.hadoop.mapreduce
[ERROR] import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
[ERROR]                                    ^
[ERROR] 
/Users/tharsch/git/ehiggs/spark/examples/src/main/scala/org/apache/spark/ex
amples/terasort/TeraInputFormat.scala:132: not found: type
TaskAttemptContextImpl
[ERROR]             val context = new TaskAttemptContextImpl(
[ERROR]                               ^
[ERROR] 
/Users/tharsch/git/ehiggs/spark/examples/src/main/scala/org/apache/spark/ex
amples/terasort/TeraOutputFormat.scala:76: value hsync is not a member of
org.apache.hadoop.fs.FSDataOutputStream
[ERROR]         out.hsync();
[ERROR]             ^




I can get past this by setting hadoop.version to 2.5.0 in the parent pom.

Thanks,
Tim


On 12/16/14, 12:38 AM, "Ewan Higgs" <ew...@ugent.be> wrote:

>Hi Tim,
>run-example is here:
>https://github.com/ehiggs/spark/blob/terasort/bin/run-example
>
>It should be in the repository that you cloned. So if you were at the
>top level of the checkout, run-example would be run as ./bin/run-example.
>
>Yours,
>Ewan Higgs
>
>On 12/12/14 01:06, Tim Harsch wrote:
>> Hi all,
>> I just joined the list, so I don¹t have a message history that would
>>allow
>> me to reply to this post:
>> 
>>http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-exampl
>>e-
>> td9284.html
>>
>> I am interested in running the terasort example.  I cloned the repo
>> https://github.com/ehiggs/spark and did checkout of the terasort branch.
>> In the above referenced post Ewan gives the example
>>
>> # Generate 1M 100 byte records:
>>    ./bin/run-example terasort.TeraGen 100M ~/data/terasort_in
>>
>>
>> I don¹t see a ³run-example² in that repo.  I¹m sure I am missing
>>something
>> basic, or less likely, maybe some changes weren¹t pushed?
>>
>> Thanks for any help,
>> Tim
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>


Re: running the Terasort example

Posted by Ewan Higgs <ew...@ugent.be>.
Hi Tim,
run-example is here:
https://github.com/ehiggs/spark/blob/terasort/bin/run-example

It should be in the repository that you cloned. So if you were at the 
top level of the checkout, run-example would be run as ./bin/run-example.

Yours,
Ewan Higgs

On 12/12/14 01:06, Tim Harsch wrote:
> Hi all,
> I just joined the list, so I don¹t have a message history that would allow
> me to reply to this post:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-example-
> td9284.html
>
> I am interested in running the terasort example.  I cloned the repo
> https://github.com/ehiggs/spark and did checkout of the terasort branch.
> In the above referenced post Ewan gives the example
>
> # Generate 1M 100 byte records:
>    ./bin/run-example terasort.TeraGen 100M ~/data/terasort_in
>
>
> I don¹t see a ³run-example² in that repo.  I¹m sure I am missing something
> basic, or less likely, maybe some changes weren¹t pushed?
>
> Thanks for any help,
> Tim
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org