You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@crunch.apache.org by John Jensen <je...@richrelevance.com> on 2013/03/02 21:43:42 UTC

Current state of scrunch

Hey,

I am considering taking a closer look at migrating some of our existing cruch code to scala, and I was wondering about the current state of the scrunch code.
Is it generally being kept in synch with the mainline crunch development?

I assume most functionality is being delegated to the underlying crunch implementation but presumably there is still work needed as features are added to crunch. No?

-- John

Re: Current state of scrunch

Posted by Josh Wills <jw...@cloudera.com>.

Hey John,

Not a silly question, nor a beginner one. We have a maven profile in
Scrunch that builds all of the packaging you need to run it from the scala
interpreter, you do it via:

mvn clean package -P scrunch

in the crunch-scrunch directory, and it will create
a crunch-scrunch-<version>-release.tar.gz file in the target/ directory
with all of the scripts and libs setup for you to run from the interpreter
(at least as of 2.9.2). Note that you'll need to specify
-Dcrunch.platform=2 in order to have the build be against the Hadoop 2.x
APIs (e.g., if you're using CDH4).

J



On Sat, Mar 2, 2013 at 3:29 PM, John Jensen <je...@richrelevance.com>wrote:

>
>  Thanks. I'm just a little seduced by the syntactical simplicity of
> writing in scala, so I figured I'd take a look.
>
>  BTW, (silly beginner question) do you have any pointers on how to run
> scrunch from the scala interpreter.
>
>  If I just try something like:
> ../scala-2.9.3/bin/scala -classpath `hadoop
> classpath`:0.5.0-incubating.jar:crunch-scrunch-0.5.0-incubating.jar:lib/guava-11.0.2.jar:`hadoop
> classpath`:lib/avro-1.7.0.jar:lib/avro-mapred-1.7.0.jar
>
>  scala> val pipeline = Pipeline.mapReduce
>
>  I get
>  java.io.IOException: No FileSystem for scheme: file
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2250)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2257)
>  …
>
>  I figure there must be a simpler way.
>
>
>   From: Josh Wills <jw...@cloudera.com>
> Reply-To: "user@crunch.apache.org" <us...@crunch.apache.org>
> Date: Saturday, March 2, 2013 1:06 PM
> To: "user@crunch.apache.org" <us...@crunch.apache.org>
> Subject: Re: Current state of scrunch
>
>   Hey John,
>
>  I think Scrunch has a good foundation right now, but yes, there is some
> work to do to expose new functionality in the Java APIs. I'd like to spend
> some more time on Scrunch for the next release, so if you come across
> something you need, please let me know and I'll add it straightaway.
>
>  J
>
>
> On Sat, Mar 2, 2013 at 12:43 PM, John Jensen <je...@richrelevance.com>wrote:
>
>>
>>  Hey,
>>
>>  I am considering taking a closer look at migrating some of our existing
>> cruch code to scala, and I was wondering about the current state of the
>> scrunch code.
>> Is it generally being kept in synch with the mainline crunch development?
>>
>>  I assume most functionality is being delegated to the underlying crunch
>> implementation but presumably there is still work needed as features are
>> added to crunch. No?
>>
>>  -- John
>>
>>
>
>
>  --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Current state of scrunch

Posted by John Jensen <je...@richrelevance.com>.

Thanks. I'm just a little seduced by the syntactical simplicity of writing in scala, so I figured I'd take a look.

BTW, (silly beginner question) do you have any pointers on how to run scrunch from the scala interpreter.

If I just try something like:
../scala-2.9.3/bin/scala -classpath `hadoop classpath`:0.5.0-incubating.jar:crunch-scrunch-0.5.0-incubating.jar:lib/guava-11.0.2.jar:`hadoop classpath`:lib/avro-1.7.0.jar:lib/avro-mapred-1.7.0.jar

scala> val pipeline = Pipeline.mapReduce

I get
java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2250)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2257)
…

I figure there must be a simpler way.


From: Josh Wills <jw...@cloudera.com>>
Reply-To: "user@crunch.apache.org<ma...@crunch.apache.org>" <us...@crunch.apache.org>>
Date: Saturday, March 2, 2013 1:06 PM
To: "user@crunch.apache.org<ma...@crunch.apache.org>" <us...@crunch.apache.org>>
Subject: Re: Current state of scrunch

Hey John,

I think Scrunch has a good foundation right now, but yes, there is some work to do to expose new functionality in the Java APIs. I'd like to spend some more time on Scrunch for the next release, so if you come across something you need, please let me know and I'll add it straightaway.

J


On Sat, Mar 2, 2013 at 12:43 PM, John Jensen <je...@richrelevance.com>> wrote:

Hey,

I am considering taking a closer look at migrating some of our existing cruch code to scala, and I was wondering about the current state of the scrunch code.
Is it generally being kept in synch with the mainline crunch development?

I assume most functionality is being delegated to the underlying crunch implementation but presumably there is still work needed as features are added to crunch. No?

-- John




--
Director of Data Science
Cloudera<http://www.cloudera.com>
Twitter: @josh_wills<http://twitter.com/josh_wills>

Re: Current state of scrunch

Posted by Josh Wills <jw...@cloudera.com>.

Hey John,

I think Scrunch has a good foundation right now, but yes, there is some
work to do to expose new functionality in the Java APIs. I'd like to spend
some more time on Scrunch for the next release, so if you come across
something you need, please let me know and I'll add it straightaway.

J

On Sat, Mar 2, 2013 at 12:43 PM, John Jensen <je...@richrelevance.com>wrote:

>
>  Hey,
>
>  I am considering taking a closer look at migrating some of our existing
> cruch code to scala, and I was wondering about the current state of the
> scrunch code.
> Is it generally being kept in synch with the mainline crunch development?
>
>  I assume most functionality is being delegated to the underlying crunch
> implementation but presumably there is still work needed as features are
> added to crunch. No?
>
>  -- John
>
>

-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>