You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Simone Leo <si...@crs4.it> on 2012/11/13 14:11:17 UTC

Pydoop 0.7.0-rc1 released

Hello everyone,

we're happy to announce that we have just released Pydoop 0.7.0-rc1
(http://pydoop.sourceforge.net).

The main changes with respect to the previous version are:

  * support for CDH4 (MapReduce v1 only)
  * tested with the latest releases of other supported Hadoop versions
  * simpler build process
  * Pydoop scripts can now accept user-defined configuration parameters
  * new wrapper object makes it easier to interact with the JobConf
  * new hdfs.path functions: isdir, isfile, kind
  * HDFS: support for string description of permission modes in chmod
  * several bug fixes

This is a release candidate.  We're working on binary packages for the 
final release.  As usual, we're happy to receive your feedback on the forum:

http://sourceforge.net/projects/pydoop/forums/forum/990018

Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++
Pipes and the C libhdfs APIs, that allows to write full-fledged 
MapReduce applications with HDFS access. Pydoop has been maturing nicely 
and is currently in production use at CRS4 as we have a few scientific 
projects that are based on it, including Seal
(http://biodoop-seal.sourceforge.net), Biodoop-BLAST 
(http://biodoop.sourceforge.net/blast), and more yet to be released.

Links:

  * full release notes: http://pydoop.sourceforge.net/docs/news.html
  * download page on sf: http://sourceforge.net/projects/pydoop/files
  * download page on PyPI: http://pypi.python.org/pypi/pydoop/0.7.0-rc1
  * git repo: 
http://pydoop.git.sourceforge.net/git/gitweb.cgi?p=pydoop/pydoop;a=summary

Happy pydooping!


The Pydoop Team

-- 
Simone Leo
Data Fusion - Distributed Computing
CRS4
POLARIS - Building #1
Piscina Manna
I-09010 Pula (CA) - Italy
e-mail: simone.leo@crs4.it
http://www.crs4.it

Re: Pydoop 0.7.0-rc1 released

Posted by Luca Pireddu <pi...@crs4.it>.

On 11/16/2012 10:02 PM, Bart Verwilst wrote:
> Hi Simone,
>
> I was wondering, is it possible to write AVRO files to hadoop straight
> from your lib ( mixed with avro libs ofcourse )? I'm currently trying to
> come up with a way to read from mysql ( but more complicated than sqoop
> can handle ) and write it out to avro files on HDFS. Is something like
> this feasible with this? How do you see it?
>
> Thanks!
>
> Bart

Hello,

you could use a record writer that uses the python-avro package 
(http://pypi.python.org/pypi/avro/1.7.2).  Unfortunately I've seen a few 
complaints about its speed.  For an example of a RecordWriter 
implemented in Python see wordcount-full in the Pydoop examples.

If that solution turns out  it's too slow for you, you may consider 
writing a Java record writer that uses the standard Avro implementation.

In either case, you'll have to get data to it from your reducers to the 
record writer.  Pydoop only supports emitting byte streams, so you'll 
have to serialize your data as a string of some sort, pass it to pydoop, 
receive it in the RecordWriter where you'll de-serialize it and then 
pass it to the Avro library.

-- 
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
09010 Pula (CA), Italy
Tel: +39 0709250452

Re: Pydoop 0.7.0-rc1 released

Posted by Luca Pireddu <pi...@crs4.it>.

On 11/16/2012 10:02 PM, Bart Verwilst wrote:
> Hi Simone,
>
> I was wondering, is it possible to write AVRO files to hadoop straight
> from your lib ( mixed with avro libs ofcourse )? I'm currently trying to
> come up with a way to read from mysql ( but more complicated than sqoop
> can handle ) and write it out to avro files on HDFS. Is something like
> this feasible with this? How do you see it?
>
> Thanks!
>
> Bart

Hello,

you could use a record writer that uses the python-avro package 
(http://pypi.python.org/pypi/avro/1.7.2).  Unfortunately I've seen a few 
complaints about its speed.  For an example of a RecordWriter 
implemented in Python see wordcount-full in the Pydoop examples.

If that solution turns out  it's too slow for you, you may consider 
writing a Java record writer that uses the standard Avro implementation.

In either case, you'll have to get data to it from your reducers to the 
record writer.  Pydoop only supports emitting byte streams, so you'll 
have to serialize your data as a string of some sort, pass it to pydoop, 
receive it in the RecordWriter where you'll de-serialize it and then 
pass it to the Avro library.

-- 
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
09010 Pula (CA), Italy
Tel: +39 0709250452

Re: Pydoop 0.7.0-rc1 released

Posted by Luca Pireddu <pi...@crs4.it>.

On 11/16/2012 10:02 PM, Bart Verwilst wrote:
> Hi Simone,
>
> I was wondering, is it possible to write AVRO files to hadoop straight
> from your lib ( mixed with avro libs ofcourse )? I'm currently trying to
> come up with a way to read from mysql ( but more complicated than sqoop
> can handle ) and write it out to avro files on HDFS. Is something like
> this feasible with this? How do you see it?
>
> Thanks!
>
> Bart

Hello,

you could use a record writer that uses the python-avro package 
(http://pypi.python.org/pypi/avro/1.7.2).  Unfortunately I've seen a few 
complaints about its speed.  For an example of a RecordWriter 
implemented in Python see wordcount-full in the Pydoop examples.

If that solution turns out  it's too slow for you, you may consider 
writing a Java record writer that uses the standard Avro implementation.

In either case, you'll have to get data to it from your reducers to the 
record writer.  Pydoop only supports emitting byte streams, so you'll 
have to serialize your data as a string of some sort, pass it to pydoop, 
receive it in the RecordWriter where you'll de-serialize it and then 
pass it to the Avro library.

-- 
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
09010 Pula (CA), Italy
Tel: +39 0709250452

Re: Pydoop 0.7.0-rc1 released

Posted by Luca Pireddu <pi...@crs4.it>.

On 11/16/2012 10:02 PM, Bart Verwilst wrote:
> Hi Simone,
>
> I was wondering, is it possible to write AVRO files to hadoop straight
> from your lib ( mixed with avro libs ofcourse )? I'm currently trying to
> come up with a way to read from mysql ( but more complicated than sqoop
> can handle ) and write it out to avro files on HDFS. Is something like
> this feasible with this? How do you see it?
>
> Thanks!
>
> Bart

Hello,

you could use a record writer that uses the python-avro package 
(http://pypi.python.org/pypi/avro/1.7.2).  Unfortunately I've seen a few 
complaints about its speed.  For an example of a RecordWriter 
implemented in Python see wordcount-full in the Pydoop examples.

If that solution turns out  it's too slow for you, you may consider 
writing a Java record writer that uses the standard Avro implementation.

In either case, you'll have to get data to it from your reducers to the 
record writer.  Pydoop only supports emitting byte streams, so you'll 
have to serialize your data as a string of some sort, pass it to pydoop, 
receive it in the RecordWriter where you'll de-serialize it and then 
pass it to the Avro library.

-- 
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
09010 Pula (CA), Italy
Tel: +39 0709250452

Re: Pydoop 0.7.0-rc1 released

Posted by Bart Verwilst <li...@verwilst.be>.

Hi Simone,

I was wondering, is it possible to write AVRO files to hadoop straight 
from your lib ( mixed with avro libs ofcourse )? I'm currently trying to 
come up with a way to read from mysql ( but more complicated than sqoop 
can handle ) and write it out to avro files on HDFS. Is something like 
this feasible with this? How do you see it?

Thanks!

Bart

Simone Leo schreef op 13.11.2012 14:11:
> Hello everyone,
>
> we're happy to announce that we have just released Pydoop 0.7.0-rc1
> (http://pydoop.sourceforge.net).
>
> The main changes with respect to the previous version are:
>
>  * support for CDH4 (MapReduce v1 only)
>  * tested with the latest releases of other supported Hadoop versions
>  * simpler build process
>  * Pydoop scripts can now accept user-defined configuration 
> parameters
>  * new wrapper object makes it easier to interact with the JobConf
>  * new hdfs.path functions: isdir, isfile, kind
>  * HDFS: support for string description of permission modes in chmod
>  * several bug fixes
>
> This is a release candidate.  We're working on binary packages for
> the final release.  As usual, we're happy to receive your feedback on
> the forum:
>
> http://sourceforge.net/projects/pydoop/forums/forum/990018
>
> Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the 
> C++
> Pipes and the C libhdfs APIs, that allows to write full-fledged
> MapReduce applications with HDFS access. Pydoop has been maturing
> nicely and is currently in production use at CRS4 as we have a few
> scientific projects that are based on it, including Seal
> (http://biodoop-seal.sourceforge.net), Biodoop-BLAST
> (http://biodoop.sourceforge.net/blast), and more yet to be released.
>
> Links:
>
>  * full release notes: http://pydoop.sourceforge.net/docs/news.html
>  * download page on sf: http://sourceforge.net/projects/pydoop/files
>  * download page on PyPI: 
> http://pypi.python.org/pypi/pydoop/0.7.0-rc1
>  * git repo:
> 
> http://pydoop.git.sourceforge.net/git/gitweb.cgi?p=pydoop/pydoop;a=summary
>
> Happy pydooping!
>
>
> The Pydoop Team

Re: Pydoop 0.7.0-rc1 released

Posted by Bart Verwilst <li...@verwilst.be>.

Hi Simone,

I was wondering, is it possible to write AVRO files to hadoop straight 
from your lib ( mixed with avro libs ofcourse )? I'm currently trying to 
come up with a way to read from mysql ( but more complicated than sqoop 
can handle ) and write it out to avro files on HDFS. Is something like 
this feasible with this? How do you see it?

Thanks!

Bart

Simone Leo schreef op 13.11.2012 14:11:
> Hello everyone,
>
> we're happy to announce that we have just released Pydoop 0.7.0-rc1
> (http://pydoop.sourceforge.net).
>
> The main changes with respect to the previous version are:
>
>  * support for CDH4 (MapReduce v1 only)
>  * tested with the latest releases of other supported Hadoop versions
>  * simpler build process
>  * Pydoop scripts can now accept user-defined configuration 
> parameters
>  * new wrapper object makes it easier to interact with the JobConf
>  * new hdfs.path functions: isdir, isfile, kind
>  * HDFS: support for string description of permission modes in chmod
>  * several bug fixes
>
> This is a release candidate.  We're working on binary packages for
> the final release.  As usual, we're happy to receive your feedback on
> the forum:
>
> http://sourceforge.net/projects/pydoop/forums/forum/990018
>
> Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the 
> C++
> Pipes and the C libhdfs APIs, that allows to write full-fledged
> MapReduce applications with HDFS access. Pydoop has been maturing
> nicely and is currently in production use at CRS4 as we have a few
> scientific projects that are based on it, including Seal
> (http://biodoop-seal.sourceforge.net), Biodoop-BLAST
> (http://biodoop.sourceforge.net/blast), and more yet to be released.
>
> Links:
>
>  * full release notes: http://pydoop.sourceforge.net/docs/news.html
>  * download page on sf: http://sourceforge.net/projects/pydoop/files
>  * download page on PyPI: 
> http://pypi.python.org/pypi/pydoop/0.7.0-rc1
>  * git repo:
> 
> http://pydoop.git.sourceforge.net/git/gitweb.cgi?p=pydoop/pydoop;a=summary
>
> Happy pydooping!
>
>
> The Pydoop Team

Re: Pydoop 0.7.0-rc1 released

Posted by Bart Verwilst <li...@verwilst.be>.

Hi Simone,

I was wondering, is it possible to write AVRO files to hadoop straight 
from your lib ( mixed with avro libs ofcourse )? I'm currently trying to 
come up with a way to read from mysql ( but more complicated than sqoop 
can handle ) and write it out to avro files on HDFS. Is something like 
this feasible with this? How do you see it?

Thanks!

Bart

Simone Leo schreef op 13.11.2012 14:11:
> Hello everyone,
>
> we're happy to announce that we have just released Pydoop 0.7.0-rc1
> (http://pydoop.sourceforge.net).
>
> The main changes with respect to the previous version are:
>
>  * support for CDH4 (MapReduce v1 only)
>  * tested with the latest releases of other supported Hadoop versions
>  * simpler build process
>  * Pydoop scripts can now accept user-defined configuration 
> parameters
>  * new wrapper object makes it easier to interact with the JobConf
>  * new hdfs.path functions: isdir, isfile, kind
>  * HDFS: support for string description of permission modes in chmod
>  * several bug fixes
>
> This is a release candidate.  We're working on binary packages for
> the final release.  As usual, we're happy to receive your feedback on
> the forum:
>
> http://sourceforge.net/projects/pydoop/forums/forum/990018
>
> Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the 
> C++
> Pipes and the C libhdfs APIs, that allows to write full-fledged
> MapReduce applications with HDFS access. Pydoop has been maturing
> nicely and is currently in production use at CRS4 as we have a few
> scientific projects that are based on it, including Seal
> (http://biodoop-seal.sourceforge.net), Biodoop-BLAST
> (http://biodoop.sourceforge.net/blast), and more yet to be released.
>
> Links:
>
>  * full release notes: http://pydoop.sourceforge.net/docs/news.html
>  * download page on sf: http://sourceforge.net/projects/pydoop/files
>  * download page on PyPI: 
> http://pypi.python.org/pypi/pydoop/0.7.0-rc1
>  * git repo:
> 
> http://pydoop.git.sourceforge.net/git/gitweb.cgi?p=pydoop/pydoop;a=summary
>
> Happy pydooping!
>
>
> The Pydoop Team

Re: Pydoop 0.7.0-rc1 released

Posted by Bart Verwilst <li...@verwilst.be>.

Hi Simone,

I was wondering, is it possible to write AVRO files to hadoop straight 
from your lib ( mixed with avro libs ofcourse )? I'm currently trying to 
come up with a way to read from mysql ( but more complicated than sqoop 
can handle ) and write it out to avro files on HDFS. Is something like 
this feasible with this? How do you see it?

Thanks!

Bart

Simone Leo schreef op 13.11.2012 14:11:
> Hello everyone,
>
> we're happy to announce that we have just released Pydoop 0.7.0-rc1
> (http://pydoop.sourceforge.net).
>
> The main changes with respect to the previous version are:
>
>  * support for CDH4 (MapReduce v1 only)
>  * tested with the latest releases of other supported Hadoop versions
>  * simpler build process
>  * Pydoop scripts can now accept user-defined configuration 
> parameters
>  * new wrapper object makes it easier to interact with the JobConf
>  * new hdfs.path functions: isdir, isfile, kind
>  * HDFS: support for string description of permission modes in chmod
>  * several bug fixes
>
> This is a release candidate.  We're working on binary packages for
> the final release.  As usual, we're happy to receive your feedback on
> the forum:
>
> http://sourceforge.net/projects/pydoop/forums/forum/990018
>
> Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the 
> C++
> Pipes and the C libhdfs APIs, that allows to write full-fledged
> MapReduce applications with HDFS access. Pydoop has been maturing
> nicely and is currently in production use at CRS4 as we have a few
> scientific projects that are based on it, including Seal
> (http://biodoop-seal.sourceforge.net), Biodoop-BLAST
> (http://biodoop.sourceforge.net/blast), and more yet to be released.
>
> Links:
>
>  * full release notes: http://pydoop.sourceforge.net/docs/news.html
>  * download page on sf: http://sourceforge.net/projects/pydoop/files
>  * download page on PyPI: 
> http://pypi.python.org/pypi/pydoop/0.7.0-rc1
>  * git repo:
> 
> http://pydoop.git.sourceforge.net/git/gitweb.cgi?p=pydoop/pydoop;a=summary
>
> Happy pydooping!
>
>
> The Pydoop Team