You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Julian Bui <ju...@gmail.com> on 2013/03/04 01:02:43 UTC

hadoop pipes or java with jni?

Hi hadoop users,

Trying to figure out which interface would be best and easiest to implement
my application : 1) hadoop pipes or 2) java with jni or 3) something else
that I'm not aware of yet, as a hadoop newbie.

I will use hadoop to take pictures as input and create output jpeg pictures
as output.  I do not think I need a reducer.

Requirements:

   - I want to use libjpeg.a (a native static library) in my hadoop
   application.  If I use hadoop pipes, I should be able to statically link
   the libjpeg.a into my hadoop application.  If I use the java hadoop
   interface with jni, I think I have to ship the libjpeg.a library with my
   hadoop jobs, is that right?  Is that easy?
   - I need to be able to write uniquely named files into hdfs (i.e. I need
   to name the files so that I know which inputs they were created from).  If
   I recall, the hadoop streaming interface doesn't let you do this because it
   only deals with stdin/stdout - does hadoop pipes have a similar constraint?
    Will it allow me to write uniquely named files?
   - I need to be able to exploit the locality of the data.  The
   application should be executed on the same machine as the input data
   (pictures).  Does the hadoop pipes interface allow me to do this?


Other questions:

   - When I tried to learn more about the hadoop pipes API, all I could
   find is this one submitter class.  Is this really it or is there more?
   http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
   - I'm not really familiar with swig which is to be used with pipes.
   All I could really find was the same simple word count example on every
   site.  Does swig get difficult to use for more complex projects?


Thanks,
-Julian

Re: hadoop pipes or java with jni?

Posted by Charles Earl <ch...@gmail.com>.
Julian,
It has been my experience that jni may be a cause for concern if you
multiple complex native libraries. You might want to run with valgrind if
possible to verify that there are no memory issues. Given that you are
using the single library, this should not be an issue.
Uniquely named files are possible in pipes mappers / reducers; you may want
to examine the HadoopPipes::MapContext context input split name returned
from the context's getInputSplit() method and develop a naming scheme based
upon that value.



On Sun, Mar 3, 2013 at 9:04 PM, Michael Segel <mi...@hotmail.com>wrote:

> I'm partial to using Java and JNI and then use the distributed cache to
> push the native libraries out to each node if not already there.
>
> But that's just me... ;-)
>
> HTH
>
> -Mike
>
> On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:
>
> Hi hadoop users,
>
> Trying to figure out which interface would be best and easiest to
> implement my application : 1) hadoop pipes or 2) java with jni or 3)
> something else that I'm not aware of yet, as a hadoop newbie.
>
> I will use hadoop to take pictures as input and create output jpeg
> pictures as output.  I do not think I need a reducer.
>
> Requirements:
>
>    - I want to use libjpeg.a (a native static library) in my hadoop
>    application.  If I use hadoop pipes, I should be able to statically link
>    the libjpeg.a into my hadoop application.  If I use the java hadoop
>    interface with jni, I think I have to ship the libjpeg.a library with my
>    hadoop jobs, is that right?  Is that easy?
>    - I need to be able to write uniquely named files into hdfs (i.e. I
>    need to name the files so that I know which inputs they were created from).
>     If I recall, the hadoop streaming interface doesn't let you do this
>    because it only deals with stdin/stdout - does hadoop pipes have a similar
>    constraint?  Will it allow me to write uniquely named files?
>    - I need to be able to exploit the locality of the data.  The
>    application should be executed on the same machine as the input data
>    (pictures).  Does the hadoop pipes interface allow me to do this?
>
>
> Other questions:
>
>    - When I tried to learn more about the hadoop pipes API, all I could
>    find is this one submitter class.  Is this really it or is there more?
>    http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
>    - I'm not really familiar with swig which is to be used with pipes.
>    All I could really find was the same simple word count example on every
>    site.  Does swig get difficult to use for more complex projects?
>
>
> Thanks,
> -Julian
>
>
>
>


-- 
- Charles

Re: hadoop pipes or java with jni?

Posted by Charles Earl <ch...@gmail.com>.
Julian,
It has been my experience that jni may be a cause for concern if you
multiple complex native libraries. You might want to run with valgrind if
possible to verify that there are no memory issues. Given that you are
using the single library, this should not be an issue.
Uniquely named files are possible in pipes mappers / reducers; you may want
to examine the HadoopPipes::MapContext context input split name returned
from the context's getInputSplit() method and develop a naming scheme based
upon that value.



On Sun, Mar 3, 2013 at 9:04 PM, Michael Segel <mi...@hotmail.com>wrote:

> I'm partial to using Java and JNI and then use the distributed cache to
> push the native libraries out to each node if not already there.
>
> But that's just me... ;-)
>
> HTH
>
> -Mike
>
> On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:
>
> Hi hadoop users,
>
> Trying to figure out which interface would be best and easiest to
> implement my application : 1) hadoop pipes or 2) java with jni or 3)
> something else that I'm not aware of yet, as a hadoop newbie.
>
> I will use hadoop to take pictures as input and create output jpeg
> pictures as output.  I do not think I need a reducer.
>
> Requirements:
>
>    - I want to use libjpeg.a (a native static library) in my hadoop
>    application.  If I use hadoop pipes, I should be able to statically link
>    the libjpeg.a into my hadoop application.  If I use the java hadoop
>    interface with jni, I think I have to ship the libjpeg.a library with my
>    hadoop jobs, is that right?  Is that easy?
>    - I need to be able to write uniquely named files into hdfs (i.e. I
>    need to name the files so that I know which inputs they were created from).
>     If I recall, the hadoop streaming interface doesn't let you do this
>    because it only deals with stdin/stdout - does hadoop pipes have a similar
>    constraint?  Will it allow me to write uniquely named files?
>    - I need to be able to exploit the locality of the data.  The
>    application should be executed on the same machine as the input data
>    (pictures).  Does the hadoop pipes interface allow me to do this?
>
>
> Other questions:
>
>    - When I tried to learn more about the hadoop pipes API, all I could
>    find is this one submitter class.  Is this really it or is there more?
>    http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
>    - I'm not really familiar with swig which is to be used with pipes.
>    All I could really find was the same simple word count example on every
>    site.  Does swig get difficult to use for more complex projects?
>
>
> Thanks,
> -Julian
>
>
>
>


-- 
- Charles

Re: hadoop pipes or java with jni?

Posted by Charles Earl <ch...@gmail.com>.
Julian,
It has been my experience that jni may be a cause for concern if you
multiple complex native libraries. You might want to run with valgrind if
possible to verify that there are no memory issues. Given that you are
using the single library, this should not be an issue.
Uniquely named files are possible in pipes mappers / reducers; you may want
to examine the HadoopPipes::MapContext context input split name returned
from the context's getInputSplit() method and develop a naming scheme based
upon that value.



On Sun, Mar 3, 2013 at 9:04 PM, Michael Segel <mi...@hotmail.com>wrote:

> I'm partial to using Java and JNI and then use the distributed cache to
> push the native libraries out to each node if not already there.
>
> But that's just me... ;-)
>
> HTH
>
> -Mike
>
> On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:
>
> Hi hadoop users,
>
> Trying to figure out which interface would be best and easiest to
> implement my application : 1) hadoop pipes or 2) java with jni or 3)
> something else that I'm not aware of yet, as a hadoop newbie.
>
> I will use hadoop to take pictures as input and create output jpeg
> pictures as output.  I do not think I need a reducer.
>
> Requirements:
>
>    - I want to use libjpeg.a (a native static library) in my hadoop
>    application.  If I use hadoop pipes, I should be able to statically link
>    the libjpeg.a into my hadoop application.  If I use the java hadoop
>    interface with jni, I think I have to ship the libjpeg.a library with my
>    hadoop jobs, is that right?  Is that easy?
>    - I need to be able to write uniquely named files into hdfs (i.e. I
>    need to name the files so that I know which inputs they were created from).
>     If I recall, the hadoop streaming interface doesn't let you do this
>    because it only deals with stdin/stdout - does hadoop pipes have a similar
>    constraint?  Will it allow me to write uniquely named files?
>    - I need to be able to exploit the locality of the data.  The
>    application should be executed on the same machine as the input data
>    (pictures).  Does the hadoop pipes interface allow me to do this?
>
>
> Other questions:
>
>    - When I tried to learn more about the hadoop pipes API, all I could
>    find is this one submitter class.  Is this really it or is there more?
>    http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
>    - I'm not really familiar with swig which is to be used with pipes.
>    All I could really find was the same simple word count example on every
>    site.  Does swig get difficult to use for more complex projects?
>
>
> Thanks,
> -Julian
>
>
>
>


-- 
- Charles

Re: hadoop pipes or java with jni?

Posted by Charles Earl <ch...@gmail.com>.
Julian,
It has been my experience that jni may be a cause for concern if you
multiple complex native libraries. You might want to run with valgrind if
possible to verify that there are no memory issues. Given that you are
using the single library, this should not be an issue.
Uniquely named files are possible in pipes mappers / reducers; you may want
to examine the HadoopPipes::MapContext context input split name returned
from the context's getInputSplit() method and develop a naming scheme based
upon that value.



On Sun, Mar 3, 2013 at 9:04 PM, Michael Segel <mi...@hotmail.com>wrote:

> I'm partial to using Java and JNI and then use the distributed cache to
> push the native libraries out to each node if not already there.
>
> But that's just me... ;-)
>
> HTH
>
> -Mike
>
> On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:
>
> Hi hadoop users,
>
> Trying to figure out which interface would be best and easiest to
> implement my application : 1) hadoop pipes or 2) java with jni or 3)
> something else that I'm not aware of yet, as a hadoop newbie.
>
> I will use hadoop to take pictures as input and create output jpeg
> pictures as output.  I do not think I need a reducer.
>
> Requirements:
>
>    - I want to use libjpeg.a (a native static library) in my hadoop
>    application.  If I use hadoop pipes, I should be able to statically link
>    the libjpeg.a into my hadoop application.  If I use the java hadoop
>    interface with jni, I think I have to ship the libjpeg.a library with my
>    hadoop jobs, is that right?  Is that easy?
>    - I need to be able to write uniquely named files into hdfs (i.e. I
>    need to name the files so that I know which inputs they were created from).
>     If I recall, the hadoop streaming interface doesn't let you do this
>    because it only deals with stdin/stdout - does hadoop pipes have a similar
>    constraint?  Will it allow me to write uniquely named files?
>    - I need to be able to exploit the locality of the data.  The
>    application should be executed on the same machine as the input data
>    (pictures).  Does the hadoop pipes interface allow me to do this?
>
>
> Other questions:
>
>    - When I tried to learn more about the hadoop pipes API, all I could
>    find is this one submitter class.  Is this really it or is there more?
>    http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
>    - I'm not really familiar with swig which is to be used with pipes.
>    All I could really find was the same simple word count example on every
>    site.  Does swig get difficult to use for more complex projects?
>
>
> Thanks,
> -Julian
>
>
>
>


-- 
- Charles

Re: hadoop pipes or java with jni?

Posted by Michael Segel <mi...@hotmail.com>.
I'm partial to using Java and JNI and then use the distributed cache to push the native libraries out to each node if not already there. 

But that's just me... ;-) 

HTH

-Mike

On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:

> Hi hadoop users,
> 
> Trying to figure out which interface would be best and easiest to implement my application : 1) hadoop pipes or 2) java with jni or 3) something else that I'm not aware of yet, as a hadoop newbie.
> 
> I will use hadoop to take pictures as input and create output jpeg pictures as output.  I do not think I need a reducer.
> 
> Requirements:
> I want to use libjpeg.a (a native static library) in my hadoop application.  If I use hadoop pipes, I should be able to statically link the libjpeg.a into my hadoop application.  If I use the java hadoop interface with jni, I think I have to ship the libjpeg.a library with my hadoop jobs, is that right?  Is that easy?
> I need to be able to write uniquely named files into hdfs (i.e. I need to name the files so that I know which inputs they were created from).  If I recall, the hadoop streaming interface doesn't let you do this because it only deals with stdin/stdout - does hadoop pipes have a similar constraint?  Will it allow me to write uniquely named files?
> I need to be able to exploit the locality of the data.  The application should be executed on the same machine as the input data (pictures).  Does the hadoop pipes interface allow me to do this?  
> 
> Other questions:
> When I tried to learn more about the hadoop pipes API, all I could find is this one submitter class.  Is this really it or is there more? http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
> I'm not really familiar with swig which is to be used with pipes.   All I could really find was the same simple word count example on every site.  Does swig get difficult to use for more complex projects?
> 
> Thanks,
> -Julian
> 
> 


Re: hadoop pipes or java with jni?

Posted by Michael Segel <mi...@hotmail.com>.
I'm partial to using Java and JNI and then use the distributed cache to push the native libraries out to each node if not already there. 

But that's just me... ;-) 

HTH

-Mike

On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:

> Hi hadoop users,
> 
> Trying to figure out which interface would be best and easiest to implement my application : 1) hadoop pipes or 2) java with jni or 3) something else that I'm not aware of yet, as a hadoop newbie.
> 
> I will use hadoop to take pictures as input and create output jpeg pictures as output.  I do not think I need a reducer.
> 
> Requirements:
> I want to use libjpeg.a (a native static library) in my hadoop application.  If I use hadoop pipes, I should be able to statically link the libjpeg.a into my hadoop application.  If I use the java hadoop interface with jni, I think I have to ship the libjpeg.a library with my hadoop jobs, is that right?  Is that easy?
> I need to be able to write uniquely named files into hdfs (i.e. I need to name the files so that I know which inputs they were created from).  If I recall, the hadoop streaming interface doesn't let you do this because it only deals with stdin/stdout - does hadoop pipes have a similar constraint?  Will it allow me to write uniquely named files?
> I need to be able to exploit the locality of the data.  The application should be executed on the same machine as the input data (pictures).  Does the hadoop pipes interface allow me to do this?  
> 
> Other questions:
> When I tried to learn more about the hadoop pipes API, all I could find is this one submitter class.  Is this really it or is there more? http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
> I'm not really familiar with swig which is to be used with pipes.   All I could really find was the same simple word count example on every site.  Does swig get difficult to use for more complex projects?
> 
> Thanks,
> -Julian
> 
> 


Re: hadoop pipes or java with jni?

Posted by Michael Segel <mi...@hotmail.com>.
I'm partial to using Java and JNI and then use the distributed cache to push the native libraries out to each node if not already there. 

But that's just me... ;-) 

HTH

-Mike

On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:

> Hi hadoop users,
> 
> Trying to figure out which interface would be best and easiest to implement my application : 1) hadoop pipes or 2) java with jni or 3) something else that I'm not aware of yet, as a hadoop newbie.
> 
> I will use hadoop to take pictures as input and create output jpeg pictures as output.  I do not think I need a reducer.
> 
> Requirements:
> I want to use libjpeg.a (a native static library) in my hadoop application.  If I use hadoop pipes, I should be able to statically link the libjpeg.a into my hadoop application.  If I use the java hadoop interface with jni, I think I have to ship the libjpeg.a library with my hadoop jobs, is that right?  Is that easy?
> I need to be able to write uniquely named files into hdfs (i.e. I need to name the files so that I know which inputs they were created from).  If I recall, the hadoop streaming interface doesn't let you do this because it only deals with stdin/stdout - does hadoop pipes have a similar constraint?  Will it allow me to write uniquely named files?
> I need to be able to exploit the locality of the data.  The application should be executed on the same machine as the input data (pictures).  Does the hadoop pipes interface allow me to do this?  
> 
> Other questions:
> When I tried to learn more about the hadoop pipes API, all I could find is this one submitter class.  Is this really it or is there more? http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
> I'm not really familiar with swig which is to be used with pipes.   All I could really find was the same simple word count example on every site.  Does swig get difficult to use for more complex projects?
> 
> Thanks,
> -Julian
> 
> 


Re: hadoop pipes or java with jni?

Posted by Michael Segel <mi...@hotmail.com>.
I'm partial to using Java and JNI and then use the distributed cache to push the native libraries out to each node if not already there. 

But that's just me... ;-) 

HTH

-Mike

On Mar 3, 2013, at 6:02 PM, Julian Bui <ju...@gmail.com> wrote:

> Hi hadoop users,
> 
> Trying to figure out which interface would be best and easiest to implement my application : 1) hadoop pipes or 2) java with jni or 3) something else that I'm not aware of yet, as a hadoop newbie.
> 
> I will use hadoop to take pictures as input and create output jpeg pictures as output.  I do not think I need a reducer.
> 
> Requirements:
> I want to use libjpeg.a (a native static library) in my hadoop application.  If I use hadoop pipes, I should be able to statically link the libjpeg.a into my hadoop application.  If I use the java hadoop interface with jni, I think I have to ship the libjpeg.a library with my hadoop jobs, is that right?  Is that easy?
> I need to be able to write uniquely named files into hdfs (i.e. I need to name the files so that I know which inputs they were created from).  If I recall, the hadoop streaming interface doesn't let you do this because it only deals with stdin/stdout - does hadoop pipes have a similar constraint?  Will it allow me to write uniquely named files?
> I need to be able to exploit the locality of the data.  The application should be executed on the same machine as the input data (pictures).  Does the hadoop pipes interface allow me to do this?  
> 
> Other questions:
> When I tried to learn more about the hadoop pipes API, all I could find is this one submitter class.  Is this really it or is there more? http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html
> I'm not really familiar with swig which is to be used with pipes.   All I could really find was the same simple word count example on every site.  Does swig get difficult to use for more complex projects?
> 
> Thanks,
> -Julian
> 
>