You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Prashant Kommireddi <pr...@gmail.com> on 2013/01/24 01:48:56 UTC

Run a job async

Hey guys,

I am trying to do the following:

   1. Launch a pig job asynchronously via Java program
   2. Get a notification once the job is complete (something similar to
   Hadoop callback with a servlet)

I looked at PigServer.executeBatch() and it seems to be waiting until job
completes.This is not what I would like my app to do.

Any ideas?

Thanks,

Re: execute pig command in Java program

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Dan,

1. Can't you print out the error messages by calling getErrorStream() on
the sub-process?
2. Is there any reason why you spawn a sub-process process rather than use
PigServer API?

Thanks,
Cheolsoo


On Wed, Jan 23, 2013 at 10:07 PM, Dan Yi <dy...@medio.com> wrote:

> Hi,
>
> I used to write PHP code to execute pig command, it worked well,
> Now I switch to Java but seems it won't work, here is my code:
>
> String pigCommand = "pig -x local -p ouput=/tmp my_pig_script.pig";
>
>                 Runtime r = Runtime.getRuntime();
>                 Process p;
>
>                 int exitVal;
>
>                 try {
>                         p = r.exec(pigCommand);
>                         exitVal = p.waitFor();
>                         BufferedReader br = new BufferedReader(new
> InputStreamReader(p.getInputStream()));
>                         String line = null;
>
>                         while((line = br.readLine()) != null) {
>                                 System.out.println(line);
>                         }
>                         br.close();
>
>                         System.out.println("exitVal: " + exitVal);
>                         System.out.println("Done");
>
>
>
> If I run the that pig command in console directly, it works, if I replace
> that
> Pig command with other shell command say 'ping www.yahoo.com', and run the
> java
> Program, it works too. So what might be the problem?
>
> Thanks.
>
>

Re: Run a job async

Posted by Cheolsoo Park <ch...@cloudera.com>.
Thank you for the suggestions. I will file a jira and add our discussion
there.


On Fri, Jan 25, 2013 at 4:23 PM, Rohini Palaniswamy <rohini.aditya@gmail.com
> wrote:

> Jon,
>   Those are good areas to check. Few things I have seen regarding those are
>
>  1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
> multiple runs if the script names are same (hit this issue in PIG-2433 unit
> tests).
>  2) QueryParserDriver - There is a static cache with macro name to macro
> file mapping. So same macro names with different file locations will cause
> problems.
>  3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
> reinitialize if supporting Multiple clusters.
>
> Regards,
> Rohini
>
>
> On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > user to bcc, +dev
> >
> > Cheolsoo,
> >
> > Can you make a JIRA for this? I can imagine a slightly heavier test
> suite,
> > but I like where you started. If it's not far off, then I think it'll be
> a
> > win to make it thread safe. But we need to make sure to test the most
> > advanced features...UDF's (esp the same name but different udf in
> different
> > invocations), scripting UDFs (same thing), and so on.
> >
> >
> > 2013/1/25 Cheolsoo Park <ch...@cloudera.com>
> >
> > > >> if you have multiple threads that run a query via PigServer, there
> is
> > a
> > > great chance of the internals clashing because of the use of static
> > > variable within Pig.
> > >
> > > Recently, I spent some time on this, and what I found is that the Pig
> > > front-end is quite thread-safe. Here is how I tested it:
> > >
> > > 1) Wrote a PigUnit test that runs in MR mode.
> > > 2) Executed test cases concurrently in 4 threads using a JUnit
> extension
> > > called temps-fugit:
> > > http://tempusfugitlibrary.org/documentation/junit/parallel/
> > >
> > > After fixing PIG-3096, I was able to successfully run Pig queries in
> > > parallel. It's important to note that only the front-end needs to be
> > > thread-safe since that's what is executed in parallel.
> > >
> > > I arbitrarily selected queries from e2e test cases, so they are
> probably
> > > not complex enough to mimic real-world examples. Nevertheless, my test
> > > program ran without a problem for few days. I couldn't continue my
> > > experiment because I was pulled out into something else. However, I
> think
> > > that making the front-end thread-safe is an achievable goal.
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > >
> > >
> > > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > > <nr...@gmail.com>wrote:
> > >
> > > > That clarifies it for me, thanks a lot.
> > > >
> > > > Regards,
> > > > Rama.
> > > >
> > > >
> > > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <
> jcoveney@gmail.com
> > > > >wrote:
> > > >
> > > > > Well, when I say that Pig is not multi-threaded, what I mean is
> that
> > if
> > > > you
> > > > > have multiple threads that run a query via PigServer, there is a
> > great
> > > > > chance of the internals clashing because of the use of static
> > variables
> > > > > within Pig. Pig itself, when running a single query, is
> > multi-threaded.
> > > > > It's just not "multi-threaded" in the sense that multiple instances
> > can
> > > > > safely be run in the same JVM.
> > > > >
> > > > >
> > > > > 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
> > > > >
> > > > > > Hi Jonathan,
> > > > > >
> > > > > > Pardon if it's a naive question, but Interesting that you say Pig
> > is
> > > > not
> > > > > > multithreaded.
> > > > > > We're using Pig 0.10.0, and looking at the code, it seems to do
> the
> > > > right
> > > > > > things to handle multi threaded requests (ThreadLocal for
> > ScriptState
> > > > for
> > > > > > eg).
> > > > > >
> > > > > > Would be great if you can point out to the kind of issues there
> > could
> > > > be.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Rama.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> > lefthandmagic@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > > >
> > > > > > > since there is "PigProcessNotificationListener" to subscribe
> for
> > > > async
> > > > > > > callbacks when the pig job completes, is there any real need to
> > > keep
> > > > > the
> > > > > > > pig job submitting thread waiting until the job completes?
> > > > > > >
> > > > > > > Is this just a shortcoming today or are there more concrete
> > reasons
> > > > > > against
> > > > > > > providing with a pigserver which can submit to the cluster in
> > > > mapreduce
> > > > > > > mode async?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Praveen
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > > > jcoveney@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > I think whatever way you slice it, handling thousands of pig
> > jobs
> > > > > > > > asynchronously is going to be a bear. I mean, this is
> > essentially
> > > > > what
> > > > > > > the
> > > > > > > > job tracker does, albeit with a lot less information.
> > > > > > > >
> > > > > > > > Either way, Pig is not multi-threaded so having more than one
> > > > > instance
> > > > > > of
> > > > > > > > Pig in the same JVM is going to start causing problems (which
> > is
> > > > > why, I
> > > > > > > > imagine, there is no async way to call Pig). So multiple
> > > processes
> > > > is
> > > > > > > > really the only way around it that I know of.
> > > > > > > >
> > > > > > > > At Twitter we have a deployment of mesos, and our long term
> > > > solution
> > > > > is
> > > > > > > > going to be running all of our pig jobs on mesos, in the
> short
> > > term
> > > > > by
> > > > > > > > deploying daemons that run pig jobs as local processes.
> > > > > > > >
> > > > > > > >
> > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > >
> > > > > > > > > Both. Think of it as an app server handling all of these
> > > > requests.
> > > > > > > > >
> > > > > > > > > Sent from my iPhone
> > > > > > > > >
> > > > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > > > jcoveney@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > > >
> > > > > > > > > >> Did not want to have several threads launched for this.
> We
> > > > might
> > > > > > > have
> > > > > > > > > >> thousands of requests coming in, and the app is doing a
> > lot
> > > > more
> > > > > > > than
> > > > > > > > > only
> > > > > > > > > >> Pig.
> > > > > > > > > >>
> > > > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > > > > jcoveney@gmail.com
> > > > > > > > > >>> wrote:
> > > > > > > > > >>
> > > > > > > > > >>> start a separate Process which runs Pig?
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > > >>>
> > > > > > > > > >>>> Hey guys,
> > > > > > > > > >>>>
> > > > > > > > > >>>> I am trying to do the following:
> > > > > > > > > >>>>
> > > > > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > > > > >>>>   2. Get a notification once the job is complete
> > > (something
> > > > > > > similar
> > > > > > > > to
> > > > > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > > > > >>>>
> > > > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to
> be
> > > > > waiting
> > > > > > > > until
> > > > > > > > > >> job
> > > > > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > > > > >>>>
> > > > > > > > > >>>> Any ideas?
> > > > > > > > > >>>>
> > > > > > > > > >>>> Thanks,
> > > > > > > > > >>>>
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -Praveen
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Run a job async

Posted by Cheolsoo Park <ch...@cloudera.com>.
Thank you for the suggestions. I will file a jira and add our discussion
there.


On Fri, Jan 25, 2013 at 4:23 PM, Rohini Palaniswamy <rohini.aditya@gmail.com
> wrote:

> Jon,
>   Those are good areas to check. Few things I have seen regarding those are
>
>  1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
> multiple runs if the script names are same (hit this issue in PIG-2433 unit
> tests).
>  2) QueryParserDriver - There is a static cache with macro name to macro
> file mapping. So same macro names with different file locations will cause
> problems.
>  3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
> reinitialize if supporting Multiple clusters.
>
> Regards,
> Rohini
>
>
> On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > user to bcc, +dev
> >
> > Cheolsoo,
> >
> > Can you make a JIRA for this? I can imagine a slightly heavier test
> suite,
> > but I like where you started. If it's not far off, then I think it'll be
> a
> > win to make it thread safe. But we need to make sure to test the most
> > advanced features...UDF's (esp the same name but different udf in
> different
> > invocations), scripting UDFs (same thing), and so on.
> >
> >
> > 2013/1/25 Cheolsoo Park <ch...@cloudera.com>
> >
> > > >> if you have multiple threads that run a query via PigServer, there
> is
> > a
> > > great chance of the internals clashing because of the use of static
> > > variable within Pig.
> > >
> > > Recently, I spent some time on this, and what I found is that the Pig
> > > front-end is quite thread-safe. Here is how I tested it:
> > >
> > > 1) Wrote a PigUnit test that runs in MR mode.
> > > 2) Executed test cases concurrently in 4 threads using a JUnit
> extension
> > > called temps-fugit:
> > > http://tempusfugitlibrary.org/documentation/junit/parallel/
> > >
> > > After fixing PIG-3096, I was able to successfully run Pig queries in
> > > parallel. It's important to note that only the front-end needs to be
> > > thread-safe since that's what is executed in parallel.
> > >
> > > I arbitrarily selected queries from e2e test cases, so they are
> probably
> > > not complex enough to mimic real-world examples. Nevertheless, my test
> > > program ran without a problem for few days. I couldn't continue my
> > > experiment because I was pulled out into something else. However, I
> think
> > > that making the front-end thread-safe is an achievable goal.
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > >
> > >
> > > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > > <nr...@gmail.com>wrote:
> > >
> > > > That clarifies it for me, thanks a lot.
> > > >
> > > > Regards,
> > > > Rama.
> > > >
> > > >
> > > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <
> jcoveney@gmail.com
> > > > >wrote:
> > > >
> > > > > Well, when I say that Pig is not multi-threaded, what I mean is
> that
> > if
> > > > you
> > > > > have multiple threads that run a query via PigServer, there is a
> > great
> > > > > chance of the internals clashing because of the use of static
> > variables
> > > > > within Pig. Pig itself, when running a single query, is
> > multi-threaded.
> > > > > It's just not "multi-threaded" in the sense that multiple instances
> > can
> > > > > safely be run in the same JVM.
> > > > >
> > > > >
> > > > > 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
> > > > >
> > > > > > Hi Jonathan,
> > > > > >
> > > > > > Pardon if it's a naive question, but Interesting that you say Pig
> > is
> > > > not
> > > > > > multithreaded.
> > > > > > We're using Pig 0.10.0, and looking at the code, it seems to do
> the
> > > > right
> > > > > > things to handle multi threaded requests (ThreadLocal for
> > ScriptState
> > > > for
> > > > > > eg).
> > > > > >
> > > > > > Would be great if you can point out to the kind of issues there
> > could
> > > > be.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Rama.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> > lefthandmagic@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > > >
> > > > > > > since there is "PigProcessNotificationListener" to subscribe
> for
> > > > async
> > > > > > > callbacks when the pig job completes, is there any real need to
> > > keep
> > > > > the
> > > > > > > pig job submitting thread waiting until the job completes?
> > > > > > >
> > > > > > > Is this just a shortcoming today or are there more concrete
> > reasons
> > > > > > against
> > > > > > > providing with a pigserver which can submit to the cluster in
> > > > mapreduce
> > > > > > > mode async?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Praveen
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > > > jcoveney@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > I think whatever way you slice it, handling thousands of pig
> > jobs
> > > > > > > > asynchronously is going to be a bear. I mean, this is
> > essentially
> > > > > what
> > > > > > > the
> > > > > > > > job tracker does, albeit with a lot less information.
> > > > > > > >
> > > > > > > > Either way, Pig is not multi-threaded so having more than one
> > > > > instance
> > > > > > of
> > > > > > > > Pig in the same JVM is going to start causing problems (which
> > is
> > > > > why, I
> > > > > > > > imagine, there is no async way to call Pig). So multiple
> > > processes
> > > > is
> > > > > > > > really the only way around it that I know of.
> > > > > > > >
> > > > > > > > At Twitter we have a deployment of mesos, and our long term
> > > > solution
> > > > > is
> > > > > > > > going to be running all of our pig jobs on mesos, in the
> short
> > > term
> > > > > by
> > > > > > > > deploying daemons that run pig jobs as local processes.
> > > > > > > >
> > > > > > > >
> > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > >
> > > > > > > > > Both. Think of it as an app server handling all of these
> > > > requests.
> > > > > > > > >
> > > > > > > > > Sent from my iPhone
> > > > > > > > >
> > > > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > > > jcoveney@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > > >
> > > > > > > > > >> Did not want to have several threads launched for this.
> We
> > > > might
> > > > > > > have
> > > > > > > > > >> thousands of requests coming in, and the app is doing a
> > lot
> > > > more
> > > > > > > than
> > > > > > > > > only
> > > > > > > > > >> Pig.
> > > > > > > > > >>
> > > > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > > > > jcoveney@gmail.com
> > > > > > > > > >>> wrote:
> > > > > > > > > >>
> > > > > > > > > >>> start a separate Process which runs Pig?
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > > >>>
> > > > > > > > > >>>> Hey guys,
> > > > > > > > > >>>>
> > > > > > > > > >>>> I am trying to do the following:
> > > > > > > > > >>>>
> > > > > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > > > > >>>>   2. Get a notification once the job is complete
> > > (something
> > > > > > > similar
> > > > > > > > to
> > > > > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > > > > >>>>
> > > > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to
> be
> > > > > waiting
> > > > > > > > until
> > > > > > > > > >> job
> > > > > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > > > > >>>>
> > > > > > > > > >>>> Any ideas?
> > > > > > > > > >>>>
> > > > > > > > > >>>> Thanks,
> > > > > > > > > >>>>
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -Praveen
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Run a job async

Posted by Rohini Palaniswamy <ro...@gmail.com>.
Jon,
  Those are good areas to check. Few things I have seen regarding those are

 1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
multiple runs if the script names are same (hit this issue in PIG-2433 unit
tests).
 2) QueryParserDriver - There is a static cache with macro name to macro
file mapping. So same macro names with different file locations will cause
problems.
 3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
reinitialize if supporting Multiple clusters.

Regards,
Rohini


On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> user to bcc, +dev
>
> Cheolsoo,
>
> Can you make a JIRA for this? I can imagine a slightly heavier test suite,
> but I like where you started. If it's not far off, then I think it'll be a
> win to make it thread safe. But we need to make sure to test the most
> advanced features...UDF's (esp the same name but different udf in different
> invocations), scripting UDFs (same thing), and so on.
>
>
> 2013/1/25 Cheolsoo Park <ch...@cloudera.com>
>
> > >> if you have multiple threads that run a query via PigServer, there is
> a
> > great chance of the internals clashing because of the use of static
> > variable within Pig.
> >
> > Recently, I spent some time on this, and what I found is that the Pig
> > front-end is quite thread-safe. Here is how I tested it:
> >
> > 1) Wrote a PigUnit test that runs in MR mode.
> > 2) Executed test cases concurrently in 4 threads using a JUnit extension
> > called temps-fugit:
> > http://tempusfugitlibrary.org/documentation/junit/parallel/
> >
> > After fixing PIG-3096, I was able to successfully run Pig queries in
> > parallel. It's important to note that only the front-end needs to be
> > thread-safe since that's what is executed in parallel.
> >
> > I arbitrarily selected queries from e2e test cases, so they are probably
> > not complex enough to mimic real-world examples. Nevertheless, my test
> > program ran without a problem for few days. I couldn't continue my
> > experiment because I was pulled out into something else. However, I think
> > that making the front-end thread-safe is an achievable goal.
> >
> > Thanks,
> > Cheolsoo
> >
> >
> >
> > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > <nr...@gmail.com>wrote:
> >
> > > That clarifies it for me, thanks a lot.
> > >
> > > Regards,
> > > Rama.
> > >
> > >
> > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <jcoveney@gmail.com
> > > >wrote:
> > >
> > > > Well, when I say that Pig is not multi-threaded, what I mean is that
> if
> > > you
> > > > have multiple threads that run a query via PigServer, there is a
> great
> > > > chance of the internals clashing because of the use of static
> variables
> > > > within Pig. Pig itself, when running a single query, is
> multi-threaded.
> > > > It's just not "multi-threaded" in the sense that multiple instances
> can
> > > > safely be run in the same JVM.
> > > >
> > > >
> > > > 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
> > > >
> > > > > Hi Jonathan,
> > > > >
> > > > > Pardon if it's a naive question, but Interesting that you say Pig
> is
> > > not
> > > > > multithreaded.
> > > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > > right
> > > > > things to handle multi threaded requests (ThreadLocal for
> ScriptState
> > > for
> > > > > eg).
> > > > >
> > > > > Would be great if you can point out to the kind of issues there
> could
> > > be.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rama.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> lefthandmagic@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > >
> > > > > > since there is "PigProcessNotificationListener" to subscribe for
> > > async
> > > > > > callbacks when the pig job completes, is there any real need to
> > keep
> > > > the
> > > > > > pig job submitting thread waiting until the job completes?
> > > > > >
> > > > > > Is this just a shortcoming today or are there more concrete
> reasons
> > > > > against
> > > > > > providing with a pigserver which can submit to the cluster in
> > > mapreduce
> > > > > > mode async?
> > > > > >
> > > > > > Thanks,
> > > > > > Praveen
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > > jcoveney@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I think whatever way you slice it, handling thousands of pig
> jobs
> > > > > > > asynchronously is going to be a bear. I mean, this is
> essentially
> > > > what
> > > > > > the
> > > > > > > job tracker does, albeit with a lot less information.
> > > > > > >
> > > > > > > Either way, Pig is not multi-threaded so having more than one
> > > > instance
> > > > > of
> > > > > > > Pig in the same JVM is going to start causing problems (which
> is
> > > > why, I
> > > > > > > imagine, there is no async way to call Pig). So multiple
> > processes
> > > is
> > > > > > > really the only way around it that I know of.
> > > > > > >
> > > > > > > At Twitter we have a deployment of mesos, and our long term
> > > solution
> > > > is
> > > > > > > going to be running all of our pig jobs on mesos, in the short
> > term
> > > > by
> > > > > > > deploying daemons that run pig jobs as local processes.
> > > > > > >
> > > > > > >
> > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > >
> > > > > > > > Both. Think of it as an app server handling all of these
> > > requests.
> > > > > > > >
> > > > > > > > Sent from my iPhone
> > > > > > > >
> > > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > > jcoveney@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > >
> > > > > > > > >> Did not want to have several threads launched for this. We
> > > might
> > > > > > have
> > > > > > > > >> thousands of requests coming in, and the app is doing a
> lot
> > > more
> > > > > > than
> > > > > > > > only
> > > > > > > > >> Pig.
> > > > > > > > >>
> > > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > > > jcoveney@gmail.com
> > > > > > > > >>> wrote:
> > > > > > > > >>
> > > > > > > > >>> start a separate Process which runs Pig?
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > >>>
> > > > > > > > >>>> Hey guys,
> > > > > > > > >>>>
> > > > > > > > >>>> I am trying to do the following:
> > > > > > > > >>>>
> > > > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > > > >>>>   2. Get a notification once the job is complete
> > (something
> > > > > > similar
> > > > > > > to
> > > > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > > > >>>>
> > > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to be
> > > > waiting
> > > > > > > until
> > > > > > > > >> job
> > > > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > > > >>>>
> > > > > > > > >>>> Any ideas?
> > > > > > > > >>>>
> > > > > > > > >>>> Thanks,
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Praveen
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Run a job async

Posted by Rohini Palaniswamy <ro...@gmail.com>.
Jon,
  Those are good areas to check. Few things I have seen regarding those are

 1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
multiple runs if the script names are same (hit this issue in PIG-2433 unit
tests).
 2) QueryParserDriver - There is a static cache with macro name to macro
file mapping. So same macro names with different file locations will cause
problems.
 3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
reinitialize if supporting Multiple clusters.

Regards,
Rohini


On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> user to bcc, +dev
>
> Cheolsoo,
>
> Can you make a JIRA for this? I can imagine a slightly heavier test suite,
> but I like where you started. If it's not far off, then I think it'll be a
> win to make it thread safe. But we need to make sure to test the most
> advanced features...UDF's (esp the same name but different udf in different
> invocations), scripting UDFs (same thing), and so on.
>
>
> 2013/1/25 Cheolsoo Park <ch...@cloudera.com>
>
> > >> if you have multiple threads that run a query via PigServer, there is
> a
> > great chance of the internals clashing because of the use of static
> > variable within Pig.
> >
> > Recently, I spent some time on this, and what I found is that the Pig
> > front-end is quite thread-safe. Here is how I tested it:
> >
> > 1) Wrote a PigUnit test that runs in MR mode.
> > 2) Executed test cases concurrently in 4 threads using a JUnit extension
> > called temps-fugit:
> > http://tempusfugitlibrary.org/documentation/junit/parallel/
> >
> > After fixing PIG-3096, I was able to successfully run Pig queries in
> > parallel. It's important to note that only the front-end needs to be
> > thread-safe since that's what is executed in parallel.
> >
> > I arbitrarily selected queries from e2e test cases, so they are probably
> > not complex enough to mimic real-world examples. Nevertheless, my test
> > program ran without a problem for few days. I couldn't continue my
> > experiment because I was pulled out into something else. However, I think
> > that making the front-end thread-safe is an achievable goal.
> >
> > Thanks,
> > Cheolsoo
> >
> >
> >
> > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > <nr...@gmail.com>wrote:
> >
> > > That clarifies it for me, thanks a lot.
> > >
> > > Regards,
> > > Rama.
> > >
> > >
> > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <jcoveney@gmail.com
> > > >wrote:
> > >
> > > > Well, when I say that Pig is not multi-threaded, what I mean is that
> if
> > > you
> > > > have multiple threads that run a query via PigServer, there is a
> great
> > > > chance of the internals clashing because of the use of static
> variables
> > > > within Pig. Pig itself, when running a single query, is
> multi-threaded.
> > > > It's just not "multi-threaded" in the sense that multiple instances
> can
> > > > safely be run in the same JVM.
> > > >
> > > >
> > > > 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
> > > >
> > > > > Hi Jonathan,
> > > > >
> > > > > Pardon if it's a naive question, but Interesting that you say Pig
> is
> > > not
> > > > > multithreaded.
> > > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > > right
> > > > > things to handle multi threaded requests (ThreadLocal for
> ScriptState
> > > for
> > > > > eg).
> > > > >
> > > > > Would be great if you can point out to the kind of issues there
> could
> > > be.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rama.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> lefthandmagic@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > >
> > > > > > since there is "PigProcessNotificationListener" to subscribe for
> > > async
> > > > > > callbacks when the pig job completes, is there any real need to
> > keep
> > > > the
> > > > > > pig job submitting thread waiting until the job completes?
> > > > > >
> > > > > > Is this just a shortcoming today or are there more concrete
> reasons
> > > > > against
> > > > > > providing with a pigserver which can submit to the cluster in
> > > mapreduce
> > > > > > mode async?
> > > > > >
> > > > > > Thanks,
> > > > > > Praveen
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > > jcoveney@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I think whatever way you slice it, handling thousands of pig
> jobs
> > > > > > > asynchronously is going to be a bear. I mean, this is
> essentially
> > > > what
> > > > > > the
> > > > > > > job tracker does, albeit with a lot less information.
> > > > > > >
> > > > > > > Either way, Pig is not multi-threaded so having more than one
> > > > instance
> > > > > of
> > > > > > > Pig in the same JVM is going to start causing problems (which
> is
> > > > why, I
> > > > > > > imagine, there is no async way to call Pig). So multiple
> > processes
> > > is
> > > > > > > really the only way around it that I know of.
> > > > > > >
> > > > > > > At Twitter we have a deployment of mesos, and our long term
> > > solution
> > > > is
> > > > > > > going to be running all of our pig jobs on mesos, in the short
> > term
> > > > by
> > > > > > > deploying daemons that run pig jobs as local processes.
> > > > > > >
> > > > > > >
> > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > >
> > > > > > > > Both. Think of it as an app server handling all of these
> > > requests.
> > > > > > > >
> > > > > > > > Sent from my iPhone
> > > > > > > >
> > > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > > jcoveney@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > >
> > > > > > > > >> Did not want to have several threads launched for this. We
> > > might
> > > > > > have
> > > > > > > > >> thousands of requests coming in, and the app is doing a
> lot
> > > more
> > > > > > than
> > > > > > > > only
> > > > > > > > >> Pig.
> > > > > > > > >>
> > > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > > > jcoveney@gmail.com
> > > > > > > > >>> wrote:
> > > > > > > > >>
> > > > > > > > >>> start a separate Process which runs Pig?
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > > >>>
> > > > > > > > >>>> Hey guys,
> > > > > > > > >>>>
> > > > > > > > >>>> I am trying to do the following:
> > > > > > > > >>>>
> > > > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > > > >>>>   2. Get a notification once the job is complete
> > (something
> > > > > > similar
> > > > > > > to
> > > > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > > > >>>>
> > > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to be
> > > > waiting
> > > > > > > until
> > > > > > > > >> job
> > > > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > > > >>>>
> > > > > > > > >>>> Any ideas?
> > > > > > > > >>>>
> > > > > > > > >>>> Thanks,
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Praveen
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Run a job async

Posted by Jonathan Coveney <jc...@gmail.com>.
user to bcc, +dev

Cheolsoo,

Can you make a JIRA for this? I can imagine a slightly heavier test suite,
but I like where you started. If it's not far off, then I think it'll be a
win to make it thread safe. But we need to make sure to test the most
advanced features...UDF's (esp the same name but different udf in different
invocations), scripting UDFs (same thing), and so on.


2013/1/25 Cheolsoo Park <ch...@cloudera.com>

> >> if you have multiple threads that run a query via PigServer, there is a
> great chance of the internals clashing because of the use of static
> variable within Pig.
>
> Recently, I spent some time on this, and what I found is that the Pig
> front-end is quite thread-safe. Here is how I tested it:
>
> 1) Wrote a PigUnit test that runs in MR mode.
> 2) Executed test cases concurrently in 4 threads using a JUnit extension
> called temps-fugit:
> http://tempusfugitlibrary.org/documentation/junit/parallel/
>
> After fixing PIG-3096, I was able to successfully run Pig queries in
> parallel. It's important to note that only the front-end needs to be
> thread-safe since that's what is executed in parallel.
>
> I arbitrarily selected queries from e2e test cases, so they are probably
> not complex enough to mimic real-world examples. Nevertheless, my test
> program ran without a problem for few days. I couldn't continue my
> experiment because I was pulled out into something else. However, I think
> that making the front-end thread-safe is an achievable goal.
>
> Thanks,
> Cheolsoo
>
>
>
> On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> <nr...@gmail.com>wrote:
>
> > That clarifies it for me, thanks a lot.
> >
> > Regards,
> > Rama.
> >
> >
> > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <jcoveney@gmail.com
> > >wrote:
> >
> > > Well, when I say that Pig is not multi-threaded, what I mean is that if
> > you
> > > have multiple threads that run a query via PigServer, there is a great
> > > chance of the internals clashing because of the use of static variables
> > > within Pig. Pig itself, when running a single query, is multi-threaded.
> > > It's just not "multi-threaded" in the sense that multiple instances can
> > > safely be run in the same JVM.
> > >
> > >
> > > 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
> > >
> > > > Hi Jonathan,
> > > >
> > > > Pardon if it's a naive question, but Interesting that you say Pig is
> > not
> > > > multithreaded.
> > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > right
> > > > things to handle multi threaded requests (ThreadLocal for ScriptState
> > for
> > > > eg).
> > > >
> > > > Would be great if you can point out to the kind of issues there could
> > be.
> > > >
> > > >
> > > > Regards,
> > > > Rama.
> > > >
> > > >
> > > >
> > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <le...@gmail.com>
> > > > wrote:
> > > >
> > > > > Are there any plans on making the pigserver multi-threaded?
> > > > >
> > > > > since there is "PigProcessNotificationListener" to subscribe for
> > async
> > > > > callbacks when the pig job completes, is there any real need to
> keep
> > > the
> > > > > pig job submitting thread waiting until the job completes?
> > > > >
> > > > > Is this just a shortcoming today or are there more concrete reasons
> > > > against
> > > > > providing with a pigserver which can submit to the cluster in
> > mapreduce
> > > > > mode async?
> > > > >
> > > > > Thanks,
> > > > > Praveen
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > jcoveney@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I think whatever way you slice it, handling thousands of pig jobs
> > > > > > asynchronously is going to be a bear. I mean, this is essentially
> > > what
> > > > > the
> > > > > > job tracker does, albeit with a lot less information.
> > > > > >
> > > > > > Either way, Pig is not multi-threaded so having more than one
> > > instance
> > > > of
> > > > > > Pig in the same JVM is going to start causing problems (which is
> > > why, I
> > > > > > imagine, there is no async way to call Pig). So multiple
> processes
> > is
> > > > > > really the only way around it that I know of.
> > > > > >
> > > > > > At Twitter we have a deployment of mesos, and our long term
> > solution
> > > is
> > > > > > going to be running all of our pig jobs on mesos, in the short
> term
> > > by
> > > > > > deploying daemons that run pig jobs as local processes.
> > > > > >
> > > > > >
> > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > >
> > > > > > > Both. Think of it as an app server handling all of these
> > requests.
> > > > > > >
> > > > > > > Sent from my iPhone
> > > > > > >
> > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > jcoveney@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > >
> > > > > > > >
> > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > >
> > > > > > > >> Did not want to have several threads launched for this. We
> > might
> > > > > have
> > > > > > > >> thousands of requests coming in, and the app is doing a lot
> > more
> > > > > than
> > > > > > > only
> > > > > > > >> Pig.
> > > > > > > >>
> > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > > jcoveney@gmail.com
> > > > > > > >>> wrote:
> > > > > > > >>
> > > > > > > >>> start a separate Process which runs Pig?
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > >>>
> > > > > > > >>>> Hey guys,
> > > > > > > >>>>
> > > > > > > >>>> I am trying to do the following:
> > > > > > > >>>>
> > > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > > >>>>   2. Get a notification once the job is complete
> (something
> > > > > similar
> > > > > > to
> > > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > > >>>>
> > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to be
> > > waiting
> > > > > > until
> > > > > > > >> job
> > > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > > >>>>
> > > > > > > >>>> Any ideas?
> > > > > > > >>>>
> > > > > > > >>>> Thanks,
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Praveen
> > > > >
> > > >
> > >
> >
>

Re: Run a job async

Posted by Jonathan Coveney <jc...@gmail.com>.
user to bcc, +dev

Cheolsoo,

Can you make a JIRA for this? I can imagine a slightly heavier test suite,
but I like where you started. If it's not far off, then I think it'll be a
win to make it thread safe. But we need to make sure to test the most
advanced features...UDF's (esp the same name but different udf in different
invocations), scripting UDFs (same thing), and so on.


2013/1/25 Cheolsoo Park <ch...@cloudera.com>

> >> if you have multiple threads that run a query via PigServer, there is a
> great chance of the internals clashing because of the use of static
> variable within Pig.
>
> Recently, I spent some time on this, and what I found is that the Pig
> front-end is quite thread-safe. Here is how I tested it:
>
> 1) Wrote a PigUnit test that runs in MR mode.
> 2) Executed test cases concurrently in 4 threads using a JUnit extension
> called temps-fugit:
> http://tempusfugitlibrary.org/documentation/junit/parallel/
>
> After fixing PIG-3096, I was able to successfully run Pig queries in
> parallel. It's important to note that only the front-end needs to be
> thread-safe since that's what is executed in parallel.
>
> I arbitrarily selected queries from e2e test cases, so they are probably
> not complex enough to mimic real-world examples. Nevertheless, my test
> program ran without a problem for few days. I couldn't continue my
> experiment because I was pulled out into something else. However, I think
> that making the front-end thread-safe is an achievable goal.
>
> Thanks,
> Cheolsoo
>
>
>
> On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> <nr...@gmail.com>wrote:
>
> > That clarifies it for me, thanks a lot.
> >
> > Regards,
> > Rama.
> >
> >
> > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <jcoveney@gmail.com
> > >wrote:
> >
> > > Well, when I say that Pig is not multi-threaded, what I mean is that if
> > you
> > > have multiple threads that run a query via PigServer, there is a great
> > > chance of the internals clashing because of the use of static variables
> > > within Pig. Pig itself, when running a single query, is multi-threaded.
> > > It's just not "multi-threaded" in the sense that multiple instances can
> > > safely be run in the same JVM.
> > >
> > >
> > > 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
> > >
> > > > Hi Jonathan,
> > > >
> > > > Pardon if it's a naive question, but Interesting that you say Pig is
> > not
> > > > multithreaded.
> > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > right
> > > > things to handle multi threaded requests (ThreadLocal for ScriptState
> > for
> > > > eg).
> > > >
> > > > Would be great if you can point out to the kind of issues there could
> > be.
> > > >
> > > >
> > > > Regards,
> > > > Rama.
> > > >
> > > >
> > > >
> > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <le...@gmail.com>
> > > > wrote:
> > > >
> > > > > Are there any plans on making the pigserver multi-threaded?
> > > > >
> > > > > since there is "PigProcessNotificationListener" to subscribe for
> > async
> > > > > callbacks when the pig job completes, is there any real need to
> keep
> > > the
> > > > > pig job submitting thread waiting until the job completes?
> > > > >
> > > > > Is this just a shortcoming today or are there more concrete reasons
> > > > against
> > > > > providing with a pigserver which can submit to the cluster in
> > mapreduce
> > > > > mode async?
> > > > >
> > > > > Thanks,
> > > > > Praveen
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> > jcoveney@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I think whatever way you slice it, handling thousands of pig jobs
> > > > > > asynchronously is going to be a bear. I mean, this is essentially
> > > what
> > > > > the
> > > > > > job tracker does, albeit with a lot less information.
> > > > > >
> > > > > > Either way, Pig is not multi-threaded so having more than one
> > > instance
> > > > of
> > > > > > Pig in the same JVM is going to start causing problems (which is
> > > why, I
> > > > > > imagine, there is no async way to call Pig). So multiple
> processes
> > is
> > > > > > really the only way around it that I know of.
> > > > > >
> > > > > > At Twitter we have a deployment of mesos, and our long term
> > solution
> > > is
> > > > > > going to be running all of our pig jobs on mesos, in the short
> term
> > > by
> > > > > > deploying daemons that run pig jobs as local processes.
> > > > > >
> > > > > >
> > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > >
> > > > > > > Both. Think of it as an app server handling all of these
> > requests.
> > > > > > >
> > > > > > > Sent from my iPhone
> > > > > > >
> > > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> > jcoveney@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > > >
> > > > > > > >
> > > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > >
> > > > > > > >> Did not want to have several threads launched for this. We
> > might
> > > > > have
> > > > > > > >> thousands of requests coming in, and the app is doing a lot
> > more
> > > > > than
> > > > > > > only
> > > > > > > >> Pig.
> > > > > > > >>
> > > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > > jcoveney@gmail.com
> > > > > > > >>> wrote:
> > > > > > > >>
> > > > > > > >>> start a separate Process which runs Pig?
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > > >>>
> > > > > > > >>>> Hey guys,
> > > > > > > >>>>
> > > > > > > >>>> I am trying to do the following:
> > > > > > > >>>>
> > > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > > >>>>   2. Get a notification once the job is complete
> (something
> > > > > similar
> > > > > > to
> > > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > > >>>>
> > > > > > > >>>> I looked at PigServer.executeBatch() and it seems to be
> > > waiting
> > > > > > until
> > > > > > > >> job
> > > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > > >>>>
> > > > > > > >>>> Any ideas?
> > > > > > > >>>>
> > > > > > > >>>> Thanks,
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Praveen
> > > > >
> > > >
> > >
> >
>

Re: Run a job async

Posted by Cheolsoo Park <ch...@cloudera.com>.
>> if you have multiple threads that run a query via PigServer, there is a
great chance of the internals clashing because of the use of static
variable within Pig.

Recently, I spent some time on this, and what I found is that the Pig
front-end is quite thread-safe. Here is how I tested it:

1) Wrote a PigUnit test that runs in MR mode.
2) Executed test cases concurrently in 4 threads using a JUnit extension
called temps-fugit:
http://tempusfugitlibrary.org/documentation/junit/parallel/

After fixing PIG-3096, I was able to successfully run Pig queries in
parallel. It's important to note that only the front-end needs to be
thread-safe since that's what is executed in parallel.

I arbitrarily selected queries from e2e test cases, so they are probably
not complex enough to mimic real-world examples. Nevertheless, my test
program ran without a problem for few days. I couldn't continue my
experiment because I was pulled out into something else. However, I think
that making the front-end thread-safe is an achievable goal.

Thanks,
Cheolsoo



On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
<nr...@gmail.com>wrote:

> That clarifies it for me, thanks a lot.
>
> Regards,
> Rama.
>
>
> On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > Well, when I say that Pig is not multi-threaded, what I mean is that if
> you
> > have multiple threads that run a query via PigServer, there is a great
> > chance of the internals clashing because of the use of static variables
> > within Pig. Pig itself, when running a single query, is multi-threaded.
> > It's just not "multi-threaded" in the sense that multiple instances can
> > safely be run in the same JVM.
> >
> >
> > 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
> >
> > > Hi Jonathan,
> > >
> > > Pardon if it's a naive question, but Interesting that you say Pig is
> not
> > > multithreaded.
> > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> right
> > > things to handle multi threaded requests (ThreadLocal for ScriptState
> for
> > > eg).
> > >
> > > Would be great if you can point out to the kind of issues there could
> be.
> > >
> > >
> > > Regards,
> > > Rama.
> > >
> > >
> > >
> > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <le...@gmail.com>
> > > wrote:
> > >
> > > > Are there any plans on making the pigserver multi-threaded?
> > > >
> > > > since there is "PigProcessNotificationListener" to subscribe for
> async
> > > > callbacks when the pig job completes, is there any real need to keep
> > the
> > > > pig job submitting thread waiting until the job completes?
> > > >
> > > > Is this just a shortcoming today or are there more concrete reasons
> > > against
> > > > providing with a pigserver which can submit to the cluster in
> mapreduce
> > > > mode async?
> > > >
> > > > Thanks,
> > > > Praveen
> > > >
> > > >
> > > >
> > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> jcoveney@gmail.com
> > > > >wrote:
> > > >
> > > > > I think whatever way you slice it, handling thousands of pig jobs
> > > > > asynchronously is going to be a bear. I mean, this is essentially
> > what
> > > > the
> > > > > job tracker does, albeit with a lot less information.
> > > > >
> > > > > Either way, Pig is not multi-threaded so having more than one
> > instance
> > > of
> > > > > Pig in the same JVM is going to start causing problems (which is
> > why, I
> > > > > imagine, there is no async way to call Pig). So multiple processes
> is
> > > > > really the only way around it that I know of.
> > > > >
> > > > > At Twitter we have a deployment of mesos, and our long term
> solution
> > is
> > > > > going to be running all of our pig jobs on mesos, in the short term
> > by
> > > > > deploying daemons that run pig jobs as local processes.
> > > > >
> > > > >
> > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > >
> > > > > > Both. Think of it as an app server handling all of these
> requests.
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <
> jcoveney@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > > >
> > > > > > >
> > > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > >
> > > > > > >> Did not want to have several threads launched for this. We
> might
> > > > have
> > > > > > >> thousands of requests coming in, and the app is doing a lot
> more
> > > > than
> > > > > > only
> > > > > > >> Pig.
> > > > > > >>
> > > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > > jcoveney@gmail.com
> > > > > > >>> wrote:
> > > > > > >>
> > > > > > >>> start a separate Process which runs Pig?
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > > >>>
> > > > > > >>>> Hey guys,
> > > > > > >>>>
> > > > > > >>>> I am trying to do the following:
> > > > > > >>>>
> > > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > > >>>>   2. Get a notification once the job is complete (something
> > > > similar
> > > > > to
> > > > > > >>>>   Hadoop callback with a servlet)
> > > > > > >>>>
> > > > > > >>>> I looked at PigServer.executeBatch() and it seems to be
> > waiting
> > > > > until
> > > > > > >> job
> > > > > > >>>> completes.This is not what I would like my app to do.
> > > > > > >>>>
> > > > > > >>>> Any ideas?
> > > > > > >>>>
> > > > > > >>>> Thanks,
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -Praveen
> > > >
> > >
> >
>

Re: Run a job async

Posted by Ramakrishna Nalam <nr...@gmail.com>.
That clarifies it for me, thanks a lot.

Regards,
Rama.


On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> Well, when I say that Pig is not multi-threaded, what I mean is that if you
> have multiple threads that run a query via PigServer, there is a great
> chance of the internals clashing because of the use of static variables
> within Pig. Pig itself, when running a single query, is multi-threaded.
> It's just not "multi-threaded" in the sense that multiple instances can
> safely be run in the same JVM.
>
>
> 2013/1/24 Ramakrishna Nalam <nr...@gmail.com>
>
> > Hi Jonathan,
> >
> > Pardon if it's a naive question, but Interesting that you say Pig is not
> > multithreaded.
> > We're using Pig 0.10.0, and looking at the code, it seems to do the right
> > things to handle multi threaded requests (ThreadLocal for ScriptState for
> > eg).
> >
> > Would be great if you can point out to the kind of issues there could be.
> >
> >
> > Regards,
> > Rama.
> >
> >
> >
> > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <le...@gmail.com>
> > wrote:
> >
> > > Are there any plans on making the pigserver multi-threaded?
> > >
> > > since there is "PigProcessNotificationListener" to subscribe for async
> > > callbacks when the pig job completes, is there any real need to keep
> the
> > > pig job submitting thread waiting until the job completes?
> > >
> > > Is this just a shortcoming today or are there more concrete reasons
> > against
> > > providing with a pigserver which can submit to the cluster in mapreduce
> > > mode async?
> > >
> > > Thanks,
> > > Praveen
> > >
> > >
> > >
> > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <jcoveney@gmail.com
> > > >wrote:
> > >
> > > > I think whatever way you slice it, handling thousands of pig jobs
> > > > asynchronously is going to be a bear. I mean, this is essentially
> what
> > > the
> > > > job tracker does, albeit with a lot less information.
> > > >
> > > > Either way, Pig is not multi-threaded so having more than one
> instance
> > of
> > > > Pig in the same JVM is going to start causing problems (which is
> why, I
> > > > imagine, there is no async way to call Pig). So multiple processes is
> > > > really the only way around it that I know of.
> > > >
> > > > At Twitter we have a deployment of mesos, and our long term solution
> is
> > > > going to be running all of our pig jobs on mesos, in the short term
> by
> > > > deploying daemons that run pig jobs as local processes.
> > > >
> > > >
> > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > >
> > > > > Both. Think of it as an app server handling all of these requests.
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > > >
> > > > > >
> > > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > >
> > > > > >> Did not want to have several threads launched for this. We might
> > > have
> > > > > >> thousands of requests coming in, and the app is doing a lot more
> > > than
> > > > > only
> > > > > >> Pig.
> > > > > >>
> > > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > > jcoveney@gmail.com
> > > > > >>> wrote:
> > > > > >>
> > > > > >>> start a separate Process which runs Pig?
> > > > > >>>
> > > > > >>>
> > > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > > >>>
> > > > > >>>> Hey guys,
> > > > > >>>>
> > > > > >>>> I am trying to do the following:
> > > > > >>>>
> > > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > > >>>>   2. Get a notification once the job is complete (something
> > > similar
> > > > to
> > > > > >>>>   Hadoop callback with a servlet)
> > > > > >>>>
> > > > > >>>> I looked at PigServer.executeBatch() and it seems to be
> waiting
> > > > until
> > > > > >> job
> > > > > >>>> completes.This is not what I would like my app to do.
> > > > > >>>>
> > > > > >>>> Any ideas?
> > > > > >>>>
> > > > > >>>> Thanks,
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Praveen
> > >
> >
>

Re: Run a job async

Posted by Jonathan Coveney <jc...@gmail.com>.
Well, when I say that Pig is not multi-threaded, what I mean is that if you
have multiple threads that run a query via PigServer, there is a great
chance of the internals clashing because of the use of static variables
within Pig. Pig itself, when running a single query, is multi-threaded.
It's just not "multi-threaded" in the sense that multiple instances can
safely be run in the same JVM.


2013/1/24 Ramakrishna Nalam <nr...@gmail.com>

> Hi Jonathan,
>
> Pardon if it's a naive question, but Interesting that you say Pig is not
> multithreaded.
> We're using Pig 0.10.0, and looking at the code, it seems to do the right
> things to handle multi threaded requests (ThreadLocal for ScriptState for
> eg).
>
> Would be great if you can point out to the kind of issues there could be.
>
>
> Regards,
> Rama.
>
>
>
> On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <le...@gmail.com>
> wrote:
>
> > Are there any plans on making the pigserver multi-threaded?
> >
> > since there is "PigProcessNotificationListener" to subscribe for async
> > callbacks when the pig job completes, is there any real need to keep the
> > pig job submitting thread waiting until the job completes?
> >
> > Is this just a shortcoming today or are there more concrete reasons
> against
> > providing with a pigserver which can submit to the cluster in mapreduce
> > mode async?
> >
> > Thanks,
> > Praveen
> >
> >
> >
> > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <jcoveney@gmail.com
> > >wrote:
> >
> > > I think whatever way you slice it, handling thousands of pig jobs
> > > asynchronously is going to be a bear. I mean, this is essentially what
> > the
> > > job tracker does, albeit with a lot less information.
> > >
> > > Either way, Pig is not multi-threaded so having more than one instance
> of
> > > Pig in the same JVM is going to start causing problems (which is why, I
> > > imagine, there is no async way to call Pig). So multiple processes is
> > > really the only way around it that I know of.
> > >
> > > At Twitter we have a deployment of mesos, and our long term solution is
> > > going to be running all of our pig jobs on mesos, in the short term by
> > > deploying daemons that run pig jobs as local processes.
> > >
> > >
> > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > >
> > > > Both. Think of it as an app server handling all of these requests.
> > > >
> > > > Sent from my iPhone
> > > >
> > > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com>
> > > wrote:
> > > >
> > > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > > >
> > > > >
> > > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > >
> > > > >> Did not want to have several threads launched for this. We might
> > have
> > > > >> thousands of requests coming in, and the app is doing a lot more
> > than
> > > > only
> > > > >> Pig.
> > > > >>
> > > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> > jcoveney@gmail.com
> > > > >>> wrote:
> > > > >>
> > > > >>> start a separate Process which runs Pig?
> > > > >>>
> > > > >>>
> > > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > > >>>
> > > > >>>> Hey guys,
> > > > >>>>
> > > > >>>> I am trying to do the following:
> > > > >>>>
> > > > >>>>   1. Launch a pig job asynchronously via Java program
> > > > >>>>   2. Get a notification once the job is complete (something
> > similar
> > > to
> > > > >>>>   Hadoop callback with a servlet)
> > > > >>>>
> > > > >>>> I looked at PigServer.executeBatch() and it seems to be waiting
> > > until
> > > > >> job
> > > > >>>> completes.This is not what I would like my app to do.
> > > > >>>>
> > > > >>>> Any ideas?
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> >
> >
> > --
> > -Praveen
> >
>

Re: Run a job async

Posted by Ramakrishna Nalam <nr...@gmail.com>.
Hi Jonathan,

Pardon if it's a naive question, but Interesting that you say Pig is not
multithreaded.
We're using Pig 0.10.0, and looking at the code, it seems to do the right
things to handle multi threaded requests (ThreadLocal for ScriptState for
eg).

Would be great if you can point out to the kind of issues there could be.


Regards,
Rama.



On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <le...@gmail.com> wrote:

> Are there any plans on making the pigserver multi-threaded?
>
> since there is "PigProcessNotificationListener" to subscribe for async
> callbacks when the pig job completes, is there any real need to keep the
> pig job submitting thread waiting until the job completes?
>
> Is this just a shortcoming today or are there more concrete reasons against
> providing with a pigserver which can submit to the cluster in mapreduce
> mode async?
>
> Thanks,
> Praveen
>
>
>
> On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > I think whatever way you slice it, handling thousands of pig jobs
> > asynchronously is going to be a bear. I mean, this is essentially what
> the
> > job tracker does, albeit with a lot less information.
> >
> > Either way, Pig is not multi-threaded so having more than one instance of
> > Pig in the same JVM is going to start causing problems (which is why, I
> > imagine, there is no async way to call Pig). So multiple processes is
> > really the only way around it that I know of.
> >
> > At Twitter we have a deployment of mesos, and our long term solution is
> > going to be running all of our pig jobs on mesos, in the short term by
> > deploying daemons that run pig jobs as local processes.
> >
> >
> > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> >
> > > Both. Think of it as an app server handling all of these requests.
> > >
> > > Sent from my iPhone
> > >
> > > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com>
> > wrote:
> > >
> > > > Thousands of requests, or thousands of Pig jobs? Or both?
> > > >
> > > >
> > > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > >
> > > >> Did not want to have several threads launched for this. We might
> have
> > > >> thousands of requests coming in, and the app is doing a lot more
> than
> > > only
> > > >> Pig.
> > > >>
> > > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> jcoveney@gmail.com
> > > >>> wrote:
> > > >>
> > > >>> start a separate Process which runs Pig?
> > > >>>
> > > >>>
> > > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > > >>>
> > > >>>> Hey guys,
> > > >>>>
> > > >>>> I am trying to do the following:
> > > >>>>
> > > >>>>   1. Launch a pig job asynchronously via Java program
> > > >>>>   2. Get a notification once the job is complete (something
> similar
> > to
> > > >>>>   Hadoop callback with a servlet)
> > > >>>>
> > > >>>> I looked at PigServer.executeBatch() and it seems to be waiting
> > until
> > > >> job
> > > >>>> completes.This is not what I would like my app to do.
> > > >>>>
> > > >>>> Any ideas?
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>
>
>
> --
> -Praveen
>

Re: Run a job async

Posted by Praveen M <le...@gmail.com>.
Are there any plans on making the pigserver multi-threaded?

since there is "PigProcessNotificationListener" to subscribe for async
callbacks when the pig job completes, is there any real need to keep the
pig job submitting thread waiting until the job completes?

Is this just a shortcoming today or are there more concrete reasons against
providing with a pigserver which can submit to the cluster in mapreduce
mode async?

Thanks,
Praveen



On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> I think whatever way you slice it, handling thousands of pig jobs
> asynchronously is going to be a bear. I mean, this is essentially what the
> job tracker does, albeit with a lot less information.
>
> Either way, Pig is not multi-threaded so having more than one instance of
> Pig in the same JVM is going to start causing problems (which is why, I
> imagine, there is no async way to call Pig). So multiple processes is
> really the only way around it that I know of.
>
> At Twitter we have a deployment of mesos, and our long term solution is
> going to be running all of our pig jobs on mesos, in the short term by
> deploying daemons that run pig jobs as local processes.
>
>
> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>
> > Both. Think of it as an app server handling all of these requests.
> >
> > Sent from my iPhone
> >
> > On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com>
> wrote:
> >
> > > Thousands of requests, or thousands of Pig jobs? Or both?
> > >
> > >
> > > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > >
> > >> Did not want to have several threads launched for this. We might have
> > >> thousands of requests coming in, and the app is doing a lot more than
> > only
> > >> Pig.
> > >>
> > >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jcoveney@gmail.com
> > >>> wrote:
> > >>
> > >>> start a separate Process which runs Pig?
> > >>>
> > >>>
> > >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> > >>>
> > >>>> Hey guys,
> > >>>>
> > >>>> I am trying to do the following:
> > >>>>
> > >>>>   1. Launch a pig job asynchronously via Java program
> > >>>>   2. Get a notification once the job is complete (something similar
> to
> > >>>>   Hadoop callback with a servlet)
> > >>>>
> > >>>> I looked at PigServer.executeBatch() and it seems to be waiting
> until
> > >> job
> > >>>> completes.This is not what I would like my app to do.
> > >>>>
> > >>>> Any ideas?
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>
> > >>
> >
>



-- 
-Praveen

Re: Run a job async

Posted by Jonathan Coveney <jc...@gmail.com>.
I think whatever way you slice it, handling thousands of pig jobs
asynchronously is going to be a bear. I mean, this is essentially what the
job tracker does, albeit with a lot less information.

Either way, Pig is not multi-threaded so having more than one instance of
Pig in the same JVM is going to start causing problems (which is why, I
imagine, there is no async way to call Pig). So multiple processes is
really the only way around it that I know of.

At Twitter we have a deployment of mesos, and our long term solution is
going to be running all of our pig jobs on mesos, in the short term by
deploying daemons that run pig jobs as local processes.


2013/1/23 Prashant Kommireddi <pr...@gmail.com>

> Both. Think of it as an app server handling all of these requests.
>
> Sent from my iPhone
>
> On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com> wrote:
>
> > Thousands of requests, or thousands of Pig jobs? Or both?
> >
> >
> > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> >
> >> Did not want to have several threads launched for this. We might have
> >> thousands of requests coming in, and the app is doing a lot more than
> only
> >> Pig.
> >>
> >> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jcoveney@gmail.com
> >>> wrote:
> >>
> >>> start a separate Process which runs Pig?
> >>>
> >>>
> >>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> >>>
> >>>> Hey guys,
> >>>>
> >>>> I am trying to do the following:
> >>>>
> >>>>   1. Launch a pig job asynchronously via Java program
> >>>>   2. Get a notification once the job is complete (something similar to
> >>>>   Hadoop callback with a servlet)
> >>>>
> >>>> I looked at PigServer.executeBatch() and it seems to be waiting until
> >> job
> >>>> completes.This is not what I would like my app to do.
> >>>>
> >>>> Any ideas?
> >>>>
> >>>> Thanks,
> >>>>
> >>>
> >>
>

execute pig command in Java program

Posted by Dan Yi <dy...@medio.com>.
Hi, 

I used to write PHP code to execute pig command, it worked well,
Now I switch to Java but seems it won't work, here is my code:

String pigCommand = "pig -x local -p ouput=/tmp my_pig_script.pig";
		
		Runtime r = Runtime.getRuntime();
		Process p;
		
		int exitVal;

		try {
			p = r.exec(pigCommand);
			exitVal = p.waitFor();
			BufferedReader br = new BufferedReader(new
InputStreamReader(p.getInputStream()));
			String line = null;
					
			while((line = br.readLine()) != null) {
				System.out.println(line);
			}
			br.close();

			System.out.println("exitVal: " + exitVal);
			System.out.println("Done");



If I run the that pig command in console directly, it works, if I replace
that 
Pig command with other shell command say 'ping www.yahoo.com', and run the
java
Program, it works too. So what might be the problem?

Thanks.


Re: Run a job async

Posted by Prashant Kommireddi <pr...@gmail.com>.
Thanks Alan, this was helpful.

On Thu, Jan 24, 2013 at 9:46 AM, Alan Gates <ga...@hortonworks.com> wrote:

> You might want to look at webhcat's code.  It produces a servlet that it
> embeds in a jetty server.  You may be able to copy paste this to get what
> you want.
>
> The code of interest is in the hcat repository under webhcat/svr.
>
> Alan.
>
> On Jan 24, 2013, at 9:42 AM, Prashant Kommireddi wrote:
>
> > Thanks Alan. We are trying to plug Pig into our existing app server.
> > We have already done this for Java MR. The difficulty we are facing is
> > with the fact that we can use JobClient.submitJob and jobtracker's job
> > end notification to run jobs async, whereas PigServer.executeBatch
> > blocks until pig job is complete.
> >
> > Sent from my iPhone
> >
> > On Jan 24, 2013, at 9:31 AM, Alan Gates <ga...@hortonworks.com> wrote:
> >
> >> If you're looking for an app server for Pig I'd take a look at a couple
> of other projects already out there that can do this:
> >>
> >> 1) webhcat (fka Templeton, now part of the HCatalog project).  It
> provides a REST API that launches Pig, Hive, or MR jobs and allows you to
> manage them, get results, etc.  It's in HCatalog 0.5, which is in the
> release candidate state.  You can go to
> http://people.apache.org/~travis/hcatalog-0.5.0-incubating-candidate-1/and pick up the release candidate.
> >>
> >> 2) Oozie.  Oozie's a workflow engine for Hadoop, but it also supports
> submission of single Pig or MR jobs via REST.  It may be a little
> heavyweight for what you want but it works.
> >>
> >> Alan.
> >>
> >> On Jan 23, 2013, at 9:22 PM, Prashant Kommireddi wrote:
> >>
> >>> Both. Think of it as an app server handling all of these requests.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com>
> wrote:
> >>>
> >>>> Thousands of requests, or thousands of Pig jobs? Or both?
> >>>>
> >>>>
> >>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> >>>>
> >>>>> Did not want to have several threads launched for this. We might have
> >>>>> thousands of requests coming in, and the app is doing a lot more
> than only
> >>>>> Pig.
> >>>>>
> >>>>> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <
> jcoveney@gmail.com
> >>>>>> wrote:
> >>>>>
> >>>>>> start a separate Process which runs Pig?
> >>>>>>
> >>>>>>
> >>>>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> >>>>>>
> >>>>>>> Hey guys,
> >>>>>>>
> >>>>>>> I am trying to do the following:
> >>>>>>>
> >>>>>>> 1. Launch a pig job asynchronously via Java program
> >>>>>>> 2. Get a notification once the job is complete (something similar
> to
> >>>>>>> Hadoop callback with a servlet)
> >>>>>>>
> >>>>>>> I looked at PigServer.executeBatch() and it seems to be waiting
> until
> >>>>> job
> >>>>>>> completes.This is not what I would like my app to do.
> >>>>>>>
> >>>>>>> Any ideas?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>
> >>>>>
> >>
>
>

Re: Run a job async

Posted by Alan Gates <ga...@hortonworks.com>.
You might want to look at webhcat's code.  It produces a servlet that it embeds in a jetty server.  You may be able to copy paste this to get what you want.

The code of interest is in the hcat repository under webhcat/svr.

Alan.

On Jan 24, 2013, at 9:42 AM, Prashant Kommireddi wrote:

> Thanks Alan. We are trying to plug Pig into our existing app server.
> We have already done this for Java MR. The difficulty we are facing is
> with the fact that we can use JobClient.submitJob and jobtracker's job
> end notification to run jobs async, whereas PigServer.executeBatch
> blocks until pig job is complete.
> 
> Sent from my iPhone
> 
> On Jan 24, 2013, at 9:31 AM, Alan Gates <ga...@hortonworks.com> wrote:
> 
>> If you're looking for an app server for Pig I'd take a look at a couple of other projects already out there that can do this:
>> 
>> 1) webhcat (fka Templeton, now part of the HCatalog project).  It provides a REST API that launches Pig, Hive, or MR jobs and allows you to manage them, get results, etc.  It's in HCatalog 0.5, which is in the release candidate state.  You can go to http://people.apache.org/~travis/hcatalog-0.5.0-incubating-candidate-1/ and pick up the release candidate.
>> 
>> 2) Oozie.  Oozie's a workflow engine for Hadoop, but it also supports submission of single Pig or MR jobs via REST.  It may be a little heavyweight for what you want but it works.
>> 
>> Alan.
>> 
>> On Jan 23, 2013, at 9:22 PM, Prashant Kommireddi wrote:
>> 
>>> Both. Think of it as an app server handling all of these requests.
>>> 
>>> Sent from my iPhone
>>> 
>>> On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com> wrote:
>>> 
>>>> Thousands of requests, or thousands of Pig jobs? Or both?
>>>> 
>>>> 
>>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>>>> 
>>>>> Did not want to have several threads launched for this. We might have
>>>>> thousands of requests coming in, and the app is doing a lot more than only
>>>>> Pig.
>>>>> 
>>>>> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jcoveney@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> start a separate Process which runs Pig?
>>>>>> 
>>>>>> 
>>>>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>>>>>> 
>>>>>>> Hey guys,
>>>>>>> 
>>>>>>> I am trying to do the following:
>>>>>>> 
>>>>>>> 1. Launch a pig job asynchronously via Java program
>>>>>>> 2. Get a notification once the job is complete (something similar to
>>>>>>> Hadoop callback with a servlet)
>>>>>>> 
>>>>>>> I looked at PigServer.executeBatch() and it seems to be waiting until
>>>>> job
>>>>>>> completes.This is not what I would like my app to do.
>>>>>>> 
>>>>>>> Any ideas?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>> 
>>>>> 
>> 


Re: Run a job async

Posted by Prashant Kommireddi <pr...@gmail.com>.
Thanks Alan. We are trying to plug Pig into our existing app server.
We have already done this for Java MR. The difficulty we are facing is
with the fact that we can use JobClient.submitJob and jobtracker's job
end notification to run jobs async, whereas PigServer.executeBatch
blocks until pig job is complete.

Sent from my iPhone

On Jan 24, 2013, at 9:31 AM, Alan Gates <ga...@hortonworks.com> wrote:

> If you're looking for an app server for Pig I'd take a look at a couple of other projects already out there that can do this:
>
> 1) webhcat (fka Templeton, now part of the HCatalog project).  It provides a REST API that launches Pig, Hive, or MR jobs and allows you to manage them, get results, etc.  It's in HCatalog 0.5, which is in the release candidate state.  You can go to http://people.apache.org/~travis/hcatalog-0.5.0-incubating-candidate-1/ and pick up the release candidate.
>
> 2) Oozie.  Oozie's a workflow engine for Hadoop, but it also supports submission of single Pig or MR jobs via REST.  It may be a little heavyweight for what you want but it works.
>
> Alan.
>
> On Jan 23, 2013, at 9:22 PM, Prashant Kommireddi wrote:
>
>> Both. Think of it as an app server handling all of these requests.
>>
>> Sent from my iPhone
>>
>> On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com> wrote:
>>
>>> Thousands of requests, or thousands of Pig jobs? Or both?
>>>
>>>
>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>>>
>>>> Did not want to have several threads launched for this. We might have
>>>> thousands of requests coming in, and the app is doing a lot more than only
>>>> Pig.
>>>>
>>>> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jcoveney@gmail.com
>>>>> wrote:
>>>>
>>>>> start a separate Process which runs Pig?
>>>>>
>>>>>
>>>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>>>>>
>>>>>> Hey guys,
>>>>>>
>>>>>> I am trying to do the following:
>>>>>>
>>>>>> 1. Launch a pig job asynchronously via Java program
>>>>>> 2. Get a notification once the job is complete (something similar to
>>>>>> Hadoop callback with a servlet)
>>>>>>
>>>>>> I looked at PigServer.executeBatch() and it seems to be waiting until
>>>> job
>>>>>> completes.This is not what I would like my app to do.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>
>>>>
>

Re: Run a job async

Posted by Alan Gates <ga...@hortonworks.com>.
If you're looking for an app server for Pig I'd take a look at a couple of other projects already out there that can do this:

1) webhcat (fka Templeton, now part of the HCatalog project).  It provides a REST API that launches Pig, Hive, or MR jobs and allows you to manage them, get results, etc.  It's in HCatalog 0.5, which is in the release candidate state.  You can go to http://people.apache.org/~travis/hcatalog-0.5.0-incubating-candidate-1/ and pick up the release candidate.

2) Oozie.  Oozie's a workflow engine for Hadoop, but it also supports submission of single Pig or MR jobs via REST.  It may be a little heavyweight for what you want but it works.

Alan.

On Jan 23, 2013, at 9:22 PM, Prashant Kommireddi wrote:

> Both. Think of it as an app server handling all of these requests.
> 
> Sent from my iPhone
> 
> On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com> wrote:
> 
>> Thousands of requests, or thousands of Pig jobs? Or both?
>> 
>> 
>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>> 
>>> Did not want to have several threads launched for this. We might have
>>> thousands of requests coming in, and the app is doing a lot more than only
>>> Pig.
>>> 
>>> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jcoveney@gmail.com
>>>> wrote:
>>> 
>>>> start a separate Process which runs Pig?
>>>> 
>>>> 
>>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>>>> 
>>>>> Hey guys,
>>>>> 
>>>>> I am trying to do the following:
>>>>> 
>>>>>  1. Launch a pig job asynchronously via Java program
>>>>>  2. Get a notification once the job is complete (something similar to
>>>>>  Hadoop callback with a servlet)
>>>>> 
>>>>> I looked at PigServer.executeBatch() and it seems to be waiting until
>>> job
>>>>> completes.This is not what I would like my app to do.
>>>>> 
>>>>> Any ideas?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>> 
>>> 


Re: Run a job async

Posted by Prashant Kommireddi <pr...@gmail.com>.
Both. Think of it as an app server handling all of these requests.

Sent from my iPhone

On Jan 23, 2013, at 9:09 PM, Jonathan Coveney <jc...@gmail.com> wrote:

> Thousands of requests, or thousands of Pig jobs? Or both?
>
>
> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>
>> Did not want to have several threads launched for this. We might have
>> thousands of requests coming in, and the app is doing a lot more than only
>> Pig.
>>
>> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jcoveney@gmail.com
>>> wrote:
>>
>>> start a separate Process which runs Pig?
>>>
>>>
>>> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>>>
>>>> Hey guys,
>>>>
>>>> I am trying to do the following:
>>>>
>>>>   1. Launch a pig job asynchronously via Java program
>>>>   2. Get a notification once the job is complete (something similar to
>>>>   Hadoop callback with a servlet)
>>>>
>>>> I looked at PigServer.executeBatch() and it seems to be waiting until
>> job
>>>> completes.This is not what I would like my app to do.
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks,
>>>>
>>>
>>

Re: Run a job async

Posted by Jonathan Coveney <jc...@gmail.com>.
Thousands of requests, or thousands of Pig jobs? Or both?


2013/1/23 Prashant Kommireddi <pr...@gmail.com>

> Did not want to have several threads launched for this. We might have
> thousands of requests coming in, and the app is doing a lot more than only
> Pig.
>
> On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > start a separate Process which runs Pig?
> >
> >
> > 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
> >
> > > Hey guys,
> > >
> > > I am trying to do the following:
> > >
> > >    1. Launch a pig job asynchronously via Java program
> > >    2. Get a notification once the job is complete (something similar to
> > >    Hadoop callback with a servlet)
> > >
> > > I looked at PigServer.executeBatch() and it seems to be waiting until
> job
> > > completes.This is not what I would like my app to do.
> > >
> > > Any ideas?
> > >
> > > Thanks,
> > >
> >
>

Re: Run a job async

Posted by Prashant Kommireddi <pr...@gmail.com>.
Did not want to have several threads launched for this. We might have
thousands of requests coming in, and the app is doing a lot more than only
Pig.

On Wed, Jan 23, 2013 at 5:44 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> start a separate Process which runs Pig?
>
>
> 2013/1/23 Prashant Kommireddi <pr...@gmail.com>
>
> > Hey guys,
> >
> > I am trying to do the following:
> >
> >    1. Launch a pig job asynchronously via Java program
> >    2. Get a notification once the job is complete (something similar to
> >    Hadoop callback with a servlet)
> >
> > I looked at PigServer.executeBatch() and it seems to be waiting until job
> > completes.This is not what I would like my app to do.
> >
> > Any ideas?
> >
> > Thanks,
> >
>

Re: Run a job async

Posted by Jonathan Coveney <jc...@gmail.com>.
start a separate Process which runs Pig?


2013/1/23 Prashant Kommireddi <pr...@gmail.com>

> Hey guys,
>
> I am trying to do the following:
>
>    1. Launch a pig job asynchronously via Java program
>    2. Get a notification once the job is complete (something similar to
>    Hadoop callback with a servlet)
>
> I looked at PigServer.executeBatch() and it seems to be waiting until job
> completes.This is not what I would like my app to do.
>
> Any ideas?
>
> Thanks,
>

Re: Run a job async

Posted by Bill Graham <bi...@gmail.com>.
You can create in instance of PigProcessNotificationListener that calls
back when the job finishes.


On Wed, Jan 23, 2013 at 4:48 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Hey guys,
>
> I am trying to do the following:
>
>    1. Launch a pig job asynchronously via Java program
>    2. Get a notification once the job is complete (something similar to
>    Hadoop callback with a servlet)
>
> I looked at PigServer.executeBatch() and it seems to be waiting until job
> completes.This is not what I would like my app to do.
>
> Any ideas?
>
> Thanks,
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*