You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by 李运田 <cu...@163.com> on 2015/04/28 10:51:36 UTC

many users can use one pigserver?

I have a big web ETL system ,which use pig as ETL tool. when I start tomcat , I also start one pigserver,and all users use the only  pigserver,perhaps when these users use this system ,there will be some unknown errors ?

RE: many users can use one pigserver?

Posted by re...@orange.com.
I use pig with several PigServers inside multi-threaded long-living daemon process. I allocate a new PigServer dedicated to one mono-thread job, shutdown after each job. The daemons spend their time dealing with jobs. So I think I'm pretty close to your use-case of PIG.

There are still some thread-safe bugs do to static variables not protected by using ThreadLocal (I've patched SchemaTupleFrontend and SchemaTupleBackend to fix this, the bug appear even if you don't activate the schema code generation feature). Also SchemaTupleFrontend and backend "leaks" file (they create temporary files and folders, with delete on exist setted, but when running pig in daemons that run for ages, you end up with those files never being deleted). I state "leaks" because this way it is done if fine when using pig in batch mode (full JVM start, pig run some (few) scripts jobs, JVM shutdown use-case). With long-living daemons, the JVM just never shutdown so delete-on-exist is just never trigerred. I end up with and inode saturation of my /tmp folder due to this, again, I've a patch for it but cannot really be sure I don't break something elsewhere. I've not submitted the patch to the community yet because I've a hard time setting the true pig dev env and so I cannot be sure I'm not breaking something elsewhere.

In conclusion : running many threads on long-living daemons will trigger 2 problems with current pig version : inode saturation and thread safe issues. The inode saturation problem also exist in JarManager class when dealing with PIG UDL auto generated jar file (got a patch for this one too). I will check jira to see if the bugs are already created into jira and propose my patch. Even if I cannot really test it with the unit tests from pig, it's running in production on my side for monthes without a glitch, maybe that's enough testing to push the patch to the community :)

-----Message d'origine-----
De : Xuefu Zhang [mailto:xzhang@cloudera.com] 
Envoyé : mardi 28 avril 2015 17:35
À : dev@pig.apache.org
Cc : user
Objet : Re: many users can use one pigserver?

PigServer has state, which isn't meant to be shared by multiple user sessions. On the other hand, PIG-1784 made PigServer thread-safe, so depending on your version, you may choose having multiple instances in your Tomcat, one for each user session.

On Tue, Apr 28, 2015 at 1:51 AM, 李运田 <cu...@163.com> wrote:

> I have a big web ETL system ,which use pig as ETL tool. when I start 
> tomcat , I also start one pigserver,and all users use the only 
> pigserver,perhaps when these users use this system ,there will be some 
> unknown errors ?

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: many users can use one pigserver?

Posted by Xuefu Zhang <xz...@cloudera.com>.
PigServer has state, which isn't meant to be shared by multiple user
sessions. On the other hand, PIG-1784 made PigServer thread-safe, so
depending on your version, you may choose having multiple instances in your
Tomcat, one for each user session.

On Tue, Apr 28, 2015 at 1:51 AM, 李运田 <cu...@163.com> wrote:

> I have a big web ETL system ,which use pig as ETL tool. when I start
> tomcat , I also start one pigserver,and all users use the only
> pigserver,perhaps when these users use this system ,there will be some
> unknown errors ?