You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Arindam Choudhury <ar...@gmail.com> on 2012/04/20 14:07:38 UTC

remote job submission

Hi,

Do hadoop have any web service or other interface so I can submit jobs from
remote machine?

Thanks,
Arindam

Re: remote job submission

Posted by Harsh J <ha...@cloudera.com>.
By "previous files" I meant the job related files there. DataNodes are
persistent members in HDFS. A removal of a DN results in loss of
blocks. Usually you have replication handling failures of DN
flawlessly, but consider a 1-replication cluster. A DN downtime can't
be acceptable in that case.

Writes to HDFS is done by writing blocks directly to DN, so a
JobClient does need access to it to write its job-related files to
HDFS.

On Sat, Apr 21, 2012 at 8:33 PM, JAX <ja...@gmail.com> wrote:
> Thanks j harsh:
> I have another question , though ---
>
> You mentioned that :
>
> The client needs access to
> " the
> DataNodes (for actually writing the previous files to DFS for the
> JobTracker to pick up)"
>
> What do you mean by previous files? It seems like, if designing Hadoop from scratch , I wouldn't want to force the client to communicate with data nodes at all, since those can be added and removed during a job.
>
> Jay Vyas
> MMSB
> UCHC
>
> On Apr 21, 2012, at 1:14 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> the
>> DataNodes (for actually writing the previous files to DFS for the
>> JobTracker to pick up)



-- 
Harsh J

Re: remote job submission

Posted by JAX <ja...@gmail.com>.
Thanks j harsh: 
I have another question , though ---

You mentioned that :

The client needs access to 
" the
DataNodes (for actually writing the previous files to DFS for the
JobTracker to pick up)" 

What do you mean by previous files? It seems like, if designing Hadoop from scratch , I wouldn't want to force the client to communicate with data nodes at all, since those can be added and removed during a job.

Jay Vyas 
MMSB
UCHC

On Apr 21, 2012, at 1:14 AM, Harsh J <ha...@cloudera.com> wrote:

> the
> DataNodes (for actually writing the previous files to DFS for the
> JobTracker to pick up)

Re: remote job submission

Posted by Harsh J <ha...@cloudera.com>.
Hi,

A JobClient is something that facilitates validating your job
configuration and shipping necessities to the cluster and notifying
the JobTracker of that new job. Afterwards, its responsibility may
merely be to monitor progress via reports from
JobTracker(MR1)/ApplicationMaster(MR2).

A client need not concern themselves, nor be aware about TaskTrackers
(or NodeManagers). These are non-permanent members of a cluster and do
not carry (critical) persistent states. The scheduling
of job and its tasks is taken care of from the JobTracker in MR1 (or
the MR Application's ApplicationMaster in MR2). The only thing a
JobClient running user needs to ensure is that he has access to the
NameNode (For creating staging files - job jar, job xml, etc.), the
DataNodes (for actually writing the previous files to DFS for the
JobTracker to pick up) and the JobTracker/Scheduler (for protocol
communication required to notify the cluster of a job and that its
resources are now ready to launch - and also monitoring progress)

On Sat, Apr 21, 2012 at 5:36 AM, JAX <ja...@gmail.com> wrote:
> RE anirunds question on "how to submit a job remotely".
>
> Here are my follow up questions - hope this helps to guide the discussion:
>
> 1) Normally - what is the "job client"? Do you guys typically use the namenode as the client?
>
> 2) In the case where the client != name node ---- how does the client know how to start up the task trackers ?
>
> UCHC
>
> On Apr 20, 2012, at 11:19 AM, Amith D K <am...@huawei.com> wrote:
>
>> I dont know your use case if its for test and
>> ssh across the machine are disabled then u write a script that can do ssh run the jobs using cli for running your jobs. U can check ssh usage.
>>
>> Or else use Ooze
>> ________________________________________
>> From: Robert Evans [evans@yahoo-inc.com]
>> Sent: Friday, April 20, 2012 11:17 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: remote job submission
>>
>> You can use Oozie to do it.
>>
>>
>> On 4/20/12 8:45 AM, "Arindam Choudhury" <ar...@gmail.com> wrote:
>>
>> Sorry. But I can you give me a example.
>>
>> On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Arindam,
>>>
>>> If your machine can access the clusters' NN/JT/DN ports, then you can
>>> simply run your job from the machine itself.
>>>
>>> On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury
>>> <ar...@gmail.com> wrote:
>>>> "If you are allowed a remote connection to the cluster's service ports,
>>>> then you can directly submit your jobs from your local CLI. Just make
>>>> sure your local configuration points to the right locations."
>>>>
>>>> Can you elaborate in details please?
>>>>
>>>> On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> If you are allowed a remote connection to the cluster's service ports,
>>>>> then you can directly submit your jobs from your local CLI. Just make
>>>>> sure your local configuration points to the right locations.
>>>>>
>>>>> Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
>>>>> (http://incubator.apache.org/oozie/) It does provide a REST interface
>>>>> that launches jobs up for you over the supplied clusters, but its more
>>>>> oriented towards workflow management or perhaps HUE:
>>>>> https://github.com/cloudera/hue
>>>>>
>>>>> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
>>>>> <ar...@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Do hadoop have any web service or other interface so I can submit jobs
>>>>> from
>>>>>> remote machine?
>>>>>>
>>>>>> Thanks,
>>>>>> Arindam
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>



-- 
Harsh J

Re: remote job submission

Posted by JAX <ja...@gmail.com>.
RE anirunds question on "how to submit a job remotely".  

Here are my follow up questions - hope this helps to guide the discussion: 

1) Normally - what is the "job client"? Do you guys typically use the namenode as the client? 

2) In the case where the client != name node ---- how does the client know how to start up the task trackers ?

UCHC

On Apr 20, 2012, at 11:19 AM, Amith D K <am...@huawei.com> wrote:

> I dont know your use case if its for test and
> ssh across the machine are disabled then u write a script that can do ssh run the jobs using cli for running your jobs. U can check ssh usage.
> 
> Or else use Ooze
> ________________________________________
> From: Robert Evans [evans@yahoo-inc.com]
> Sent: Friday, April 20, 2012 11:17 PM
> To: common-user@hadoop.apache.org
> Subject: Re: remote job submission
> 
> You can use Oozie to do it.
> 
> 
> On 4/20/12 8:45 AM, "Arindam Choudhury" <ar...@gmail.com> wrote:
> 
> Sorry. But I can you give me a example.
> 
> On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <ha...@cloudera.com> wrote:
> 
>> Arindam,
>> 
>> If your machine can access the clusters' NN/JT/DN ports, then you can
>> simply run your job from the machine itself.
>> 
>> On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury
>> <ar...@gmail.com> wrote:
>>> "If you are allowed a remote connection to the cluster's service ports,
>>> then you can directly submit your jobs from your local CLI. Just make
>>> sure your local configuration points to the right locations."
>>> 
>>> Can you elaborate in details please?
>>> 
>>> On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <ha...@cloudera.com> wrote:
>>> 
>>>> If you are allowed a remote connection to the cluster's service ports,
>>>> then you can directly submit your jobs from your local CLI. Just make
>>>> sure your local configuration points to the right locations.
>>>> 
>>>> Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
>>>> (http://incubator.apache.org/oozie/) It does provide a REST interface
>>>> that launches jobs up for you over the supplied clusters, but its more
>>>> oriented towards workflow management or perhaps HUE:
>>>> https://github.com/cloudera/hue
>>>> 
>>>> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
>>>> <ar...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> Do hadoop have any web service or other interface so I can submit jobs
>>>> from
>>>>> remote machine?
>>>>> 
>>>>> Thanks,
>>>>> Arindam
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>>> 
>> 
>> 
>> 
>> --
>> Harsh J
>> 
> 

RE: remote job submission

Posted by Amith D K <am...@huawei.com>.
I dont know your use case if its for test and
ssh across the machine are disabled then u write a script that can do ssh run the jobs using cli for running your jobs. U can check ssh usage.

Or else use Ooze
________________________________________
From: Robert Evans [evans@yahoo-inc.com]
Sent: Friday, April 20, 2012 11:17 PM
To: common-user@hadoop.apache.org
Subject: Re: remote job submission

You can use Oozie to do it.


On 4/20/12 8:45 AM, "Arindam Choudhury" <ar...@gmail.com> wrote:

Sorry. But I can you give me a example.

On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <ha...@cloudera.com> wrote:

> Arindam,
>
> If your machine can access the clusters' NN/JT/DN ports, then you can
> simply run your job from the machine itself.
>
> On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury
> <ar...@gmail.com> wrote:
> > "If you are allowed a remote connection to the cluster's service ports,
> > then you can directly submit your jobs from your local CLI. Just make
> > sure your local configuration points to the right locations."
> >
> > Can you elaborate in details please?
> >
> > On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <ha...@cloudera.com> wrote:
> >
> >> If you are allowed a remote connection to the cluster's service ports,
> >> then you can directly submit your jobs from your local CLI. Just make
> >> sure your local configuration points to the right locations.
> >>
> >> Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
> >> (http://incubator.apache.org/oozie/) It does provide a REST interface
> >> that launches jobs up for you over the supplied clusters, but its more
> >> oriented towards workflow management or perhaps HUE:
> >> https://github.com/cloudera/hue
> >>
> >> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
> >> <ar...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > Do hadoop have any web service or other interface so I can submit jobs
> >> from
> >> > remote machine?
> >> >
> >> > Thanks,
> >> > Arindam
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>


Re: remote job submission

Posted by Robert Evans <ev...@yahoo-inc.com>.
You can use Oozie to do it.


On 4/20/12 8:45 AM, "Arindam Choudhury" <ar...@gmail.com> wrote:

Sorry. But I can you give me a example.

On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <ha...@cloudera.com> wrote:

> Arindam,
>
> If your machine can access the clusters' NN/JT/DN ports, then you can
> simply run your job from the machine itself.
>
> On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury
> <ar...@gmail.com> wrote:
> > "If you are allowed a remote connection to the cluster's service ports,
> > then you can directly submit your jobs from your local CLI. Just make
> > sure your local configuration points to the right locations."
> >
> > Can you elaborate in details please?
> >
> > On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <ha...@cloudera.com> wrote:
> >
> >> If you are allowed a remote connection to the cluster's service ports,
> >> then you can directly submit your jobs from your local CLI. Just make
> >> sure your local configuration points to the right locations.
> >>
> >> Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
> >> (http://incubator.apache.org/oozie/) It does provide a REST interface
> >> that launches jobs up for you over the supplied clusters, but its more
> >> oriented towards workflow management or perhaps HUE:
> >> https://github.com/cloudera/hue
> >>
> >> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
> >> <ar...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > Do hadoop have any web service or other interface so I can submit jobs
> >> from
> >> > remote machine?
> >> >
> >> > Thanks,
> >> > Arindam
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>


Re: remote job submission

Posted by Arindam Choudhury <ar...@gmail.com>.
Sorry. But I can you give me a example.

On Fri, Apr 20, 2012 at 3:08 PM, Harsh J <ha...@cloudera.com> wrote:

> Arindam,
>
> If your machine can access the clusters' NN/JT/DN ports, then you can
> simply run your job from the machine itself.
>
> On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury
> <ar...@gmail.com> wrote:
> > "If you are allowed a remote connection to the cluster's service ports,
> > then you can directly submit your jobs from your local CLI. Just make
> > sure your local configuration points to the right locations."
> >
> > Can you elaborate in details please?
> >
> > On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <ha...@cloudera.com> wrote:
> >
> >> If you are allowed a remote connection to the cluster's service ports,
> >> then you can directly submit your jobs from your local CLI. Just make
> >> sure your local configuration points to the right locations.
> >>
> >> Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
> >> (http://incubator.apache.org/oozie/) It does provide a REST interface
> >> that launches jobs up for you over the supplied clusters, but its more
> >> oriented towards workflow management or perhaps HUE:
> >> https://github.com/cloudera/hue
> >>
> >> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
> >> <ar...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > Do hadoop have any web service or other interface so I can submit jobs
> >> from
> >> > remote machine?
> >> >
> >> > Thanks,
> >> > Arindam
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: remote job submission

Posted by Harsh J <ha...@cloudera.com>.
Arindam,

If your machine can access the clusters' NN/JT/DN ports, then you can
simply run your job from the machine itself.

On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury
<ar...@gmail.com> wrote:
> "If you are allowed a remote connection to the cluster's service ports,
> then you can directly submit your jobs from your local CLI. Just make
> sure your local configuration points to the right locations."
>
> Can you elaborate in details please?
>
> On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> If you are allowed a remote connection to the cluster's service ports,
>> then you can directly submit your jobs from your local CLI. Just make
>> sure your local configuration points to the right locations.
>>
>> Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
>> (http://incubator.apache.org/oozie/) It does provide a REST interface
>> that launches jobs up for you over the supplied clusters, but its more
>> oriented towards workflow management or perhaps HUE:
>> https://github.com/cloudera/hue
>>
>> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
>> <ar...@gmail.com> wrote:
>> > Hi,
>> >
>> > Do hadoop have any web service or other interface so I can submit jobs
>> from
>> > remote machine?
>> >
>> > Thanks,
>> > Arindam
>>
>>
>>
>> --
>> Harsh J
>>



-- 
Harsh J

Re: remote job submission

Posted by Arindam Choudhury <ar...@gmail.com>.
"If you are allowed a remote connection to the cluster's service ports,
then you can directly submit your jobs from your local CLI. Just make
sure your local configuration points to the right locations."

Can you elaborate in details please?

On Fri, Apr 20, 2012 at 2:20 PM, Harsh J <ha...@cloudera.com> wrote:

> If you are allowed a remote connection to the cluster's service ports,
> then you can directly submit your jobs from your local CLI. Just make
> sure your local configuration points to the right locations.
>
> Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
> (http://incubator.apache.org/oozie/) It does provide a REST interface
> that launches jobs up for you over the supplied clusters, but its more
> oriented towards workflow management or perhaps HUE:
> https://github.com/cloudera/hue
>
> On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
> <ar...@gmail.com> wrote:
> > Hi,
> >
> > Do hadoop have any web service or other interface so I can submit jobs
> from
> > remote machine?
> >
> > Thanks,
> > Arindam
>
>
>
> --
> Harsh J
>

Re: remote job submission

Posted by Harsh J <ha...@cloudera.com>.
If you are allowed a remote connection to the cluster's service ports,
then you can directly submit your jobs from your local CLI. Just make
sure your local configuration points to the right locations.

Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
(http://incubator.apache.org/oozie/) It does provide a REST interface
that launches jobs up for you over the supplied clusters, but its more
oriented towards workflow management or perhaps HUE:
https://github.com/cloudera/hue

On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
<ar...@gmail.com> wrote:
> Hi,
>
> Do hadoop have any web service or other interface so I can submit jobs from
> remote machine?
>
> Thanks,
> Arindam



-- 
Harsh J