You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Trinh Tuan Cuong <tr...@luvina.net> on 2008/09/24 10:50:33 UTC

Question about Hadoop 's Feature(s)

Hi,

 

We are developing a project and we are intend to use Hadoop to handle the processing vast amount of data. But to convince our customers about the using of Hadoop in our project, we must show them the advantages ( and maybe ? the disadvantage ) when deploy the project with Hadoop compare to Oracle Database Platform.

 

 So I would like to have a full feature(s) list of Hadoop, which would describe what features are integrated in the lastest version of Hadoop ( 0.18.1). Especially  features related to the area of manipulating database – support user facilitating the use of database – and maybe some features about security,platform supports which are not related to the manipulating database process.

 

P.S : I did google and yahoo the features list of Hadoop for several days but no clues or just a small features of HDFS or MapReduce, Hadoop on Demand stand-alone, what I really want is the completed feature list of the leastest version. Thanks in advance if any help, or links.

 

Best Regards,

 

Trịnh Tuấn Cường

 

Luvina Software Company

Website : www.luvina.net

 

Address : 1001 Hoang Quoc Viet Street

Email : trinhtuancuong@luvina.net,niha9088@yahoo.com

Mobile : 097 4574 457

Re: Question about Hadoop 's Feature(s)

Posted by Jason Rutherglen <ja...@gmail.com>.

> However, HDFS uses HTTP to serve blocks up -that needs to be locked down
>  too. Would the signing work there?

I am not familiar with HDFS over HTTP.  Could it simply sign the
stream and include the signature at the end of the HTTP message
returned?

On Tue, Sep 30, 2008 at 8:56 AM, Steve Loughran <st...@apache.org> wrote:
> Jason Rutherglen wrote:
>>
>> I implemented an RMI protocol using Hadoop IPC and implemented basic
>> HMAC signing.  It is I believe faster than public key private key
>> because it uses a secret key and does not require public key
>> provisioning like PKI would.  Perhaps it would be a baseline way to
>> sign the data.
>
> That should work for authenticating messages between (trusted) nodes.
> Presumably the ipc.key value could be set in the Conf and all would be well.
>
> External job submitters shouldn't be given those keys; they'd need an
> HTTP(S) front end that could authenticate them however the organisation
> worked.
>
> Yes, that would be simpler. I am not enough of a security expert to say if
> it will work, but the keys should be easier to work with. As long as the
> configuration files are kept secure, your cluster will be locked.
>
> However, HDFS uses HTTP to serve blocks up -that needs to be locked down
>  too. Would the signing work there?
>
> -steve
>

Re: Question about Hadoop 's Feature(s)

Posted by Steve Loughran <st...@apache.org>.

Jason Rutherglen wrote:
> I implemented an RMI protocol using Hadoop IPC and implemented basic
> HMAC signing.  It is I believe faster than public key private key
> because it uses a secret key and does not require public key
> provisioning like PKI would.  Perhaps it would be a baseline way to
> sign the data.

That should work for authenticating messages between (trusted) nodes. 
Presumably the ipc.key value could be set in the Conf and all would be well.

External job submitters shouldn't be given those keys; they'd need an 
HTTP(S) front end that could authenticate them however the organisation 
worked.

Yes, that would be simpler. I am not enough of a security expert to say 
if it will work, but the keys should be easier to work with. As long as 
the configuration files are kept secure, your cluster will be locked.

However, HDFS uses HTTP to serve blocks up -that needs to be locked down 
  too. Would the signing work there?

-steve

Re: Question about Hadoop 's Feature(s)

Posted by Jason Rutherglen <ja...@gmail.com>.

I implemented an RMI protocol using Hadoop IPC and implemented basic
HMAC signing.  It is I believe faster than public key private key
because it uses a secret key and does not require public key
provisioning like PKI would.  Perhaps it would be a baseline way to
sign the data.

On Thu, Sep 25, 2008 at 7:47 AM, Steve Loughran <st...@apache.org> wrote:
> Owen O'Malley wrote:
>>
>> On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote:
>>
>>> We are developing a project and we are intend to use Hadoop to handle the
>>> processing vast amount of data. But to convince our customers about the
>>> using of Hadoop in our project, we must show them the advantages ( and maybe
>>> ? the disadvantage ) when deploy the project with Hadoop compare to Oracle
>>> Database Platform.
>>
>> The primary advantage of Hadoop is scalability. On an equivalent hardware
>> budget, Hadoop can handle much much larger databases. We had a process that
>> was run once a week on Oracle that is now run once an hour on Hadoop.
>> Additionally, Hadoop scales out much much farther. We can store petabytes of
>> data in a single Hadoop cluster and have jobs that read and generate 100's
>> of terabytes.
>
> That said, what a database gives you -on the right hardware- is very fast
> responses, especially if the indices are set up right and the data
> denormalised when appropriate. There is also really good integration with
> tools and application servers, with things like Java EE designed to make
> running code against a database easy.
>
> Not using Oracle means you don't have to work with an Oracle DBA, which, in
> my experience, can only be a good thing. DBAs and developers never seem to
> see eye-to-eye.
>
>
>>
>>  Hadoop only has very primitive security at the moment, although I expect
>> that to change in the next 6 months.
>>
>
> Right now you need to trust everyone else on the network where you run
> hadoop to not be malicious; the filesystem and job tracker interfaces are
> insecure. The forthcoming 0.19 release will ask who you are, but the far end
> trusts you to be who you say you are. In that respect, it's as secure as NFS
> over UDP.
>
> To secure Hadoop you'd probably need to
>  -sign every IPC request, with a CPU time cost at both ends.
>  -require some form of authentication for the HTTP exported parts of the
> system, such as digest authentication, or issue lots of HTTPS private keys
> and use that instead. Giving everyone a key management problem as well as
> extra communications overhead.
>
> What is easier would be to lock down remote access to the filesystem/job
> submission so that only authenticated users would be able to upload jobs and
> data. The cluster would continue to trust everything else on its network,
> but the system doesn't trust people to submit work unless they could prove
> who they were.
>
>

Re: Question about Hadoop 's Feature(s)

Posted by Steve Loughran <st...@apache.org>.

Owen O'Malley wrote:
> On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote:
> 
>> We are developing a project and we are intend to use Hadoop to handle 
>> the processing vast amount of data. But to convince our customers 
>> about the using of Hadoop in our project, we must show them the 
>> advantages ( and maybe ? the disadvantage ) when deploy the project 
>> with Hadoop compare to Oracle Database Platform.
> 
> The primary advantage of Hadoop is scalability. On an equivalent 
> hardware budget, Hadoop can handle much much larger databases. We had a 
> process that was run once a week on Oracle that is now run once an hour 
> on Hadoop. Additionally, Hadoop scales out much much farther. We can 
> store petabytes of data in a single Hadoop cluster and have jobs that 
> read and generate 100's of terabytes.

That said, what a database gives you -on the right hardware- is very 
fast responses, especially if the indices are set up right and the data 
denormalised when appropriate. There is also really good integration 
with tools and application servers, with things like Java EE designed to 
make running code against a database easy.

Not using Oracle means you don't have to work with an Oracle DBA, which, 
in my experience, can only be a good thing. DBAs and developers never 
seem to see eye-to-eye.

> 
>  Hadoop only has very primitive 
> security at the moment, although I expect that to change in the next 6 
> months.
> 

Right now you need to trust everyone else on the network where you run 
hadoop to not be malicious; the filesystem and job tracker interfaces 
are insecure. The forthcoming 0.19 release will ask who you are, but the 
far end trusts you to be who you say you are. In that respect, it's as 
secure as NFS over UDP.

To secure Hadoop you'd probably need to
  -sign every IPC request, with a CPU time cost at both ends.
  -require some form of authentication for the HTTP exported parts of 
the system, such as digest authentication, or issue lots of HTTPS 
private keys and use that instead. Giving everyone a key management 
problem as well as extra communications overhead.

What is easier would be to lock down remote access to the filesystem/job 
submission so that only authenticated users would be able to upload jobs 
and data. The cluster would continue to trust everything else on its 
network, but the system doesn't trust people to submit work unless they 
could prove who they were.

[Hadoop NY User Group Meetup] HIVE: Data Warehousing using Hadoop 10/9

Posted by Alex Dorman <ad...@contextweb.com>.

Next NY Hadoop meetup will take place on Thursday, 10/9 at 6:30 pm.

Jeff Hammerbacher will present HIVE: Data Warehousing using Hadoop.

About HIVE:
- Data Organization into Tables with logical and hash partitioning 
- A Metastore to store metadata about Tables/Partitions etc 
- A SQL like query language over object data stored in Tables 
- DDL commands to define and load external data into tables

About the speaker:
Jeff Hammerbacher conceived, built, and led the Data team at Facebook.
The Data team was responsible for driving many of the applications of
statistics and machine learning at Facebook, as well as building out the
infrastructure to support these tasks for massive data sets. The team
produced two open source projects: Hive, a system for offline analysis
built above Hadoop, and Cassandra, a structured storage system on a P2P
network. Before joining Facebook, Jeff wore a suit on Wall Street and id
Mathematics at Harvard.
Currently Jeff is an Entrepreneur in Residence at Accel Partners.

Location 
ContextWeb, 9th floor  
22 Cortlandt Street
New York, NY 10007 

If you are interested, RSVP here:
http://softwaredev.meetup.com/110/calendar/8881385/

-Alex

Re: Could not find any valid local directory for task_200809041356_0042_r_000000_2/intermediate.9

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.

check that you are not getting disk full errors

Miles

2008/9/29 Elia Mazzawi <el...@casalemedia.com>:
> in more detail, my program is happily chugging along until the reducer fails
> with that exception, then it looks like it retries and fails by itself.
> the same hadoop program works fine on a subset of the data.
> I'm rerunning on all the subsets to see if there is anything in the data
> that is causing this,
>
> but can someone explain what this error means,
> I'm runnig hadoop 17.0 maybe its time to update
>
>
> 08/09/27 05:59:01 INFO mapred.JobClient:  map 96% reduce 31%
> 08/09/27 06:02:17 INFO mapred.JobClient:  map 96% reduce 32%
> 08/09/27 06:24:46 INFO mapred.JobClient:  map 97% reduce 32%
> 08/09/27 06:49:38 INFO mapred.JobClient:  map 98% reduce 32%
> 08/09/27 07:14:12 INFO mapred.JobClient:  map 99% reduce 32%
> 08/09/27 07:17:09 INFO mapred.JobClient:  map 99% reduce 33%
> 08/09/27 07:37:50 INFO mapred.JobClient:  map 100% reduce 33%
> 08/09/27 07:56:11 INFO mapred.JobClient:  map 100% reduce 0%
> 08/09/27 07:56:11 INFO mapred.JobClient: Task Id :
> task_200809041356_0042_r_000000_2, Status : FAILED
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for task_200809041356_0042_r_000000_2/intermediate.9
>       at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313)
>       at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
>       at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2851)
>       at
> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2586)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:352)
>       at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
>
> 08/09/27 07:57:10 INFO mapred.JobClient:  map 100% reduce 1%
> 08/09/27 07:57:55 INFO mapred.JobClient:  map 100% reduce 2%
> 08/09/27 07:58:46 INFO mapred.JobClient:  map 100% reduce 3%
> 08/09/27 07:59:36 INFO mapred.JobClient:  map 100% reduce 4%
> 08/09/27 08:00:26 INFO mapred.JobClient:  map 100% reduce 5%
> 08/09/27 08:01:16 INFO mapred.JobClient:  map 100% reduce 6%
> 08/09/27 08:02:06 INFO mapred.JobClient:  map 100% reduce 7%
> 08/09/27 08:02:55 INFO mapred.JobClient:  map 100% reduce 8%
> 08/09/27 08:03:45 INFO mapred.JobClient:  map 100% reduce 9%
> 08/09/27 08:04:36 INFO mapred.JobClient:  map 100% reduce 10%
> 08/09/27 08:05:26 INFO mapred.JobClient:  map 100% reduce 11%
> 08/09/27 08:06:18 INFO mapred.JobClient:  map 100% reduce 12%
> 08/09/27 08:07:09 INFO mapred.JobClient:  map 100% reduce 13%
> 08/09/27 08:08:00 INFO mapred.JobClient:  map 100% reduce 14%
> 08/09/27 08:08:50 INFO mapred.JobClient:  map 100% reduce 15%
> 08/09/27 08:09:45 INFO mapred.JobClient:  map 100% reduce 16%
> 08/09/27 08:10:31 INFO mapred.JobClient:  map 100% reduce 17%
> 08/09/27 08:11:26 INFO mapred.JobClient:  map 100% reduce 18%
> 08/09/27 08:12:16 INFO mapred.JobClient:  map 100% reduce 19%
> 08/09/27 08:13:08 INFO mapred.JobClient:  map 100% reduce 20%
> 08/09/27 08:14:02 INFO mapred.JobClient:  map 100% reduce 21%
> 08/09/27 08:14:50 INFO mapred.JobClient:  map 100% reduce 22%
> 08/09/27 08:15:41 INFO mapred.JobClient:  map 100% reduce 23%
> 08/09/27 08:16:36 INFO mapred.JobClient:  map 100% reduce 24%
> 08/09/27 08:17:26 INFO mapred.JobClient:  map 100% reduce 25%
> 08/09/27 08:18:14 INFO mapred.JobClient:  map 100% reduce 26%
> 08/09/27 08:19:02 INFO mapred.JobClient:  map 100% reduce 27%
> 08/09/27 08:19:55 INFO mapred.JobClient:  map 100% reduce 28%
> 08/09/27 08:21:12 INFO mapred.JobClient:  map 100% reduce 71%
> java.io.IOException: Job failed!
>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
>       at org.myorg.binAnalysis.main(binAnalysis.java:99)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:585)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>       at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
>
> You have new mail in /var/spool/mail/root
>
>
> Elia Mazzawi wrote:
>>
>> what does this exception mean?
>>
>> 08/09/27 07:56:11 INFO mapred.JobClient: Task Id :
>> task_200809041356_0042_r_000000_2, Status : FAILED
>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
>> valid local directory for task_200809041356_0042_r_000000_2/intermediate.9
>>       at
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313)
>>       at
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
>>       at
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2851)
>>       at
>> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2586)
>>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:352)
>>       at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
>>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: Could not find any valid local directory for task_200809041356_0042_r_000000_2/intermediate.9

Posted by Elia Mazzawi <el...@casalemedia.com>.

in more detail, my program is happily chugging along until the reducer 
fails with that exception, then it looks like it retries and fails by 
itself.
the same hadoop program works fine on a subset of the data.
I'm rerunning on all the subsets to see if there is anything in the data 
that is causing this,

but can someone explain what this error means,
I'm runnig hadoop 17.0 maybe its time to update


08/09/27 05:59:01 INFO mapred.JobClient:  map 96% reduce 31%
08/09/27 06:02:17 INFO mapred.JobClient:  map 96% reduce 32%
08/09/27 06:24:46 INFO mapred.JobClient:  map 97% reduce 32%
08/09/27 06:49:38 INFO mapred.JobClient:  map 98% reduce 32%
08/09/27 07:14:12 INFO mapred.JobClient:  map 99% reduce 32%
08/09/27 07:17:09 INFO mapred.JobClient:  map 99% reduce 33%
08/09/27 07:37:50 INFO mapred.JobClient:  map 100% reduce 33%
08/09/27 07:56:11 INFO mapred.JobClient:  map 100% reduce 0%
08/09/27 07:56:11 INFO mapred.JobClient: Task Id : 
task_200809041356_0042_r_000000_2, Status : FAILED
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
any valid local directory for 
task_200809041356_0042_r_000000_2/intermediate.9
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at 
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2851)
        at 
org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2586)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:352)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

08/09/27 07:57:10 INFO mapred.JobClient:  map 100% reduce 1%
08/09/27 07:57:55 INFO mapred.JobClient:  map 100% reduce 2%
08/09/27 07:58:46 INFO mapred.JobClient:  map 100% reduce 3%
08/09/27 07:59:36 INFO mapred.JobClient:  map 100% reduce 4%
08/09/27 08:00:26 INFO mapred.JobClient:  map 100% reduce 5%
08/09/27 08:01:16 INFO mapred.JobClient:  map 100% reduce 6%
08/09/27 08:02:06 INFO mapred.JobClient:  map 100% reduce 7%
08/09/27 08:02:55 INFO mapred.JobClient:  map 100% reduce 8%
08/09/27 08:03:45 INFO mapred.JobClient:  map 100% reduce 9%
08/09/27 08:04:36 INFO mapred.JobClient:  map 100% reduce 10%
08/09/27 08:05:26 INFO mapred.JobClient:  map 100% reduce 11%
08/09/27 08:06:18 INFO mapred.JobClient:  map 100% reduce 12%
08/09/27 08:07:09 INFO mapred.JobClient:  map 100% reduce 13%
08/09/27 08:08:00 INFO mapred.JobClient:  map 100% reduce 14%
08/09/27 08:08:50 INFO mapred.JobClient:  map 100% reduce 15%
08/09/27 08:09:45 INFO mapred.JobClient:  map 100% reduce 16%
08/09/27 08:10:31 INFO mapred.JobClient:  map 100% reduce 17%
08/09/27 08:11:26 INFO mapred.JobClient:  map 100% reduce 18%
08/09/27 08:12:16 INFO mapred.JobClient:  map 100% reduce 19%
08/09/27 08:13:08 INFO mapred.JobClient:  map 100% reduce 20%
08/09/27 08:14:02 INFO mapred.JobClient:  map 100% reduce 21%
08/09/27 08:14:50 INFO mapred.JobClient:  map 100% reduce 22%
08/09/27 08:15:41 INFO mapred.JobClient:  map 100% reduce 23%
08/09/27 08:16:36 INFO mapred.JobClient:  map 100% reduce 24%
08/09/27 08:17:26 INFO mapred.JobClient:  map 100% reduce 25%
08/09/27 08:18:14 INFO mapred.JobClient:  map 100% reduce 26%
08/09/27 08:19:02 INFO mapred.JobClient:  map 100% reduce 27%
08/09/27 08:19:55 INFO mapred.JobClient:  map 100% reduce 28%
08/09/27 08:21:12 INFO mapred.JobClient:  map 100% reduce 71%
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
        at org.myorg.binAnalysis.main(binAnalysis.java:99)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)

You have new mail in /var/spool/mail/root


Elia Mazzawi wrote:
> what does this exception mean?
>
> 08/09/27 07:56:11 INFO mapred.JobClient: Task Id : 
> task_200809041356_0042_r_000000_2, Status : FAILED
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> any valid local directory for 
> task_200809041356_0042_r_000000_2/intermediate.9
>        at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313) 
>
>        at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) 
>
>        at 
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2851) 
>
>        at 
> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2586)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:352)
>        at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
>

Could not find any valid local directory for task_200809041356_0042_r_000000_2/intermediate.9

Posted by Elia Mazzawi <el...@casalemedia.com>.

what does this exception mean?

08/09/27 07:56:11 INFO mapred.JobClient: Task Id : 
task_200809041356_0042_r_000000_2, Status : FAILED
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
any valid local directory for 
task_200809041356_0042_r_000000_2/intermediate.9
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at 
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2851)
        at 
org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2586)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:352)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

Re: what does this error mean

Posted by Elia Mazzawi <el...@casalemedia.com>.

Mikhail Yakshin wrote:
> On Wed, Sep 24, 2008 at 9:24 PM, Elia Mazzawi wrote:
>   
>> I got these errors I don't know what they mean, any help is appreciated.
>> I suspect that either its a H/W error or the cluster is out of space to
>> store intermediate results?
>> there is still lots of free space left on the cluster.
>>
>> 08/09/24 00:23:31 INFO mapred.JobClient:  map 79% reduce 24%
>> 08/09/24 00:24:53 INFO mapred.JobClient:  map 80% reduce 24%
>> 08/09/24 00:26:28 INFO mapred.JobClient:  map 80% reduce 0%
>> 08/09/24 00:26:28 INFO mapred.JobClient: Task Id :
>> task_200809041356_0037_r_000000_2, Status : FAILED
>> java.io.IOException: task_200809041356_0037_r_000000_2The reduce copier
>> failed
>>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
>>       at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
>>     
>
> Nope. In fact, it could be much more complicated as it gets %)
>
> Several possible causes of this error, as far as I know:
>
> * An error in partitioner (for example, look out for negative keys and
> default "key % numReducers" algorithm - strange Java modulo operator
> (%) yields negative values for negative keys, thus effectively sending
> output to non-existing reducers).
> * An error in group/key/value comparator.
>
>   

I only had 1 reducer running.
you think it is possibly a bug in my reducer code?

Re: what does this error mean

Posted by Mikhail Yakshin <gr...@gmail.com>.

On Wed, Sep 24, 2008 at 9:24 PM, Elia Mazzawi wrote:
> I got these errors I don't know what they mean, any help is appreciated.
> I suspect that either its a H/W error or the cluster is out of space to
> store intermediate results?
> there is still lots of free space left on the cluster.
>
> 08/09/24 00:23:31 INFO mapred.JobClient:  map 79% reduce 24%
> 08/09/24 00:24:53 INFO mapred.JobClient:  map 80% reduce 24%
> 08/09/24 00:26:28 INFO mapred.JobClient:  map 80% reduce 0%
> 08/09/24 00:26:28 INFO mapred.JobClient: Task Id :
> task_200809041356_0037_r_000000_2, Status : FAILED
> java.io.IOException: task_200809041356_0037_r_000000_2The reduce copier
> failed
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
>       at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

Nope. In fact, it could be much more complicated as it gets %)

Several possible causes of this error, as far as I know:

* An error in partitioner (for example, look out for negative keys and
default "key % numReducers" algorithm - strange Java modulo operator
(%) yields negative values for negative keys, thus effectively sending
output to non-existing reducers).
* An error in group/key/value comparator.

-- 
WBR, Mikhail Yakshin

what does this error mean

Posted by Elia Mazzawi <el...@casalemedia.com>.

I got these errors I don't know what they mean, any help is appreciated.
I suspect that either its a H/W error or the cluster is out of space to 
store intermediate results?
there is still lots of free space left on the cluster.


08/09/24 00:23:31 INFO mapred.JobClient:  map 79% reduce 24%
08/09/24 00:24:53 INFO mapred.JobClient:  map 80% reduce 24%
08/09/24 00:26:28 INFO mapred.JobClient:  map 80% reduce 0%
08/09/24 00:26:28 INFO mapred.JobClient: Task Id : 
task_200809041356_0037_r_000000_2, Status : FAILED
java.io.IOException: task_200809041356_0037_r_000000_2The reduce copier 
failed
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

08/09/24 00:27:30 INFO mapred.JobClient:  map 80% reduce 1%
08/09/24 00:28:34 INFO mapred.JobClient:  map 80% reduce 2%
08/09/24 00:29:38 INFO mapred.JobClient:  map 80% reduce 3%
08/09/24 00:31:04 INFO mapred.JobClient:  map 80% reduce 4%
08/09/24 00:32:21 INFO mapred.JobClient:  map 80% reduce 5%
08/09/24 00:33:47 INFO mapred.JobClient:  map 80% reduce 6%
08/09/24 00:35:12 INFO mapred.JobClient:  map 80% reduce 7%
08/09/24 00:36:19 INFO mapred.JobClient:  map 80% reduce 8%
08/09/24 00:37:58 INFO mapred.JobClient:  map 80% reduce 9%
08/09/24 00:39:17 INFO mapred.JobClient:  map 80% reduce 10%
08/09/24 00:40:43 INFO mapred.JobClient:  map 80% reduce 11%
08/09/24 00:42:01 INFO mapred.JobClient:  map 80% reduce 12%
08/09/24 00:43:27 INFO mapred.JobClient:  map 80% reduce 13%
08/09/24 00:44:35 INFO mapred.JobClient:  map 80% reduce 14%
08/09/24 00:45:40 INFO mapred.JobClient:  map 80% reduce 15%
08/09/24 00:46:38 INFO mapred.JobClient:  map 80% reduce 16%
08/09/24 00:47:32 INFO mapred.JobClient:  map 80% reduce 17%
08/09/24 00:49:44 INFO mapred.JobClient:  map 100% reduce 82%
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
        at org.myorg.binAnalysis.main(binAnalysis.java:91)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)

Re: Question about Hadoop 's Feature(s)

Posted by Mice <mi...@gmail.com>.

one of the major advantages Hadoop over Oracle: it saves you a lot of $$$

2008/9/25 Trinh Tuan Cuong <tr...@luvina.net>:
> Dear Mr/Mrs Owen O'Malley,
>
> First I would like to thank you much for your reply, it was somehow the
> exact answer which I expected. As I read about the Query Language of
> Hadoop, it is a combination of Pig_Pig Latin, Have,HBase,Jaql and
> more... and I could see that Hadoop have an advantage SQL-like query
> language. The most thing I was curous bout is Hadoop's security level
> which is hard to find in any documents I searched. Like many of your
> organization, we are believing in the fast growing of Hadoop and intend
> to use it in our serious projects. Once again, thanks for the reply, now
> I could tell our clients clearly about Hadoop.
>
> Best Regards.
>
> Tuan Cuong, Trinh.
> trinhtuancuong@luvina.net
> Luvina Software Company.
> Website : www.luvina.net
>
> -----Original Message-----
> From: Owen O'Malley [mailto:omalley@apache.org]
> Sent: Wednesday, September 24, 2008 11:27 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Question about Hadoop 's Feature(s)
>
> On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote:
>
>> We are developing a project and we are intend to use Hadoop to
>> handle the processing vast amount of data. But to convince our
>> customers about the using of Hadoop in our project, we must show
>> them the advantages ( and maybe ? the disadvantage ) when deploy the
>> project with Hadoop compare to Oracle Database Platform.
>
> The primary advantage of Hadoop is scalability. On an equivalent
> hardware budget, Hadoop can handle much much larger databases. We had
> a process that was run once a week on Oracle that is now run once an
> hour on Hadoop. Additionally, Hadoop scales out much much farther. We
> can store petabytes of data in a single Hadoop cluster and have jobs
> that read and generate 100's of terabytes.
>
> The disadvantage of Hadoop is that it is still relatively young and
> growing fast, so there are growing pains. Hadoop has recently gotten
> higher level query languages like SQL (Pig, Hive, and Jaql), but still
> doesn't have any fancy report generators. Hadoop only has very
> primitive security at the moment, although I expect that to change in
> the next 6 months.
>
> -- Owen
>
>
>

RE: Question about Hadoop 's Feature(s)

Posted by Trinh Tuan Cuong <tr...@luvina.net>.

Dear Mr/Mrs Owen O'Malley,

First I would like to thank you much for your reply, it was somehow the
exact answer which I expected. As I read about the Query Language of
Hadoop, it is a combination of Pig_Pig Latin, Have,HBase,Jaql and
more... and I could see that Hadoop have an advantage SQL-like query
language. The most thing I was curous bout is Hadoop's security level
which is hard to find in any documents I searched. Like many of your
organization, we are believing in the fast growing of Hadoop and intend
to use it in our serious projects. Once again, thanks for the reply, now
I could tell our clients clearly about Hadoop.

Best Regards.

Tuan Cuong, Trinh.
trinhtuancuong@luvina.net
Luvina Software Company.
Website : www.luvina.net

-----Original Message-----
From: Owen O'Malley [mailto:omalley@apache.org] 
Sent: Wednesday, September 24, 2008 11:27 PM
To: core-user@hadoop.apache.org
Subject: Re: Question about Hadoop 's Feature(s)

On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote:

> We are developing a project and we are intend to use Hadoop to  
> handle the processing vast amount of data. But to convince our  
> customers about the using of Hadoop in our project, we must show  
> them the advantages ( and maybe ? the disadvantage ) when deploy the  
> project with Hadoop compare to Oracle Database Platform.

The primary advantage of Hadoop is scalability. On an equivalent  
hardware budget, Hadoop can handle much much larger databases. We had  
a process that was run once a week on Oracle that is now run once an  
hour on Hadoop. Additionally, Hadoop scales out much much farther. We  
can store petabytes of data in a single Hadoop cluster and have jobs  
that read and generate 100's of terabytes.

The disadvantage of Hadoop is that it is still relatively young and  
growing fast, so there are growing pains. Hadoop has recently gotten  
higher level query languages like SQL (Pig, Hive, and Jaql), but still  
doesn't have any fancy report generators. Hadoop only has very  
primitive security at the moment, although I expect that to change in  
the next 6 months.

-- Owen

Re: Question about Hadoop 's Feature(s)

Posted by Owen O'Malley <om...@apache.org>.

On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote:

> We are developing a project and we are intend to use Hadoop to  
> handle the processing vast amount of data. But to convince our  
> customers about the using of Hadoop in our project, we must show  
> them the advantages ( and maybe ? the disadvantage ) when deploy the  
> project with Hadoop compare to Oracle Database Platform.

The primary advantage of Hadoop is scalability. On an equivalent  
hardware budget, Hadoop can handle much much larger databases. We had  
a process that was run once a week on Oracle that is now run once an  
hour on Hadoop. Additionally, Hadoop scales out much much farther. We  
can store petabytes of data in a single Hadoop cluster and have jobs  
that read and generate 100's of terabytes.

The disadvantage of Hadoop is that it is still relatively young and  
growing fast, so there are growing pains. Hadoop has recently gotten  
higher level query languages like SQL (Pig, Hive, and Jaql), but still  
doesn't have any fancy report generators. Hadoop only has very  
primitive security at the moment, although I expect that to change in  
the next 6 months.

-- Owen