You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2009/11/26 02:33:39 UTC

[jira] Created: (HADOOP-6396) Provide a description in the exception when an error is encountered parsing umask

Provide a description in the exception when an error is encountered parsing umask
---------------------------------------------------------------------------------

                 Key: HADOOP-6396
                 URL: https://issues.apache.org/jira/browse/HADOOP-6396
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.21.0, 0.22.0
            Reporter: Jakob Homan
            Assignee: Jakob Homan
             Fix For: 0.21.0, 0.22.0


Currently when there is a problem parsing a umask, the exception text is just the offending umask with no other clue, which can be quite confusing as demonstrated in HDFS-763.  This message should include the nature of the problem and whether or not the umask parsing attempt was using old-style or new-style values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Pipes with the latest version of mapReduce

Posted by Upendra Dadi <ud...@gmu.edu>.

Hi,
  I am trying to use MapReduce Pipes with version 0.20.1. of hadoop. If I 
used  org.apache.hadoop.mapreduce.lib.input.TextInputFormat as the input 
format with the             -InputFormat option, I am getting the following 
error:
    Exception in thread "main" java.lang.ClassCastException: class 
org.apache.hadoop.mapreduce.lib.input.TextInputFormat
     at java.lang.Class.asSubclass(Class.java:3018)
     at 
org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:372)
     at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:421)
     at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)

I was able to run Pipes without the -InputFormat option, but my requirement 
is to extend TextInputFormat.  I guess Pipes is assuming the inputFormat  to 
be of type org.apache.hadoop.mapred.InputFormat, which is deprecated. So I 
guess 0.20.1 is not fully ported to work with Pipes. Am I right here? Which 
version should I use? Thanks.

Upendra



----- Original Message ----- 
From: "Erez Katz" <er...@yahoo.com>
To: <co...@hadoop.apache.org>
Sent: Friday, November 27, 2009 6:59 PM
Subject: Re: mapreduce with non-text data


> you could always base64 encode your binary data...
>
>
>
> --- On Thu, 11/26/09, Upendra Dadi <ud...@gmu.edu> wrote:
>
>> From: Upendra Dadi <ud...@gmu.edu>
>> Subject: mapreduce with non-text data
>> To: common-dev@hadoop.apache.org
>> Date: Thursday, November 26, 2009, 5:07 PM
>> Hi,
>>  Are there any use cases, examples of use of Hadoop
>> MapReduce for non-text data? Only examples that I see on the
>> web are for text data. Any pointers in that direction is
>> greatly appreciated. Thanks.
>>
>> Regards,
>> Upendra
>>
>
>
>

Re: mapreduce with non-text data

Posted by Erez Katz <er...@yahoo.com>.

you could always base64 encode your binary data...



--- On Thu, 11/26/09, Upendra Dadi <ud...@gmu.edu> wrote:

> From: Upendra Dadi <ud...@gmu.edu>
> Subject: mapreduce with non-text data
> To: common-dev@hadoop.apache.org
> Date: Thursday, November 26, 2009, 5:07 PM
> Hi,
>  Are there any use cases, examples of use of Hadoop
> MapReduce for non-text data? Only examples that I see on the
> web are for text data. Any pointers in that direction is
> greatly appreciated. Thanks.
> 
> Regards,
> Upendra 
>

mapreduce with non-text data

Posted by Upendra Dadi <ud...@gmu.edu>.

Hi,
  Are there any use cases, examples of use of Hadoop MapReduce for non-text 
data? Only examples that I see on the web are for text data. Any pointers in 
that direction is greatly appreciated. Thanks.

Regards,
Upendra

Re: how to set one map task for each input key-value pair

Posted by Jason Venner <ja...@gmail.com>.

I don't have direct experience with setMaxInputSplitSize. The process sounds
feasible.


On Fri, Nov 27, 2009 at 4:34 AM, Upendra Dadi <ud...@gmu.edu> wrote:

> Thank you Jason! How about if I fix the size of each record to the size of
> the largest record by adding dummy characters to the rest of the records and
> then set the setMaxInputSplitSize() and setMinInputSplitSize() of
> FileInputFormat class to this value? The mapper will extract the input after
> ignoring the dummy characters. Do you think this could work? Thanks.
>
> Regards,
> Upendra
>
>
> ----- Original Message ----- From: "Jason Venner" <ja...@gmail.com>
> To: <co...@hadoop.apache.org>
> Sent: Friday, November 27, 2009 12:06 AM
> Subject: Re: how to set one map task for each input key-value pair
>
>
>
>  The only thing that comes immediately to mind is to write your own custom
>> input format that knows how to tell where the boundaries are in your data
>> set, and uses those to specify the beginning and end of the input splits.
>>
>> You can also tell the framework not to split your individual input files
>> by
>> setting the minimum input split size (mapred.min.split.size) to
>> Long.MAX_VALUE
>>
>> On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <ud...@gmu.edu> wrote:
>>
>>  Hi,
>>>  I am trying to use MapReduce with some scientific data. I have key-value
>>> pairs such that the size of the value can range from few megabytes to
>>> several hundreds of megabytes. What happens when the size of the value
>>> exceeds block size? How do I set it up so that each key-value pair is
>>> associated with a seperate map? Please some one help. Thanks.
>>>
>>> Regards,
>>> Upendra
>>>
>>>
>>
>>
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
>>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: how to set one map task for each input key-value pair

Posted by Upendra Dadi <ud...@gmu.edu>.

Thank you Jason! How about if I fix the size of each record to the size of 
the largest record by adding dummy characters to the rest of the records and 
then set the setMaxInputSplitSize() and setMinInputSplitSize() of 
FileInputFormat class to this value? The mapper will extract the input after 
ignoring the dummy characters. Do you think this could work? Thanks.

Regards,
Upendra


----- Original Message ----- 
From: "Jason Venner" <ja...@gmail.com>
To: <co...@hadoop.apache.org>
Sent: Friday, November 27, 2009 12:06 AM
Subject: Re: how to set one map task for each input key-value pair


> The only thing that comes immediately to mind is to write your own custom
> input format that knows how to tell where the boundaries are in your data
> set, and uses those to specify the beginning and end of the input splits.
>
> You can also tell the framework not to split your individual input files 
> by
> setting the minimum input split size (mapred.min.split.size) to
> Long.MAX_VALUE
>
> On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <ud...@gmu.edu> wrote:
>
>> Hi,
>>  I am trying to use MapReduce with some scientific data. I have key-value
>> pairs such that the size of the value can range from few megabytes to
>> several hundreds of megabytes. What happens when the size of the value
>> exceeds block size? How do I set it up so that each key-value pair is
>> associated with a seperate map? Please some one help. Thanks.
>>
>> Regards,
>> Upendra
>>
>
>
>
> -- 
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Re: how to set one map task for each input key-value pair

Posted by Jason Venner <ja...@gmail.com>.

The only thing that comes immediately to mind is to write your own custom
input format that knows how to tell where the boundaries are in your data
set, and uses those to specify the beginning and end of the input splits.

You can also tell the framework not to split your individual input files by
setting the minimum input split size (mapred.min.split.size) to
Long.MAX_VALUE

On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <ud...@gmu.edu> wrote:

> Hi,
>  I am trying to use MapReduce with some scientific data. I have key-value
> pairs such that the size of the value can range from few megabytes to
> several hundreds of megabytes. What happens when the size of the value
> exceeds block size? How do I set it up so that each key-value pair is
> associated with a seperate map? Please some one help. Thanks.
>
> Regards,
> Upendra
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

how to set one map task for each input key-value pair

Posted by Upendra Dadi <ud...@gmu.edu>.

Hi,
  I am trying to use MapReduce with some scientific data. I have key-value 
pairs such that the size of the value can range from few megabytes to 
several hundreds of megabytes. What happens when the size of the value 
exceeds block size? How do I set it up so that each key-value pair is 
associated with a seperate map? Please some one help. Thanks.

Regards,
Upendra

RE: Environment for Hadoop Proto Typing - Amazon Web Services

Posted by "Sirota, Peter" <si...@amazon.com>.

You can use Amazon Elastic MapReduce. It is a hosted Hadoop service which charges by the hour.  Here is more information:
http://aws.amazon.com/elasticmapreduce/ 

There are also a bunch of sample applications and tutorials available here:
http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=265 

Disclaimer:  I work for Amazon Web Services.

Best Regards,
Peter-

-----Original Message-----
From: Jeff Zhang [mailto:zjffdu@gmail.com] 
Sent: Thursday, November 26, 2009 4:22 PM
To: common-dev@hadoop.apache.org
Subject: Re: Environment for Hadoop Proto Typing - Amazon Web Services

Amazon EC2 will charge you by hours, so I think it will fit for your
requirement.

Jeff Zhang

On Thu, Nov 26, 2009 at 1:42 PM, Palikala, Rajendra (CCL) <
RPalikala@carnival.com> wrote:

>
> Hi,
>
> I am planning to develop some proto-types on Hadoop for ETL to a
> datwarehouse. But I don't have enough nodes (hardware/computers) to test the
> performance of Hadoop. I want to give a demo on performance. I heard of
> Amazon Web Services that they provide some services like this. But I am not
> sure. Any one from the community knows any companies where they provide
> these hardware on a monthly rent basis. Please advise.
>
> Thks,
> Rajendra

Re: Environment for Hadoop Proto Typing - Amazon Web Services

Posted by Jeff Zhang <zj...@gmail.com>.

Amazon EC2 will charge you by hours, so I think it will fit for your
requirement.


Jeff Zhang


On Thu, Nov 26, 2009 at 1:42 PM, Palikala, Rajendra (CCL) <
RPalikala@carnival.com> wrote:

>
> Hi,
>
> I am planning to develop some proto-types on Hadoop for ETL to a
> datwarehouse. But I don't have enough nodes (hardware/computers) to test the
> performance of Hadoop. I want to give a demo on performance. I heard of
> Amazon Web Services that they provide some services like this. But I am not
> sure. Any one from the community knows any companies where they provide
> these hardware on a monthly rent basis. Please advise.
>
> Thks,
> Rajendra

Environment for Hadoop Proto Typing - Amazon Web Services

Posted by "Palikala, Rajendra (CCL)" <RP...@carnival.com>.

Hi,

I am planning to develop some proto-types on Hadoop for ETL to a datwarehouse. But I don't have enough nodes (hardware/computers) to test the performance of Hadoop. I want to give a demo on performance. I heard of Amazon Web Services that they provide some services like this. But I am not sure. Any one from the community knows any companies where they provide these hardware on a monthly rent basis. Please advise.

Thks,
Rajendra