You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2012/04/03 17:01:38 UTC
Re: Reduce output is strange
If I want to compare 2 sequence files to see if they are the same, how do I
compare?
On 19 December 2011 14:43, Robert Evans <ev...@yahoo-inc.com> wrote:
> Oh I forgot to say that part of the Random Characters are actually random
> characters. Sequence files store a set of random characters as synch
> points within the file. This allows for splitting the file easily without
> a high risk that the random sequence appears inside the data itself just by
> chance.
>
> --Bobby Evans
>
> On 12/19/11 7:51 AM, "Pedro Costa" <ps...@gmail.com> wrote:
>
> Hi,
>
> In the hadoop MapReduce, I've executed the webdatascan example, and the
> reduce output is in a SequeceFile. The result is shows here (
> http://paste.lisp.org/display/126572). What's the trash (random
> characters), like "u 265
> 0000100 330 320 252 " \n # ; 374 5 211 V ' 340 376" in the output? Is the
> output correct?
>
>
> 0000000 S E Q 006 031 o r g . a p a c h e .
> 0000020 h a d o o p . i o . T e x t 031 o
> 0000040 r g . a p a c h e . h a d o o p
> 0000060 . i o . T e x t \0 \0 \0 \0 \0 \0 u 265
> 0000100 330 320 252 " \n # ; 374 5 211 V ' 340 376 \0 \0
> 0000120 \0 X \0 \0 \0 037 a p p l e a p p
> 0000140 l e b a n a n a a p p l e
> 0000160 a p p l e 7 c a r r o t c a
> 0000200 r r o t c a r r o t c a r r
> 0000220 o t a p p l e b a n a n a
> 0000240 c a r r o t b a n a n a
> 0000256
>
>
> --
> Thanks,
>
>
--
Best regards,
Re: Reduce output is strange
Posted by Owen O'Malley <om...@apache.org>.
On Tue, Apr 3, 2012 at 8:25 AM, Pedro Costa <ps...@gmail.com> wrote:
> What I want to ask is:
>
> - how do I read the values from sequence files that are block, or record
> compressed, or uncompressed?
You use the SequenceFile.Reader class.
> - how do I know if the sequence file is block compressed, record
> compressed, or uncompressed?
You use the SequenceFile.Reader class.
>
> - how do I know if it's a sequence file or a Textfile?
SequenceFile's always have "SEQ" followed by the version in the first 4 bytes.
-- Owen
Re: Reduce output is strange
Posted by Pedro Costa <ps...@gmail.com>.
What I want to ask is:
- how do I read the values from sequence files that are block, or record
compressed, or uncompressed?
- how do I know if the sequence file is block compressed, record
compressed, or uncompressed?
- how do I know if it's a sequence file or a Textfile?
On 3 April 2012 16:01, Pedro Costa <ps...@gmail.com> wrote:
> If I want to compare 2 sequence files to see if they are the same, how do
> I compare?
>
>
>
> On 19 December 2011 14:43, Robert Evans <ev...@yahoo-inc.com> wrote:
>
>> Oh I forgot to say that part of the Random Characters are actually random
>> characters. Sequence files store a set of random characters as synch
>> points within the file. This allows for splitting the file easily without
>> a high risk that the random sequence appears inside the data itself just by
>> chance.
>>
>> --Bobby Evans
>>
>> On 12/19/11 7:51 AM, "Pedro Costa" <ps...@gmail.com> wrote:
>>
>> Hi,
>>
>> In the hadoop MapReduce, I've executed the webdatascan example, and the
>> reduce output is in a SequeceFile. The result is shows here (
>> http://paste.lisp.org/display/126572). What's the trash (random
>> characters), like "u 265
>> 0000100 330 320 252 " \n # ; 374 5 211 V ' 340 376" in the output? Is the
>> output correct?
>>
>>
>> 0000000 S E Q 006 031 o r g . a p a c h e .
>> 0000020 h a d o o p . i o . T e x t 031 o
>> 0000040 r g . a p a c h e . h a d o o p
>> 0000060 . i o . T e x t \0 \0 \0 \0 \0 \0 u 265
>> 0000100 330 320 252 " \n # ; 374 5 211 V ' 340 376 \0 \0
>> 0000120 \0 X \0 \0 \0 037 a p p l e a p p
>> 0000140 l e b a n a n a a p p l e
>> 0000160 a p p l e 7 c a r r o t c a
>> 0000200 r r o t c a r r o t c a r r
>> 0000220 o t a p p l e b a n a n a
>> 0000240 c a r r o t b a n a n a
>> 0000256
>>
>>
>> --
>> Thanks,
>>
>>
>
>
> --
> Best regards,
>
>
--
Best regards,
Re: Reduce output is strange
Posted by Owen O'Malley <om...@apache.org>.
On Tue, Apr 3, 2012 at 8:01 AM, Pedro Costa <ps...@gmail.com> wrote:
> If I want to compare 2 sequence files to see if they are the same, how do I
> compare?
>From the command line, you can "textify" the files with:
hadoop fs -text myfile.seq
Of course, if you are using API you can iterate through the two
Sequence files and compare them row by row.
-- Owen