You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Abhijit Sarkar <ab...@gmail.com> on 2013/08/10 22:16:02 UTC

How to compress MapFile programmatically

Hi,
I'm a Hadoop newbie. This is my first question to this mailing list, hoping
for a good start :)

MapFile is a directory so when I try to open an InputStream to it, it fails
with FileNotFoundException. How do I compress MapFile programmatically?

Code snippet:
final FileSystem fs = FileSystem.get(conf);
final InputStream inputStream = fs.open(new Path(uncompressedStr));

Exception:
java.io.FileNotFoundException: /some/directory (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)

Regards,
Abhijit

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile.Reader will automatically detect and decompress without
needing to be told anything special. You needn't have to worry about
decompressing files by yourself in Apache Hadoop generally - the
framework handles it for you transparently if you're using the proper APIs.

On Sun, Aug 11, 2013 at 8:49 PM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer
> Constructor option and then put it in a DistributedCache, how do I
> uncompress it in the Map/Reduce? There isn't any API method to do that
> apparently.
>
> Regards,
> Abhijit
>
>> From: harsh@cloudera.com
>> Date: Sun, 11 Aug 2013 12:56:43 +0530
>> Subject: Re: How to compress MapFile programmatically
>> To: user@hadoop.apache.org
>
>>
>> A MapFile isn't a directory. It is a directory _containing_ two files.
>> You cannot "open" a directory for reading.
>>
>> The MapFile API is documented at
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
>> and thats what you're to be using for reading/writing them.
>>
>> Compression is a simple option you need to provide when invoking the
>> writer:
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
>>
>> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
>> <ab...@gmail.com> wrote:
>> > Hi,
>> > I'm a Hadoop newbie. This is my first question to this mailing list,
>> > hoping
>> > for a good start :)
>> >
>> > MapFile is a directory so when I try to open an InputStream to it, it
>> > fails
>> > with FileNotFoundException. How do I compress MapFile programmatically?
>> >
>> > Code snippet:
>> > final FileSystem fs = FileSystem.get(conf);
>> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
>> >
>> > Exception:
>> > java.io.FileNotFoundException: /some/directory (No such file or
>> > directory)
>> > at java.io.FileInputStream.open(Native Method)
>> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
>> > at
>> > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
>> > at
>> >
>> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>> > at
>> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>> > at
>> > name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>> >
>> > Regards,
>> > Abhijit
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile.Reader will automatically detect and decompress without
needing to be told anything special. You needn't have to worry about
decompressing files by yourself in Apache Hadoop generally - the
framework handles it for you transparently if you're using the proper APIs.

On Sun, Aug 11, 2013 at 8:49 PM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer
> Constructor option and then put it in a DistributedCache, how do I
> uncompress it in the Map/Reduce? There isn't any API method to do that
> apparently.
>
> Regards,
> Abhijit
>
>> From: harsh@cloudera.com
>> Date: Sun, 11 Aug 2013 12:56:43 +0530
>> Subject: Re: How to compress MapFile programmatically
>> To: user@hadoop.apache.org
>
>>
>> A MapFile isn't a directory. It is a directory _containing_ two files.
>> You cannot "open" a directory for reading.
>>
>> The MapFile API is documented at
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
>> and thats what you're to be using for reading/writing them.
>>
>> Compression is a simple option you need to provide when invoking the
>> writer:
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
>>
>> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
>> <ab...@gmail.com> wrote:
>> > Hi,
>> > I'm a Hadoop newbie. This is my first question to this mailing list,
>> > hoping
>> > for a good start :)
>> >
>> > MapFile is a directory so when I try to open an InputStream to it, it
>> > fails
>> > with FileNotFoundException. How do I compress MapFile programmatically?
>> >
>> > Code snippet:
>> > final FileSystem fs = FileSystem.get(conf);
>> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
>> >
>> > Exception:
>> > java.io.FileNotFoundException: /some/directory (No such file or
>> > directory)
>> > at java.io.FileInputStream.open(Native Method)
>> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
>> > at
>> > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
>> > at
>> >
>> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>> > at
>> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>> > at
>> > name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>> >
>> > Regards,
>> > Abhijit
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile.Reader will automatically detect and decompress without
needing to be told anything special. You needn't have to worry about
decompressing files by yourself in Apache Hadoop generally - the
framework handles it for you transparently if you're using the proper APIs.

On Sun, Aug 11, 2013 at 8:49 PM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer
> Constructor option and then put it in a DistributedCache, how do I
> uncompress it in the Map/Reduce? There isn't any API method to do that
> apparently.
>
> Regards,
> Abhijit
>
>> From: harsh@cloudera.com
>> Date: Sun, 11 Aug 2013 12:56:43 +0530
>> Subject: Re: How to compress MapFile programmatically
>> To: user@hadoop.apache.org
>
>>
>> A MapFile isn't a directory. It is a directory _containing_ two files.
>> You cannot "open" a directory for reading.
>>
>> The MapFile API is documented at
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
>> and thats what you're to be using for reading/writing them.
>>
>> Compression is a simple option you need to provide when invoking the
>> writer:
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
>>
>> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
>> <ab...@gmail.com> wrote:
>> > Hi,
>> > I'm a Hadoop newbie. This is my first question to this mailing list,
>> > hoping
>> > for a good start :)
>> >
>> > MapFile is a directory so when I try to open an InputStream to it, it
>> > fails
>> > with FileNotFoundException. How do I compress MapFile programmatically?
>> >
>> > Code snippet:
>> > final FileSystem fs = FileSystem.get(conf);
>> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
>> >
>> > Exception:
>> > java.io.FileNotFoundException: /some/directory (No such file or
>> > directory)
>> > at java.io.FileInputStream.open(Native Method)
>> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
>> > at
>> > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
>> > at
>> >
>> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>> > at
>> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>> > at
>> > name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>> >
>> > Regards,
>> > Abhijit
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile.Reader will automatically detect and decompress without
needing to be told anything special. You needn't have to worry about
decompressing files by yourself in Apache Hadoop generally - the
framework handles it for you transparently if you're using the proper APIs.

On Sun, Aug 11, 2013 at 8:49 PM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer
> Constructor option and then put it in a DistributedCache, how do I
> uncompress it in the Map/Reduce? There isn't any API method to do that
> apparently.
>
> Regards,
> Abhijit
>
>> From: harsh@cloudera.com
>> Date: Sun, 11 Aug 2013 12:56:43 +0530
>> Subject: Re: How to compress MapFile programmatically
>> To: user@hadoop.apache.org
>
>>
>> A MapFile isn't a directory. It is a directory _containing_ two files.
>> You cannot "open" a directory for reading.
>>
>> The MapFile API is documented at
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
>> and thats what you're to be using for reading/writing them.
>>
>> Compression is a simple option you need to provide when invoking the
>> writer:
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
>>
>> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
>> <ab...@gmail.com> wrote:
>> > Hi,
>> > I'm a Hadoop newbie. This is my first question to this mailing list,
>> > hoping
>> > for a good start :)
>> >
>> > MapFile is a directory so when I try to open an InputStream to it, it
>> > fails
>> > with FileNotFoundException. How do I compress MapFile programmatically?
>> >
>> > Code snippet:
>> > final FileSystem fs = FileSystem.get(conf);
>> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
>> >
>> > Exception:
>> > java.io.FileNotFoundException: /some/directory (No such file or
>> > directory)
>> > at java.io.FileInputStream.open(Native Method)
>> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
>> > at
>> >
>> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
>> > at
>> > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
>> > at
>> >
>> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>> > at
>> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>> > at
>> > name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>> >
>> > Regards,
>> > Abhijit
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

RE: How to compress MapFile programmatically

Posted by Abhijit Sarkar <ab...@gmail.com>.
Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer Constructor option and then put it in a DistributedCache, how do I uncompress it in the Map/Reduce? There isn't any API method to do that apparently.
Regards,Abhijit

> From: harsh@cloudera.com
> Date: Sun, 11 Aug 2013 12:56:43 +0530
> Subject: Re: How to compress MapFile programmatically
> To: user@hadoop.apache.org
> 
> A MapFile isn't a directory. It is a directory _containing_ two files.
> You cannot "open" a directory for reading.
> 
> The MapFile API is documented at
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
> and thats what you're to be using for reading/writing them.
> 
> Compression is a simple option you need to provide when invoking the
> writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
> 
> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
> <ab...@gmail.com> wrote:
> > Hi,
> > I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> > for a good start :)
> >
> > MapFile is a directory so when I try to open an InputStream to it, it fails
> > with FileNotFoundException. How do I compress MapFile programmatically?
> >
> > Code snippet:
> > final FileSystem fs = FileSystem.get(conf);
> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
> >
> > Exception:
> > java.io.FileNotFoundException: /some/directory (No such file or directory)
> > at java.io.FileInputStream.open(Native Method)
> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> > at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> > at
> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> > at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> > at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
> >
> > Regards,
> > Abhijit
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

RE: How to compress MapFile programmatically

Posted by Abhijit Sarkar <ab...@gmail.com>.
Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer Constructor option and then put it in a DistributedCache, how do I uncompress it in the Map/Reduce? There isn't any API method to do that apparently.
Regards,Abhijit

> From: harsh@cloudera.com
> Date: Sun, 11 Aug 2013 12:56:43 +0530
> Subject: Re: How to compress MapFile programmatically
> To: user@hadoop.apache.org
> 
> A MapFile isn't a directory. It is a directory _containing_ two files.
> You cannot "open" a directory for reading.
> 
> The MapFile API is documented at
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
> and thats what you're to be using for reading/writing them.
> 
> Compression is a simple option you need to provide when invoking the
> writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
> 
> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
> <ab...@gmail.com> wrote:
> > Hi,
> > I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> > for a good start :)
> >
> > MapFile is a directory so when I try to open an InputStream to it, it fails
> > with FileNotFoundException. How do I compress MapFile programmatically?
> >
> > Code snippet:
> > final FileSystem fs = FileSystem.get(conf);
> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
> >
> > Exception:
> > java.io.FileNotFoundException: /some/directory (No such file or directory)
> > at java.io.FileInputStream.open(Native Method)
> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> > at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> > at
> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> > at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> > at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
> >
> > Regards,
> > Abhijit
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

RE: How to compress MapFile programmatically

Posted by Abhijit Sarkar <ab...@gmail.com>.
Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer Constructor option and then put it in a DistributedCache, how do I uncompress it in the Map/Reduce? There isn't any API method to do that apparently.
Regards,Abhijit

> From: harsh@cloudera.com
> Date: Sun, 11 Aug 2013 12:56:43 +0530
> Subject: Re: How to compress MapFile programmatically
> To: user@hadoop.apache.org
> 
> A MapFile isn't a directory. It is a directory _containing_ two files.
> You cannot "open" a directory for reading.
> 
> The MapFile API is documented at
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
> and thats what you're to be using for reading/writing them.
> 
> Compression is a simple option you need to provide when invoking the
> writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
> 
> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
> <ab...@gmail.com> wrote:
> > Hi,
> > I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> > for a good start :)
> >
> > MapFile is a directory so when I try to open an InputStream to it, it fails
> > with FileNotFoundException. How do I compress MapFile programmatically?
> >
> > Code snippet:
> > final FileSystem fs = FileSystem.get(conf);
> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
> >
> > Exception:
> > java.io.FileNotFoundException: /some/directory (No such file or directory)
> > at java.io.FileInputStream.open(Native Method)
> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> > at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> > at
> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> > at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> > at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
> >
> > Regards,
> > Abhijit
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

RE: How to compress MapFile programmatically

Posted by Abhijit Sarkar <ab...@gmail.com>.
Thanks Harsh. However, if I compress the MapFile using the MapFile.Writer Constructor option and then put it in a DistributedCache, how do I uncompress it in the Map/Reduce? There isn't any API method to do that apparently.
Regards,Abhijit

> From: harsh@cloudera.com
> Date: Sun, 11 Aug 2013 12:56:43 +0530
> Subject: Re: How to compress MapFile programmatically
> To: user@hadoop.apache.org
> 
> A MapFile isn't a directory. It is a directory _containing_ two files.
> You cannot "open" a directory for reading.
> 
> The MapFile API is documented at
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
> and thats what you're to be using for reading/writing them.
> 
> Compression is a simple option you need to provide when invoking the
> writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)
> 
> On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
> <ab...@gmail.com> wrote:
> > Hi,
> > I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> > for a good start :)
> >
> > MapFile is a directory so when I try to open an InputStream to it, it fails
> > with FileNotFoundException. How do I compress MapFile programmatically?
> >
> > Code snippet:
> > final FileSystem fs = FileSystem.get(conf);
> > final InputStream inputStream = fs.open(new Path(uncompressedStr));
> >
> > Exception:
> > java.io.FileNotFoundException: /some/directory (No such file or directory)
> > at java.io.FileInputStream.open(Native Method)
> > at java.io.FileInputStream.<init>(FileInputStream.java:120)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> > at
> > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> > at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> > at
> > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> > at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> > at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
> >
> > Regards,
> > Abhijit
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile isn't a directory. It is a directory _containing_ two files.
You cannot "open" a directory for reading.

The MapFile API is documented at
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
and thats what you're to be using for reading/writing them.

Compression is a simple option you need to provide when invoking the
writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)

On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Hi,
> I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> for a good start :)
>
> MapFile is a directory so when I try to open an InputStream to it, it fails
> with FileNotFoundException. How do I compress MapFile programmatically?
>
> Code snippet:
> final FileSystem fs = FileSystem.get(conf);
> final InputStream inputStream = fs.open(new Path(uncompressedStr));
>
> Exception:
> java.io.FileNotFoundException: /some/directory (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:120)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>
> Regards,
> Abhijit



-- 
Harsh J

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile isn't a directory. It is a directory _containing_ two files.
You cannot "open" a directory for reading.

The MapFile API is documented at
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
and thats what you're to be using for reading/writing them.

Compression is a simple option you need to provide when invoking the
writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)

On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Hi,
> I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> for a good start :)
>
> MapFile is a directory so when I try to open an InputStream to it, it fails
> with FileNotFoundException. How do I compress MapFile programmatically?
>
> Code snippet:
> final FileSystem fs = FileSystem.get(conf);
> final InputStream inputStream = fs.open(new Path(uncompressedStr));
>
> Exception:
> java.io.FileNotFoundException: /some/directory (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:120)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>
> Regards,
> Abhijit



-- 
Harsh J

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile isn't a directory. It is a directory _containing_ two files.
You cannot "open" a directory for reading.

The MapFile API is documented at
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
and thats what you're to be using for reading/writing them.

Compression is a simple option you need to provide when invoking the
writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)

On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Hi,
> I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> for a good start :)
>
> MapFile is a directory so when I try to open an InputStream to it, it fails
> with FileNotFoundException. How do I compress MapFile programmatically?
>
> Code snippet:
> final FileSystem fs = FileSystem.get(conf);
> final InputStream inputStream = fs.open(new Path(uncompressedStr));
>
> Exception:
> java.io.FileNotFoundException: /some/directory (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:120)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>
> Regards,
> Abhijit



-- 
Harsh J

Re: How to compress MapFile programmatically

Posted by Harsh J <ha...@cloudera.com>.
A MapFile isn't a directory. It is a directory _containing_ two files.
You cannot "open" a directory for reading.

The MapFile API is documented at
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.html
and thats what you're to be using for reading/writing them.

Compression is a simple option you need to provide when invoking the
writer: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/MapFile.Writer.html#MapFile.Writer(org.apache.hadoop.conf.Configuration,%20org.apache.hadoop.fs.FileSystem,%20java.lang.String,%20org.apache.hadoop.io.WritableComparator,%20java.lang.Class,%20org.apache.hadoop.io.SequenceFile.CompressionType,%20org.apache.hadoop.io.compress.CompressionCodec,%20org.apache.hadoop.util.Progressable)

On Sun, Aug 11, 2013 at 1:46 AM, Abhijit Sarkar
<ab...@gmail.com> wrote:
> Hi,
> I'm a Hadoop newbie. This is my first question to this mailing list, hoping
> for a good start :)
>
> MapFile is a directory so when I try to open an InputStream to it, it fails
> with FileNotFoundException. How do I compress MapFile programmatically?
>
> Code snippet:
> final FileSystem fs = FileSystem.get(conf);
> final InputStream inputStream = fs.open(new Path(uncompressedStr));
>
> Exception:
> java.io.FileNotFoundException: /some/directory (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.<init>(FileInputStream.java:120)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:71)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:107)
> at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> at name.abhijitsarkar.learning.hadoop.io.IOUtils.compress(IOUtils.java:104)
>
> Regards,
> Abhijit



-- 
Harsh J