You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by chiranjeevi vasupilli <ch...@gmail.com> on 2016/10/03 07:08:40 UTC

Reading compressed file using FileSplitter

Hi Team,

Can you please provide any reader/Operator which is capable of reading the
compressed data in DataTorrent.

I have a requirement to read .snappy files having cntl+A separaor using
filesplitter ,can u please let me know how to do it?


-- 
thanks
chiru

Re: Reading compressed file using FileSplitter

Posted by Vlad Rozov <v....@datatorrent.com>.
Another option is https://github.com/dain/snappy.

Thank you,

Vlad

On 10/3/16 04:15, Priyanka Gugale wrote:
> Looks like hadoop commons compressor has read only support for snappy. 
> So if your usecase is just to decompress you can try it out. There is 
> an example on this page: 
> http://commons.apache.org/proper/commons-compress/examples.html
>
> If this doesn't fulfill your requirement look for java wrapper written 
> by xerial: https://github.com/xerial/snappy-java
>
> -Priyanka
>
> On Mon, Oct 3, 2016 at 4:16 PM, chiranjeevi vasupilli 
> <chiru.vcj@gmail.com <ma...@gmail.com>> wrote:
>
>     Thank you Priyanka,
>
>     we are not using any snappy libraries yet for decomressing, can
>     you please suggest the library and version. so that we will try to
>     implement.
>
>
>
>     On Mon, Oct 3, 2016 at 4:06 PM, Priyanka Gugale <priyag@apache.org
>     <ma...@apache.org>> wrote:
>
>         Hi Chiranjeevi,
>
>         There is no direct support in current operators to decompress
>         data read from file. But you can do it in following ways:
>         1. Extend AbstractBlockReader to use right STREAM type by
>         implementing `setupStream` function to initialize right stream
>         reader class. e.g. gzipInputStream if your input was in gzip
>         format. Or in your case "SnappyInputStream".
>         2. Override `readBlock` from AbstractBlockReader and call
>         decompress on input data using snappy java api and then emit
>         the data.
>
>         I would suggest the option one but what is achievable depends
>         on which snappy java library you use. Can you tell us which
>         library you are using?
>
>         -Priyanka
>
>         On Mon, Oct 3, 2016 at 2:42 PM, chiranjeevi vasupilli
>         <chiru.vcj@gmail.com <ma...@gmail.com>> wrote:
>
>             Hi Priyanka,
>
>             We are getting compressed file from source, which we need
>             to read and decompress it. So that we can process the
>             actual data.
>
>             Can you please provide any reader/Operator which is
>             readily available to decompress the data  while
>             reading data in DataTorrent?
>
>
>
>             On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale
>             <priyag@apache.org <ma...@apache.org>> wrote:
>
>                 Hi,
>
>                 Do you want to read files in compressed form only or
>                 you want to your program to decompress and read it?
>                 If you want to read it in compressed format you can
>                 use FSInputModule (which uses FileSplitter and block
>                 reader) directly to read your files.
>                 If you want to uncompress while reading, there are
>                 other options you can choose. I will explain in detail
>                 once you confirm this is what you are trying to achieve.
>
>                 -Priyanka
>
>                 On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli
>                 <chiru.vcj@gmail.com <ma...@gmail.com>> wrote:
>
>                     Hi Team,
>
>                     Can you please provide any reader/Operator which
>                     is capable of reading the compressed data in
>                     DataTorrent.
>
>                     I have a requirement to read .snappy files having
>                     cntl+A separaor using filesplitter ,can u please
>                     let me know how to do it?
>
>
>                     -- 
>                     thanks
>                     chiru
>
>
>
>
>
>             -- 
>             ur's
>             chiru
>
>
>
>
>
>     -- 
>     ur's
>     chiru
>
>


Re: Reading compressed file using FileSplitter

Posted by Priyanka Gugale <pr...@apache.org>.
Looks like hadoop commons compressor has read only support for snappy. So
if your usecase is just to decompress you can try it out. There is an
example on this page:
http://commons.apache.org/proper/commons-compress/examples.html

If this doesn't fulfill your requirement look for java wrapper written
by xerial: https://github.com/xerial/snappy-java

-Priyanka

On Mon, Oct 3, 2016 at 4:16 PM, chiranjeevi vasupilli <ch...@gmail.com>
wrote:

> Thank you Priyanka,
>
> we are not using any snappy libraries yet for decomressing, can you please
> suggest the library and version. so that we will try to implement.
>
>
>
> On Mon, Oct 3, 2016 at 4:06 PM, Priyanka Gugale <pr...@apache.org> wrote:
>
>> Hi Chiranjeevi,
>>
>> There is no direct support in current operators to decompress data read
>> from file. But you can do it in following ways:
>> 1. Extend AbstractBlockReader to use right STREAM type by implementing
>> `setupStream` function to initialize right stream reader class. e.g.
>> gzipInputStream if your input was in gzip format. Or in your case
>> "SnappyInputStream".
>> 2. Override `readBlock` from AbstractBlockReader and call decompress on
>> input data using snappy java api and then emit the data.
>>
>> I would suggest the option one but what is achievable depends on which
>> snappy java library you use. Can you tell us which library you are using?
>>
>> -Priyanka
>>
>> On Mon, Oct 3, 2016 at 2:42 PM, chiranjeevi vasupilli <
>> chiru.vcj@gmail.com> wrote:
>>
>>> Hi Priyanka,
>>>
>>> We are getting compressed file from source, which we need to read and
>>> decompress it. So that we can process the actual data.
>>>
>>> Can you please provide any reader/Operator which is readily available to
>>> decompress the data  while reading data in DataTorrent?
>>>
>>>
>>>
>>> On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale <pr...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Do you want to read files in compressed form only or you want to your
>>>> program to decompress and read it?
>>>> If you want to read it in compressed format you can use FSInputModule
>>>> (which uses FileSplitter and block reader) directly to read your files.
>>>> If you want to uncompress while reading, there are other options you
>>>> can choose. I will explain in detail once you confirm this is what you are
>>>> trying to achieve.
>>>>
>>>> -Priyanka
>>>>
>>>> On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli <
>>>> chiru.vcj@gmail.com> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> Can you please provide any reader/Operator which is capable of reading
>>>>> the compressed data in DataTorrent.
>>>>>
>>>>> I have a requirement to read .snappy files having cntl+A separaor
>>>>> using filesplitter ,can u please let me know how to do it?
>>>>>
>>>>>
>>>>> --
>>>>> thanks
>>>>> chiru
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ur's
>>> chiru
>>>
>>
>>
>
>
> --
> ur's
> chiru
>

Re: Reading compressed file using FileSplitter

Posted by chiranjeevi vasupilli <ch...@gmail.com>.
Thank you Priyanka,

we are not using any snappy libraries yet for decomressing, can you please
suggest the library and version. so that we will try to implement.



On Mon, Oct 3, 2016 at 4:06 PM, Priyanka Gugale <pr...@apache.org> wrote:

> Hi Chiranjeevi,
>
> There is no direct support in current operators to decompress data read
> from file. But you can do it in following ways:
> 1. Extend AbstractBlockReader to use right STREAM type by implementing
> `setupStream` function to initialize right stream reader class. e.g.
> gzipInputStream if your input was in gzip format. Or in your case
> "SnappyInputStream".
> 2. Override `readBlock` from AbstractBlockReader and call decompress on
> input data using snappy java api and then emit the data.
>
> I would suggest the option one but what is achievable depends on which
> snappy java library you use. Can you tell us which library you are using?
>
> -Priyanka
>
> On Mon, Oct 3, 2016 at 2:42 PM, chiranjeevi vasupilli <chiru.vcj@gmail.com
> > wrote:
>
>> Hi Priyanka,
>>
>> We are getting compressed file from source, which we need to read and
>> decompress it. So that we can process the actual data.
>>
>> Can you please provide any reader/Operator which is readily available to
>> decompress the data  while reading data in DataTorrent?
>>
>>
>>
>> On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale <pr...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> Do you want to read files in compressed form only or you want to your
>>> program to decompress and read it?
>>> If you want to read it in compressed format you can use FSInputModule
>>> (which uses FileSplitter and block reader) directly to read your files.
>>> If you want to uncompress while reading, there are other options you can
>>> choose. I will explain in detail once you confirm this is what you are
>>> trying to achieve.
>>>
>>> -Priyanka
>>>
>>> On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli <
>>> chiru.vcj@gmail.com> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> Can you please provide any reader/Operator which is capable of reading
>>>> the compressed data in DataTorrent.
>>>>
>>>> I have a requirement to read .snappy files having cntl+A separaor using
>>>> filesplitter ,can u please let me know how to do it?
>>>>
>>>>
>>>> --
>>>> thanks
>>>> chiru
>>>>
>>>
>>>
>>
>>
>> --
>> ur's
>> chiru
>>
>
>


-- 
ur's
chiru

Re: Reading compressed file using FileSplitter

Posted by Priyanka Gugale <pr...@apache.org>.
Hi Chiranjeevi,

There is no direct support in current operators to decompress data read
from file. But you can do it in following ways:
1. Extend AbstractBlockReader to use right STREAM type by implementing
`setupStream` function to initialize right stream reader class. e.g.
gzipInputStream if your input was in gzip format. Or in your case
"SnappyInputStream".
2. Override `readBlock` from AbstractBlockReader and call decompress on
input data using snappy java api and then emit the data.

I would suggest the option one but what is achievable depends on which
snappy java library you use. Can you tell us which library you are using?

-Priyanka

On Mon, Oct 3, 2016 at 2:42 PM, chiranjeevi vasupilli <ch...@gmail.com>
wrote:

> Hi Priyanka,
>
> We are getting compressed file from source, which we need to read and
> decompress it. So that we can process the actual data.
>
> Can you please provide any reader/Operator which is readily available to decompress
> the data  while reading data in DataTorrent?
>
>
>
> On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale <pr...@apache.org> wrote:
>
>> Hi,
>>
>> Do you want to read files in compressed form only or you want to your
>> program to decompress and read it?
>> If you want to read it in compressed format you can use FSInputModule
>> (which uses FileSplitter and block reader) directly to read your files.
>> If you want to uncompress while reading, there are other options you can
>> choose. I will explain in detail once you confirm this is what you are
>> trying to achieve.
>>
>> -Priyanka
>>
>> On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli <
>> chiru.vcj@gmail.com> wrote:
>>
>>> Hi Team,
>>>
>>> Can you please provide any reader/Operator which is capable of reading
>>> the compressed data in DataTorrent.
>>>
>>> I have a requirement to read .snappy files having cntl+A separaor using
>>> filesplitter ,can u please let me know how to do it?
>>>
>>>
>>> --
>>> thanks
>>> chiru
>>>
>>
>>
>
>
> --
> ur's
> chiru
>

Re: Reading compressed file using FileSplitter

Posted by chiranjeevi vasupilli <ch...@gmail.com>.
Hi Priyanka,

We are getting compressed file from source, which we need to read and
decompress it. So that we can process the actual data.

Can you please provide any reader/Operator which is readily available
to decompress
the data  while reading data in DataTorrent?



On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale <pr...@apache.org> wrote:

> Hi,
>
> Do you want to read files in compressed form only or you want to your
> program to decompress and read it?
> If you want to read it in compressed format you can use FSInputModule
> (which uses FileSplitter and block reader) directly to read your files.
> If you want to uncompress while reading, there are other options you can
> choose. I will explain in detail once you confirm this is what you are
> trying to achieve.
>
> -Priyanka
>
> On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli <
> chiru.vcj@gmail.com> wrote:
>
>> Hi Team,
>>
>> Can you please provide any reader/Operator which is capable of reading
>> the compressed data in DataTorrent.
>>
>> I have a requirement to read .snappy files having cntl+A separaor using
>> filesplitter ,can u please let me know how to do it?
>>
>>
>> --
>> thanks
>> chiru
>>
>
>


-- 
ur's
chiru

Re: Reading compressed file using FileSplitter

Posted by Priyanka Gugale <pr...@apache.org>.
Hi,

Do you want to read files in compressed form only or you want to your
program to decompress and read it?
If you want to read it in compressed format you can use FSInputModule
(which uses FileSplitter and block reader) directly to read your files.
If you want to uncompress while reading, there are other options you can
choose. I will explain in detail once you confirm this is what you are
trying to achieve.

-Priyanka

On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli <ch...@gmail.com>
wrote:

> Hi Team,
>
> Can you please provide any reader/Operator which is capable of reading the
> compressed data in DataTorrent.
>
> I have a requirement to read .snappy files having cntl+A separaor using
> filesplitter ,can u please let me know how to do it?
>
>
> --
> thanks
> chiru
>