You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by su...@bt.com on 2015/08/28 12:40:51 UTC

Unzipping an SQL Server DB column data while using SQOOP

Hello Everyone,

I have the below requirement to do


1)      We have a DB in SQL Server.

2)      EVENTS related transactional data is getting stored in EVENTS table of DB. A part of event related information is being generated as an xml (one for each event) and is being stored in one of the columns (of type VARBINARY) of EVENTS table.

Note: Interesting part here is that all xmls are being zipped with gzip compression codec, and only then being stored in one of the columns of EVENTS table.

3)      Now that we have decided to SQOOP the data to HDFS from EVENTS table, to generate some reports on it, we are totally clueless on how to proceed on this requirement.

I have tried SQOOPing the data as it is into HDFS as a default text format. All the columns are in human readable format except that VARBINARY column (where the zipped xml is stored).

So, can someone help me in this?

Ø  How can I unzip that zipped xml data? Can I do it in a single SQOOP command?

Ø  I have not tried but will it work if I SQOOP to a sequence file?

Please help me...I am struggling to start on this requirement  from past couple of days.


Thanks
Suman


Re: Unzipping an SQL Server DB column data while using SQOOP

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey man,

Currently Sqoop doesn't have any ability to unzip a single column. This is
a feature that seems fit for Sqoop2 actually. Until then, I'd try using
Hive or Impala and write a custom UDF to help with column extraction.

-Abe

On Fri, Aug 28, 2015 at 3:40 AM <su...@bt.com> wrote:

> *Hello Everyone,*
>
>
>
> I have the below requirement to do
>
>
>
> 1)      We have a DB in SQL Server.
>
> 2)      *EVENTS* related transactional data is getting stored in *EVENTS*
> table of DB. A part of event related information is being generated as an
> *xml* (one for each event) and is being stored in one of the columns (of
> type *VARBINARY*) of *EVENTS* table.
>
> *Note:* Interesting part here is that all xmls are being zipped with *gzip
> *compression codec, and only then being stored in one of the columns of *EVENTS
> *table.
>
> 3)      Now that we have decided to *SQOOP* the data to *HDFS* from
> *EVENTS* table, to generate some reports on it, we are totally clueless
> on how to proceed on this requirement.
>
>
>
> I have tried SQOOPing the data as it is into HDFS as a default text
> format. All the columns are in human readable format except that VARBINARY
> column (where the zipped xml is stored).
>
>
>
> So, can someone help me in this?
>
> Ø  How can I unzip that zipped xml data? Can I do it in a single SQOOP
> command?
>
> Ø  I have not tried but will it work if I SQOOP to a *sequence* file?
>
>
>
> *Please help me…I am struggling to start on this requirement  from past
> couple of days.*
>
>
>
>
>
> *Thanks*
>
> *Suman*
>
>
>