You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Rahul Joshi <vi...@gmail.com> on 2013/11/06 19:09:35 UTC

Handling CLOBs in Sqoop - Hive Import

Hi,



We are trying to use Sqoop for importing data from Oracle. The table has
CLOB as one of its column type which contains newline characters at many
places. Tried using --hive-drop- import-delims option but somehow it’s not
working. The data still contains newlines, and so Hive table doesn’t read
them properly. Found that this works with SQL Server tables smoothly. The
table / commands / sqoop options are more or less similar (except
connection strings etc), not sure why it’s not working with Oracle. In case
of import from Oracle,  the delims are not getting droped where for SQL
Server, it modifies the data on HDFS.



Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
to 0), wanted to know how this can be used along with Hive. Could import
the data using this option, but the import fails if this option is used
with –hive-import option. Is there any known way of dealing with such
external CLOB data in Hive?



Please let us know if anyone has any suggestions.



Regards,

--Rahul Joshi.

Re: Handling CLOBs in Sqoop - Hive Import

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Rahul,
sadly the parameter --hive-drop-import-delims is applicable only for String based types (such as CHAR, VARCHAR, NCHAR, ...), it's not applicable to CLOB. To workaround this, you can re-type the CLOB field into String using --map-column-java parameter, such as:

  sqoop import --map-column-java $columnName=String

Jarcec

On Wed, Nov 06, 2013 at 01:15:50PM -0500, Rahul Joshi wrote:
> Hi,
> 
> 
> 
> We are trying to use Sqoop for importing data from Oracle. The table has
> CLOB as one of its column type which contains newline characters at many
> places. Tried using --hive-drop- import-delims option but somehow it’s not
> working. The data still contains newlines, and so Hive table doesn’t read
> them properly. Found that this works with SQL Server tables smoothly. The
> table / commands / sqoop options are more or less similar (except
> connection strings etc), not sure why it’s not working with Oracle. In case
> of import from Oracle,  the delims are not getting droped where for SQL
> Server, it modifies the data on HDFS.
> 
> 
> 
> Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
> to 0), wanted to know how this can be used along with Hive. Could import
> the data using this option, but the import fails if this option is used
> with –hive-import option. Is there any known way of dealing with such
> external CLOB data in Hive?
> 
> 
> 
> Please let us know if anyone has any suggestions.
> 
> 
> 
> Regards,
> 
> --Rahul Joshi.
> 
> 
> On 6 November 2013 13:09, Rahul Joshi <vi...@gmail.com> wrote:
> 
> > Hi,
> >
> >
> >
> > We are trying to use Sqoop for importing data from Oracle. The table has
> > CLOB as one of its column type which contains newline characters at many
> > places. Tried using --hive-drop- import-delims option but somehow it’s not
> > working. The data still contains newlines, and so Hive table doesn’t read
> > them properly. Found that this works with SQL Server tables smoothly. The
> > table / commands / sqoop options are more or less similar (except
> > connection strings etc), not sure why it’s not working with Oracle. In case
> > of import from Oracle,  the delims are not getting droped where for SQL
> > Server, it modifies the data on HDFS.
> >
> >
> >
> > Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
> > to 0), wanted to know how this can be used along with Hive. Could import
> > the data using this option, but the import fails if this option is used
> > with –hive-import option. Is there any known way of dealing with such
> > external CLOB data in Hive?
> >
> >
> >
> > Please let us know if anyone has any suggestions.
> >
> >
> >
> > Regards,
> >
> > --Rahul Joshi.
> >
> 
> 
> 
> -- 
> Regards,
> Rahul R. Joshi.

Re: Handling CLOBs in Sqoop - Hive Import

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Rahul,
sadly the parameter --hive-drop-import-delims is applicable only for String based types (such as CHAR, VARCHAR, NCHAR, ...), it's not applicable to CLOB. To workaround this, you can re-type the CLOB field into String using --map-column-java parameter, such as:

  sqoop import --map-column-java $columnName=String

Jarcec

On Wed, Nov 06, 2013 at 01:15:50PM -0500, Rahul Joshi wrote:
> Hi,
> 
> 
> 
> We are trying to use Sqoop for importing data from Oracle. The table has
> CLOB as one of its column type which contains newline characters at many
> places. Tried using --hive-drop- import-delims option but somehow it’s not
> working. The data still contains newlines, and so Hive table doesn’t read
> them properly. Found that this works with SQL Server tables smoothly. The
> table / commands / sqoop options are more or less similar (except
> connection strings etc), not sure why it’s not working with Oracle. In case
> of import from Oracle,  the delims are not getting droped where for SQL
> Server, it modifies the data on HDFS.
> 
> 
> 
> Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
> to 0), wanted to know how this can be used along with Hive. Could import
> the data using this option, but the import fails if this option is used
> with –hive-import option. Is there any known way of dealing with such
> external CLOB data in Hive?
> 
> 
> 
> Please let us know if anyone has any suggestions.
> 
> 
> 
> Regards,
> 
> --Rahul Joshi.
> 
> 
> On 6 November 2013 13:09, Rahul Joshi <vi...@gmail.com> wrote:
> 
> > Hi,
> >
> >
> >
> > We are trying to use Sqoop for importing data from Oracle. The table has
> > CLOB as one of its column type which contains newline characters at many
> > places. Tried using --hive-drop- import-delims option but somehow it’s not
> > working. The data still contains newlines, and so Hive table doesn’t read
> > them properly. Found that this works with SQL Server tables smoothly. The
> > table / commands / sqoop options are more or less similar (except
> > connection strings etc), not sure why it’s not working with Oracle. In case
> > of import from Oracle,  the delims are not getting droped where for SQL
> > Server, it modifies the data on HDFS.
> >
> >
> >
> > Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
> > to 0), wanted to know how this can be used along with Hive. Could import
> > the data using this option, but the import fails if this option is used
> > with –hive-import option. Is there any known way of dealing with such
> > external CLOB data in Hive?
> >
> >
> >
> > Please let us know if anyone has any suggestions.
> >
> >
> >
> > Regards,
> >
> > --Rahul Joshi.
> >
> 
> 
> 
> -- 
> Regards,
> Rahul R. Joshi.

Re: Handling CLOBs in Sqoop - Hive Import

Posted by Rahul Joshi <vi...@gmail.com>.
Hi,



We are trying to use Sqoop for importing data from Oracle. The table has
CLOB as one of its column type which contains newline characters at many
places. Tried using --hive-drop- import-delims option but somehow it’s not
working. The data still contains newlines, and so Hive table doesn’t read
them properly. Found that this works with SQL Server tables smoothly. The
table / commands / sqoop options are more or less similar (except
connection strings etc), not sure why it’s not working with Oracle. In case
of import from Oracle,  the delims are not getting droped where for SQL
Server, it modifies the data on HDFS.



Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
to 0), wanted to know how this can be used along with Hive. Could import
the data using this option, but the import fails if this option is used
with –hive-import option. Is there any known way of dealing with such
external CLOB data in Hive?



Please let us know if anyone has any suggestions.



Regards,

--Rahul Joshi.


On 6 November 2013 13:09, Rahul Joshi <vi...@gmail.com> wrote:

> Hi,
>
>
>
> We are trying to use Sqoop for importing data from Oracle. The table has
> CLOB as one of its column type which contains newline characters at many
> places. Tried using --hive-drop- import-delims option but somehow it’s not
> working. The data still contains newlines, and so Hive table doesn’t read
> them properly. Found that this works with SQL Server tables smoothly. The
> table / commands / sqoop options are more or less similar (except
> connection strings etc), not sure why it’s not working with Oracle. In case
> of import from Oracle,  the delims are not getting droped where for SQL
> Server, it modifies the data on HDFS.
>
>
>
> Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
> to 0), wanted to know how this can be used along with Hive. Could import
> the data using this option, but the import fails if this option is used
> with –hive-import option. Is there any known way of dealing with such
> external CLOB data in Hive?
>
>
>
> Please let us know if anyone has any suggestions.
>
>
>
> Regards,
>
> --Rahul Joshi.
>



-- 
Regards,
Rahul R. Joshi.

Re: Handling CLOBs in Sqoop - Hive Import

Posted by Rahul Joshi <vi...@gmail.com>.
Hi,



We are trying to use Sqoop for importing data from Oracle. The table has
CLOB as one of its column type which contains newline characters at many
places. Tried using --hive-drop- import-delims option but somehow it’s not
working. The data still contains newlines, and so Hive table doesn’t read
them properly. Found that this works with SQL Server tables smoothly. The
table / commands / sqoop options are more or less similar (except
connection strings etc), not sure why it’s not working with Oracle. In case
of import from Oracle,  the delims are not getting droped where for SQL
Server, it modifies the data on HDFS.



Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
to 0), wanted to know how this can be used along with Hive. Could import
the data using this option, but the import fails if this option is used
with –hive-import option. Is there any known way of dealing with such
external CLOB data in Hive?



Please let us know if anyone has any suggestions.



Regards,

--Rahul Joshi.


On 6 November 2013 13:09, Rahul Joshi <vi...@gmail.com> wrote:

> Hi,
>
>
>
> We are trying to use Sqoop for importing data from Oracle. The table has
> CLOB as one of its column type which contains newline characters at many
> places. Tried using --hive-drop- import-delims option but somehow it’s not
> working. The data still contains newlines, and so Hive table doesn’t read
> them properly. Found that this works with SQL Server tables smoothly. The
> table / commands / sqoop options are more or less similar (except
> connection strings etc), not sure why it’s not working with Oracle. In case
> of import from Oracle,  the delims are not getting droped where for SQL
> Server, it modifies the data on HDFS.
>
>
>
> Sqoop also has a way to treat CLOB as external file (setting --inline-lob-limit
> to 0), wanted to know how this can be used along with Hive. Could import
> the data using this option, but the import fails if this option is used
> with –hive-import option. Is there any known way of dealing with such
> external CLOB data in Hive?
>
>
>
> Please let us know if anyone has any suggestions.
>
>
>
> Regards,
>
> --Rahul Joshi.
>



-- 
Regards,
Rahul R. Joshi.