You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Marcus Truscello <ma...@gmail.com> on 2015/12/17 18:49:42 UTC

--hive-import with --fields-terminated-by value over 127

This isn't so much as a bug report as a feature request.

With sqoop, one can specify a --fields-terminated-by value greater than 127
using octal notation and it will work correctly.  The resulting file will
have the correct delimiter.

However, if you include the --hive-import option, the delimiter will result
in error when being imported into Hive even though the file retains the
correct delimiter.  This is the region of code responsible for the error:
https://github.com/apache/sqoop/blob/f19e2a523579db8c28a96febfd3cf35a5d58adc6/src/java/org/apache/sqoop/hive/TableDefWriter.java#L278-L300

However, Hive supports delimiters with ASCII values between 128 and 255,
just not in the octal escape form.  Instead, they must be specified as
negative values (two's compliment, signed char).  For example, ASCII 254 in
octal would normally be FIELDS TERMINATED BY '\0376' which is an error in
Hive, but FIELDS TERMINATED BY '-2' works correctly.

I believe that sqoop's --hive-import function should convert the
--fields-terminated-by value into a form usable by Hive even if the value
is greater than 127.  Values greater than 255 should probably still be an
error.


Thanks for your time and consideration.
-Marcus

Re: --hive-import with --fields-terminated-by value over 127

Posted by Marcus Truscello <ma...@gmail.com>.
I can absolutely try!  I was just hoping to get a read on if this would be
considered a worthwhile change to pursue or if it would be considered
"working as intended".
Regardless, I'll open an issue in JIRA and see where it goes from there.

On Fri, Dec 18, 2015 at 1:25 AM, Jarek Jarcec Cecho <ja...@apache.org>
wrote:

> Can you create a JIRA Marcus?
>
> Jarcec
>
> > On Dec 17, 2015, at 6:49 PM, Marcus Truscello <
> marcus.truscello@gmail.com> wrote:
> >
> > This isn't so much as a bug report as a feature request.
> >
> > With sqoop, one can specify a --fields-terminated-by value greater than
> 127 using octal notation and it will work correctly.  The resulting file
> will have the correct delimiter.
> >
> > However, if you include the --hive-import option, the delimiter will
> result in error when being imported into Hive even though the file retains
> the correct delimiter.  This is the region of code responsible for the
> error:
> >
> https://github.com/apache/sqoop/blob/f19e2a523579db8c28a96febfd3cf35a5d58adc6/src/java/org/apache/sqoop/hive/TableDefWriter.java#L278-L300
> >
> > However, Hive supports delimiters with ASCII values between 128 and 255,
> just not in the octal escape form.  Instead, they must be specified as
> negative values (two's compliment, signed char).  For example, ASCII 254 in
> octal would normally be FIELDS TERMINATED BY '\0376' which is an error in
> Hive, but FIELDS TERMINATED BY '-2' works correctly.
> >
> > I believe that sqoop's --hive-import function should convert the
> --fields-terminated-by value into a form usable by Hive even if the value
> is greater than 127.  Values greater than 255 should probably still be an
> error.
> >
> >
> > Thanks for your time and consideration.
> > -Marcus
>
>

Re: --hive-import with --fields-terminated-by value over 127

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Can you create a JIRA Marcus?

Jarcec

> On Dec 17, 2015, at 6:49 PM, Marcus Truscello <ma...@gmail.com> wrote:
> 
> This isn't so much as a bug report as a feature request.
> 
> With sqoop, one can specify a --fields-terminated-by value greater than 127 using octal notation and it will work correctly.  The resulting file will have the correct delimiter.
> 
> However, if you include the --hive-import option, the delimiter will result in error when being imported into Hive even though the file retains the correct delimiter.  This is the region of code responsible for the error:
> https://github.com/apache/sqoop/blob/f19e2a523579db8c28a96febfd3cf35a5d58adc6/src/java/org/apache/sqoop/hive/TableDefWriter.java#L278-L300
> 
> However, Hive supports delimiters with ASCII values between 128 and 255, just not in the octal escape form.  Instead, they must be specified as negative values (two's compliment, signed char).  For example, ASCII 254 in octal would normally be FIELDS TERMINATED BY '\0376' which is an error in Hive, but FIELDS TERMINATED BY '-2' works correctly.
> 
> I believe that sqoop's --hive-import function should convert the --fields-terminated-by value into a form usable by Hive even if the value is greater than 127.  Values greater than 255 should probably still be an error.
> 
> 
> Thanks for your time and consideration.
> -Marcus