You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by pratik khadloya <ti...@gmail.com> on 2014/09/16 20:19:58 UTC
Comparing sqoop's output to hdfs to data in mysql
Hello,
I am comparing the mysql data (by dumping into a file) to the textfile
imported by sqoop onto HDFS.
Am using the diff tool to do the same.
I observed the following differences:
mysql --> sqoop_text_output
\\n \n
\\t \n
\$ $
0 0.0
So, it seems like mysql auto escapes the output with a \. I got around that
by telling mysql not to do that so that i can compare properly. I had to
pass the --raw flag to mysql. Then the only difference i currently see is
that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
possible to tell sqoop not to convert 0 to 0.0 ?
All in all, i am trying to verify the work done by sqoop for my
satisfaction. Once i verify the text data is being exported fine, i will
verify the same for the parquet format.
Thanks,
~Pratik
Re: Comparing sqoop's output to hdfs to data in mysql
Posted by pratik khadloya <ti...@gmail.com>.
Thanks Abraham. I was hoping to not have to do that since am dealing with
70+ queries.
In the mean time i am writing a script which uses sed to massage the files
and then compares.
Regards,
~Pratik
On Wed, Sep 17, 2014 at 12:18 PM, Abraham Elmahrek <ab...@cloudera.com> wrote:
> Try making your query a free-form query and casting the float to a string.
> Something like the following worked well for me:
>
> SELECT id, text, fl, CAST(fl AS CHAR(64)) FROM fl WHERE $CONDITIONS
>
> On Tue, Sep 16, 2014 at 2:41 PM, pratik khadloya <ti...@gmail.com>
> wrote:
>
>> Sure. The column in question is
>>
>> Field Type Null Default
>> usage_fee_percent double YES 0
>>
>> If we run a mysql select query the value is 0
>> If we cat the file on hdfs which is exported by sqoop the value is 0.0
>>
>> ======= Mysql Command =======
>> mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT
>> a.usage_fee_percent FROM accounts a" >
>> /home/pkhadloya/sqoop_out/mysql_accounts
>>
>> ======= Sqoop Command =======
>> bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
>> --username <user> --password <pwd> --target-dir
>> /user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
>> a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
>> --mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
>> --as-textfile
>>
>> ======= Diff Command =======
>> bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
>> <(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
>> /home/pkhadloya/sqoop_out/diff_accounts"
>>
>>
>> Thanks for looking into this.
>>
>> Regards,
>> Pratik
>>
>>
>> On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <ab...@cloudera.com>
>> wrote:
>>
>>> Hey there,
>>>
>>> Could you provide us with the table description (types) and the sqoop
>>> command you are running?
>>>
>>> -Abe
>>>
>>>
>>> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am comparing the mysql data (by dumping into a file) to the textfile
>>>> imported by sqoop onto HDFS.
>>>> Am using the diff tool to do the same.
>>>>
>>>> I observed the following differences:
>>>> mysql --> sqoop_text_output
>>>> \\n \n
>>>> \\t \n
>>>> \$ $
>>>> 0 0.0
>>>>
>>>> So, it seems like mysql auto escapes the output with a \. I got around
>>>> that by telling mysql not to do that so that i can compare properly. I had
>>>> to pass the --raw flag to mysql. Then the only difference i currently see
>>>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>>>
>>>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a
>>>> csv? Maybe the answer lines in the guts of sqoop that i can use myself. Or
>>>> is it possible to tell sqoop not to convert 0 to 0.0 ?
>>>>
>>>> All in all, i am trying to verify the work done by sqoop for my
>>>> satisfaction. Once i verify the text data is being exported fine, i will
>>>> verify the same for the parquet format.
>>>>
>>>> Thanks,
>>>> ~Pratik
>>>>
>>>
>>>
>>
>
Re: Comparing sqoop's output to hdfs to data in mysql
Posted by Abraham Elmahrek <ab...@cloudera.com>.
Try making your query a free-form query and casting the float to a string.
Something like the following worked well for me:
SELECT id, text, fl, CAST(fl AS CHAR(64)) FROM fl WHERE $CONDITIONS
On Tue, Sep 16, 2014 at 2:41 PM, pratik khadloya <ti...@gmail.com>
wrote:
> Sure. The column in question is
>
> Field Type Null Default
> usage_fee_percent double YES 0
>
> If we run a mysql select query the value is 0
> If we cat the file on hdfs which is exported by sqoop the value is 0.0
>
> ======= Mysql Command =======
> mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT
> a.usage_fee_percent FROM accounts a" >
> /home/pkhadloya/sqoop_out/mysql_accounts
>
> ======= Sqoop Command =======
> bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
> --username <user> --password <pwd> --target-dir
> /user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
> a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
> --mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
> --as-textfile
>
> ======= Diff Command =======
> bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
> <(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
> /home/pkhadloya/sqoop_out/diff_accounts"
>
>
> Thanks for looking into this.
>
> Regards,
> Pratik
>
>
> On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> Hey there,
>>
>> Could you provide us with the table description (types) and the sqoop
>> command you are running?
>>
>> -Abe
>>
>>
>> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am comparing the mysql data (by dumping into a file) to the textfile
>>> imported by sqoop onto HDFS.
>>> Am using the diff tool to do the same.
>>>
>>> I observed the following differences:
>>> mysql --> sqoop_text_output
>>> \\n \n
>>> \\t \n
>>> \$ $
>>> 0 0.0
>>>
>>> So, it seems like mysql auto escapes the output with a \. I got around
>>> that by telling mysql not to do that so that i can compare properly. I had
>>> to pass the --raw flag to mysql. Then the only difference i currently see
>>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>>
>>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
>>> Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
>>> possible to tell sqoop not to convert 0 to 0.0 ?
>>>
>>> All in all, i am trying to verify the work done by sqoop for my
>>> satisfaction. Once i verify the text data is being exported fine, i will
>>> verify the same for the parquet format.
>>>
>>> Thanks,
>>> ~Pratik
>>>
>>
>>
>
Re: Comparing sqoop's output to hdfs to data in mysql
Posted by pratik khadloya <ti...@gmail.com>.
Sure. The column in question is
Field Type Null Default
usage_fee_percent double YES 0
If we run a mysql select query the value is 0
If we cat the file on hdfs which is exported by sqoop the value is 0.0
======= Mysql Command =======
mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT a.usage_fee_percent
FROM accounts a" > /home/pkhadloya/sqoop_out/mysql_accounts
======= Sqoop Command =======
bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
--username <user> --password <pwd> --target-dir
/user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
--mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
--as-textfile
======= Diff Command =======
bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
<(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
/home/pkhadloya/sqoop_out/diff_accounts"
Thanks for looking into this.
Regards,
Pratik
On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <ab...@cloudera.com> wrote:
> Hey there,
>
> Could you provide us with the table description (types) and the sqoop
> command you are running?
>
> -Abe
>
>
> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am comparing the mysql data (by dumping into a file) to the textfile
>> imported by sqoop onto HDFS.
>> Am using the diff tool to do the same.
>>
>> I observed the following differences:
>> mysql --> sqoop_text_output
>> \\n \n
>> \\t \n
>> \$ $
>> 0 0.0
>>
>> So, it seems like mysql auto escapes the output with a \. I got around
>> that by telling mysql not to do that so that i can compare properly. I had
>> to pass the --raw flag to mysql. Then the only difference i currently see
>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>
>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
>> Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
>> possible to tell sqoop not to convert 0 to 0.0 ?
>>
>> All in all, i am trying to verify the work done by sqoop for my
>> satisfaction. Once i verify the text data is being exported fine, i will
>> verify the same for the parquet format.
>>
>> Thanks,
>> ~Pratik
>>
>
>
Re: Comparing sqoop's output to hdfs to data in mysql
Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey there,
Could you provide us with the table description (types) and the sqoop
command you are running?
-Abe
On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <ti...@gmail.com>
wrote:
> Hello,
>
> I am comparing the mysql data (by dumping into a file) to the textfile
> imported by sqoop onto HDFS.
> Am using the diff tool to do the same.
>
> I observed the following differences:
> mysql --> sqoop_text_output
> \\n \n
> \\t \n
> \$ $
> 0 0.0
>
> So, it seems like mysql auto escapes the output with a \. I got around
> that by telling mysql not to do that so that i can compare properly. I had
> to pass the --raw flag to mysql. Then the only difference i currently see
> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>
> How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
> Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
> possible to tell sqoop not to convert 0 to 0.0 ?
>
> All in all, i am trying to verify the work done by sqoop for my
> satisfaction. Once i verify the text data is being exported fine, i will
> verify the same for the parquet format.
>
> Thanks,
> ~Pratik
>