You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Anandha L Ranganathan <an...@gmail.com> on 2012/12/01 03:15:02 UTC

Sqoop import - column removed in the source DB and all the records are off by 1 column.

In my Sqoop import one of the column in the source table got deleted and
that resulting in data issue. This resulting   data are off by 1 column.
The removed column was in the middle of the schema. If it were last column
then wouldn't have any worries.

Data is imported  from MySql to Hive using Sqoop.  I am using  sqoop-1.3.0

Here is the syntax.

sqoop import     --hive-import
                --options-file 'credential.txt'
                --table 'TABLENAME '
                --where 'created between 1353960000000 and 1353963600000'
                --hive-partition-key part
                --hive-partition-value 'PARTITION_VALUE'
                --hive-overwrite
                --hive-delims-replacement



Now the problem is One of the column in the source DB got removed.

I tried with workaround by including  the --columns

 1) By hardcoding third column with quotes.
        --columns "col1,col2,'col3' as col3,col4"

but this gives error Column name 'col3' not in table


2) Then i tried with (col2 repeated twice)
   --columns " col1,col2, col2 , col4"

It threw an error
   Imported Failed: Duplicate Column identifier specified:


3) Then i tried with (col2 as col3)
   --columns " col1,col2, col2 as col3, col4"

ERROR tool.ImportTool: Imported Failed: Column name 'authid uuid' not in
table

Could anybody suggest workaround for this.

Thanks

Re: Sqoop import - column removed in the source DB and all the records are off by 1 column.

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

I see, thank you for your feedback Anandha. 

I believe that your hive import into specific partition should work.

Jarcec

On Sun, Dec 02, 2012 at 11:10:36AM -0800, Anandha L Ranganathan wrote:
> There are couple of options I have to specify explicitly to make it work
> using query option 1) target-dir 2) split-by OR --m
> 
> We import the data to Hive by hourly partition.
> 
> 
> sqoop imprt --options-file "credential.txt"
>                   --query "select col1,col2,'col3' as col3 ,col10 from
> TABLE where
>                     created between 1353960000000 and 1353963600000
> \$CONDITION"
>                   --target-dir='/tmp/TABLENAME/12-01-2002-00-00-00'
>                   --hive-import
>                    --hive-partition-key part     --hive-partition-value
> 'PARTITION_VALUE'
>                  --hive-overwrite
>                  --hive-delims-replacement
>                  --split-by created  (OR) --m=1
> 
> 
> Previously we were using --table option so we don't have to specify column
> name, target directory and split-by option.
> 
> By implementing this the existing schema is intact in Hive and no changes
> required.
> 
> 
> 
> On Sun, Dec 2, 2012 at 9:11 AM, Jarek Jarcec Cecho <ja...@apache.org>wrote:
> 
> > Hi Anad,
> > would you mind providing details on your question about partition support
> > in query argument? I'm not sure what you mean and how it connects to
> > something that is supported by --table argument.
> >
> > Jarcec
> >
> > On Sat, Dec 01, 2012 at 08:13:34PM -0800, Anandha L Ranganathan wrote:
> > > Hi Jarcec,
> > >
> > > I went with second option.
> > >
> > > giving "select col1, col2, NULL as col3, col4" was throwing
> > > NullPointereException.
> > >
> > > So I went with
> > >
> > > "Select col1,col2,'col3' as col3, col4"
> > >
> > > Thanks
> > > Anand
> > >
> > >
> > >
> > >
> > >
> > > The second option --query looks good but is it support partition ?
> > >
> > >
> > > On Sat, Dec 1, 2012 at 10:03 AM, Jarek Jarcec Cecho <jarcec@apache.org
> > >wrote:
> > >
> > > > Hi Anandha,
> > > > I do have couple of suggestions that could help in your scenario:
> > > >
> > > > 1) Rename your hive table to "TABLENAME_OLD" or something and let Sqoop
> > > > create new hive table with proper schema. This way you will not loose
> > any
> > > > data nor appropriate schema, however you will end up with two tables.
> > > >
> > > > 2) Change your table based import to query based import and select one
> > > > column extra:
> > > >
> > > > --query "select col1, col2, NULL as col3, col4, ..."
> > > >
> > > > Jarcec
> > > >
> > > > On Fri, Nov 30, 2012 at 06:15:02PM -0800, Anandha L Ranganathan wrote:
> > > > > In my Sqoop import one of the column in the source table got deleted
> > and
> > > > > that resulting in data issue. This resulting   data are off by 1
> > column.
> > > > > The removed column was in the middle of the schema. If it were last
> > > > column
> > > > > then wouldn't have any worries.
> > > > >
> > > > > Data is imported  from MySql to Hive using Sqoop.  I am using
> > > >  sqoop-1.3.0
> > > > >
> > > > > Here is the syntax.
> > > > >
> > > > > sqoop import     --hive-import
> > > > >                 --options-file 'credential.txt'
> > > > >                 --table 'TABLENAME '
> > > > >                 --where 'created between 1353960000000 and
> > 1353963600000'
> > > > >                 --hive-partition-key part
> > > > >                 --hive-partition-value 'PARTITION_VALUE'
> > > > >                 --hive-overwrite
> > > > >                 --hive-delims-replacement
> > > > >
> > > > >
> > > > >
> > > > > Now the problem is One of the column in the source DB got removed.
> > > > >
> > > > > I tried with workaround by including  the --columns
> > > > >
> > > > >  1) By hardcoding third column with quotes.
> > > > >         --columns "col1,col2,'col3' as col3,col4"
> > > > >
> > > > > but this gives error Column name 'col3' not in table
> > > > >
> > > > >
> > > > > 2) Then i tried with (col2 repeated twice)
> > > > >    --columns " col1,col2, col2 , col4"
> > > > >
> > > > > It threw an error
> > > > >    Imported Failed: Duplicate Column identifier specified:
> > > > >
> > > > >
> > > > > 3) Then i tried with (col2 as col3)
> > > > >    --columns " col1,col2, col2 as col3, col4"
> > > > >
> > > > > ERROR tool.ImportTool: Imported Failed: Column name 'authid uuid'
> > not in
> > > > > table
> > > > >
> > > > > Could anybody suggest workaround for this.
> > > > >
> > > > > Thanks
> > > >
> >

Re: Sqoop import - column removed in the source DB and all the records are off by 1 column.

Posted by Anandha L Ranganathan <an...@gmail.com>.

There are couple of options I have to specify explicitly to make it work
using query option 1) target-dir 2) split-by OR --m

We import the data to Hive by hourly partition.


sqoop imprt --options-file "credential.txt"
                  --query "select col1,col2,'col3' as col3 ,col10 from
TABLE where
                    created between 1353960000000 and 1353963600000
\$CONDITION"
                  --target-dir='/tmp/TABLENAME/12-01-2002-00-00-00'
                  --hive-import
                   --hive-partition-key part     --hive-partition-value
'PARTITION_VALUE'
                 --hive-overwrite
                 --hive-delims-replacement
                 --split-by created  (OR) --m=1


Previously we were using --table option so we don't have to specify column
name, target directory and split-by option.

By implementing this the existing schema is intact in Hive and no changes
required.



On Sun, Dec 2, 2012 at 9:11 AM, Jarek Jarcec Cecho <ja...@apache.org>wrote:

> Hi Anad,
> would you mind providing details on your question about partition support
> in query argument? I'm not sure what you mean and how it connects to
> something that is supported by --table argument.
>
> Jarcec
>
> On Sat, Dec 01, 2012 at 08:13:34PM -0800, Anandha L Ranganathan wrote:
> > Hi Jarcec,
> >
> > I went with second option.
> >
> > giving "select col1, col2, NULL as col3, col4" was throwing
> > NullPointereException.
> >
> > So I went with
> >
> > "Select col1,col2,'col3' as col3, col4"
> >
> > Thanks
> > Anand
> >
> >
> >
> >
> >
> > The second option --query looks good but is it support partition ?
> >
> >
> > On Sat, Dec 1, 2012 at 10:03 AM, Jarek Jarcec Cecho <jarcec@apache.org
> >wrote:
> >
> > > Hi Anandha,
> > > I do have couple of suggestions that could help in your scenario:
> > >
> > > 1) Rename your hive table to "TABLENAME_OLD" or something and let Sqoop
> > > create new hive table with proper schema. This way you will not loose
> any
> > > data nor appropriate schema, however you will end up with two tables.
> > >
> > > 2) Change your table based import to query based import and select one
> > > column extra:
> > >
> > > --query "select col1, col2, NULL as col3, col4, ..."
> > >
> > > Jarcec
> > >
> > > On Fri, Nov 30, 2012 at 06:15:02PM -0800, Anandha L Ranganathan wrote:
> > > > In my Sqoop import one of the column in the source table got deleted
> and
> > > > that resulting in data issue. This resulting   data are off by 1
> column.
> > > > The removed column was in the middle of the schema. If it were last
> > > column
> > > > then wouldn't have any worries.
> > > >
> > > > Data is imported  from MySql to Hive using Sqoop.  I am using
> > >  sqoop-1.3.0
> > > >
> > > > Here is the syntax.
> > > >
> > > > sqoop import     --hive-import
> > > >                 --options-file 'credential.txt'
> > > >                 --table 'TABLENAME '
> > > >                 --where 'created between 1353960000000 and
> 1353963600000'
> > > >                 --hive-partition-key part
> > > >                 --hive-partition-value 'PARTITION_VALUE'
> > > >                 --hive-overwrite
> > > >                 --hive-delims-replacement
> > > >
> > > >
> > > >
> > > > Now the problem is One of the column in the source DB got removed.
> > > >
> > > > I tried with workaround by including  the --columns
> > > >
> > > >  1) By hardcoding third column with quotes.
> > > >         --columns "col1,col2,'col3' as col3,col4"
> > > >
> > > > but this gives error Column name 'col3' not in table
> > > >
> > > >
> > > > 2) Then i tried with (col2 repeated twice)
> > > >    --columns " col1,col2, col2 , col4"
> > > >
> > > > It threw an error
> > > >    Imported Failed: Duplicate Column identifier specified:
> > > >
> > > >
> > > > 3) Then i tried with (col2 as col3)
> > > >    --columns " col1,col2, col2 as col3, col4"
> > > >
> > > > ERROR tool.ImportTool: Imported Failed: Column name 'authid uuid'
> not in
> > > > table
> > > >
> > > > Could anybody suggest workaround for this.
> > > >
> > > > Thanks
> > >
>

Re: Sqoop import - column removed in the source DB and all the records are off by 1 column.

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

Hi Anad,
would you mind providing details on your question about partition support in query argument? I'm not sure what you mean and how it connects to something that is supported by --table argument.

Jarcec

On Sat, Dec 01, 2012 at 08:13:34PM -0800, Anandha L Ranganathan wrote:
> Hi Jarcec,
> 
> I went with second option.
> 
> giving "select col1, col2, NULL as col3, col4" was throwing
> NullPointereException.
> 
> So I went with
> 
> "Select col1,col2,'col3' as col3, col4"
> 
> Thanks
> Anand
> 
> 
> 
> 
> 
> The second option --query looks good but is it support partition ?
> 
> 
> On Sat, Dec 1, 2012 at 10:03 AM, Jarek Jarcec Cecho <ja...@apache.org>wrote:
> 
> > Hi Anandha,
> > I do have couple of suggestions that could help in your scenario:
> >
> > 1) Rename your hive table to "TABLENAME_OLD" or something and let Sqoop
> > create new hive table with proper schema. This way you will not loose any
> > data nor appropriate schema, however you will end up with two tables.
> >
> > 2) Change your table based import to query based import and select one
> > column extra:
> >
> > --query "select col1, col2, NULL as col3, col4, ..."
> >
> > Jarcec
> >
> > On Fri, Nov 30, 2012 at 06:15:02PM -0800, Anandha L Ranganathan wrote:
> > > In my Sqoop import one of the column in the source table got deleted and
> > > that resulting in data issue. This resulting   data are off by 1 column.
> > > The removed column was in the middle of the schema. If it were last
> > column
> > > then wouldn't have any worries.
> > >
> > > Data is imported  from MySql to Hive using Sqoop.  I am using
> >  sqoop-1.3.0
> > >
> > > Here is the syntax.
> > >
> > > sqoop import     --hive-import
> > >                 --options-file 'credential.txt'
> > >                 --table 'TABLENAME '
> > >                 --where 'created between 1353960000000 and 1353963600000'
> > >                 --hive-partition-key part
> > >                 --hive-partition-value 'PARTITION_VALUE'
> > >                 --hive-overwrite
> > >                 --hive-delims-replacement
> > >
> > >
> > >
> > > Now the problem is One of the column in the source DB got removed.
> > >
> > > I tried with workaround by including  the --columns
> > >
> > >  1) By hardcoding third column with quotes.
> > >         --columns "col1,col2,'col3' as col3,col4"
> > >
> > > but this gives error Column name 'col3' not in table
> > >
> > >
> > > 2) Then i tried with (col2 repeated twice)
> > >    --columns " col1,col2, col2 , col4"
> > >
> > > It threw an error
> > >    Imported Failed: Duplicate Column identifier specified:
> > >
> > >
> > > 3) Then i tried with (col2 as col3)
> > >    --columns " col1,col2, col2 as col3, col4"
> > >
> > > ERROR tool.ImportTool: Imported Failed: Column name 'authid uuid' not in
> > > table
> > >
> > > Could anybody suggest workaround for this.
> > >
> > > Thanks
> >

Re: Sqoop import - column removed in the source DB and all the records are off by 1 column.

Posted by Anandha L Ranganathan <an...@gmail.com>.

Hi Jarcec,

I went with second option.

giving "select col1, col2, NULL as col3, col4" was throwing
NullPointereException.

So I went with

"Select col1,col2,'col3' as col3, col4"

Thanks
Anand





The second option --query looks good but is it support partition ?


On Sat, Dec 1, 2012 at 10:03 AM, Jarek Jarcec Cecho <ja...@apache.org>wrote:

> Hi Anandha,
> I do have couple of suggestions that could help in your scenario:
>
> 1) Rename your hive table to "TABLENAME_OLD" or something and let Sqoop
> create new hive table with proper schema. This way you will not loose any
> data nor appropriate schema, however you will end up with two tables.
>
> 2) Change your table based import to query based import and select one
> column extra:
>
> --query "select col1, col2, NULL as col3, col4, ..."
>
> Jarcec
>
> On Fri, Nov 30, 2012 at 06:15:02PM -0800, Anandha L Ranganathan wrote:
> > In my Sqoop import one of the column in the source table got deleted and
> > that resulting in data issue. This resulting   data are off by 1 column.
> > The removed column was in the middle of the schema. If it were last
> column
> > then wouldn't have any worries.
> >
> > Data is imported  from MySql to Hive using Sqoop.  I am using
>  sqoop-1.3.0
> >
> > Here is the syntax.
> >
> > sqoop import     --hive-import
> >                 --options-file 'credential.txt'
> >                 --table 'TABLENAME '
> >                 --where 'created between 1353960000000 and 1353963600000'
> >                 --hive-partition-key part
> >                 --hive-partition-value 'PARTITION_VALUE'
> >                 --hive-overwrite
> >                 --hive-delims-replacement
> >
> >
> >
> > Now the problem is One of the column in the source DB got removed.
> >
> > I tried with workaround by including  the --columns
> >
> >  1) By hardcoding third column with quotes.
> >         --columns "col1,col2,'col3' as col3,col4"
> >
> > but this gives error Column name 'col3' not in table
> >
> >
> > 2) Then i tried with (col2 repeated twice)
> >    --columns " col1,col2, col2 , col4"
> >
> > It threw an error
> >    Imported Failed: Duplicate Column identifier specified:
> >
> >
> > 3) Then i tried with (col2 as col3)
> >    --columns " col1,col2, col2 as col3, col4"
> >
> > ERROR tool.ImportTool: Imported Failed: Column name 'authid uuid' not in
> > table
> >
> > Could anybody suggest workaround for this.
> >
> > Thanks
>

Re: Sqoop import - column removed in the source DB and all the records are off by 1 column.

Posted by Jarek Jarcec Cecho <ja...@apache.org>.

Hi Anandha,
I do have couple of suggestions that could help in your scenario:

1) Rename your hive table to "TABLENAME_OLD" or something and let Sqoop create new hive table with proper schema. This way you will not loose any data nor appropriate schema, however you will end up with two tables.

2) Change your table based import to query based import and select one column extra:

--query "select col1, col2, NULL as col3, col4, ..."

Jarcec

On Fri, Nov 30, 2012 at 06:15:02PM -0800, Anandha L Ranganathan wrote:
> In my Sqoop import one of the column in the source table got deleted and
> that resulting in data issue. This resulting   data are off by 1 column.
> The removed column was in the middle of the schema. If it were last column
> then wouldn't have any worries.
> 
> Data is imported  from MySql to Hive using Sqoop.  I am using  sqoop-1.3.0
> 
> Here is the syntax.
> 
> sqoop import     --hive-import
>                 --options-file 'credential.txt'
>                 --table 'TABLENAME '
>                 --where 'created between 1353960000000 and 1353963600000'
>                 --hive-partition-key part
>                 --hive-partition-value 'PARTITION_VALUE'
>                 --hive-overwrite
>                 --hive-delims-replacement
> 
> 
> 
> Now the problem is One of the column in the source DB got removed.
> 
> I tried with workaround by including  the --columns
> 
>  1) By hardcoding third column with quotes.
>         --columns "col1,col2,'col3' as col3,col4"
> 
> but this gives error Column name 'col3' not in table
> 
> 
> 2) Then i tried with (col2 repeated twice)
>    --columns " col1,col2, col2 , col4"
> 
> It threw an error
>    Imported Failed: Duplicate Column identifier specified:
> 
> 
> 3) Then i tried with (col2 as col3)
>    --columns " col1,col2, col2 as col3, col4"
> 
> ERROR tool.ImportTool: Imported Failed: Column name 'authid uuid' not in
> table
> 
> Could anybody suggest workaround for this.
> 
> Thanks