You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by "Venkat, Ankam" <An...@centurylink.com> on 2014/08/05 19:55:51 UTC

Import Partitions from Oracle to Hive Partitions

I am trying to import  partitions from Oracle table to Hive partitions.

Can somebody provide the syntax using regular JDBC connector and Oraoop connector?

Thanks in advance.

Regards,
Venkat

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

Venkat,

I ran with --verbose and found below error in log.  

2014-08-20 09:16:47,887 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRA
ppMaster
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveStorageHandler

Regards,
Venkat


-----Original Message-----
From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com] 
Sent: Tuesday, August 19, 2014 12:09 PM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

You can use oraoop with hcatalog imports and exports.  My only guess is launcher is failing because your path is not setup right.

You can precreate the hive table with the partition keys similar to the oracle table (not all Oracle DB partitioning modes will be good and if you have composite partitioning that may not map well in all cases).

Here is a simple example.  So, your partitioning key is country_code
and you have one for partition for each country code.   Then you will
create the hive table with the same partitioning key and use hcatalog imports.

If you want hcatalog imports to create the table, you have to explicitly pass --create-hcatalog-table option.

Can you post the full log with --verbose option from sqoop also to get better idea on what the issue is

Thanks

Venkat

On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam <An...@centurylink.com> wrote:
> Thanks Venkat for the inputs.  I just tried HCat import and got errors...
>
> sqoop import  \
> -Doraoop.disabled=true  \
> --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \ --table ENS.CUSTOMER 
> \ --hcatalog-table CUSTOMER \ --username myid \ --password mypwd
>
> 14/08/19 11:57:33 INFO mapreduce.Job: Running job: 
> job_1405966899667_0869
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 
> running in uber mode : false
> 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed with state FAILED due to: Application application_1405966899667_0869 failed 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>         at org.apache.hadoop.util.Shell.run(Shell.java:418)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> Can you please send me the syntax for importing Oracle partitions to Hive partitions?  Also, Can I use Oraoop?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
> Sent: Tuesday, August 19, 2014 10:45 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> What is the partitioning key in your Oracle table?   If you specify
> the same as partitioning keys on the hive table, then doing a simple hcatalog import from the oracle table to hive will automatically handle all the partitioning (even multiple level).
>
> You are right that Sqoop supported only one level of partitioning key but with 1.4.5 we allowed nesting (multiple static partition keys).
> Multiple dynamic partition keys were supported with hcatalog from
> 1.4.4
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <An...@centurylink.com> wrote:
>> David/Gwen,
>>
>> I have an issue with importing Oracle partitions to match Hadoop Files.
>>
>> Sqoop command:
>>
>> sqoop import  \
>> -Doraoop.disabled=false  \
>> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CU
>> S 
>> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOME
>> R 
>> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14
>> , 
>> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUST
>> O 
>> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_
>> P 
>> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,C
>> U 
>> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOM
>> E
>> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P4
>> 1 
>> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUS
>> T 
>> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER
>> _ 
>> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,
>> C 
>> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTO
>> M
>> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P
>> 6 
>> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CU
>> S TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \ 
>> -Doraoop.import.consistent.read=true \ --connect 
>> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \ 
>> --username MyID \ --password MyPWD \ -m 75
>>
>> Issues:
>> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition.
>> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name.
>>
>> Any workaround for this?
>>
>> Regards,
>> Venkat
>>
>>
>> -----Original Message-----
>> From: David Robson [mailto:David.Robson@software.dell.com]
>> Sent: Tuesday, August 05, 2014 6:08 PM
>> To: user@sqoop.apache.org
>> Subject: RE: Import Partitions from Oracle to Hive Partitions
>>
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table 
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
>> When using double quotes the entire list of partition names must be enclosed in single quotes.
>> If the last partition name in the list is double quoted then there 
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>>> test it out when I get into the office and get back to you. But this 
>>> option will make it so there is one file for each Oracle partition – 
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the 
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle 
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to 
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and 
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Import Partitions from Oracle to Hive Partitions

Posted by Venkat Ranganathan <vr...@hortonworks.com>.

Not sure what you are asking.   As you have seen, hive partitioning and
Oracle partitioning don't work the same way.

Oracle data distribution function is not something that is available to
Sqoop/hcatalog.   If you want Hcatalog partitions to mimic the Oracle
partitioning scheme, one option would be to use a computed column which
returns the partition id  similar to oracle partitioning (say using a case
expression) and use the computed column as the partitioning column on
hcatalog.

But if you use hash partitioning on Oracle, the hashing function is not
something you may know and you can't mimic this.

An alternate approach may be to do this based on the partition name for
each row in the table and then use the partition name.   If you use
composite partitioning you may have more items to consider.

Venkat



On Fri, Sep 19, 2014 at 2:22 PM, Venkat, Ankam <Ankam.Venkat@centurylink.com
> wrote:

>  Venkat,
>
>
>
> Can this be achieved with HCAT?
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* Venkat, Ankam
> *Sent:* Friday, September 12, 2014 4:06 PM
>
> *To:* 'user@sqoop.apache.org'
> *Subject:* RE: Import Partitions from Oracle to Hive Partitions
>
>
>
> Venkat,
>
>
>
> Sorry, I think I am doing something wrong.   Let me explain the scenario.
>
>
>
> *Source:*
>
> Oracle table name:   ens.customer
>
> Key: cust_id + load_date + load_exp_dt
>
> Data is partitioned on: cust_id column
>
> Number of partitions: 75
>
> Names of partitions: CUSTOMER_P01, CUSTOMER_P02………CUSTOMER_P75.   Each
> partition has a range of cust_id column data.
>
>
>
> *Target:*
>
> Hive Table name:  cust_hcat
>
> Partition Key:  cust_part
>
> Partition Value: P01, P02………P75
>
> Number of partitions: 75
>
>
>
> I would like to sqoop import 75 partitions of Oracle table and organize
> them in same way in Hive.
>
>
>
> /user/hive/warehouse/cust_hcat/cust_part=p01/data-files…
>
>
>                           /cust_part=p02/data-files…
>
>     .
>
>     .
>
>     .
>
>    /cust_part=p75/data-files…
>
>
>
>
>
> Regards,
>
> Venkat
>
>
>
> *From:* Venkat Ranganathan [mailto:vranganathan@hortonworks.com
> <vr...@hortonworks.com>]
> *Sent:* Friday, September 12, 2014 2:50 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Import Partitions from Oracle to Hive Partitions
>
>
>
> You can't have the column repeated in hcat in both cases.   It is not
> valid.    You specify it as partitioning column alone in hive.
>
>
>
> That said, are you sure you want to partition by cust_id on hcat.   That
> will be one hive partition by cust_id - I am assuming cust_id is the
> customer_id.
>
>
>
> The option might be to use load_date as the partition?   that way you will
> have one hcat partition by load dates.
>
>
>
> Not sure if that is what you are using.
>
>
>
> Venkat
>
>
>
> On Fri, Sep 12, 2014 at 10:54 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
>
> Venkat,
>
> When I am trying to create a table with below command
>
> hcat -e "create table cust_hcat (
> cust_id                 string,
> load_date               string,
> load_exp_date           string,
> contact_telno           string,
> contact_tn_extno        string,
> current_flag            string,
> cpni_ind                string,
> cpni_email              string )
> PARTITIONED BY (cust_id string) stored as rcfile;"
>
> I am getting error -  FAILED: SemanticException [Error 10035]: Column
> repeated in partitioning columns
>
> Oracle table is partitioned on cust_id and primary key is cust_id +
> load_date + load_exp_dt.
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
>
> Sent: Tuesday, August 19, 2014 12:09 PM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> You can use oraoop with hcatalog imports and exports.  My only guess is
> launcher is failing because your path is not setup right.
>
> You can precreate the hive table with the partition keys similar to the
> oracle table (not all Oracle DB partitioning modes will be good and if you
> have composite partitioning that may not map well in all cases).
>
> Here is a simple example.  So, your partitioning key is country_code
> and you have one for partition for each country code.   Then you will
> create the hive table with the same partitioning key and use hcatalog
> imports.
>
> If you want hcatalog imports to create the table, you have to explicitly
> pass --create-hcatalog-table option.
>
> Can you post the full log with --verbose option from sqoop also to get
> better idea on what the issue is
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
> > Thanks Venkat for the inputs.  I just tried HCat import and got errors...
> >
> > sqoop import  \
> > -Doraoop.disabled=true  \
> > --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \ --table ENS.CUSTOMER
> > \ --hcatalog-table CUSTOMER \ --username myid \ --password mypwd
> >
> > 14/08/19 11:57:33 INFO mapreduce.Job: Running job:
> > job_1405966899667_0869
> > 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869
> > running in uber mode : false
> > 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> > 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed
> with state FAILED due to: Application application_1405966899667_0869 failed
> 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> > org.apache.hadoop.util.Shell$ExitCodeException:
> >         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
> >         at org.apache.hadoop.util.Shell.run(Shell.java:418)
> >         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
> >         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> >         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
> >         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:744)
> >
> > Can you please send me the syntax for importing Oracle partitions to
> Hive partitions?  Also, Can I use Oraoop?
> >
> > Regards,
> > Venkat
> >
> > -----Original Message-----
> > From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
> > Sent: Tuesday, August 19, 2014 10:45 AM
> > To: user@sqoop.apache.org
> > Subject: Re: Import Partitions from Oracle to Hive Partitions
> >
> > What is the partitioning key in your Oracle table?   If you specify
> > the same as partitioning keys on the hive table, then doing a simple
> hcatalog import from the oracle table to hive will automatically handle all
> the partitioning (even multiple level).
> >
> > You are right that Sqoop supported only one level of partitioning key
> but with 1.4.5 we allowed nesting (multiple static partition keys).
> > Multiple dynamic partition keys were supported with hcatalog from
> > 1.4.4
> >
> > Thanks
> >
> > Venkat
> >
> > On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
> >> David/Gwen,
> >>
> >> I have an issue with importing Oracle partitions to match Hadoop Files.
> >>
> >> Sqoop command:
> >>
> >> sqoop import  \
> >> -Doraoop.disabled=false  \
>
> >> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CU
> >> S
> >> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOME
> >> R
> >> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14
> >> ,
> >> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUST
> >> O
> >> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_
> >> P
> >> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,C
> >> U
> >> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOM
> >> E
> >> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P4
> >> 1
> >> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUS
> >> T
> >> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER
> >> _
> >> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,
> >> C
> >> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTO
> >> M
> >> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P
> >> 6
> >> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CU
>
> >> S TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \
>
> >> -Doraoop.import.consistent.read=true \ --connect
> >> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \
> >> --username MyID \ --password MyPWD \ -m 75
> >>
> >> Issues:
> >> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported
> because 14 partitions are empty. I have no way to identify the file belongs
> to which partition.
> >> 2) All the partitions are imported with names as PART-00001, PART-00002
> and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the
> hadoop files to Oracle Partions.  I mean import as HDFS files with same
> name.
> >>
> >> Any workaround for this?
> >>
> >> Regards,
> >> Venkat
> >>
> >>
> >> -----Original Message-----
> >> From: David Robson [mailto:David.Robson@software.dell.com]
> >> Sent: Tuesday, August 05, 2014 6:08 PM
> >> To: user@sqoop.apache.org
> >> Subject: RE: Import Partitions from Oracle to Hive Partitions
> >>
> >> Yes now that you mention Sqoop is limited to one partition in Hive I do
> remember that! I would think we could modify Sqoop to create subfolders for
> each partition - instead of how it now creates a separate file for each
> partition? This would probably be limited to the direct (OraOop) connector
> as it is aware of partitions (existing connector doesn't read data
> dictionary directly).
> >>
> >> In the meantime Venkat - you could look at the option I mentioned -
> then manually move the files into separate folders - at least you'll have
> each partition in a separate file rather than spread throughout all files.
> The other thing you could look at is the option below - you could run one
> Sqoop job per partition:
> >>
> >> Specify The Partitions To Import
> >>
> >> -Doraoop.import.partitions=PartitionA,PartitionB --table
> >> OracleTableName
> >>
> >> Imports PartitionA and PartitionB of OracleTableName.
> >>
> >> Notes:
> >> You can enclose an individual partition name in double quotes to retain
> the letter case or if the name has special characters.
> >> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
> OracleTableName If the partition name is not double quoted then its name
> will be automatically converted to upper case, PARTITIONB for above.
> >> When using double quotes the entire list of partition names must be
> enclosed in single quotes.
> >> If the last partition name in the list is double quoted then there
> >> must be a comma at the end of the list.
> >> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
> >> OracleTableName
> >>
> >> Name each partition to be included. There is no facility to provide a
> range of partition names.
> >>
> >> There is no facility to define sub partitions. The entire partition is
> included/excluded as per the filter.
> >>
> >>
> >> -----Original Message-----
> >> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> >> Sent: Wednesday, 6 August 2014 8:44 AM
> >> To: user@sqoop.apache.org
> >> Subject: Re: Import Partitions from Oracle to Hive Partitions
> >>
> >> Hive expects a directory for each partition, so getting data with
> OraOop will require some post-processing - copy files into properly named
> directories and adding the new partitions to a hive table.
> >>
> >> Sqoop has the --hive-partition-key and --hive-partition-value, but this
> assumes that all the data sqooped will fit into a single partition.
> >>
> >>
> >> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <
> David.Robson@software.dell.com> wrote:
> >>> Hi Venkat,
> >>>
> >>>
> >>>
> >>> I’m not sure what this will do in regards to Hive partitions – I’ll
> >>> test it out when I get into the office and get back to you. But this
> >>> option will make it so there is one file for each Oracle partition –
> >>> which might be of interest to you.
> >>>
> >>>
> >>>
> >>> Match Hadoop Files to Oracle Table Partitions
> >>>
> >>>
> >>>
> >>> -Doraoop.chunk.method={ROWID|PARTITION}
> >>>
> >>>
> >>>
> >>> To import data from a partitioned table in such a way that the
> >>> resulting HDFS folder structure in
> >>>
> >>> Hadoop will match the table’s partitions, set the chunk method to
> PARTITION.
> >>> The alternative
> >>>
> >>> (default) chunk method is ROWID.
> >>>
> >>>
> >>>
> >>> Notes:
> >>>
> >>> l For the number of Hadoop files to match the number of Oracle
> >>> partitions, set the number
> >>>
> >>> of mappers to be greater than or equal to the number of partitions.
> >>>
> >>> l If the table is not partitioned then value PARTITION will lead to
> >>> an error.
> >>>
> >>>
> >>>
> >>> David
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
> >>> Sent: Wednesday, 6 August 2014 3:56 AM
> >>> To: 'user@sqoop.apache.org'
> >>> Subject: Import Partitions from Oracle to Hive Partitions
> >>>
> >>>
> >>>
> >>> I am trying to import  partitions from Oracle table to Hive partitions.
> >>>
> >>>
> >>>
> >>> Can somebody provide the syntax using regular JDBC connector and
> >>> Oraoop connector?
> >>>
> >>>
> >>>
> >>> Thanks in advance.
> >>>
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Venkat
> >>>
> >>>
> >>>
> >>>
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

Venkat,

Can this be achieved with HCAT?

Regards,
Venkat

From: Venkat, Ankam
Sent: Friday, September 12, 2014 4:06 PM
To: 'user@sqoop.apache.org'
Subject: RE: Import Partitions from Oracle to Hive Partitions

Venkat,

Sorry, I think I am doing something wrong.   Let me explain the scenario.

Source:
Oracle table name:   ens.customer
Key: cust_id + load_date + load_exp_dt
Data is partitioned on: cust_id column
Number of partitions: 75
Names of partitions: CUSTOMER_P01, CUSTOMER_P02………CUSTOMER_P75.   Each partition has a range of cust_id column data.

Target:
Hive Table name:  cust_hcat
Partition Key:  cust_part
Partition Value: P01, P02………P75
Number of partitions: 75

I would like to sqoop import 75 partitions of Oracle table and organize them in same way in Hive.

/user/hive/warehouse/cust_hcat/cust_part=p01/data-files…
                                                                    /cust_part=p02/data-files…
    .
    .
    .
   /cust_part=p75/data-files…


Regards,
Venkat

From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
Sent: Friday, September 12, 2014 2:50 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Import Partitions from Oracle to Hive Partitions

You can't have the column repeated in hcat in both cases.   It is not valid.    You specify it as partitioning column alone in hive.

That said, are you sure you want to partition by cust_id on hcat.   That will be one hive partition by cust_id - I am assuming cust_id is the customer_id.

The option might be to use load_date as the partition?   that way you will have one hcat partition by load dates.

Not sure if that is what you are using.

Venkat

On Fri, Sep 12, 2014 at 10:54 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
Venkat,

When I am trying to create a table with below command

hcat -e "create table cust_hcat (
cust_id                 string,
load_date               string,
load_exp_date           string,
contact_telno           string,
contact_tn_extno        string,
current_flag            string,
cpni_ind                string,
cpni_email              string )
PARTITIONED BY (cust_id string) stored as rcfile;"

I am getting error -  FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

Oracle table is partitioned on cust_id and primary key is cust_id + load_date + load_exp_dt.

Regards,
Venkat

-----Original Message-----
From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com<ma...@hortonworks.com>]
Sent: Tuesday, August 19, 2014 12:09 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Import Partitions from Oracle to Hive Partitions

You can use oraoop with hcatalog imports and exports.  My only guess is launcher is failing because your path is not setup right.

You can precreate the hive table with the partition keys similar to the oracle table (not all Oracle DB partitioning modes will be good and if you have composite partitioning that may not map well in all cases).

Here is a simple example.  So, your partitioning key is country_code
and you have one for partition for each country code.   Then you will
create the hive table with the same partitioning key and use hcatalog imports.

If you want hcatalog imports to create the table, you have to explicitly pass --create-hcatalog-table option.

Can you post the full log with --verbose option from sqoop also to get better idea on what the issue is

Thanks

Venkat

On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
> Thanks Venkat for the inputs.  I just tried HCat import and got errors...
>
> sqoop import  \
> -Doraoop.disabled=true  \
> --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \ --table ENS.CUSTOMER
> \ --hcatalog-table CUSTOMER \ --username myid \ --password mypwd
>
> 14/08/19 11:57:33 INFO mapreduce.Job: Running job:
> job_1405966899667_0869
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869
> running in uber mode : false
> 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed with state FAILED due to: Application application_1405966899667_0869 failed 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>         at org.apache.hadoop.util.Shell.run(Shell.java:418)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> Can you please send me the syntax for importing Oracle partitions to Hive partitions?  Also, Can I use Oraoop?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com<ma...@hortonworks.com>]
> Sent: Tuesday, August 19, 2014 10:45 AM
> To: user@sqoop.apache.org<ma...@sqoop.apache.org>
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> What is the partitioning key in your Oracle table?   If you specify
> the same as partitioning keys on the hive table, then doing a simple hcatalog import from the oracle table to hive will automatically handle all the partitioning (even multiple level).
>
> You are right that Sqoop supported only one level of partitioning key but with 1.4.5 we allowed nesting (multiple static partition keys).
> Multiple dynamic partition keys were supported with hcatalog from
> 1.4.4
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
>> David/Gwen,
>>
>> I have an issue with importing Oracle partitions to match Hadoop Files.
>>
>> Sqoop command:
>>
>> sqoop import  \
>> -Doraoop.disabled=false  \
>> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CU
>> S
>> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOME
>> R
>> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14
>> ,
>> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUST
>> O
>> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_
>> P
>> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,C
>> U
>> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOM
>> E
>> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P4
>> 1
>> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUS
>> T
>> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER
>> _
>> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,
>> C
>> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTO
>> M
>> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P
>> 6
>> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CU
>> S TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \
>> -Doraoop.import.consistent.read=true \ --connect
>> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \
>> --username MyID \ --password MyPWD \ -m 75
>>
>> Issues:
>> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition.
>> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name.
>>
>> Any workaround for this?
>>
>> Regards,
>> Venkat
>>
>>
>> -----Original Message-----
>> From: David Robson [mailto:David.Robson@software.dell.com<ma...@software.dell.com>]
>> Sent: Tuesday, August 05, 2014 6:08 PM
>> To: user@sqoop.apache.org<ma...@sqoop.apache.org>
>> Subject: RE: Import Partitions from Oracle to Hive Partitions
>>
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
>> When using double quotes the entire list of partition names must be enclosed in single quotes.
>> If the last partition name in the list is double quoted then there
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com<ma...@cloudera.com>]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org<ma...@sqoop.apache.org>
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com>> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>> test it out when I get into the office and get back to you. But this
>>> option will make it so there is one file for each Oracle partition –
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com<ma...@centurylink.com>]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org<ma...@sqoop.apache.org>'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

Venkat,

Sorry, I think I am doing something wrong.   Let me explain the scenario.

Source:
Oracle table name:   ens.customer
Key: cust_id + load_date + load_exp_dt
Data is partitioned on: cust_id column
Number of partitions: 75
Names of partitions: CUSTOMER_P01, CUSTOMER_P02………CUSTOMER_P75.   Each partition has a range of cust_id column data.

Target:
Hive Table name:  cust_hcat
Partition Key:  cust_part
Partition Value: P01, P02………P75
Number of partitions: 75

I would like to sqoop import 75 partitions of Oracle table and organize them in same way in Hive.

/user/hive/warehouse/cust_hcat/cust_part=p01/data-files…
                                                                    /cust_part=p02/data-files…
    .
    .
    .
   /cust_part=p75/data-files…


Regards,
Venkat

From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
Sent: Friday, September 12, 2014 2:50 PM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

You can't have the column repeated in hcat in both cases.   It is not valid.    You specify it as partitioning column alone in hive.

That said, are you sure you want to partition by cust_id on hcat.   That will be one hive partition by cust_id - I am assuming cust_id is the customer_id.

The option might be to use load_date as the partition?   that way you will have one hcat partition by load dates.

Not sure if that is what you are using.

Venkat

On Fri, Sep 12, 2014 at 10:54 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
Venkat,

When I am trying to create a table with below command

hcat -e "create table cust_hcat (
cust_id                 string,
load_date               string,
load_exp_date           string,
contact_telno           string,
contact_tn_extno        string,
current_flag            string,
cpni_ind                string,
cpni_email              string )
PARTITIONED BY (cust_id string) stored as rcfile;"

I am getting error -  FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

Oracle table is partitioned on cust_id and primary key is cust_id + load_date + load_exp_dt.

Regards,
Venkat

-----Original Message-----
From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com<ma...@hortonworks.com>]
Sent: Tuesday, August 19, 2014 12:09 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Import Partitions from Oracle to Hive Partitions

You can use oraoop with hcatalog imports and exports.  My only guess is launcher is failing because your path is not setup right.

You can precreate the hive table with the partition keys similar to the oracle table (not all Oracle DB partitioning modes will be good and if you have composite partitioning that may not map well in all cases).

Here is a simple example.  So, your partitioning key is country_code
and you have one for partition for each country code.   Then you will
create the hive table with the same partitioning key and use hcatalog imports.

If you want hcatalog imports to create the table, you have to explicitly pass --create-hcatalog-table option.

Can you post the full log with --verbose option from sqoop also to get better idea on what the issue is

Thanks

Venkat

On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
> Thanks Venkat for the inputs.  I just tried HCat import and got errors...
>
> sqoop import  \
> -Doraoop.disabled=true  \
> --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \ --table ENS.CUSTOMER
> \ --hcatalog-table CUSTOMER \ --username myid \ --password mypwd
>
> 14/08/19 11:57:33 INFO mapreduce.Job: Running job:
> job_1405966899667_0869
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869
> running in uber mode : false
> 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed with state FAILED due to: Application application_1405966899667_0869 failed 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>         at org.apache.hadoop.util.Shell.run(Shell.java:418)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> Can you please send me the syntax for importing Oracle partitions to Hive partitions?  Also, Can I use Oraoop?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com<ma...@hortonworks.com>]
> Sent: Tuesday, August 19, 2014 10:45 AM
> To: user@sqoop.apache.org<ma...@sqoop.apache.org>
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> What is the partitioning key in your Oracle table?   If you specify
> the same as partitioning keys on the hive table, then doing a simple hcatalog import from the oracle table to hive will automatically handle all the partitioning (even multiple level).
>
> You are right that Sqoop supported only one level of partitioning key but with 1.4.5 we allowed nesting (multiple static partition keys).
> Multiple dynamic partition keys were supported with hcatalog from
> 1.4.4
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <An...@centurylink.com>> wrote:
>> David/Gwen,
>>
>> I have an issue with importing Oracle partitions to match Hadoop Files.
>>
>> Sqoop command:
>>
>> sqoop import  \
>> -Doraoop.disabled=false  \
>> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CU
>> S
>> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOME
>> R
>> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14
>> ,
>> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUST
>> O
>> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_
>> P
>> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,C
>> U
>> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOM
>> E
>> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P4
>> 1
>> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUS
>> T
>> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER
>> _
>> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,
>> C
>> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTO
>> M
>> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P
>> 6
>> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CU
>> S TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \
>> -Doraoop.import.consistent.read=true \ --connect
>> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \
>> --username MyID \ --password MyPWD \ -m 75
>>
>> Issues:
>> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition.
>> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name.
>>
>> Any workaround for this?
>>
>> Regards,
>> Venkat
>>
>>
>> -----Original Message-----
>> From: David Robson [mailto:David.Robson@software.dell.com<ma...@software.dell.com>]
>> Sent: Tuesday, August 05, 2014 6:08 PM
>> To: user@sqoop.apache.org<ma...@sqoop.apache.org>
>> Subject: RE: Import Partitions from Oracle to Hive Partitions
>>
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
>> When using double quotes the entire list of partition names must be enclosed in single quotes.
>> If the last partition name in the list is double quoted then there
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com<ma...@cloudera.com>]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org<ma...@sqoop.apache.org>
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com>> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>> test it out when I get into the office and get back to you. But this
>>> option will make it so there is one file for each Oracle partition –
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com<ma...@centurylink.com>]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org<ma...@sqoop.apache.org>'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Import Partitions from Oracle to Hive Partitions

Posted by Venkat Ranganathan <vr...@hortonworks.com>.

You can't have the column repeated in hcat in both cases.   It is not
valid.    You specify it as partitioning column alone in hive.

That said, are you sure you want to partition by cust_id on hcat.   That
will be one hive partition by cust_id - I am assuming cust_id is the
customer_id.

The option might be to use load_date as the partition?   that way you will
have one hcat partition by load dates.

Not sure if that is what you are using.

Venkat

On Fri, Sep 12, 2014 at 10:54 AM, Venkat, Ankam <
Ankam.Venkat@centurylink.com> wrote:

> Venkat,
>
> When I am trying to create a table with below command
>
> hcat -e "create table cust_hcat (
> cust_id                 string,
> load_date               string,
> load_exp_date           string,
> contact_telno           string,
> contact_tn_extno        string,
> current_flag            string,
> cpni_ind                string,
> cpni_email              string )
> PARTITIONED BY (cust_id string) stored as rcfile;"
>
> I am getting error -  FAILED: SemanticException [Error 10035]: Column
> repeated in partitioning columns
>
> Oracle table is partitioned on cust_id and primary key is cust_id +
> load_date + load_exp_dt.
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
> Sent: Tuesday, August 19, 2014 12:09 PM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> You can use oraoop with hcatalog imports and exports.  My only guess is
> launcher is failing because your path is not setup right.
>
> You can precreate the hive table with the partition keys similar to the
> oracle table (not all Oracle DB partitioning modes will be good and if you
> have composite partitioning that may not map well in all cases).
>
> Here is a simple example.  So, your partitioning key is country_code
> and you have one for partition for each country code.   Then you will
> create the hive table with the same partitioning key and use hcatalog
> imports.
>
> If you want hcatalog imports to create the table, you have to explicitly
> pass --create-hcatalog-table option.
>
> Can you post the full log with --verbose option from sqoop also to get
> better idea on what the issue is
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
> > Thanks Venkat for the inputs.  I just tried HCat import and got errors...
> >
> > sqoop import  \
> > -Doraoop.disabled=true  \
> > --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \ --table ENS.CUSTOMER
> > \ --hcatalog-table CUSTOMER \ --username myid \ --password mypwd
> >
> > 14/08/19 11:57:33 INFO mapreduce.Job: Running job:
> > job_1405966899667_0869
> > 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869
> > running in uber mode : false
> > 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> > 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed
> with state FAILED due to: Application application_1405966899667_0869 failed
> 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> > org.apache.hadoop.util.Shell$ExitCodeException:
> >         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
> >         at org.apache.hadoop.util.Shell.run(Shell.java:418)
> >         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
> >         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> >         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
> >         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:744)
> >
> > Can you please send me the syntax for importing Oracle partitions to
> Hive partitions?  Also, Can I use Oraoop?
> >
> > Regards,
> > Venkat
> >
> > -----Original Message-----
> > From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
> > Sent: Tuesday, August 19, 2014 10:45 AM
> > To: user@sqoop.apache.org
> > Subject: Re: Import Partitions from Oracle to Hive Partitions
> >
> > What is the partitioning key in your Oracle table?   If you specify
> > the same as partitioning keys on the hive table, then doing a simple
> hcatalog import from the oracle table to hive will automatically handle all
> the partitioning (even multiple level).
> >
> > You are right that Sqoop supported only one level of partitioning key
> but with 1.4.5 we allowed nesting (multiple static partition keys).
> > Multiple dynamic partition keys were supported with hcatalog from
> > 1.4.4
> >
> > Thanks
> >
> > Venkat
> >
> > On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <
> Ankam.Venkat@centurylink.com> wrote:
> >> David/Gwen,
> >>
> >> I have an issue with importing Oracle partitions to match Hadoop Files.
> >>
> >> Sqoop command:
> >>
> >> sqoop import  \
> >> -Doraoop.disabled=false  \
> >> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CU
> >> S
> >> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOME
> >> R
> >> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14
> >> ,
> >> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUST
> >> O
> >> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_
> >> P
> >> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,C
> >> U
> >> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOM
> >> E
> >> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P4
> >> 1
> >> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUS
> >> T
> >> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER
> >> _
> >> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,
> >> C
> >> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTO
> >> M
> >> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P
> >> 6
> >> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CU
> >> S TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \
> >> -Doraoop.import.consistent.read=true \ --connect
> >> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \
> >> --username MyID \ --password MyPWD \ -m 75
> >>
> >> Issues:
> >> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported
> because 14 partitions are empty. I have no way to identify the file belongs
> to which partition.
> >> 2) All the partitions are imported with names as PART-00001, PART-00002
> and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the
> hadoop files to Oracle Partions.  I mean import as HDFS files with same
> name.
> >>
> >> Any workaround for this?
> >>
> >> Regards,
> >> Venkat
> >>
> >>
> >> -----Original Message-----
> >> From: David Robson [mailto:David.Robson@software.dell.com]
> >> Sent: Tuesday, August 05, 2014 6:08 PM
> >> To: user@sqoop.apache.org
> >> Subject: RE: Import Partitions from Oracle to Hive Partitions
> >>
> >> Yes now that you mention Sqoop is limited to one partition in Hive I do
> remember that! I would think we could modify Sqoop to create subfolders for
> each partition - instead of how it now creates a separate file for each
> partition? This would probably be limited to the direct (OraOop) connector
> as it is aware of partitions (existing connector doesn't read data
> dictionary directly).
> >>
> >> In the meantime Venkat - you could look at the option I mentioned -
> then manually move the files into separate folders - at least you'll have
> each partition in a separate file rather than spread throughout all files.
> The other thing you could look at is the option below - you could run one
> Sqoop job per partition:
> >>
> >> Specify The Partitions To Import
> >>
> >> -Doraoop.import.partitions=PartitionA,PartitionB --table
> >> OracleTableName
> >>
> >> Imports PartitionA and PartitionB of OracleTableName.
> >>
> >> Notes:
> >> You can enclose an individual partition name in double quotes to retain
> the letter case or if the name has special characters.
> >> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
> OracleTableName If the partition name is not double quoted then its name
> will be automatically converted to upper case, PARTITIONB for above.
> >> When using double quotes the entire list of partition names must be
> enclosed in single quotes.
> >> If the last partition name in the list is double quoted then there
> >> must be a comma at the end of the list.
> >> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
> >> OracleTableName
> >>
> >> Name each partition to be included. There is no facility to provide a
> range of partition names.
> >>
> >> There is no facility to define sub partitions. The entire partition is
> included/excluded as per the filter.
> >>
> >>
> >> -----Original Message-----
> >> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> >> Sent: Wednesday, 6 August 2014 8:44 AM
> >> To: user@sqoop.apache.org
> >> Subject: Re: Import Partitions from Oracle to Hive Partitions
> >>
> >> Hive expects a directory for each partition, so getting data with
> OraOop will require some post-processing - copy files into properly named
> directories and adding the new partitions to a hive table.
> >>
> >> Sqoop has the --hive-partition-key and --hive-partition-value, but this
> assumes that all the data sqooped will fit into a single partition.
> >>
> >>
> >> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <
> David.Robson@software.dell.com> wrote:
> >>> Hi Venkat,
> >>>
> >>>
> >>>
> >>> I’m not sure what this will do in regards to Hive partitions – I’ll
> >>> test it out when I get into the office and get back to you. But this
> >>> option will make it so there is one file for each Oracle partition –
> >>> which might be of interest to you.
> >>>
> >>>
> >>>
> >>> Match Hadoop Files to Oracle Table Partitions
> >>>
> >>>
> >>>
> >>> -Doraoop.chunk.method={ROWID|PARTITION}
> >>>
> >>>
> >>>
> >>> To import data from a partitioned table in such a way that the
> >>> resulting HDFS folder structure in
> >>>
> >>> Hadoop will match the table’s partitions, set the chunk method to
> PARTITION.
> >>> The alternative
> >>>
> >>> (default) chunk method is ROWID.
> >>>
> >>>
> >>>
> >>> Notes:
> >>>
> >>> l For the number of Hadoop files to match the number of Oracle
> >>> partitions, set the number
> >>>
> >>> of mappers to be greater than or equal to the number of partitions.
> >>>
> >>> l If the table is not partitioned then value PARTITION will lead to
> >>> an error.
> >>>
> >>>
> >>>
> >>> David
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
> >>> Sent: Wednesday, 6 August 2014 3:56 AM
> >>> To: 'user@sqoop.apache.org'
> >>> Subject: Import Partitions from Oracle to Hive Partitions
> >>>
> >>>
> >>>
> >>> I am trying to import  partitions from Oracle table to Hive partitions.
> >>>
> >>>
> >>>
> >>> Can somebody provide the syntax using regular JDBC connector and
> >>> Oraoop connector?
> >>>
> >>>
> >>>
> >>> Thanks in advance.
> >>>
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Venkat
> >>>
> >>>
> >>>
> >>>
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

Venkat,

When I am trying to create a table with below command

hcat -e "create table cust_hcat (
cust_id                 string,
load_date               string,
load_exp_date           string,
contact_telno           string,
contact_tn_extno        string,
current_flag            string,
cpni_ind                string,
cpni_email              string )
PARTITIONED BY (cust_id string) stored as rcfile;"

I am getting error -  FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

Oracle table is partitioned on cust_id and primary key is cust_id + load_date + load_exp_dt.  

Regards,
Venkat

-----Original Message-----
From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com] 
Sent: Tuesday, August 19, 2014 12:09 PM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

You can use oraoop with hcatalog imports and exports.  My only guess is launcher is failing because your path is not setup right.

You can precreate the hive table with the partition keys similar to the oracle table (not all Oracle DB partitioning modes will be good and if you have composite partitioning that may not map well in all cases).

Here is a simple example.  So, your partitioning key is country_code
and you have one for partition for each country code.   Then you will
create the hive table with the same partitioning key and use hcatalog imports.

If you want hcatalog imports to create the table, you have to explicitly pass --create-hcatalog-table option.

Can you post the full log with --verbose option from sqoop also to get better idea on what the issue is

Thanks

Venkat

On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam <An...@centurylink.com> wrote:
> Thanks Venkat for the inputs.  I just tried HCat import and got errors...
>
> sqoop import  \
> -Doraoop.disabled=true  \
> --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \ --table ENS.CUSTOMER 
> \ --hcatalog-table CUSTOMER \ --username myid \ --password mypwd
>
> 14/08/19 11:57:33 INFO mapreduce.Job: Running job: 
> job_1405966899667_0869
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 
> running in uber mode : false
> 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed with state FAILED due to: Application application_1405966899667_0869 failed 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>         at org.apache.hadoop.util.Shell.run(Shell.java:418)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> Can you please send me the syntax for importing Oracle partitions to Hive partitions?  Also, Can I use Oraoop?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
> Sent: Tuesday, August 19, 2014 10:45 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> What is the partitioning key in your Oracle table?   If you specify
> the same as partitioning keys on the hive table, then doing a simple hcatalog import from the oracle table to hive will automatically handle all the partitioning (even multiple level).
>
> You are right that Sqoop supported only one level of partitioning key but with 1.4.5 we allowed nesting (multiple static partition keys).
> Multiple dynamic partition keys were supported with hcatalog from
> 1.4.4
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <An...@centurylink.com> wrote:
>> David/Gwen,
>>
>> I have an issue with importing Oracle partitions to match Hadoop Files.
>>
>> Sqoop command:
>>
>> sqoop import  \
>> -Doraoop.disabled=false  \
>> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CU
>> S 
>> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOME
>> R 
>> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14
>> , 
>> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUST
>> O 
>> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_
>> P 
>> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,C
>> U 
>> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOM
>> E
>> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P4
>> 1 
>> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUS
>> T 
>> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER
>> _ 
>> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,
>> C 
>> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTO
>> M
>> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P
>> 6 
>> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CU
>> S TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \ 
>> -Doraoop.import.consistent.read=true \ --connect 
>> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \ 
>> --username MyID \ --password MyPWD \ -m 75
>>
>> Issues:
>> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition.
>> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name.
>>
>> Any workaround for this?
>>
>> Regards,
>> Venkat
>>
>>
>> -----Original Message-----
>> From: David Robson [mailto:David.Robson@software.dell.com]
>> Sent: Tuesday, August 05, 2014 6:08 PM
>> To: user@sqoop.apache.org
>> Subject: RE: Import Partitions from Oracle to Hive Partitions
>>
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table 
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
>> When using double quotes the entire list of partition names must be enclosed in single quotes.
>> If the last partition name in the list is double quoted then there 
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>>> test it out when I get into the office and get back to you. But this 
>>> option will make it so there is one file for each Oracle partition – 
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the 
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle 
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to 
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and 
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Import Partitions from Oracle to Hive Partitions

Posted by Venkat Ranganathan <vr...@hortonworks.com>.

You can use oraoop with hcatalog imports and exports.  My only guess
is launcher is failing because your path is not setup right.

You can precreate the hive table with the partition keys similar to
the oracle table (not all Oracle DB partitioning modes will be good
and if you have composite partitioning that may not map well in all
cases).

Here is a simple example.  So, your partitioning key is country_code
and you have one for partition for each country code.   Then you will
create the hive table with the same partitioning key and use hcatalog
imports.

If you want hcatalog imports to create the table, you have to
explicitly pass --create-hcatalog-table option.

Can you post the full log with --verbose option from sqoop also to get
better idea on what the issue is

Thanks

Venkat

On Tue, Aug 19, 2014 at 11:02 AM, Venkat, Ankam
<An...@centurylink.com> wrote:
> Thanks Venkat for the inputs.  I just tried HCat import and got errors...
>
> sqoop import  \
> -Doraoop.disabled=true  \
> --connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \
> --table ENS.CUSTOMER \
> --hcatalog-table CUSTOMER \
> --username myid \
> --password mypwd
>
> 14/08/19 11:57:33 INFO mapreduce.Job: Running job: job_1405966899667_0869
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 running in uber mode : false
> 14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
> 14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed with state FAILED due to: Application application_1405966899667_0869 failed 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>         at org.apache.hadoop.util.Shell.run(Shell.java:418)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
> Can you please send me the syntax for importing Oracle partitions to Hive partitions?  Also, Can I use Oraoop?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com]
> Sent: Tuesday, August 19, 2014 10:45 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> What is the partitioning key in your Oracle table?   If you specify
> the same as partitioning keys on the hive table, then doing a simple hcatalog import from the oracle table to hive will automatically handle all the partitioning (even multiple level).
>
> You are right that Sqoop supported only one level of partitioning key but with 1.4.5 we allowed nesting (multiple static partition keys).
> Multiple dynamic partition keys were supported with hcatalog from
> 1.4.4
>
> Thanks
>
> Venkat
>
> On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <An...@centurylink.com> wrote:
>> David/Gwen,
>>
>> I have an issue with importing Oracle partitions to match Hadoop Files.
>>
>> Sqoop command:
>>
>> sqoop import  \
>> -Doraoop.disabled=false  \
>> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CUS
>> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOMER
>> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14,
>> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUSTO
>> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_P
>> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,CU
>> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOME
>> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P41
>> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUST
>> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER_
>> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,C
>> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTOM
>> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P6
>> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CUS
>> TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \
>> -Doraoop.import.consistent.read=true \ --connect
>> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \
>> --username MyID \ --password MyPWD \ -m 75
>>
>> Issues:
>> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition.
>> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name.
>>
>> Any workaround for this?
>>
>> Regards,
>> Venkat
>>
>>
>> -----Original Message-----
>> From: David Robson [mailto:David.Robson@software.dell.com]
>> Sent: Tuesday, August 05, 2014 6:08 PM
>> To: user@sqoop.apache.org
>> Subject: RE: Import Partitions from Oracle to Hive Partitions
>>
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
>> When using double quotes the entire list of partition names must be enclosed in single quotes.
>> If the last partition name in the list is double quoted then there
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>> test it out when I get into the office and get back to you. But this
>>> option will make it so there is one file for each Oracle partition –
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

Thanks Venkat for the inputs.  I just tried HCat import and got errors...

sqoop import  \
-Doraoop.disabled=true  \
--connect jdbc:oracle:thin:@XXXXXXX:1521/YYYYYY \
--table ENS.CUSTOMER \
--hcatalog-table CUSTOMER \
--username myid \
--password mypwd

14/08/19 11:57:33 INFO mapreduce.Job: Running job: job_1405966899667_0869
14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 running in uber mode : false
14/08/19 11:57:53 INFO mapreduce.Job:  map 0% reduce 0%
14/08/19 11:57:53 INFO mapreduce.Job: Job job_1405966899667_0869 failed with state FAILED due to: Application application_1405966899667_0869 failed 2 times due to AM Container for appattempt_1405966899667_0869_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

Can you please send me the syntax for importing Oracle partitions to Hive partitions?  Also, Can I use Oraoop?  

Regards,
Venkat

-----Original Message-----
From: Venkat Ranganathan [mailto:vranganathan@hortonworks.com] 
Sent: Tuesday, August 19, 2014 10:45 AM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

What is the partitioning key in your Oracle table?   If you specify
the same as partitioning keys on the hive table, then doing a simple hcatalog import from the oracle table to hive will automatically handle all the partitioning (even multiple level).

You are right that Sqoop supported only one level of partitioning key but with 1.4.5 we allowed nesting (multiple static partition keys).
Multiple dynamic partition keys were supported with hcatalog from
1.4.4

Thanks

Venkat

On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam <An...@centurylink.com> wrote:
> David/Gwen,
>
> I have an issue with importing Oracle partitions to match Hadoop Files.
>
> Sqoop command:
>
> sqoop import  \
> -Doraoop.disabled=false  \
> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CUS
> TOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOMER
> _P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14,
> CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUSTO
> MER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_P
> 25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,CU
> STOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOME
> R_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P41
> ,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUST
> OMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER_
> P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,C
> USTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTOM
> ER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P6
> 8,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CUS
> TOMER_P74,CUSTOMER_P75' \ -Doraoop.chunk.method=PARTITION \ 
> -Doraoop.import.consistent.read=true \ --connect 
> jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \ --table Schema.TableName \ 
> --username MyID \ --password MyPWD \ -m 75
>
> Issues:
> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition.
> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name.
>
> Any workaround for this?
>
> Regards,
> Venkat
>
>
> -----Original Message-----
> From: David Robson [mailto:David.Robson@software.dell.com]
> Sent: Tuesday, August 05, 2014 6:08 PM
> To: user@sqoop.apache.org
> Subject: RE: Import Partitions from Oracle to Hive Partitions
>
> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table 
> OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
> When using double quotes the entire list of partition names must be enclosed in single quotes.
> If the last partition name in the list is double quoted then there 
> must be a comma at the end of the list. 
> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
> OracleTableName
>
> Name each partition to be included. There is no facility to provide a range of partition names.
>
> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>> test it out when I get into the office and get back to you. But this 
>> option will make it so there is one file for each Oracle partition – 
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the 
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle 
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to 
>> an error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: 'user@sqoop.apache.org'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and 
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: Import Partitions from Oracle to Hive Partitions

Posted by Venkat Ranganathan <vr...@hortonworks.com>.

What is the partitioning key in your Oracle table?   If you specify
the same as partitioning keys on the hive table, then doing a simple
hcatalog import from the oracle table to hive will automatically
handle all the partitioning (even multiple level).

You are right that Sqoop supported only one level of partitioning key
but with 1.4.5 we allowed nesting (multiple static partition keys).
Multiple dynamic partition keys were supported with hcatalog from
1.4.4

Thanks

Venkat

On Tue, Aug 19, 2014 at 8:30 AM, Venkat, Ankam
<An...@centurylink.com> wrote:
> David/Gwen,
>
> I have an issue with importing Oracle partitions to match Hadoop Files.
>
> Sqoop command:
>
> sqoop import  \
> -Doraoop.disabled=false  \
> -Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CUSTOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOMER_P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14,CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUSTOMER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_P25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,CUSTOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOMER_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P41,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUSTOMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER_P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,CUSTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTOMER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P68,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CUSTOMER_P74,CUSTOMER_P75' \
> -Doraoop.chunk.method=PARTITION \
> -Doraoop.import.consistent.read=true \
> --connect jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \
> --table Schema.TableName \
> --username MyID \
> --password MyPWD \
> -m 75
>
> Issues:
> 1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition.
> 2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name.
>
> Any workaround for this?
>
> Regards,
> Venkat
>
>
> -----Original Message-----
> From: David Robson [mailto:David.Robson@software.dell.com]
> Sent: Tuesday, August 05, 2014 6:08 PM
> To: user@sqoop.apache.org
> Subject: RE: Import Partitions from Oracle to Hive Partitions
>
> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
> When using double quotes the entire list of partition names must be enclosed in single quotes.
> If the last partition name in the list is double quoted then there must be a comma at the end of the list. -Doraoop.import.partitions='"PartitionA","PartitionB",' --table OracleTableName
>
> Name each partition to be included. There is no facility to provide a range of partition names.
>
> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll
>> test it out when I get into the office and get back to you. But this
>> option will make it so there is one file for each Oracle partition –
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to an
>> error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: 'user@sqoop.apache.org'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

David/Gwen,

I have an issue with importing Oracle partitions to match Hadoop Files. 

Sqoop command:

sqoop import  \
-Doraoop.disabled=false  \
-Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CUSTOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOMER_P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14,CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUSTOMER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_P25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,CUSTOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOMER_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P41,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUSTOMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER_P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,CUSTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTOMER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P68,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CUSTOMER_P74,CUSTOMER_P75' \
-Doraoop.chunk.method=PARTITION \
-Doraoop.import.consistent.read=true \
--connect jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \
--table Schema.TableName \
--username MyID \
--password MyPWD \
-m 75

Issues:
1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 14 partitions are empty. I have no way to identify the file belongs to which partition. 
2) All the partitions are imported with names as PART-00001, PART-00002 and so on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop files to Oracle Partions.  I mean import as HDFS files with same name. 

Any workaround for this?

Regards,
Venkat

-----Original Message-----
From: David Robson [mailto:David.Robson@software.dell.com] 
Sent: Tuesday, August 05, 2014 6:08 PM
To: user@sqoop.apache.org
Subject: RE: Import Partitions from Oracle to Hive Partitions

Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).

In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:

Specify The Partitions To Import

-Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName

Imports PartitionA and PartitionB of OracleTableName.

Notes:
You can enclose an individual partition name in double quotes to retain the letter case or if the name has special characters.
-Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If the partition name is not double quoted then its name will be automatically converted to upper case, PARTITIONB for above.
When using double quotes the entire list of partition names must be enclosed in single quotes.
If the last partition name in the list is double quoted then there must be a comma at the end of the list. -Doraoop.import.partitions='"PartitionA","PartitionB",' --table OracleTableName

Name each partition to be included. There is no facility to provide a range of partition names.

There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.

-----Original Message-----
From: Gwen Shapira [mailto:gshapira@cloudera.com]
Sent: Wednesday, 6 August 2014 8:44 AM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.

Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.

On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
> Hi Venkat,
>
>
>
> I’m not sure what this will do in regards to Hive partitions – I’ll 
> test it out when I get into the office and get back to you. But this 
> option will make it so there is one file for each Oracle partition – 
> which might be of interest to you.
>
>
>
> Match Hadoop Files to Oracle Table Partitions
>
>
>
> -Doraoop.chunk.method={ROWID|PARTITION}
>
>
>
> To import data from a partitioned table in such a way that the 
> resulting HDFS folder structure in
>
> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
> The alternative
>
> (default) chunk method is ROWID.
>
>
>
> Notes:
>
> l For the number of Hadoop files to match the number of Oracle 
> partitions, set the number
>
> of mappers to be greater than or equal to the number of partitions.
>
> l If the table is not partitioned then value PARTITION will lead to an 
> error.
>
>
>
> David
>
>
>
>
>
> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
> Sent: Wednesday, 6 August 2014 3:56 AM
> To: 'user@sqoop.apache.org'
> Subject: Import Partitions from Oracle to Hive Partitions
>
>
>
> I am trying to import  partitions from Oracle table to Hive partitions.
>
>
>
> Can somebody provide the syntax using regular JDBC connector and 
> Oraoop connector?
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Venkat
>
>
>
>

Re: Import Partitions from Oracle to Hive Partitions

Posted by Gwen Shapira <gs...@cloudera.com>.

Go for it, Venkat!

If you have questions about writing a patch, feel free to ask on the
dev@sqoop.apache.org mailing list.



On Thu, Aug 7, 2014 at 9:56 AM, Venkat, Ankam
<An...@centurylink.com> wrote:
> Gwen,
>
> Created a jira at https://issues.apache.org/jira/browse/SQOOP-1415.
>
> Yes, I would love to contribute a patch for this.
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, August 06, 2014 9:45 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Venkat,
>
> Running one sqoop job and moving files to different directories should be faster than a sqoop job per partition (at least it was for my customers).
>
> If you are interested in a new OraOop feature, why not open a Jira in issues.apache.org?
> You can even contribute a patch if you are so inclined :)
>
> Gwen
>
> On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam <An...@centurylink.com> wrote:
>> Thanks for the response.
>>
>> I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.  But, based on conversation below, I just learned its not possible.
>>
>> From automation perspective, I think running one Sqoop job per partition and create same partition in Hive is better option.
>>
>> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.  Any idea why there are no commits to Oraoop since 2012?
>>
>> Regards,
>> Venkat
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Tuesday, August 05, 2014 6:24 PM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Having OraOop automatically handle partitions in Hive will be a cool feature. I agree that this will be limited to OraOop for now.
>>
>> On Tue, Aug 5, 2014 at 5:08 PM, David Robson <Da...@software.dell.com> wrote:
>>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>>
>>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>>
>>> Specify The Partitions To Import
>>>
>>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>>> OracleTableName
>>>
>>> Imports PartitionA and PartitionB of OracleTableName.
>>>
>>> Notes:
>>> You can enclose an individual partition name in double quotes to
>>> retain the letter case or if the name has special characters.
>>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
>>> OracleTableName If the partition name is not double quoted then its
>>> name will be automatically converted to upper case, PARTITIONB for
>>> above.
>>> When using double quotes the entire list of partition names must be
>>> enclosed in single quotes.
>>> If the last partition name in the list is double quoted then there
>>> must be a comma at the end of the list.
>>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>>> OracleTableName
>>>
>>> Name each partition to be included. There is no facility to provide a range of partition names.
>>>
>>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>>
>>>
>>> -----Original Message-----
>>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>>> Sent: Wednesday, 6 August 2014 8:44 AM
>>> To: user@sqoop.apache.org
>>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>>
>>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>>
>>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>>
>>>
>>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>>>> Hi Venkat,
>>>>
>>>>
>>>>
>>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>>> test it out when I get into the office and get back to you. But this
>>>> option will make it so there is one file for each Oracle partition –
>>>> which might be of interest to you.
>>>>
>>>>
>>>>
>>>> Match Hadoop Files to Oracle Table Partitions
>>>>
>>>>
>>>>
>>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>>
>>>>
>>>>
>>>> To import data from a partitioned table in such a way that the
>>>> resulting HDFS folder structure in
>>>>
>>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>>> The alternative
>>>>
>>>> (default) chunk method is ROWID.
>>>>
>>>>
>>>>
>>>> Notes:
>>>>
>>>> l For the number of Hadoop files to match the number of Oracle
>>>> partitions, set the number
>>>>
>>>> of mappers to be greater than or equal to the number of partitions.
>>>>
>>>> l If the table is not partitioned then value PARTITION will lead to
>>>> an error.
>>>>
>>>>
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>>> To: 'user@sqoop.apache.org'
>>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>>
>>>>
>>>>
>>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>>
>>>>
>>>>
>>>> Can somebody provide the syntax using regular JDBC connector and
>>>> Oraoop connector?
>>>>
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Venkat
>>>>
>>>>
>>>>
>>>>

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

Gwen,

Created a jira at https://issues.apache.org/jira/browse/SQOOP-1415.

Yes, I would love to contribute a patch for this.  

Regards,
Venkat

-----Original Message-----
From: Gwen Shapira [mailto:gshapira@cloudera.com] 
Sent: Wednesday, August 06, 2014 9:45 AM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

Venkat,

Running one sqoop job and moving files to different directories should be faster than a sqoop job per partition (at least it was for my customers).

If you are interested in a new OraOop feature, why not open a Jira in issues.apache.org?
You can even contribute a patch if you are so inclined :)

Gwen

On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam <An...@centurylink.com> wrote:
> Thanks for the response.
>
> I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.  But, based on conversation below, I just learned its not possible.
>
> From automation perspective, I think running one Sqoop job per partition and create same partition in Hive is better option.
>
> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.  Any idea why there are no commits to Oraoop since 2012?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Tuesday, August 05, 2014 6:24 PM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Having OraOop automatically handle partitions in Hive will be a cool feature. I agree that this will be limited to OraOop for now.
>
> On Tue, Aug 5, 2014 at 5:08 PM, David Robson <Da...@software.dell.com> wrote:
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table 
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to 
>> retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table 
>> OracleTableName If the partition name is not double quoted then its 
>> name will be automatically converted to upper case, PARTITIONB for 
>> above.
>> When using double quotes the entire list of partition names must be 
>> enclosed in single quotes.
>> If the last partition name in the list is double quoted then there 
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>>> test it out when I get into the office and get back to you. But this 
>>> option will make it so there is one file for each Oracle partition – 
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the 
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle 
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to 
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and 
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>

Re: Import Partitions from Oracle to Hive Partitions

Posted by Gwen Shapira <gs...@cloudera.com>.

Venkat,

Running one sqoop job and moving files to different directories should
be faster than a sqoop job per partition (at least it was for my
customers).

If you are interested in a new OraOop feature, why not open a Jira in
issues.apache.org?
You can even contribute a patch if you are so inclined :)

Gwen

On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam
<An...@centurylink.com> wrote:
> Thanks for the response.
>
> I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.  But, based on conversation below, I just learned its not possible.
>
> From automation perspective, I think running one Sqoop job per partition and create same partition in Hive is better option.
>
> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.  Any idea why there are no commits to Oraoop since 2012?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Tuesday, August 05, 2014 6:24 PM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Having OraOop automatically handle partitions in Hive will be a cool feature. I agree that this will be limited to OraOop for now.
>
> On Tue, Aug 5, 2014 at 5:08 PM, David Robson <Da...@software.dell.com> wrote:
>> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to
>> retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
>> OracleTableName If the partition name is not double quoted then its
>> name will be automatically converted to upper case, PARTITIONB for
>> above.
>> When using double quotes the entire list of partition names must be
>> enclosed in single quotes.
>> If the last partition name in the list is double quoted then there
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: user@sqoop.apache.org
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>> test it out when I get into the office and get back to you. But this
>>> option will make it so there is one file for each Oracle partition –
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: 'user@sqoop.apache.org'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>

RE: Import Partitions from Oracle to Hive Partitions

Posted by "Venkat, Ankam" <An...@centurylink.com>.

Thanks for the response.  

I was thinking to use Oraoop to automatically import Oracle partitions to Hive partitions.  But, based on conversation below, I just learned its not possible.  

From automation perspective, I think running one Sqoop job per partition and create same partition in Hive is better option.  

Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive partitions.  Any idea why there are no commits to Oraoop since 2012?

Regards,
Venkat

-----Original Message-----
From: Gwen Shapira [mailto:gshapira@cloudera.com] 
Sent: Tuesday, August 05, 2014 6:24 PM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

Having OraOop automatically handle partitions in Hive will be a cool feature. I agree that this will be limited to OraOop for now.

On Tue, Aug 5, 2014 at 5:08 PM, David Robson <Da...@software.dell.com> wrote:
> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table 
> OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to 
> retain the letter case or if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table 
> OracleTableName If the partition name is not double quoted then its 
> name will be automatically converted to upper case, PARTITIONB for 
> above.
> When using double quotes the entire list of partition names must be 
> enclosed in single quotes.
> If the last partition name in the list is double quoted then there 
> must be a comma at the end of the list. 
> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
> OracleTableName
>
> Name each partition to be included. There is no facility to provide a range of partition names.
>
> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>> test it out when I get into the office and get back to you. But this 
>> option will make it so there is one file for each Oracle partition – 
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the 
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle 
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to 
>> an error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: 'user@sqoop.apache.org'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and 
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>

Re: Import Partitions from Oracle to Hive Partitions

Posted by Gwen Shapira <gs...@cloudera.com>.

Having OraOop automatically handle partitions in Hive will be a cool
feature. I agree that this will be limited to OraOop for now.

On Tue, Aug 5, 2014 at 5:08 PM, David Robson
<Da...@software.dell.com> wrote:
> Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to retain the letter case or
> if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
> OracleTableName
> If the partition name is not double quoted then its name will be automatically converted
> to upper case, PARTITIONB for above.
> When using double quotes the entire list of partition names must be enclosed in
> single quotes.
> If the last partition name in the list is double quoted then there must be a comma at the end of the list. -Doraoop.import.partitions='"PartitionA","PartitionB",' --table OracleTableName
>
> Name each partition to be included. There is no facility to provide a range of partition names.
>
> There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: user@sqoop.apache.org
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll
>> test it out when I get into the office and get back to you. But this
>> option will make it so there is one file for each Oracle partition –
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to an
>> error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: 'user@sqoop.apache.org'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>

RE: Import Partitions from Oracle to Hive Partitions

Posted by David Robson <Da...@software.dell.com>.

Yes now that you mention Sqoop is limited to one partition in Hive I do remember that! I would think we could modify Sqoop to create subfolders for each partition - instead of how it now creates a separate file for each partition? This would probably be limited to the direct (OraOop) connector as it is aware of partitions (existing connector doesn't read data dictionary directly).

In the meantime Venkat - you could look at the option I mentioned - then manually move the files into separate folders - at least you'll have each partition in a separate file rather than spread throughout all files. The other thing you could look at is the option below - you could run one Sqoop job per partition:

Specify The Partitions To Import

-Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName

Imports PartitionA and PartitionB of OracleTableName.

Notes:
You can enclose an individual partition name in double quotes to retain the letter case or
if the name has special characters.
-Doraoop.import.partitions='"PartitionA",PartitionB' --table
OracleTableName
If the partition name is not double quoted then its name will be automatically converted
to upper case, PARTITIONB for above.
When using double quotes the entire list of partition names must be enclosed in
single quotes.
If the last partition name in the list is double quoted then there must be a comma at the end of the list. -Doraoop.import.partitions='"PartitionA","PartitionB",' --table OracleTableName

Name each partition to be included. There is no facility to provide a range of partition names.

There is no facility to define sub partitions. The entire partition is included/excluded as per the filter.

-----Original Message-----
From: Gwen Shapira [mailto:gshapira@cloudera.com] 
Sent: Wednesday, 6 August 2014 8:44 AM
To: user@sqoop.apache.org
Subject: Re: Import Partitions from Oracle to Hive Partitions

Hive expects a directory for each partition, so getting data with OraOop will require some post-processing - copy files into properly named directories and adding the new partitions to a hive table.

Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes that all the data sqooped will fit into a single partition.

On Tue, Aug 5, 2014 at 3:40 PM, David Robson <Da...@software.dell.com> wrote:
> Hi Venkat,
>
>
>
> I’m not sure what this will do in regards to Hive partitions – I’ll 
> test it out when I get into the office and get back to you. But this 
> option will make it so there is one file for each Oracle partition – 
> which might be of interest to you.
>
>
>
> Match Hadoop Files to Oracle Table Partitions
>
>
>
> -Doraoop.chunk.method={ROWID|PARTITION}
>
>
>
> To import data from a partitioned table in such a way that the 
> resulting HDFS folder structure in
>
> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
> The alternative
>
> (default) chunk method is ROWID.
>
>
>
> Notes:
>
> l For the number of Hadoop files to match the number of Oracle 
> partitions, set the number
>
> of mappers to be greater than or equal to the number of partitions.
>
> l If the table is not partitioned then value PARTITION will lead to an 
> error.
>
>
>
> David
>
>
>
>
>
> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
> Sent: Wednesday, 6 August 2014 3:56 AM
> To: 'user@sqoop.apache.org'
> Subject: Import Partitions from Oracle to Hive Partitions
>
>
>
> I am trying to import  partitions from Oracle table to Hive partitions.
>
>
>
> Can somebody provide the syntax using regular JDBC connector and 
> Oraoop connector?
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Venkat
>
>
>
>

Re: Import Partitions from Oracle to Hive Partitions

Posted by Gwen Shapira <gs...@cloudera.com>.

Hive expects a directory for each partition, so getting data with
OraOop will require some post-processing - copy files into properly
named directories and adding the new partitions to a hive table.

Sqoop has the --hive-partition-key and --hive-partition-value, but
this assumes that all the data sqooped will fit into a single
partition.


On Tue, Aug 5, 2014 at 3:40 PM, David Robson
<Da...@software.dell.com> wrote:
> Hi Venkat,
>
>
>
> I’m not sure what this will do in regards to Hive partitions – I’ll test it
> out when I get into the office and get back to you. But this option will
> make it so there is one file for each Oracle partition – which might be of
> interest to you.
>
>
>
> Match Hadoop Files to Oracle Table Partitions
>
>
>
> -Doraoop.chunk.method={ROWID|PARTITION}
>
>
>
> To import data from a partitioned table in such a way that the resulting
> HDFS folder structure in
>
> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
> The alternative
>
> (default) chunk method is ROWID.
>
>
>
> Notes:
>
> l For the number of Hadoop files to match the number of Oracle partitions,
> set the number
>
> of mappers to be greater than or equal to the number of partitions.
>
> l If the table is not partitioned then value PARTITION will lead to an
> error.
>
>
>
> David
>
>
>
>
>
> From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
> Sent: Wednesday, 6 August 2014 3:56 AM
> To: 'user@sqoop.apache.org'
> Subject: Import Partitions from Oracle to Hive Partitions
>
>
>
> I am trying to import  partitions from Oracle table to Hive partitions.
>
>
>
> Can somebody provide the syntax using regular JDBC connector and Oraoop
> connector?
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Venkat
>
>
>
>

RE: Import Partitions from Oracle to Hive Partitions

Posted by David Robson <Da...@software.dell.com>.

Hi Venkat,

I'm not sure what this will do in regards to Hive partitions - I'll test it out when I get into the office and get back to you. But this option will make it so there is one file for each Oracle partition - which might be of interest to you.

Match Hadoop Files to Oracle Table Partitions

-Doraoop.chunk.method={ROWID|PARTITION}

To import data from a partitioned table in such a way that the resulting HDFS folder structure in
Hadoop will match the table's partitions, set the chunk method to PARTITION. The alternative
(default) chunk method is ROWID.

Notes:
l For the number of Hadoop files to match the number of Oracle partitions, set the number
of mappers to be greater than or equal to the number of partitions.
l If the table is not partitioned then value PARTITION will lead to an error.

David


From: Venkat, Ankam [mailto:Ankam.Venkat@centurylink.com]
Sent: Wednesday, 6 August 2014 3:56 AM
To: 'user@sqoop.apache.org'
Subject: Import Partitions from Oracle to Hive Partitions

I am trying to import  partitions from Oracle table to Hive partitions.

Can somebody provide the syntax using regular JDBC connector and Oraoop connector?

Thanks in advance.

Regards,
Venkat