You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Manikandan R <ma...@gmail.com> on 2015/07/02 07:27:08 UTC

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Hello Abe,

Can you please update on this? Also let me know if you need any more info.

Thanks,
Mani

On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <ma...@gmail.com> wrote:

> Impala 1.2.4. We are using amazon emr cluster.
>
> Thanks,
> Mani
>
> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> Oh that makes more sense. Seems like a format mismatch. You might have to
>> upgrade impala. Mind providing the version of Impala you're using?
>>
>> -Abe
>>
>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <ma...@gmail.com>
>> wrote:
>>
>>> actual errors are
>>>
>>> Query: select * from gwynniebee_bi.mi_test
>>> ERROR: AnalysisException: Failed to load metadata for table:
>>> gwynniebee_bi.mi_test
>>> CAUSED BY: TableLoadingException: Unrecognized table type for table:
>>> gwynniebee_bi.mi_test
>>>
>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <ma...@gmail.com>
>>> wrote:
>>>
>>>> It should be same as I have created many tables before in Hive and used
>>>> to read the same in Impala without any issues.
>>>>
>>>> I am running oozie based workflows in Production environment to take
>>>> the data from MySQL to HDFS (via sqoop hive imports) in raw format ->
>>>> Storing the same data again in Parquet format using Impala shell and on top
>>>> of it, reports are running using Impala queries. This is happening for few
>>>> weeks without any issues.
>>>>
>>>> Now, I am trying to see whether I can import the data from mySQL to
>>>> Impala (parquet) directly to avoid the Intermediate step.
>>>>
>>>>
>>>>
>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Check your config. They should use the same metastore.
>>>>>
>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>
>>>>>> I can able to read data from Hive, but not from Impala shell. Any
>>>>>> workaround?
>>>>>>
>>>>>> Thanks,
>>>>>> Mani
>>>>>>
>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>
>>>>>>> For the datetime/timestamp issue... this is because parquet doesn't
>>>>>>> support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try
>>>>>>> casting to a numeric or string value first?
>>>>>>>
>>>>>>> -Abe
>>>>>>>
>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <ma...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am running
>>>>>>>>
>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root --password
>>>>>>>> gwynniebee --table bats_active --hive-import --hive-database gwynniebee_bi
>>>>>>>> --hive-table test_pq_bats_active --null-string '\\N' --null-non-string
>>>>>>>> '\\N' --as-parquetfile -m1
>>>>>>>>
>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind of
>>>>>>>> errors. In my case, corresponding home directory exists. But, still it is
>>>>>>>> throwing the below exception.
>>>>>>>>
>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI patterns
>>>>>>>> in org.kitesdk.data.spi.hive.Loader
>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>> are on the classpath.
>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>> are on the classpath.
>>>>>>>> at
>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>> at
>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>> at
>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>> at
>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>> at
>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>> at
>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>> at
>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>
>>>>>>>> So, I tried an alternative solution, creating an parquet file first
>>>>>>>> without any hive related options and creating an table referring to the
>>>>>>>> same location in Impala. It worked fine. But, it is throwing the below
>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>
>>>>>>>> ERROR: File hdfs://
>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>
>>>>>>>> Then, I tried table without datetime columns. It is working fine in
>>>>>>>> this case.
>>>>>>>>
>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha bin.
>>>>>>>>
>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>> please help me in this regard?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mani
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Posted by Abraham Elmahrek <ab...@cloudera.com>.

Hey man,

Can you start a separate thread for this? I'd add details like:

   - version
   - command
   - --verbose output

-Abe

On Fri, Jul 17, 2015 at 7:35 AM, Anupam sinha <ak...@gmail.com> wrote:

> I have face similar issue like, sqoop not working in access node,
> get the error "SQLServer test failed (1)"
>
> do i need to change any setting
>
> On Thu, Jul 2, 2015 at 11:59 AM, Manikandan R <ma...@gmail.com>
> wrote:
>
>> Ok, thanks
>>
>> On Thu, Jul 2, 2015 at 11:38 AM, Abraham Elmahrek <ab...@cloudera.com>
>> wrote:
>>
>>> I'd check with the impala user group! But I think 1.2.4 is an older
>>> version. Upgrading might make your headaches go away in general.
>>>
>>> -Abe
>>>
>>> On Wed, Jul 1, 2015 at 11:04 PM, Manikandan R <ma...@gmail.com>
>>> wrote:
>>>
>>>> Ok, Abe. I will try for that.
>>>>
>>>> Also, for the past 2 days, Impalad is getting crashed in 1 node
>>>> particularly. Because of this, oozie workflows are taking huge amount of
>>>> time to complete. Even, it is not getting completed after 24 hours. We used
>>>> to restart the daemon, it works fine for sometime. Again, it crashes. It
>>>> doesn't seem very stable.
>>>>
>>>> I've attached error report file. Please check.
>>>>
>>>> Thanks,
>>>> Mani
>>>>
>>>>
>>>> On Thu, Jul 2, 2015 at 11:11 AM, Abraham Elmahrek <ab...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Could you try upgrading Impala?
>>>>>
>>>>> -Abe
>>>>>
>>>>> On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Abe,
>>>>>>
>>>>>> Can you please update on this? Also let me know if you need any more
>>>>>> info.
>>>>>>
>>>>>> Thanks,
>>>>>> Mani
>>>>>>
>>>>>> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Impala 1.2.4. We are using amazon emr cluster.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mani
>>>>>>>
>>>>>>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <abe@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Oh that makes more sense. Seems like a format mismatch. You might
>>>>>>>> have to upgrade impala. Mind providing the version of Impala you're using?
>>>>>>>>
>>>>>>>> -Abe
>>>>>>>>
>>>>>>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <
>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> actual errors are
>>>>>>>>>
>>>>>>>>> Query: select * from gwynniebee_bi.mi_test
>>>>>>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>>>>>>> gwynniebee_bi.mi_test
>>>>>>>>> CAUSED BY: TableLoadingException: Unrecognized table type for
>>>>>>>>> table: gwynniebee_bi.mi_test
>>>>>>>>>
>>>>>>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <
>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> It should be same as I have created many tables before in Hive
>>>>>>>>>> and used to read the same in Impala without any issues.
>>>>>>>>>>
>>>>>>>>>> I am running oozie based workflows in Production environment to
>>>>>>>>>> take the data from MySQL to HDFS (via sqoop hive imports) in raw format ->
>>>>>>>>>> Storing the same data again in Parquet format using Impala shell and on top
>>>>>>>>>> of it, reports are running using Impala queries. This is happening for few
>>>>>>>>>> weeks without any issues.
>>>>>>>>>>
>>>>>>>>>> Now, I am trying to see whether I can import the data from mySQL
>>>>>>>>>> to Impala (parquet) directly to avoid the Intermediate step.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <
>>>>>>>>>> abe@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Check your config. They should use the same metastore.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <
>>>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>>>>>>
>>>>>>>>>>>> I can able to read data from Hive, but not from Impala shell.
>>>>>>>>>>>> Any workaround?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Mani
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <
>>>>>>>>>>>> abe@cloudera.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the datetime/timestamp issue... this is because parquet
>>>>>>>>>>>>> doesn't support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>>>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739.
>>>>>>>>>>>>> Try casting to a numeric or string value first?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Abe
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <
>>>>>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am running
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root
>>>>>>>>>>>>>> --password gwynniebee --table bats_active --hive-import --hive-database
>>>>>>>>>>>>>> gwynniebee_bi --hive-table test_pq_bats_active --null-string '\\N'
>>>>>>>>>>>>>> --null-non-string '\\N' --as-parquetfile -m1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind of
>>>>>>>>>>>>>> errors. In my case, corresponding home directory exists. But, still it is
>>>>>>>>>>>>>> throwing the below exception.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI
>>>>>>>>>>>>>> patterns in org.kitesdk.data.spi.hive.Loader
>>>>>>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running
>>>>>>>>>>>>>> Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>>>>>> are on the classpath.
>>>>>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset
>>>>>>>>>>>>>> URI: hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive
>>>>>>>>>>>>>> datasets are on the classpath.
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>>>>>>> first without any hive related options and creating an table referring to
>>>>>>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing the below
>>>>>>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ERROR: File hdfs://
>>>>>>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then, I tried table without datetime columns. It is working
>>>>>>>>>>>>>> fine in this case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha
>>>>>>>>>>>>>> bin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>>>>>>> please help me in this regard?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Mani
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Posted by Anupam sinha <ak...@gmail.com>.

I have face similar issue like, sqoop not working in access node,
get the error "SQLServer test failed (1)"

do i need to change any setting

On Thu, Jul 2, 2015 at 11:59 AM, Manikandan R <ma...@gmail.com> wrote:

> Ok, thanks
>
> On Thu, Jul 2, 2015 at 11:38 AM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> I'd check with the impala user group! But I think 1.2.4 is an older
>> version. Upgrading might make your headaches go away in general.
>>
>> -Abe
>>
>> On Wed, Jul 1, 2015 at 11:04 PM, Manikandan R <ma...@gmail.com>
>> wrote:
>>
>>> Ok, Abe. I will try for that.
>>>
>>> Also, for the past 2 days, Impalad is getting crashed in 1 node
>>> particularly. Because of this, oozie workflows are taking huge amount of
>>> time to complete. Even, it is not getting completed after 24 hours. We used
>>> to restart the daemon, it works fine for sometime. Again, it crashes. It
>>> doesn't seem very stable.
>>>
>>> I've attached error report file. Please check.
>>>
>>> Thanks,
>>> Mani
>>>
>>>
>>> On Thu, Jul 2, 2015 at 11:11 AM, Abraham Elmahrek <ab...@cloudera.com>
>>> wrote:
>>>
>>>> Could you try upgrading Impala?
>>>>
>>>> -Abe
>>>>
>>>> On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Abe,
>>>>>
>>>>> Can you please update on this? Also let me know if you need any more
>>>>> info.
>>>>>
>>>>> Thanks,
>>>>> Mani
>>>>>
>>>>> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Impala 1.2.4. We are using amazon emr cluster.
>>>>>>
>>>>>> Thanks,
>>>>>> Mani
>>>>>>
>>>>>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Oh that makes more sense. Seems like a format mismatch. You might
>>>>>>> have to upgrade impala. Mind providing the version of Impala you're using?
>>>>>>>
>>>>>>> -Abe
>>>>>>>
>>>>>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <manirajv06@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> actual errors are
>>>>>>>>
>>>>>>>> Query: select * from gwynniebee_bi.mi_test
>>>>>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>>>>>> gwynniebee_bi.mi_test
>>>>>>>> CAUSED BY: TableLoadingException: Unrecognized table type for
>>>>>>>> table: gwynniebee_bi.mi_test
>>>>>>>>
>>>>>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <manirajv06@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> It should be same as I have created many tables before in Hive and
>>>>>>>>> used to read the same in Impala without any issues.
>>>>>>>>>
>>>>>>>>> I am running oozie based workflows in Production environment to
>>>>>>>>> take the data from MySQL to HDFS (via sqoop hive imports) in raw format ->
>>>>>>>>> Storing the same data again in Parquet format using Impala shell and on top
>>>>>>>>> of it, reports are running using Impala queries. This is happening for few
>>>>>>>>> weeks without any issues.
>>>>>>>>>
>>>>>>>>> Now, I am trying to see whether I can import the data from mySQL
>>>>>>>>> to Impala (parquet) directly to avoid the Intermediate step.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <
>>>>>>>>> abe@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> Check your config. They should use the same metastore.
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <
>>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>>>>>
>>>>>>>>>>> I can able to read data from Hive, but not from Impala shell.
>>>>>>>>>>> Any workaround?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Mani
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <
>>>>>>>>>>> abe@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>>>>>
>>>>>>>>>>>> For the datetime/timestamp issue... this is because parquet
>>>>>>>>>>>> doesn't support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739.
>>>>>>>>>>>> Try casting to a numeric or string value first?
>>>>>>>>>>>>
>>>>>>>>>>>> -Abe
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <
>>>>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am running
>>>>>>>>>>>>>
>>>>>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root
>>>>>>>>>>>>> --password gwynniebee --table bats_active --hive-import --hive-database
>>>>>>>>>>>>> gwynniebee_bi --hive-table test_pq_bats_active --null-string '\\N'
>>>>>>>>>>>>> --null-non-string '\\N' --as-parquetfile -m1
>>>>>>>>>>>>>
>>>>>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind of
>>>>>>>>>>>>> errors. In my case, corresponding home directory exists. But, still it is
>>>>>>>>>>>>> throwing the below exception.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI
>>>>>>>>>>>>> patterns in org.kitesdk.data.spi.hive.Loader
>>>>>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running
>>>>>>>>>>>>> Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>>>>> are on the classpath.
>>>>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset
>>>>>>>>>>>>> URI: hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive
>>>>>>>>>>>>> datasets are on the classpath.
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>>>>>> first without any hive related options and creating an table referring to
>>>>>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing the below
>>>>>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>>>>>
>>>>>>>>>>>>> ERROR: File hdfs://
>>>>>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, I tried table without datetime columns. It is working
>>>>>>>>>>>>> fine in this case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha
>>>>>>>>>>>>> bin.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>>>>>> please help me in this regard?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Mani
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Posted by Manikandan R <ma...@gmail.com>.

Ok, thanks

On Thu, Jul 2, 2015 at 11:38 AM, Abraham Elmahrek <ab...@cloudera.com> wrote:

> I'd check with the impala user group! But I think 1.2.4 is an older
> version. Upgrading might make your headaches go away in general.
>
> -Abe
>
> On Wed, Jul 1, 2015 at 11:04 PM, Manikandan R <ma...@gmail.com>
> wrote:
>
>> Ok, Abe. I will try for that.
>>
>> Also, for the past 2 days, Impalad is getting crashed in 1 node
>> particularly. Because of this, oozie workflows are taking huge amount of
>> time to complete. Even, it is not getting completed after 24 hours. We used
>> to restart the daemon, it works fine for sometime. Again, it crashes. It
>> doesn't seem very stable.
>>
>> I've attached error report file. Please check.
>>
>> Thanks,
>> Mani
>>
>>
>> On Thu, Jul 2, 2015 at 11:11 AM, Abraham Elmahrek <ab...@cloudera.com>
>> wrote:
>>
>>> Could you try upgrading Impala?
>>>
>>> -Abe
>>>
>>> On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <ma...@gmail.com>
>>> wrote:
>>>
>>>> Hello Abe,
>>>>
>>>> Can you please update on this? Also let me know if you need any more
>>>> info.
>>>>
>>>> Thanks,
>>>> Mani
>>>>
>>>> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Impala 1.2.4. We are using amazon emr cluster.
>>>>>
>>>>> Thanks,
>>>>> Mani
>>>>>
>>>>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Oh that makes more sense. Seems like a format mismatch. You might
>>>>>> have to upgrade impala. Mind providing the version of Impala you're using?
>>>>>>
>>>>>> -Abe
>>>>>>
>>>>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> actual errors are
>>>>>>>
>>>>>>> Query: select * from gwynniebee_bi.mi_test
>>>>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>>>>> gwynniebee_bi.mi_test
>>>>>>> CAUSED BY: TableLoadingException: Unrecognized table type for table:
>>>>>>> gwynniebee_bi.mi_test
>>>>>>>
>>>>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <ma...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It should be same as I have created many tables before in Hive and
>>>>>>>> used to read the same in Impala without any issues.
>>>>>>>>
>>>>>>>> I am running oozie based workflows in Production environment to
>>>>>>>> take the data from MySQL to HDFS (via sqoop hive imports) in raw format ->
>>>>>>>> Storing the same data again in Parquet format using Impala shell and on top
>>>>>>>> of it, reports are running using Impala queries. This is happening for few
>>>>>>>> weeks without any issues.
>>>>>>>>
>>>>>>>> Now, I am trying to see whether I can import the data from mySQL to
>>>>>>>> Impala (parquet) directly to avoid the Intermediate step.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <abe@cloudera.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Check your config. They should use the same metastore.
>>>>>>>>>
>>>>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <
>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>>>>
>>>>>>>>>> I can able to read data from Hive, but not from Impala shell. Any
>>>>>>>>>> workaround?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Mani
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <
>>>>>>>>>> abe@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>>>>
>>>>>>>>>>> For the datetime/timestamp issue... this is because parquet
>>>>>>>>>>> doesn't support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try
>>>>>>>>>>> casting to a numeric or string value first?
>>>>>>>>>>>
>>>>>>>>>>> -Abe
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <
>>>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I am running
>>>>>>>>>>>>
>>>>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root
>>>>>>>>>>>> --password gwynniebee --table bats_active --hive-import --hive-database
>>>>>>>>>>>> gwynniebee_bi --hive-table test_pq_bats_active --null-string '\\N'
>>>>>>>>>>>> --null-non-string '\\N' --as-parquetfile -m1
>>>>>>>>>>>>
>>>>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind of
>>>>>>>>>>>> errors. In my case, corresponding home directory exists. But, still it is
>>>>>>>>>>>> throwing the below exception.
>>>>>>>>>>>>
>>>>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI
>>>>>>>>>>>> patterns in org.kitesdk.data.spi.hive.Loader
>>>>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running
>>>>>>>>>>>> Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>>>> are on the classpath.
>>>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>>>> are on the classpath.
>>>>>>>>>>>> at
>>>>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>>>>
>>>>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>>>>> first without any hive related options and creating an table referring to
>>>>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing the below
>>>>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>>>>
>>>>>>>>>>>> ERROR: File hdfs://
>>>>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>>>>
>>>>>>>>>>>> Then, I tried table without datetime columns. It is working
>>>>>>>>>>>> fine in this case.
>>>>>>>>>>>>
>>>>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha
>>>>>>>>>>>> bin.
>>>>>>>>>>>>
>>>>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>>>>> please help me in this regard?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Mani
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Posted by Abraham Elmahrek <ab...@cloudera.com>.

I'd check with the impala user group! But I think 1.2.4 is an older
version. Upgrading might make your headaches go away in general.

-Abe

On Wed, Jul 1, 2015 at 11:04 PM, Manikandan R <ma...@gmail.com> wrote:

> Ok, Abe. I will try for that.
>
> Also, for the past 2 days, Impalad is getting crashed in 1 node
> particularly. Because of this, oozie workflows are taking huge amount of
> time to complete. Even, it is not getting completed after 24 hours. We used
> to restart the daemon, it works fine for sometime. Again, it crashes. It
> doesn't seem very stable.
>
> I've attached error report file. Please check.
>
> Thanks,
> Mani
>
>
> On Thu, Jul 2, 2015 at 11:11 AM, Abraham Elmahrek <ab...@cloudera.com>
> wrote:
>
>> Could you try upgrading Impala?
>>
>> -Abe
>>
>> On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <ma...@gmail.com>
>> wrote:
>>
>>> Hello Abe,
>>>
>>> Can you please update on this? Also let me know if you need any more
>>> info.
>>>
>>> Thanks,
>>> Mani
>>>
>>> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <ma...@gmail.com>
>>> wrote:
>>>
>>>> Impala 1.2.4. We are using amazon emr cluster.
>>>>
>>>> Thanks,
>>>> Mani
>>>>
>>>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Oh that makes more sense. Seems like a format mismatch. You might have
>>>>> to upgrade impala. Mind providing the version of Impala you're using?
>>>>>
>>>>> -Abe
>>>>>
>>>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> actual errors are
>>>>>>
>>>>>> Query: select * from gwynniebee_bi.mi_test
>>>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>>>> gwynniebee_bi.mi_test
>>>>>> CAUSED BY: TableLoadingException: Unrecognized table type for table:
>>>>>> gwynniebee_bi.mi_test
>>>>>>
>>>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> It should be same as I have created many tables before in Hive and
>>>>>>> used to read the same in Impala without any issues.
>>>>>>>
>>>>>>> I am running oozie based workflows in Production environment to take
>>>>>>> the data from MySQL to HDFS (via sqoop hive imports) in raw format ->
>>>>>>> Storing the same data again in Parquet format using Impala shell and on top
>>>>>>> of it, reports are running using Impala queries. This is happening for few
>>>>>>> weeks without any issues.
>>>>>>>
>>>>>>> Now, I am trying to see whether I can import the data from mySQL to
>>>>>>> Impala (parquet) directly to avoid the Intermediate step.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Check your config. They should use the same metastore.
>>>>>>>>
>>>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <
>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>>>
>>>>>>>>> I can able to read data from Hive, but not from Impala shell. Any
>>>>>>>>> workaround?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Mani
>>>>>>>>>
>>>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <
>>>>>>>>> abe@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>>>
>>>>>>>>>> For the datetime/timestamp issue... this is because parquet
>>>>>>>>>> doesn't support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try
>>>>>>>>>> casting to a numeric or string value first?
>>>>>>>>>>
>>>>>>>>>> -Abe
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <
>>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am running
>>>>>>>>>>>
>>>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root
>>>>>>>>>>> --password gwynniebee --table bats_active --hive-import --hive-database
>>>>>>>>>>> gwynniebee_bi --hive-table test_pq_bats_active --null-string '\\N'
>>>>>>>>>>> --null-non-string '\\N' --as-parquetfile -m1
>>>>>>>>>>>
>>>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind of
>>>>>>>>>>> errors. In my case, corresponding home directory exists. But, still it is
>>>>>>>>>>> throwing the below exception.
>>>>>>>>>>>
>>>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI
>>>>>>>>>>> patterns in org.kitesdk.data.spi.hive.Loader
>>>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running
>>>>>>>>>>> Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>>> are on the classpath.
>>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>>> are on the classpath.
>>>>>>>>>>> at
>>>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>>>
>>>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>>>> first without any hive related options and creating an table referring to
>>>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing the below
>>>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>>>
>>>>>>>>>>> ERROR: File hdfs://
>>>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>>>
>>>>>>>>>>> Then, I tried table without datetime columns. It is working fine
>>>>>>>>>>> in this case.
>>>>>>>>>>>
>>>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha
>>>>>>>>>>> bin.
>>>>>>>>>>>
>>>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>>>> please help me in this regard?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Mani
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Posted by Manikandan R <ma...@gmail.com>.

Ok, Abe. I will try for that.

Also, for the past 2 days, Impalad is getting crashed in 1 node
particularly. Because of this, oozie workflows are taking huge amount of
time to complete. Even, it is not getting completed after 24 hours. We used
to restart the daemon, it works fine for sometime. Again, it crashes. It
doesn't seem very stable.

I've attached error report file. Please check.

Thanks,
Mani


On Thu, Jul 2, 2015 at 11:11 AM, Abraham Elmahrek <ab...@cloudera.com> wrote:

> Could you try upgrading Impala?
>
> -Abe
>
> On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <ma...@gmail.com>
> wrote:
>
>> Hello Abe,
>>
>> Can you please update on this? Also let me know if you need any more info.
>>
>> Thanks,
>> Mani
>>
>> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <ma...@gmail.com>
>> wrote:
>>
>>> Impala 1.2.4. We are using amazon emr cluster.
>>>
>>> Thanks,
>>> Mani
>>>
>>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <ab...@cloudera.com>
>>> wrote:
>>>
>>>> Oh that makes more sense. Seems like a format mismatch. You might have
>>>> to upgrade impala. Mind providing the version of Impala you're using?
>>>>
>>>> -Abe
>>>>
>>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> actual errors are
>>>>>
>>>>> Query: select * from gwynniebee_bi.mi_test
>>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>>> gwynniebee_bi.mi_test
>>>>> CAUSED BY: TableLoadingException: Unrecognized table type for table:
>>>>> gwynniebee_bi.mi_test
>>>>>
>>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> It should be same as I have created many tables before in Hive and
>>>>>> used to read the same in Impala without any issues.
>>>>>>
>>>>>> I am running oozie based workflows in Production environment to take
>>>>>> the data from MySQL to HDFS (via sqoop hive imports) in raw format ->
>>>>>> Storing the same data again in Parquet format using Impala shell and on top
>>>>>> of it, reports are running using Impala queries. This is happening for few
>>>>>> weeks without any issues.
>>>>>>
>>>>>> Now, I am trying to see whether I can import the data from mySQL to
>>>>>> Impala (parquet) directly to avoid the Intermediate step.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Check your config. They should use the same metastore.
>>>>>>>
>>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <manirajv06@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>>
>>>>>>>> I can able to read data from Hive, but not from Impala shell. Any
>>>>>>>> workaround?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mani
>>>>>>>>
>>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <abe@cloudera.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>>
>>>>>>>>> For the datetime/timestamp issue... this is because parquet
>>>>>>>>> doesn't support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try
>>>>>>>>> casting to a numeric or string value first?
>>>>>>>>>
>>>>>>>>> -Abe
>>>>>>>>>
>>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <
>>>>>>>>> manirajv06@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am running
>>>>>>>>>>
>>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root --password
>>>>>>>>>> gwynniebee --table bats_active --hive-import --hive-database gwynniebee_bi
>>>>>>>>>> --hive-table test_pq_bats_active --null-string '\\N' --null-non-string
>>>>>>>>>> '\\N' --as-parquetfile -m1
>>>>>>>>>>
>>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind of
>>>>>>>>>> errors. In my case, corresponding home directory exists. But, still it is
>>>>>>>>>> throwing the below exception.
>>>>>>>>>>
>>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI patterns
>>>>>>>>>> in org.kitesdk.data.spi.hive.Loader
>>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>> are on the classpath.
>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>>> are on the classpath.
>>>>>>>>>> at
>>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>>> at
>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>>> at
>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>>> at
>>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>>> at
>>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>>> at
>>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>>> at
>>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>>> at
>>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>>
>>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>>> first without any hive related options and creating an table referring to
>>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing the below
>>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>>
>>>>>>>>>> ERROR: File hdfs://
>>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>>
>>>>>>>>>> Then, I tried table without datetime columns. It is working fine
>>>>>>>>>> in this case.
>>>>>>>>>>
>>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha bin.
>>>>>>>>>>
>>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>>> please help me in this regard?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Mani
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Posted by Abraham Elmahrek <ab...@cloudera.com>.

Could you try upgrading Impala?

-Abe

On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <ma...@gmail.com> wrote:

> Hello Abe,
>
> Can you please update on this? Also let me know if you need any more info.
>
> Thanks,
> Mani
>
> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <ma...@gmail.com>
> wrote:
>
>> Impala 1.2.4. We are using amazon emr cluster.
>>
>> Thanks,
>> Mani
>>
>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <ab...@cloudera.com>
>> wrote:
>>
>>> Oh that makes more sense. Seems like a format mismatch. You might have
>>> to upgrade impala. Mind providing the version of Impala you're using?
>>>
>>> -Abe
>>>
>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <ma...@gmail.com>
>>> wrote:
>>>
>>>> actual errors are
>>>>
>>>> Query: select * from gwynniebee_bi.mi_test
>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>> gwynniebee_bi.mi_test
>>>> CAUSED BY: TableLoadingException: Unrecognized table type for table:
>>>> gwynniebee_bi.mi_test
>>>>
>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> It should be same as I have created many tables before in Hive and
>>>>> used to read the same in Impala without any issues.
>>>>>
>>>>> I am running oozie based workflows in Production environment to take
>>>>> the data from MySQL to HDFS (via sqoop hive imports) in raw format ->
>>>>> Storing the same data again in Parquet format using Impala shell and on top
>>>>> of it, reports are running using Impala queries. This is happening for few
>>>>> weeks without any issues.
>>>>>
>>>>> Now, I am trying to see whether I can import the data from mySQL to
>>>>> Impala (parquet) directly to avoid the Intermediate step.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Check your config. They should use the same metastore.
>>>>>>
>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>
>>>>>>> I can able to read data from Hive, but not from Impala shell. Any
>>>>>>> workaround?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mani
>>>>>>>
>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <ab...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>
>>>>>>>> For the datetime/timestamp issue... this is because parquet doesn't
>>>>>>>> support timestamp types yet. Avro schemas support them as of 1.8.0
>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try
>>>>>>>> casting to a numeric or string value first?
>>>>>>>>
>>>>>>>> -Abe
>>>>>>>>
>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <manirajv06@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am running
>>>>>>>>>
>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root --password
>>>>>>>>> gwynniebee --table bats_active --hive-import --hive-database gwynniebee_bi
>>>>>>>>> --hive-table test_pq_bats_active --null-string '\\N' --null-non-string
>>>>>>>>> '\\N' --as-parquetfile -m1
>>>>>>>>>
>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind of
>>>>>>>>> errors. In my case, corresponding home directory exists. But, still it is
>>>>>>>>> throwing the below exception.
>>>>>>>>>
>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI patterns
>>>>>>>>> in org.kitesdk.data.spi.hive.Loader
>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>> are on the classpath.
>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive datasets
>>>>>>>>> are on the classpath.
>>>>>>>>> at
>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>> at
>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>> at
>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>> at
>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>> at
>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>> at
>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>> at
>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>> at
>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>
>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>> first without any hive related options and creating an table referring to
>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing the below
>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>
>>>>>>>>> ERROR: File hdfs://
>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>> has an incompatible type with the table schema for column create_date.
>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>
>>>>>>>>> Then, I tried table without datetime columns. It is working fine
>>>>>>>>> in this case.
>>>>>>>>>
>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha bin.
>>>>>>>>>
>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>> please help me in this regard?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Mani
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>