You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Alexander Pivovarov <ap...@gmail.com> on 2015/06/24 22:33:10 UTC

join 2 tables located on different clusters

Hello Everyone

Can I define external table on cluster_1 pointing to hdfs location on
cluster_2?
I tried and got some strange exception in hive
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:java.lang.reflect.InvocationTargetException)

I want to do full outer join btw table A which exist on cluster_1 and table
A on cluster_2.

My idea was to create external table A_2 (on cluster_1) which points to
cluster_2 and run hive query on cluster_1

select a.*, a_2.*
from a
full outer join a_2 on (a.id = a_2.id)

Re: join 2 tables located on different clusters

Posted by Alexander Pivovarov <ap...@gmail.com>.
I tried to reproduce "Wrong FS" issue in several hive branches

branch-0.14 - works
branch-1.0 - works
branch-1.1 - throws exception

Looks like the error was introduced in 1.1.0 by the following
https://issues.apache.org/jira/browse/HIVE-9264

I opened new JIRA for the issue
https://issues.apache.org/jira/browse/HIVE-11116


On Wed, Jun 24, 2015 at 4:08 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> I tried on local hadoop/hive instance  (hive is the latest from master
> branch)
>
> mydev is ha alias to remote ha name node.
>
> $ hadoop fs -ls hdfs://mydev/tmp/et1
> Found 1 items
> -rw-r--r--   3 myapp hadoop         16 2015-06-24 16:05
> hdfs://mydev/tmp/et1/et1file
>
> $ hive
>
> hive> CREATE TABLE et1 (
>   a string
> ) stored as textfile
> LOCATION 'hdfs://mydev/tmp/et1';
>
> hive> select * from et1;
>
> 15/06/24 16:01:08 [main]: ERROR parse.CalcitePlanner:
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if
> hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException:
> Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1870)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1947)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1979)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1792)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
>     at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>     at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>     at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
>     at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
>     at
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
>     at
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1210)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866)
>     ... 26 more
>
> FAILED: SemanticException Unable to determine if hdfs://mydev/tmp/et1 is
> encrypted: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
> 15/06/24 16:01:08 [main]: ERROR ql.Driver: FAILED: SemanticException
> Unable to determine if hdfs://mydev/tmp/et1 is encrypted:
> java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1,
> expected: hdfs://localhost:8020
> org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if
> hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException:
> Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1850)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
>     at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
>     at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>     at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
> determine if hdfs://mydev/tmp/et1 is encrypted:
> java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1,
> expected: hdfs://localhost:8020
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1870)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1947)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1979)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1792)
>     ... 23 more
> Caused by: java.lang.IllegalArgumentException: Wrong FS:
> hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
>     at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
>     at
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
>     at
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1210)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866)
>     ... 26 more
>
>
> On Wed, Jun 24, 2015 at 1:37 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> I do not know what your exact problem is. Set you debug logging on. This
>> can be done however assuming both clusters have network access to each other
>>
>> On Wed, Jun 24, 2015 at 4:33 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> Hello Everyone
>>>
>>> Can I define external table on cluster_1 pointing to hdfs location on
>>> cluster_2?
>>> I tried and got some strange exception in hive
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.DDLTask.
>>> MetaException(message:java.lang.reflect.InvocationTargetException)
>>>
>>> I want to do full outer join btw table A which exist on cluster_1 and
>>> table A on cluster_2.
>>>
>>> My idea was to create external table A_2 (on cluster_1) which points to
>>> cluster_2 and run hive query on cluster_1
>>>
>>> select a.*, a_2.*
>>> from a
>>> full outer join a_2 on (a.id = a_2.id)
>>>
>>
>>
>

Re: join 2 tables located on different clusters

Posted by Alexander Pivovarov <ap...@gmail.com>.
I tried on local hadoop/hive instance  (hive is the latest from master
branch)

mydev is ha alias to remote ha name node.

$ hadoop fs -ls hdfs://mydev/tmp/et1
Found 1 items
-rw-r--r--   3 myapp hadoop         16 2015-06-24 16:05
hdfs://mydev/tmp/et1/et1file

$ hive

hive> CREATE TABLE et1 (
  a string
) stored as textfile
LOCATION 'hdfs://mydev/tmp/et1';

hive> select * from et1;

15/06/24 16:01:08 [main]: ERROR parse.CalcitePlanner:
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to determine if
hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException:
Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1870)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1947)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1979)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1792)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
    at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
    at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
    at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: Wrong FS:
hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
    at
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
    at
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1210)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866)
    ... 26 more

FAILED: SemanticException Unable to determine if hdfs://mydev/tmp/et1 is
encrypted: java.lang.IllegalArgumentException: Wrong FS:
hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
15/06/24 16:01:08 [main]: ERROR ql.Driver: FAILED: SemanticException Unable
to determine if hdfs://mydev/tmp/et1 is encrypted:
java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1,
expected: hdfs://localhost:8020
org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if
hdfs://mydev/tmp/et1 is encrypted: java.lang.IllegalArgumentException:
Wrong FS: hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1850)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1527)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10057)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
    at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
    at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
    at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
determine if hdfs://mydev/tmp/et1 is encrypted:
java.lang.IllegalArgumentException: Wrong FS: hdfs://mydev/tmp/et1,
expected: hdfs://localhost:8020
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1870)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1947)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1979)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1792)
    ... 23 more
Caused by: java.lang.IllegalArgumentException: Wrong FS:
hdfs://mydev/tmp/et1, expected: hdfs://localhost:8020
    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
    at
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
    at
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1210)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866)
    ... 26 more


On Wed, Jun 24, 2015 at 1:37 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> I do not know what your exact problem is. Set you debug logging on. This
> can be done however assuming both clusters have network access to each other
>
> On Wed, Jun 24, 2015 at 4:33 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> Hello Everyone
>>
>> Can I define external table on cluster_1 pointing to hdfs location on
>> cluster_2?
>> I tried and got some strange exception in hive
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask.
>> MetaException(message:java.lang.reflect.InvocationTargetException)
>>
>> I want to do full outer join btw table A which exist on cluster_1 and
>> table A on cluster_2.
>>
>> My idea was to create external table A_2 (on cluster_1) which points to
>> cluster_2 and run hive query on cluster_1
>>
>> select a.*, a_2.*
>> from a
>> full outer join a_2 on (a.id = a_2.id)
>>
>
>

Re: join 2 tables located on different clusters

Posted by Edward Capriolo <ed...@gmail.com>.
I do not know what your exact problem is. Set you debug logging on. This
can be done however assuming both clusters have network access to each other

On Wed, Jun 24, 2015 at 4:33 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> Hello Everyone
>
> Can I define external table on cluster_1 pointing to hdfs location on
> cluster_2?
> I tried and got some strange exception in hive
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:java.lang.reflect.InvocationTargetException)
>
> I want to do full outer join btw table A which exist on cluster_1 and
> table A on cluster_2.
>
> My idea was to create external table A_2 (on cluster_1) which points to
> cluster_2 and run hive query on cluster_1
>
> select a.*, a_2.*
> from a
> full outer join a_2 on (a.id = a_2.id)
>