You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "Maksim Kononenko (JIRA)" <ji...@apache.org> on 2013/07/19 11:20:49 UTC

[jira] [Commented] (KNOX-44) Support HUE interacting with a Hadoop cluster via the Gateway

    [ https://issues.apache.org/jira/browse/KNOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713482#comment-13713482 ] 

Maksim Kononenko commented on KNOX-44:
--------------------------------------

Here are the results of my research.

1. Job Browser.
a) Job Tracker
Hue Plugin is written based on Thritf. Transport type is Socket and content type is Binary. Port - 9290.
Another config property is jobtracker_port (8021) - The port where the JobTracker IPC listens on - this value is being sent in messages by Oozie for internal usage. Hue doesn't use this to communicate with Job Tracker.
b) YARN
Uses Resource Manager, Node Manager, History Server REST APIs.

Hue config also contains ResourceManager Host/Port for IPC - for application submissions.
Required for all clients who need to submit the YARN applications including Hive, Hive server, Pig.
This address is being sent in message during job submission.

2. Job Designer
3. Oozie Editor
Uses Oozie and WebHDFS REST API (config file also contains comment that we could use Thrift for communication with HDFS).

Here is an example of data being sent during workflow submission:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
  <name>oozie.use.system.libpath</name>
  <value>true</value>
</property>

<property>
  <name>hue-id-w</name>
  <value>6</value>
</property>

<property>
  <name>user.name</name>
  <value>admin</value>
</property>

<property>
  <name>nameNode</name>
  <value>hdfs://dev03.hortonworks.com:8020</value>
</property>

<property>
  <name>jobTracker</name>
  <value>dev03.hortonworks.com:8021</value>
</property>

<property>
  <name>oozie.wf.application.path</name>
  <value>hdfs://dev03.hortonworks.com:8020/user/hue/oozie/workspaces/_admin_-oozie-6-1371227049.01</value>
</property>

</configuration>

where dev03.hortonworks.com - host with all installed Hadoop services.
So here is a problem: Hue has to be configured to know where nameNode and jobTracker are being installed or Knox has to be able to rewrite their addresses.


4. Pig Editor
As I looked into the code, I made conclusion that Hue Pig doesn't wokr with Hadoop Pig directly.
In nutshell, Hue Pig is a wrapper around Oozie/WebHDFS REST API.

5. File Browser
Uses WebHDFS REST API. Also it seems that it could directly communicate with NameNode using 8020 port and IPC protocol. I didn't investigate this protocol.

6. Beeswax (Hive UI).
Cloudera developed additional service called Beeswax. It is Hadoop dependent - it requires Hadoop binaries to run.
Hue is able to work directly with Hive or through Beeswax.

It communicates with Hive using Thrift, transport type is Socket and content type is Binary, port - 10000.

I didn't investigate Beeswax project code - it requires more time.

I tried to configure Hue to work directly with Hive and got following exception on the Hive side:

13/06/20 05:57:10 ERROR server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
    at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
    at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
    at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
    at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
    at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
    ... 4 more
   
I didn't manage to solve it.

7. Metastore Manager - the same as in the point 6.

8. Impala

Impala is Cloudera's application.
Here is its URL
http://www.cloudera.com/content/cloudera/en/products/cdh/impala.html
and Impala Concepts and Architecture
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_concepts.html

It contains statement:
"The core Impala component is a daemon process that runs on each node of the cluster, physically represented by the impalad process"

So I made conclusion that is should be installed beyond Knox (Knox should proxy calls to Impala).

Also I found statement:

" You can connect and submit requests to the Impala daemons through:

    The impala-shell command.
    The Apache Hue web-based user interface.
    JDBC.
    ODBC.
"

So, if it is needed, I can try to investigate its API.

One more statement:
"The Cloudera Impala Query UI application enables you to perform queries on Apache Hadoop data stored in HDFS or HBase using Cloudera Impala."

Hue has two ways to communicate with Impala:
a) through Beeswax
b) directly
Here is Impala's ports description:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_ports.html
These two ways use Thrift interface(Sockets+Binary Data Format).

I can spend more time to investigate Impala.

9. Shell
It tries to execute following commands on the local machine:

/usr/bin/pig -l /dev/null
/usr/bin/hbase shell
/usr/bin/sqoop2

So it wouldn't work with Knox.
                
> Support HUE interacting with a Hadoop cluster via the Gateway
> -------------------------------------------------------------
>
>                 Key: KNOX-44
>                 URL: https://issues.apache.org/jira/browse/KNOX-44
>             Project: Apache Knox
>          Issue Type: New Feature
>          Components: Server
>    Affects Versions: 0.2.0
>            Reporter: Kevin Minder
>            Assignee: Maksim Kononenko
>
> From BUG-4322
> Ultimately it should be possible to use all HUE features and have traffic pass through the gateway.  This will require understanding what HUE features are not supported via REST APIs and determining how HUE can be changed or configured to use gateway URLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira