You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by Daniel Einspanjer <de...@gmail.com> on 2013/12/03 20:10:33 UTC

Getting started with sandbox on mac

I originally started playing with Tajo by installing it in a Linux VM of
Hadoop, but I'm trying to make changes to the source, and I'd really like
to be able to test those changes using my dev machine which is a Mac.

I'm trying to figure out how to get the dependencies sorted out as well as
debugging with IntelliJ IDEA.

I'm trying to use a local pseudo-distributed instance of CDH 4.4 for this
dev work.

I've cloned the Tajo git repo and run:

mvn package -DskipTests -Pdist

I then went into the tajo-dist/target/tajo-0.8.0-SNAPSHOT directory
and edited the tajo-env.sh file to point HADOOP_HOME at my
cdh44/hadoop-2.0.0-cdh4.4.0 directory.

I was able to create an App config in IntelliJ for TajoMaster and set
it up as mentioned previously in this mailing list.

When I try to create an external table, I started to run into problems..

First, I tried creating a table1/data.csv file in HDFS, but I wasn't
sure how to reference the host in the location clause.

I tried "location 'hdfs://localhost:8020/table1'", but that gave me
the following exception:

Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:
 Message missing required fields: callId, status; Host Details :
 local host is: "den.local/192.168.1.88"; destination host is:
"localhost":8020;

I then tried just pointing at a local directory using "location
'file:/Users/deinspanjer/table1';" instead.

I was able to create the external table and describe it using \d, but
when I try to query it using a simple "select * from table1;", it
starts outputting lines similar to this and never stops:

Progress: 100%, response time: 0.751 sec
Progress: 100%, response time: 1.761 sec
Progress: 100%, response time: 2.788 sec
Progress: 100%, response time: 3.796 sec
...
Progress: 100%, response time: 90.325 sec

Progress: 100%, response time: 91.332 sec
Progress: 100%, response time: 92.337 sec



If anyone has suggestions on how to resolve my issue with pointing at
hdfs, or why the query against a local file doesn't complete would be
appreciated.

-Daniel

Re: Getting started with sandbox on mac

Posted by Hyunsik Choi <hy...@apache.org>.
Hi Daniel,

I'm suspecting that HDFS problem is cased by two problems.

First one that localhost indicates wrong IP address. Your HDFS address
is right, but HDFS client is trying to connect den.local/192.168.1.88
instead of 127.0.0.1. I think that your localhost is 192.168.1.88.

Second problem may be caused by HDFS protocol incompatibility. As you
know, 'Message missing required fields' error message means the
possibility of difference between protobuf messages of client and
server. Actually, we haven't tested on cdh 4.4.0, and I'm not sure
that Tajo works well in this version.

I'm not sure the cause of Infinite hang problem. One suspected problem
is rpc bind problem because the default service bind addresses mostly
localhost and your localhost ip is somewhat strange. In order to avoid
this problem, you should specify rpc bind addresses. For this, please
see this document
(http://tajo.incubator.apache.org/tajo-0.2.0-doc.html#DefaultPorts),
and then you need to specify three bind addresses (1)
tajo.master.umbilical-rpc.address, (2) tajo.master.client-rpc.address,
and (3) tajo.catalog.client-rpc.address. In addition, TajoMaster log
in ${TAJO_HOME}/logs may contain the cause of the problem. If you
share your error messages, it would be better to figure out the
problem.

Cheers,
Hyunsik

On Wed, Dec 4, 2013 at 4:10 AM, Daniel Einspanjer <de...@gmail.com> wrote:
> I originally started playing with Tajo by installing it in a Linux VM of
> Hadoop, but I'm trying to make changes to the source, and I'd really like
> to be able to test those changes using my dev machine which is a Mac.
>
> I'm trying to figure out how to get the dependencies sorted out as well as
> debugging with IntelliJ IDEA.
>
> I'm trying to use a local pseudo-distributed instance of CDH 4.4 for this
> dev work.
>
> I've cloned the Tajo git repo and run:
>
> mvn package -DskipTests -Pdist
>
> I then went into the tajo-dist/target/tajo-0.8.0-SNAPSHOT directory
> and edited the tajo-env.sh file to point HADOOP_HOME at my
> cdh44/hadoop-2.0.0-cdh4.4.0 directory.
>
> I was able to create an App config in IntelliJ for TajoMaster and set
> it up as mentioned previously in this mailing list.
>
> When I try to create an external table, I started to run into problems..
>
> First, I tried creating a table1/data.csv file in HDFS, but I wasn't
> sure how to reference the host in the location clause.
>
> I tried "location 'hdfs://localhost:8020/table1'", but that gave me
> the following exception:
>
> Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: callId, status; Host Details :
>  local host is: "den.local/192.168.1.88"; destination host is:
> "localhost":8020;
>
> I then tried just pointing at a local directory using "location
> 'file:/Users/deinspanjer/table1';" instead.
>
> I was able to create the external table and describe it using \d, but
> when I try to query it using a simple "select * from table1;", it
> starts outputting lines similar to this and never stops:
>
> Progress: 100%, response time: 0.751 sec
> Progress: 100%, response time: 1.761 sec
> Progress: 100%, response time: 2.788 sec
> Progress: 100%, response time: 3.796 sec
> ...
> Progress: 100%, response time: 90.325 sec
>
> Progress: 100%, response time: 91.332 sec
> Progress: 100%, response time: 92.337 sec
>
>
>
> If anyone has suggestions on how to resolve my issue with pointing at
> hdfs, or why the query against a local file doesn't complete would be
> appreciated.
>
> -Daniel