You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by Tejas Patil <te...@gmail.com> on 2013/05/16 04:17:07 UTC

unable to create external table

Hi @tajo-dev,

I am following the wiki page [0] and tried to create a dataset but that
didn't go well

*$TAJO_HOME/bin/tajo cli*
*2013-05-15 19:09:50,730 INFO  client.TajoClient
(TajoClient.java:connect(76)) - connected to tajo cluster (0.0.0.0:9004)*
*
*
*Trying to connect the tajo master (0.0.0.0:9004)*
*tajo> create external table table1 (id int, name string, score float, type
string) using csv with ('csvfile.delimiter'='|') location
'file:/home/tejas/table1'*
*ERROR: line 1:43*
*LINE: create external table table1 (id int, name string, score float, type
string) using csv with ('csvfile.delimiter'='|') location
'file:/home/tejas/table1'*

Below is the output of jps indicating that Tajo and Hadoop processes are
running:

*5366 ResourceManager*
*5176 NameNode*
*5758 JobHistoryServer*
*6028 TajoMaster*
*6476 Jps*
*5235 DataNode*
*6142 TajoCli*
*5517 NodeManager*

I went through the tajo-master log but could not find anything meaningful.
Any pointers on how to figure out the problem ?

[0] : http://wiki.apache.org/tajo/GettingStarted

Thanks,
Tejas Patil
http://www.linkedin.com/in/tejaspatil1

Re: unable to create external table

Posted by Hyunsik Choi <hy...@apache.org>.
Thank you for your feedback. I leave some inline comments on your
suggestions.


On Thu, May 16, 2013 at 1:18 PM, Tejas Patil <te...@gmail.com>wrote:

> Sure :)
>
> 1. I hit a problem while trying out the query on wiki page:
> tajo> select * from table1 where id > 2
> Internal Error
>
> After looking over the logs and then googling, I came across a recent
> thread [0] over tajo-dev which had the way to resolve the problem. Why not
> add a note about this to wiki ?
>

I'll note this case and how to solve in the wiki. Thank you for your
comment.


> 2. Looks like column names if tried to set to a reserved keyword doesn't
> work.
>
> tajo> create external table table2 (cid int, *date* text, price int, name
> text) using csv with ('csvfile.delimiter'='|') location
> 'file:/home/tejas/Desktop/apache/incubator-tajo/snapshot/table2'
> ERROR: line 1:39
> LINE: create external table table2 (cid int, date text, price int, name
> text) using csv with ('csvfile.delimiter'='|') location
> 'file:/home/tejas/Desktop/apache/incubator-tajo/snapshot/table2'
>
> As soon as I modify 'date' something else (say 'mdate'), it worked:
> *tajo> create external table table2 (cid int, mdate text, price int, name
> text) using csv with ('csvfile.delimiter'='|') location
> 'file:/home/tejas/Desktop/apache/incubator-tajo/snapshot/table2'*
> *OK*
>
> IMO its not acceptable to tell people that the cannot have attributes with
> a certain name because its a reserved keyword. This will lead to artificial
> schemas and data for the user based on the underlying systems'
> implementation details. MySQL allows users to do this [1] using back ticks
> (`)
>

That's good idea. Like MySQL, we need to support some escape method.
Actually, we are working on new parser to support most of SQL standard at
tajo-frontend/tajo-frontend-sql. I'll reflect your idea to the new parser.


> 3. The wiki page for query language [2], shows keywords in capital (eg.
> SELECT, FROM, etc..). In reality, using capital letters doesn't work:
>
>
Thank you for your comment. The new parser can handle keywords in a case
insensitive way. This problem will be solved in new parser.


> tajo> SELECT name, sum(price) as total_price from table2 group by name
> having sum(price)<2000
> ERROR: line 1:11
> LINE: SELECT name, sum(price) as total_price from table2 group by name
> having sum(price)<2000
>
> tajo> select name, sum(price) as total_price from table2 group by name
> HAVING SUM(price)<2000
> ...
> ..
> name,  total_price
> -------------------------------
> Jensen,  2000
> Nilsen,  1700
> Hansen,  2000
>
> 4. For the query above, the expected output should have had only one row
> but it returned 3 rows instead. After googling I saw that this is a known
> issue and there is an open jira [3] for that. I am not sure how to fix that
> but want to give it a shot. Can you give some pointers ?


That problem is somewhat complex. The problem is caused by both incomplete
logical planning and aggregation operator problems. Now, we implemented
'having clause' by putting the selection operator after an aggregation
operator (SortAggregationExec or HashAggregationExec). However, this method
cannot deal with any aggregation function included in a having clause.

A straightforward solution may be to improve the planner to push 'having
condition' to aggregation operators. Then, an aggregation operator should
handle the expressions included in the having clause. Before the
aggregation operator outputs the aggregated values, the having condition
should be evaluated against aggregated values.

P.S. I really appreciate your comments. All of them should be fixed and
improved. We will do or you can do. Actually, I should spend whole time to
finish my Ph.D. thesis for 2 weeks. I can rarely spend my time on Tajo
project. Some other core developers are similar to me because they are tied
to another project that will be finished at the end of May. However, we
will make great changes from mid-June. Please keep your interest in Tajo
project.

Best regards,
Hyunsik



> [0] :
>
> http://mail-archives.apache.org/mod_mbox/tajo-dev/201305.mbox/%3CCALuGr6ZHyKmk+V7+QXYV03tu5eHhgMjrEn_JfsAX4RWbvCjBiA@mail.gmail.com%3E
> [1] : http://doc.ispirer.com/sqlways/Output/SQLWays-1-035.html
> [2] : http://wiki.apache.org/tajo/QueryLanguage
> [3] : https://issues.apache.org/jira/browse/TAJO-46
>
>
> On Wed, May 15, 2013 at 8:15 PM, Hyunsik Choi <hy...@apache.org> wrote:
>
> > If you have any question, feel free to ask anything. Thanks!
> >
> > -hyunsik
> >
> >
> >
> > On Thu, May 16, 2013 at 12:02 PM, Tejas Patil <tejas.patil.cs@gmail.com
> > >wrote:
> >
> > > That one worked :)
> > >
> > >
> > > On Wed, May 15, 2013 at 7:39 PM, Hyunsik Choi <hy...@apache.org>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Above all, I'm sorry for not updating the document. Recently, some
> data
> > > > types are renamed in order to follow SQL standard. So, string is
> > replaced
> > > > by text. I'll update the document right now.
> > > >
> > > > Could you try to execute the following statement?
> > > >
> > > > create external table table1 (id int, name text, score float, type
> > text)
> > > > using csv with ('csvfile.delimiter'='|')
> > location'file:/home/tejas/table1
> > > >
> > > > Thanks,
> > > > Hyunsik
> > > >
> > > >
> > > > On Thu, May 16, 2013 at 11:17 AM, Tejas Patil <
> > tejas.patil.cs@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi @tajo-dev,
> > > > >
> > > > > I am following the wiki page [0] and tried to create a dataset but
> > that
> > > > > didn't go well
> > > > >
> > > > > *$TAJO_HOME/bin/tajo cli*
> > > > > *2013-05-15 19:09:50,730 INFO  client.TajoClient
> > > > > (TajoClient.java:connect(76)) - connected to tajo cluster (
> > > 0.0.0.0:9004
> > > > )*
> > > > > *
> > > > > *
> > > > > *Trying to connect the tajo master (0.0.0.0:9004)*
> > > > > *tajo> create external table table1 (id int, name string, score
> > float,
> > > > type
> > > > > string) using csv with ('csvfile.delimiter'='|') location
> > > > > 'file:/home/tejas/table1'*
> > > > > *ERROR: line 1:43*
> > > > > *LINE: create external table table1 (id int, name string, score
> > float,
> > > > type
> > > > > string) using csv with ('csvfile.delimiter'='|') location
> > > > > 'file:/home/tejas/table1'*
> > > > >
> > > > > Below is the output of jps indicating that Tajo and Hadoop
> processes
> > > are
> > > > > running:
> > > > >
> > > > > *5366 ResourceManager*
> > > > > *5176 NameNode*
> > > > > *5758 JobHistoryServer*
> > > > > *6028 TajoMaster*
> > > > > *6476 Jps*
> > > > > *5235 DataNode*
> > > > > *6142 TajoCli*
> > > > > *5517 NodeManager*
> > > > >
> > > > > I went through the tajo-master log but could not find anything
> > > > meaningful.
> > > > > Any pointers on how to figure out the problem ?
> > > > >
> > > > > [0] : http://wiki.apache.org/tajo/GettingStarted
> > > > >
> > > > > Thanks,
> > > > > Tejas Patil
> > > > > http://www.linkedin.com/in/tejaspatil1
> > > > >
> > > >
> > >
> >
>

Re: unable to create external table

Posted by Tejas Patil <te...@gmail.com>.
Sure :)

1. I hit a problem while trying out the query on wiki page:
tajo> select * from table1 where id > 2
Internal Error

After looking over the logs and then googling, I came across a recent
thread [0] over tajo-dev which had the way to resolve the problem. Why not
add a note about this to wiki ?

2. Looks like column names if tried to set to a reserved keyword doesn't
work.

tajo> create external table table2 (cid int, *date* text, price int, name
text) using csv with ('csvfile.delimiter'='|') location
'file:/home/tejas/Desktop/apache/incubator-tajo/snapshot/table2'
ERROR: line 1:39
LINE: create external table table2 (cid int, date text, price int, name
text) using csv with ('csvfile.delimiter'='|') location
'file:/home/tejas/Desktop/apache/incubator-tajo/snapshot/table2'

As soon as I modify 'date' something else (say 'mdate'), it worked:
*tajo> create external table table2 (cid int, mdate text, price int, name
text) using csv with ('csvfile.delimiter'='|') location
'file:/home/tejas/Desktop/apache/incubator-tajo/snapshot/table2'*
*OK*

IMO its not acceptable to tell people that the cannot have attributes with
a certain name because its a reserved keyword. This will lead to artificial
schemas and data for the user based on the underlying systems'
implementation details. MySQL allows users to do this [1] using back ticks
(`)

3. The wiki page for query language [2], shows keywords in capital (eg.
SELECT, FROM, etc..). In reality, using capital letters doesn't work:

tajo> SELECT name, sum(price) as total_price from table2 group by name
having sum(price)<2000
ERROR: line 1:11
LINE: SELECT name, sum(price) as total_price from table2 group by name
having sum(price)<2000

tajo> select name, sum(price) as total_price from table2 group by name
HAVING SUM(price)<2000
...
..
name,  total_price
-------------------------------
Jensen,  2000
Nilsen,  1700
Hansen,  2000

4. For the query above, the expected output should have had only one row
but it returned 3 rows instead. After googling I saw that this is a known
issue and there is an open jira [3] for that. I am not sure how to fix that
but want to give it a shot. Can you give some pointers ?

[0] :
http://mail-archives.apache.org/mod_mbox/tajo-dev/201305.mbox/%3CCALuGr6ZHyKmk+V7+QXYV03tu5eHhgMjrEn_JfsAX4RWbvCjBiA@mail.gmail.com%3E
[1] : http://doc.ispirer.com/sqlways/Output/SQLWays-1-035.html
[2] : http://wiki.apache.org/tajo/QueryLanguage
[3] : https://issues.apache.org/jira/browse/TAJO-46


On Wed, May 15, 2013 at 8:15 PM, Hyunsik Choi <hy...@apache.org> wrote:

> If you have any question, feel free to ask anything. Thanks!
>
> -hyunsik
>
>
>
> On Thu, May 16, 2013 at 12:02 PM, Tejas Patil <tejas.patil.cs@gmail.com
> >wrote:
>
> > That one worked :)
> >
> >
> > On Wed, May 15, 2013 at 7:39 PM, Hyunsik Choi <hy...@apache.org>
> wrote:
> >
> > > Hi,
> > >
> > > Above all, I'm sorry for not updating the document. Recently, some data
> > > types are renamed in order to follow SQL standard. So, string is
> replaced
> > > by text. I'll update the document right now.
> > >
> > > Could you try to execute the following statement?
> > >
> > > create external table table1 (id int, name text, score float, type
> text)
> > > using csv with ('csvfile.delimiter'='|')
> location'file:/home/tejas/table1
> > >
> > > Thanks,
> > > Hyunsik
> > >
> > >
> > > On Thu, May 16, 2013 at 11:17 AM, Tejas Patil <
> tejas.patil.cs@gmail.com
> > > >wrote:
> > >
> > > > Hi @tajo-dev,
> > > >
> > > > I am following the wiki page [0] and tried to create a dataset but
> that
> > > > didn't go well
> > > >
> > > > *$TAJO_HOME/bin/tajo cli*
> > > > *2013-05-15 19:09:50,730 INFO  client.TajoClient
> > > > (TajoClient.java:connect(76)) - connected to tajo cluster (
> > 0.0.0.0:9004
> > > )*
> > > > *
> > > > *
> > > > *Trying to connect the tajo master (0.0.0.0:9004)*
> > > > *tajo> create external table table1 (id int, name string, score
> float,
> > > type
> > > > string) using csv with ('csvfile.delimiter'='|') location
> > > > 'file:/home/tejas/table1'*
> > > > *ERROR: line 1:43*
> > > > *LINE: create external table table1 (id int, name string, score
> float,
> > > type
> > > > string) using csv with ('csvfile.delimiter'='|') location
> > > > 'file:/home/tejas/table1'*
> > > >
> > > > Below is the output of jps indicating that Tajo and Hadoop processes
> > are
> > > > running:
> > > >
> > > > *5366 ResourceManager*
> > > > *5176 NameNode*
> > > > *5758 JobHistoryServer*
> > > > *6028 TajoMaster*
> > > > *6476 Jps*
> > > > *5235 DataNode*
> > > > *6142 TajoCli*
> > > > *5517 NodeManager*
> > > >
> > > > I went through the tajo-master log but could not find anything
> > > meaningful.
> > > > Any pointers on how to figure out the problem ?
> > > >
> > > > [0] : http://wiki.apache.org/tajo/GettingStarted
> > > >
> > > > Thanks,
> > > > Tejas Patil
> > > > http://www.linkedin.com/in/tejaspatil1
> > > >
> > >
> >
>

Re: unable to create external table

Posted by Hyunsik Choi <hy...@apache.org>.
If you have any question, feel free to ask anything. Thanks!

-hyunsik



On Thu, May 16, 2013 at 12:02 PM, Tejas Patil <te...@gmail.com>wrote:

> That one worked :)
>
>
> On Wed, May 15, 2013 at 7:39 PM, Hyunsik Choi <hy...@apache.org> wrote:
>
> > Hi,
> >
> > Above all, I'm sorry for not updating the document. Recently, some data
> > types are renamed in order to follow SQL standard. So, string is replaced
> > by text. I'll update the document right now.
> >
> > Could you try to execute the following statement?
> >
> > create external table table1 (id int, name text, score float, type text)
> > using csv with ('csvfile.delimiter'='|') location'file:/home/tejas/table1
> >
> > Thanks,
> > Hyunsik
> >
> >
> > On Thu, May 16, 2013 at 11:17 AM, Tejas Patil <tejas.patil.cs@gmail.com
> > >wrote:
> >
> > > Hi @tajo-dev,
> > >
> > > I am following the wiki page [0] and tried to create a dataset but that
> > > didn't go well
> > >
> > > *$TAJO_HOME/bin/tajo cli*
> > > *2013-05-15 19:09:50,730 INFO  client.TajoClient
> > > (TajoClient.java:connect(76)) - connected to tajo cluster (
> 0.0.0.0:9004
> > )*
> > > *
> > > *
> > > *Trying to connect the tajo master (0.0.0.0:9004)*
> > > *tajo> create external table table1 (id int, name string, score float,
> > type
> > > string) using csv with ('csvfile.delimiter'='|') location
> > > 'file:/home/tejas/table1'*
> > > *ERROR: line 1:43*
> > > *LINE: create external table table1 (id int, name string, score float,
> > type
> > > string) using csv with ('csvfile.delimiter'='|') location
> > > 'file:/home/tejas/table1'*
> > >
> > > Below is the output of jps indicating that Tajo and Hadoop processes
> are
> > > running:
> > >
> > > *5366 ResourceManager*
> > > *5176 NameNode*
> > > *5758 JobHistoryServer*
> > > *6028 TajoMaster*
> > > *6476 Jps*
> > > *5235 DataNode*
> > > *6142 TajoCli*
> > > *5517 NodeManager*
> > >
> > > I went through the tajo-master log but could not find anything
> > meaningful.
> > > Any pointers on how to figure out the problem ?
> > >
> > > [0] : http://wiki.apache.org/tajo/GettingStarted
> > >
> > > Thanks,
> > > Tejas Patil
> > > http://www.linkedin.com/in/tejaspatil1
> > >
> >
>

Re: unable to create external table

Posted by Tejas Patil <te...@gmail.com>.
That one worked :)


On Wed, May 15, 2013 at 7:39 PM, Hyunsik Choi <hy...@apache.org> wrote:

> Hi,
>
> Above all, I'm sorry for not updating the document. Recently, some data
> types are renamed in order to follow SQL standard. So, string is replaced
> by text. I'll update the document right now.
>
> Could you try to execute the following statement?
>
> create external table table1 (id int, name text, score float, type text)
> using csv with ('csvfile.delimiter'='|') location'file:/home/tejas/table1
>
> Thanks,
> Hyunsik
>
>
> On Thu, May 16, 2013 at 11:17 AM, Tejas Patil <tejas.patil.cs@gmail.com
> >wrote:
>
> > Hi @tajo-dev,
> >
> > I am following the wiki page [0] and tried to create a dataset but that
> > didn't go well
> >
> > *$TAJO_HOME/bin/tajo cli*
> > *2013-05-15 19:09:50,730 INFO  client.TajoClient
> > (TajoClient.java:connect(76)) - connected to tajo cluster (0.0.0.0:9004
> )*
> > *
> > *
> > *Trying to connect the tajo master (0.0.0.0:9004)*
> > *tajo> create external table table1 (id int, name string, score float,
> type
> > string) using csv with ('csvfile.delimiter'='|') location
> > 'file:/home/tejas/table1'*
> > *ERROR: line 1:43*
> > *LINE: create external table table1 (id int, name string, score float,
> type
> > string) using csv with ('csvfile.delimiter'='|') location
> > 'file:/home/tejas/table1'*
> >
> > Below is the output of jps indicating that Tajo and Hadoop processes are
> > running:
> >
> > *5366 ResourceManager*
> > *5176 NameNode*
> > *5758 JobHistoryServer*
> > *6028 TajoMaster*
> > *6476 Jps*
> > *5235 DataNode*
> > *6142 TajoCli*
> > *5517 NodeManager*
> >
> > I went through the tajo-master log but could not find anything
> meaningful.
> > Any pointers on how to figure out the problem ?
> >
> > [0] : http://wiki.apache.org/tajo/GettingStarted
> >
> > Thanks,
> > Tejas Patil
> > http://www.linkedin.com/in/tejaspatil1
> >
>

Re: unable to create external table

Posted by Hyunsik Choi <hy...@apache.org>.
Hi,

Above all, I'm sorry for not updating the document. Recently, some data
types are renamed in order to follow SQL standard. So, string is replaced
by text. I'll update the document right now.

Could you try to execute the following statement?

create external table table1 (id int, name text, score float, type text)
using csv with ('csvfile.delimiter'='|') location'file:/home/tejas/table1

Thanks,
Hyunsik


On Thu, May 16, 2013 at 11:17 AM, Tejas Patil <te...@gmail.com>wrote:

> Hi @tajo-dev,
>
> I am following the wiki page [0] and tried to create a dataset but that
> didn't go well
>
> *$TAJO_HOME/bin/tajo cli*
> *2013-05-15 19:09:50,730 INFO  client.TajoClient
> (TajoClient.java:connect(76)) - connected to tajo cluster (0.0.0.0:9004)*
> *
> *
> *Trying to connect the tajo master (0.0.0.0:9004)*
> *tajo> create external table table1 (id int, name string, score float, type
> string) using csv with ('csvfile.delimiter'='|') location
> 'file:/home/tejas/table1'*
> *ERROR: line 1:43*
> *LINE: create external table table1 (id int, name string, score float, type
> string) using csv with ('csvfile.delimiter'='|') location
> 'file:/home/tejas/table1'*
>
> Below is the output of jps indicating that Tajo and Hadoop processes are
> running:
>
> *5366 ResourceManager*
> *5176 NameNode*
> *5758 JobHistoryServer*
> *6028 TajoMaster*
> *6476 Jps*
> *5235 DataNode*
> *6142 TajoCli*
> *5517 NodeManager*
>
> I went through the tajo-master log but could not find anything meaningful.
> Any pointers on how to figure out the problem ?
>
> [0] : http://wiki.apache.org/tajo/GettingStarted
>
> Thanks,
> Tejas Patil
> http://www.linkedin.com/in/tejaspatil1
>