You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tajo.apache.org by Hyunsik Choi <hy...@apache.org> on 2013/12/06 12:49:59 UTC

Re: I have question with tajo query

Hi Jae Lee,

TAJO-304 (https://issues.apache.org/jira/browse/TAJO-304) was
resolved. Now, 'DROP TABLE' does not remove data, and just remove a
table information from catalog. And, 'DROP TABLE table_name PURGE'
removes a table information as well as removes data.

- hyunsik

On Wed, Nov 13, 2013 at 11:02 PM, Jae Lee <ot...@gmail.com> wrote:
> Hi Hyunsik Choi,
>
> Thank you for your reply and jira issues.
> I loot forward to meet 0.8-incubating.
>
> Regards,
> Jae.
>
>
>
> 2013/11/13 Hyunsik Choi <hy...@apache.org>
>
>> Hi Jae Lee,
>>
>> Thank you for your interest in Tajo. From your questions, I've just
>> added two jira issues, and scheduled them to 0.8-incubating. They will
>> be resolved soon.
>>
>> TAJO-304: drop table command should not remove data files in default
>> (https://issues.apache.org/jira/browse/TAJO-304)
>> TAJO-305: Implement killQuery feature
>> (https://issues.apache.org/jira/browse/TAJO-305)
>>
>> - hyunsik
>>
>> On Tue, Nov 12, 2013 at 10:58 AM, Jinho Kim <jh...@apache.org> wrote:
>> > Hi Jae,
>> >
>> > Could you try to do the following? I'm suspecting that some column values
>> > may contain Hive's default null character '\\N'.
>> >
>> > CREATE EXTERNAL TABLE tablename( A int, B text, C text)
>> >
>> > using csv with('csvfile.delimiter'='\001','csvfile.null'='\\N')
>> >
>> >
>> > Besides, in Hive, some value that cannot be parsed are dealt as NULL.
>> But,
>> > in the current Tajo, they causes errors.
>> >
>> >
>> > Thanks
>> > -Jinho
>> >
>> >
>> > 2013/11/12 Jae Lee <ot...@gmail.com>
>> >
>> >> Hi Jihoon,
>> >>
>> >> Thank you for your answer.
>> >> About Q1 has more question.
>> >> I already waiting for query result long time. I think that is not
>> normaly.
>> >> COUNT(*) query got result only 300sec, but DISTINCT, GROUPBY and SUM
>> query
>> >> is excuting whole day.
>> >> I found another error message. Please see below message.
>> >> Error message is about integer type column name of "Year".
>> >> The query was "select distinct year from departuredelay;"
>> >> I was execute same query on Hive. It had no error.
>> >> But Year column has some null or blank data.
>> >> Table was create EXTERNAL table with several CSV files on HDFS.
>> >>
>> ---------------------------------------------------------------------------
>> >>
>> >> 2013-11-11 18:34:01,436 ERROR worker.Task (Task.java:run(363)) -
>> >> java.lang.NumberFormatException: For input string: "Year"
>> >> at
>> >>
>> >>
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> >>  at java.lang.Integer.parseInt(Integer.java:492)
>> >> at java.lang.Integer.valueOf(Integer.java:582)
>> >> at org.apache.tajo.datum.DatumFactory.createInt4(DatumFactory.java:140)
>> >>  at
>> org.apache.tajo.storage.LazyTuple.createByTextBytes(LazyTuple.java:313)
>> >> at org.apache.tajo.storage.LazyTuple.get(LazyTuple.java:126)
>> >>  at org.apache.tajo.engine.eval.FieldEval.eval(FieldEval.java:58)
>> >> at org.apache.tajo.engine.planner.Projector.eval(Projector.java:87)
>> >>  at
>> >>
>> >>
>> org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:111)
>> >> at
>> >>
>> >>
>> org.apache.tajo.engine.planner.physical.HashAggregateExec.compute(HashAggregateExec.java:57)
>> >>  at
>> >>
>> >>
>> org.apache.tajo.engine.planner.physical.HashAggregateExec.next(HashAggregateExec.java:83)
>> >> at
>> >>
>> >>
>> org.apache.tajo.engine.planner.physical.PartitionedStoreExec.next(PartitionedStoreExec.java:121)
>> >>  at org.apache.tajo.worker.Task.run(Task.java:355)
>> >> at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:376)
>> >>  at java.lang.Thread.run(Thread.java:744)
>> >>
>> >>
>> >>
>> -----------------------------------------------------------------------------
>> >>
>> >>
>> >>
>> >> Also I attach file tajo-site.xml.
>> >> Please check my config is correct.
>> >>
>> >> hostname | hadoop | tajo | DBMS
>> >> namenode | namenode | master | Maria (Metastore)
>> >> snamenode | snamenode+datanode1 | worker
>> >> datanode02 | datanode2 | worker
>> >> datanode03 | datanode3 | worker
>> >>
>> >>
>> >>
>> -----------------------------------------------------------------------------
>> >> <configuration>
>> >> <property>
>> >>     <name>tajo.rootdir</name>
>> >>     <value>hdfs://namenode:9000/tajo</value>
>> >> </property>
>> >> <property>
>> >>     <name>tajo.master.umbilical-rpc.address</name>
>> >>     <value>namenode:26001</value>
>> >> </property>
>> >> <property>
>> >>     <name>tajo.master.client-rpc.address</name>
>> >>     <value>namenode:26002</value>
>> >> </property>
>> >> <property>
>> >>     <name>tajo.master.info-http.address</name>
>> >>     <value>namenode:26080</value>
>> >> </property>
>> >> <property>
>> >>     <name>tajo.catalog.client-rpc.address</name>
>> >>     <value>namenode:26005</value>
>> >> </property>
>> >> </configuration>
>> >>
>> >>
>> >> Regards,
>> >> Jae
>> >>
>> >>
>> >> 2013/11/11 Jihoon Son <gh...@gmail.com>
>> >>
>> >> > Hi Jae Lee,
>> >> > thanks for your interesting to Tajo.
>> >> >
>> >> > Here are my answers.
>> >> >
>> >> > 1. The timeout message looks like an error, but it does not mean that
>> the
>> >> > query is failed. (We should change the message.)
>> >> > Would you wait for some time after executing a query, please?
>> >> > If any other errors occur, please report it to us.
>> >> >
>> >> > 2. Tajo's SQL commands are designed to follow those of traditional
>> >> > relational databases.
>> >> > In those systems, the 'DROP table' command deletes data from disks.
>> >> > However, we are also considering the Hive-style 'DROP table', because
>> >> > tables are generally very large.
>> >> >
>> >> > 3. Tajo currently does not provide any commands to kill executing
>> >> queries.
>> >> > Instead, you should kill the master and every worker using the unix
>> >> 'kill'
>> >> > command.
>> >> >
>> >> > If you have any other questions,
>> >> > please feel free to ask us.
>> >> >
>> >> > Thanks,
>> >> > Jihoon
>> >> >
>> >> >
>> >> > 2013/11/11 Jae Lee <ot...@gmail.com>
>> >> >
>> >> > > Hello,
>> >> > >
>> >> > > :: I have error message and hang query with below.
>> >> > > It's from clustered tajo worker.
>> >> > > Centos 6.2 + hadoop 2.0.5 + tajo 0.2.0
>> >> > > Just count(*) query is working but  use distinct or group by query
>> had
>> >> > hang
>> >> > > and this error messages
>> >> > >
>> >> > > :: have more question
>> >> > > Tajo delete files on hdfs when i drop EXTERNAL table. is it normal?
>> >> > > Because Hive is not delete files when drop external table.
>> >> > >
>> >> > > :: How to can i kill tajo jobs (query)?
>> >> > >
>> >> > >
>> ---------------------------------------------------------------------
>> >> > > 2013-11-11 18:44:22,751 WARN  worker.TaskRunner
>> >> > (TaskRunner.java:run(339))
>> >> > > - Timeout
>> >> > >
>> >> > >
>> >> >
>> >>
>> GetTask:eb_1384155011466_0005_000001,container_1384155011466_0005_01_000013,
>> >> > > but retry
>> >> > > java.util.concurrent.TimeoutException
>> >> > > at org.apache.tajo.rpc.CallFuture.get(CallFuture.java:81)
>> >> > > at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:328)
>> >> > >  at java.lang.Thread.run(Thread.java:744)
>> >> > >
>> >> > >
>> >> > > Regards,
>> >> > > Jae
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jihoon Son
>> >> >
>> >> > Database & Information Systems Group,
>> >> > Prof. Yon Dohn Chung Lab.
>> >> > Dept. of Computer Science & Engineering,
>> >> > Korea University
>> >> > 1, 5-ga, Anam-dong, Seongbuk-gu,
>> >> > Seoul, 136-713, Republic of Korea
>> >> >
>> >> > Tel : +82-2-3290-3580
>> >> > E-mail : jihoonson@korea.ac.kr
>> >> >
>> >>
>>

Re: I have question with tajo query

Posted by Jae Lee <ot...@gmail.com>.

Hi Hyunsik Choi,

I will try it. Thanks a lot. :)


2013/12/6 Hyunsik Choi <hy...@apache.org>

> Hi Jae Lee,
>
> TAJO-304 (https://issues.apache.org/jira/browse/TAJO-304) was
> resolved. Now, 'DROP TABLE' does not remove data, and just remove a
> table information from catalog. And, 'DROP TABLE table_name PURGE'
> removes a table information as well as removes data.
>
> - hyunsik
>
> On Wed, Nov 13, 2013 at 11:02 PM, Jae Lee <ot...@gmail.com> wrote:
> > Hi Hyunsik Choi,
> >
> > Thank you for your reply and jira issues.
> > I loot forward to meet 0.8-incubating.
> >
> > Regards,
> > Jae.
> >
> >
> >
> > 2013/11/13 Hyunsik Choi <hy...@apache.org>
> >
> >> Hi Jae Lee,
> >>
> >> Thank you for your interest in Tajo. From your questions, I've just
> >> added two jira issues, and scheduled them to 0.8-incubating. They will
> >> be resolved soon.
> >>
> >> TAJO-304: drop table command should not remove data files in default
> >> (https://issues.apache.org/jira/browse/TAJO-304)
> >> TAJO-305: Implement killQuery feature
> >> (https://issues.apache.org/jira/browse/TAJO-305)
> >>
> >> - hyunsik
> >>
> >> On Tue, Nov 12, 2013 at 10:58 AM, Jinho Kim <jh...@apache.org> wrote:
> >> > Hi Jae,
> >> >
> >> > Could you try to do the following? I'm suspecting that some column
> values
> >> > may contain Hive's default null character '\\N'.
> >> >
> >> > CREATE EXTERNAL TABLE tablename( A int, B text, C text)
> >> >
> >> > using csv with('csvfile.delimiter'='\001','csvfile.null'='\\N')
> >> >
> >> >
> >> > Besides, in Hive, some value that cannot be parsed are dealt as NULL.
> >> But,
> >> > in the current Tajo, they causes errors.
> >> >
> >> >
> >> > Thanks
> >> > -Jinho
> >> >
> >> >
> >> > 2013/11/12 Jae Lee <ot...@gmail.com>
> >> >
> >> >> Hi Jihoon,
> >> >>
> >> >> Thank you for your answer.
> >> >> About Q1 has more question.
> >> >> I already waiting for query result long time. I think that is not
> >> normaly.
> >> >> COUNT(*) query got result only 300sec, but DISTINCT, GROUPBY and SUM
> >> query
> >> >> is excuting whole day.
> >> >> I found another error message. Please see below message.
> >> >> Error message is about integer type column name of "Year".
> >> >> The query was "select distinct year from departuredelay;"
> >> >> I was execute same query on Hive. It had no error.
> >> >> But Year column has some null or blank data.
> >> >> Table was create EXTERNAL table with several CSV files on HDFS.
> >> >>
> >>
> ---------------------------------------------------------------------------
> >> >>
> >> >> 2013-11-11 18:34:01,436 ERROR worker.Task (Task.java:run(363)) -
> >> >> java.lang.NumberFormatException: For input string: "Year"
> >> >> at
> >> >>
> >> >>
> >>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> >> >>  at java.lang.Integer.parseInt(Integer.java:492)
> >> >> at java.lang.Integer.valueOf(Integer.java:582)
> >> >> at
> org.apache.tajo.datum.DatumFactory.createInt4(DatumFactory.java:140)
> >> >>  at
> >> org.apache.tajo.storage.LazyTuple.createByTextBytes(LazyTuple.java:313)
> >> >> at org.apache.tajo.storage.LazyTuple.get(LazyTuple.java:126)
> >> >>  at org.apache.tajo.engine.eval.FieldEval.eval(FieldEval.java:58)
> >> >> at org.apache.tajo.engine.planner.Projector.eval(Projector.java:87)
> >> >>  at
> >> >>
> >> >>
> >>
> org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:111)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.tajo.engine.planner.physical.HashAggregateExec.compute(HashAggregateExec.java:57)
> >> >>  at
> >> >>
> >> >>
> >>
> org.apache.tajo.engine.planner.physical.HashAggregateExec.next(HashAggregateExec.java:83)
> >> >> at
> >> >>
> >> >>
> >>
> org.apache.tajo.engine.planner.physical.PartitionedStoreExec.next(PartitionedStoreExec.java:121)
> >> >>  at org.apache.tajo.worker.Task.run(Task.java:355)
> >> >> at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:376)
> >> >>  at java.lang.Thread.run(Thread.java:744)
> >> >>
> >> >>
> >> >>
> >>
> -----------------------------------------------------------------------------
> >> >>
> >> >>
> >> >>
> >> >> Also I attach file tajo-site.xml.
> >> >> Please check my config is correct.
> >> >>
> >> >> hostname | hadoop | tajo | DBMS
> >> >> namenode | namenode | master | Maria (Metastore)
> >> >> snamenode | snamenode+datanode1 | worker
> >> >> datanode02 | datanode2 | worker
> >> >> datanode03 | datanode3 | worker
> >> >>
> >> >>
> >> >>
> >>
> -----------------------------------------------------------------------------
> >> >> <configuration>
> >> >> <property>
> >> >>     <name>tajo.rootdir</name>
> >> >>     <value>hdfs://namenode:9000/tajo</value>
> >> >> </property>
> >> >> <property>
> >> >>     <name>tajo.master.umbilical-rpc.address</name>
> >> >>     <value>namenode:26001</value>
> >> >> </property>
> >> >> <property>
> >> >>     <name>tajo.master.client-rpc.address</name>
> >> >>     <value>namenode:26002</value>
> >> >> </property>
> >> >> <property>
> >> >>     <name>tajo.master.info-http.address</name>
> >> >>     <value>namenode:26080</value>
> >> >> </property>
> >> >> <property>
> >> >>     <name>tajo.catalog.client-rpc.address</name>
> >> >>     <value>namenode:26005</value>
> >> >> </property>
> >> >> </configuration>
> >> >>
> >> >>
> >> >> Regards,
> >> >> Jae
> >> >>
> >> >>
> >> >> 2013/11/11 Jihoon Son <gh...@gmail.com>
> >> >>
> >> >> > Hi Jae Lee,
> >> >> > thanks for your interesting to Tajo.
> >> >> >
> >> >> > Here are my answers.
> >> >> >
> >> >> > 1. The timeout message looks like an error, but it does not mean
> that
> >> the
> >> >> > query is failed. (We should change the message.)
> >> >> > Would you wait for some time after executing a query, please?
> >> >> > If any other errors occur, please report it to us.
> >> >> >
> >> >> > 2. Tajo's SQL commands are designed to follow those of traditional
> >> >> > relational databases.
> >> >> > In those systems, the 'DROP table' command deletes data from disks.
> >> >> > However, we are also considering the Hive-style 'DROP table',
> because
> >> >> > tables are generally very large.
> >> >> >
> >> >> > 3. Tajo currently does not provide any commands to kill executing
> >> >> queries.
> >> >> > Instead, you should kill the master and every worker using the unix
> >> >> 'kill'
> >> >> > command.
> >> >> >
> >> >> > If you have any other questions,
> >> >> > please feel free to ask us.
> >> >> >
> >> >> > Thanks,
> >> >> > Jihoon
> >> >> >
> >> >> >
> >> >> > 2013/11/11 Jae Lee <ot...@gmail.com>
> >> >> >
> >> >> > > Hello,
> >> >> > >
> >> >> > > :: I have error message and hang query with below.
> >> >> > > It's from clustered tajo worker.
> >> >> > > Centos 6.2 + hadoop 2.0.5 + tajo 0.2.0
> >> >> > > Just count(*) query is working but  use distinct or group by
> query
> >> had
> >> >> > hang
> >> >> > > and this error messages
> >> >> > >
> >> >> > > :: have more question
> >> >> > > Tajo delete files on hdfs when i drop EXTERNAL table. is it
> normal?
> >> >> > > Because Hive is not delete files when drop external table.
> >> >> > >
> >> >> > > :: How to can i kill tajo jobs (query)?
> >> >> > >
> >> >> > >
> >> ---------------------------------------------------------------------
> >> >> > > 2013-11-11 18:44:22,751 WARN  worker.TaskRunner
> >> >> > (TaskRunner.java:run(339))
> >> >> > > - Timeout
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >>
> GetTask:eb_1384155011466_0005_000001,container_1384155011466_0005_01_000013,
> >> >> > > but retry
> >> >> > > java.util.concurrent.TimeoutException
> >> >> > > at org.apache.tajo.rpc.CallFuture.get(CallFuture.java:81)
> >> >> > > at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:328)
> >> >> > >  at java.lang.Thread.run(Thread.java:744)
> >> >> > >
> >> >> > >
> >> >> > > Regards,
> >> >> > > Jae
> >> >> > >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jihoon Son
> >> >> >
> >> >> > Database & Information Systems Group,
> >> >> > Prof. Yon Dohn Chung Lab.
> >> >> > Dept. of Computer Science & Engineering,
> >> >> > Korea University
> >> >> > 1, 5-ga, Anam-dong, Seongbuk-gu,
> >> >> > Seoul, 136-713, Republic of Korea
> >> >> >
> >> >> > Tel : +82-2-3290-3580
> >> >> > E-mail : jihoonson@korea.ac.kr
> >> >> >
> >> >>
> >>
>