You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by 王亮 <wa...@gmail.com> on 2018/08/05 08:55:12 UTC

Query local files in different machines?

Hi all,

I have apache HTTP server logs in different machines and want to query
these log files.

So I  install the drill (distributed mode) in these machines, for example,
node1,node2.

I use  this command:
sqlline –u jdbc:drill:zk:node1,node2
or
sqlline –u jdbc:drill:drillbit:node1,node2

then input query like: select count(*) from dfs.`/apache/logs/access_log`
I could only get the data of one machine.

Maybe I can upload all logs file to s3 or Hadoop.
But is there an easy way to query all local files in different machines by
drill?

If we need develop the new features to support this requirement, How much
work we should do?  for example, only revise the physical plan distribution
codes? or need write the completely new data source plugin?

I found these discussions, but seems no clear answer.

https://stackoverflow.com/questions/29365320/apache-drill-in-distributed-mode

http://mail-archives.apache.org/mod_mbox/drill-user/201506.mbox/thread

https://stackoverflow.com/questions/33952568/how-to-configure-drill-to-use-all-the-nodes-for-a-query-by-creating-multiple-fr

Thanks,

Wang Liang

Re: Query local files in different machines?

Posted by Vitalii Diravka <vi...@gmail.com>.
If other queries are acceptable, you can use something similar to:
0: jdbc:drill:> select sum(`ROWS`) `TOTAL_NUMBER` from (select count(*) as
`ROWS` from cp.`tpch/nation.parquet` union all select count(*) as `ROWS`
from cp.`tpch/region.parquet`);
+-------------------------+
| TOTAL_NUMBER  |
+-------------------------+
|              30              |
+-------------------------+
1 row selected (0.324 seconds)

Kind regards
Vitalii


On Sun, Aug 5, 2018 at 9:56 PM 王亮 <wa...@gmail.com> wrote:

> Hi all,
>
> I have apache HTTP server logs in different machines and want to query
> these log files.
>
> So I  install the drill (distributed mode) in these machines, for example,
> node1,node2.
>
> I use  this command:
> sqlline –u jdbc:drill:zk:node1,node2
> or
> sqlline –u jdbc:drill:drillbit:node1,node2
>
> then input query like: select count(*) from dfs.`/apache/logs/access_log`
> I could only get the data of one machine.
>
> Maybe I can upload all logs file to s3 or Hadoop.
> But is there an easy way to query all local files in different machines by
> drill?
>
> If we need develop the new features to support this requirement, How much
> work we should do?  for example, only revise the physical plan distribution
> codes? or need write the completely new data source plugin?
>
> I found these discussions, but seems no clear answer.
>
>
> https://stackoverflow.com/questions/29365320/apache-drill-in-distributed-mode
>
> http://mail-archives.apache.org/mod_mbox/drill-user/201506.mbox/thread
>
>
> https://stackoverflow.com/questions/33952568/how-to-configure-drill-to-use-all-the-nodes-for-a-query-by-creating-multiple-fr
>
> Thanks,
>
> Wang Liang
>

Re: Query local files in different machines?

Posted by Padma Penumarthy <pe...@gmail.com>.
you need DFS i.e. Hadoop with global file system namespace provided by
NameNode.
Planning is done by single drill node,  which is the foreman for the query.
It will look for files through file system API. Local file system can know
only about files on that node.
So, I don't  think what you want to do is possible.

Thanks
Padma




On Sun, Aug 5, 2018 at 1:55 AM, 王亮 <wa...@gmail.com> wrote:

> Hi all,
>
> I have apache HTTP server logs in different machines and want to query
> these log files.
>
> So I  install the drill (distributed mode) in these machines, for example,
> node1,node2.
>
> I use  this command:
> sqlline –u jdbc:drill:zk:node1,node2
> or
> sqlline –u jdbc:drill:drillbit:node1,node2
>
> then input query like: select count(*) from dfs.`/apache/logs/access_log`
> I could only get the data of one machine.
>
> Maybe I can upload all logs file to s3 or Hadoop.
> But is there an easy way to query all local files in different machines by
> drill?
>
> If we need develop the new features to support this requirement, How much
> work we should do?  for example, only revise the physical plan distribution
> codes? or need write the completely new data source plugin?
>
> I found these discussions, but seems no clear answer.
>
> https://stackoverflow.com/questions/29365320/apache-
> drill-in-distributed-mode
>
> http://mail-archives.apache.org/mod_mbox/drill-user/201506.mbox/thread
>
> https://stackoverflow.com/questions/33952568/how-to-
> configure-drill-to-use-all-the-nodes-for-a-query-by-creating-multiple-fr
>
> Thanks,
>
> Wang Liang
>