You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Malathi <ma...@gmail.com> on 2015/08/20 13:13:58 UTC

Need help in querying HDFS from drill

Hi,

I have drill and zookeeper installed in my laptop. I started HDFS in my
laptop and see that I can query the csv and json files in HDFS. Now I
wanted to query the files located in another laptop. Hence I started hdfs
in the other laptop and when I gave the select * query, it failed(though I
can execute `show files` query without issues).

The error I am getting is there in the dropbox link:
https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0

Environment : Both the laptops running Ubuntu
Apache drill version : 1.1.0

I have the following questions:
1) Is it possible to run drill in a machine outside hadoop cluster and
query the hdfs files in the cluster?
2) If yes, is there any need of additional configuration change?

Thanks,
Malathi

Re: Need help in querying HDFS from drill

Posted by Malathi <ma...@gmail.com>.
Hi Jason,

Thanks for the reply.

When I queried the same file via drill from the local machine, it worked
fine. Also I tried to get the file to local file system using hadoop cli
from another laptop and it worked fine. So I don't think there is a problem
with the hdfs file.

Also I ran fsck and the datanode looks healthy. Can somebody please suggest
the way to figure out what's wrong with my setup.

Thanks,
Malathi

On Thu, Aug 20, 2015, 8:55 PM Jason Altekruse <al...@gmail.com>
wrote:

> If files are available through the HDFS API, which includes remote reads,
> Drill is able to read the files. A good use case for Drill is actually
> installing on a subset of your nodes to save the overhead of running the
> server everywhere while still being able to query all of your data. I have
> not seen this error before, but it looks like a low level HDFS error.
> Someone might have a better way to suggest testing this, but could you try
> to write a simple program (could be a map-reduce program, pig script etc.)
> to read the file and see if it is successful?
>
> On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:
>
> > Hi,
> >
> > I have drill and zookeeper installed in my laptop. I started HDFS in my
> > laptop and see that I can query the csv and json files in HDFS. Now I
> > wanted to query the files located in another laptop. Hence I started hdfs
> > in the other laptop and when I gave the select * query, it failed(though
> I
> > can execute `show files` query without issues).
> >
> > The error I am getting is there in the dropbox link:
> > https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
> >
> > Environment : Both the laptops running Ubuntu
> > Apache drill version : 1.1.0
> >
> > I have the following questions:
> > 1) Is it possible to run drill in a machine outside hadoop cluster and
> > query the hdfs files in the cluster?
> > 2) If yes, is there any need of additional configuration change?
> >
> > Thanks,
> > Malathi
> >
>

Re: Need help in querying HDFS from drill

Posted by Jason Altekruse <al...@gmail.com>.
If files are available through the HDFS API, which includes remote reads,
Drill is able to read the files. A good use case for Drill is actually
installing on a subset of your nodes to save the overhead of running the
server everywhere while still being able to query all of your data. I have
not seen this error before, but it looks like a low level HDFS error.
Someone might have a better way to suggest testing this, but could you try
to write a simple program (could be a map-reduce program, pig script etc.)
to read the file and see if it is successful?

On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:

> Hi,
>
> I have drill and zookeeper installed in my laptop. I started HDFS in my
> laptop and see that I can query the csv and json files in HDFS. Now I
> wanted to query the files located in another laptop. Hence I started hdfs
> in the other laptop and when I gave the select * query, it failed(though I
> can execute `show files` query without issues).
>
> The error I am getting is there in the dropbox link:
> https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
>
> Environment : Both the laptops running Ubuntu
> Apache drill version : 1.1.0
>
> I have the following questions:
> 1) Is it possible to run drill in a machine outside hadoop cluster and
> query the hdfs files in the cluster?
> 2) If yes, is there any need of additional configuration change?
>
> Thanks,
> Malathi
>

Re: Need help in querying HDFS from drill

Posted by USC <hs...@usc.edu>.
Hi Malathi,
You can configure the data storage by following the steps here:
https://drill.apache.org/docs/connect-a-data-source/

On the other hand, though it is fine to run drill outside the hdfs cluster, performance wise, it is better to spin drill on each node. That helps leverage data locality.

Sent from my iPhone

> On Aug 20, 2015, at 10:21 AM, Ted Dunning <te...@gmail.com> wrote:
> 
> Some specific answers here.
> 
>> On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:
>> 
>> I have the following questions:
>> 1) Is it possible to run drill in a machine outside hadoop cluster and
>> query the hdfs files in the cluster?
> 
> Yes.  Absolutely.
> 
> 
>> 2) If yes, is there any need of additional configuration change?
> 
> Yes.
> 
> You have to set up a data source that points to the external data.

Re: Need help in querying HDFS from drill

Posted by Venki Korukanti <ve...@gmail.com>.
Also can you check the logs on the HDFS datanodes/namenode which may have
clues regarding why the client can't obtain the block.

On Sun, Aug 23, 2015 at 11:58 PM, Venki Korukanti <venki.korukanti@gmail.com
> wrote:

> Hi,
>
> On Sun, Aug 23, 2015 at 10:26 PM, Malathi <ma...@gmail.com> wrote:
>
>> Hi,
>>
>> The entire log of the sqlline is attached in the dropbox link:
>> https://www.dropbox.com/s/oijx9vjibk1md5x/sqlline.log?dl=0
>>
>> Error id is :
>>
>> *aa3410a8-8b11-412e-bb89-dec3e8bb8bb4*
>> P.S : And one more thing to note is that it worked when both drill and
>> apache HDFS run in centos in different machines. It doesn't work when HDFS
>> runs in ubuntu. Please let me know of there could be anything wrong with
>> the setup.
>>
> What version of HDFS are you using? Currently Drill comes with HDFS
> fileclient 2.4.1 version packaged with it. It could be due to client
> compatibility issue with the server. Not sure how HDFS handles
> compatibility across versions.
>
> Also is the version of the HDFS installed on CentOS different from Ubuntu?
>
>>
>>
>> Thanks,
>> Malathi
>>
>> On Fri, 21 Aug 2015 at 21:16 Abdel Hakim Deneche <ad...@maprtech.com>
>> wrote:
>>
>> > Malathi, I couldn't find the error message in the files you shared, I
>> only
>> > see the error id (which would be useful if I had the full Drillbit.log
>> > file) and the stack trace (but it's on the client side, it doesn't
>> actually
>> > tell us where the exception happened on the server side).
>> >
>> > Can you share the error message you saw in Sqlline ? you can also use
>> the
>> > error Id you got with the error and search for it in the Drillbit.log,
>> the
>> > first occurrence should give us the stack trace on the server side.
>> >
>> > Thanks!
>> >
>> > On Fri, Aug 21, 2015 at 1:48 AM, Malathi <ma...@gmail.com> wrote:
>> >
>> > > Select * from `test.json` was the query.
>> > >
>> > > On Fri, 21 Aug 2015 at 11:12 Ted Dunning <te...@gmail.com>
>> wrote:
>> > >
>> > > > What was the query?
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Aug 20, 2015 at 10:36 PM, Malathi <ma...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi Ted,
>> > > > >
>> > > > > I have created the data source to point to my external cluster.
>> > > > >
>> > > > > Data source configuration:
>> > > > > https://www.dropbox.com/s/g5peo43baf1bqgj/drill.config?dl=0
>> > > > >
>> > > > > Error log I am getting when issue a select * query:
>> > > > > https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
>> > > > >
>> > > > > Can you please let me know what could be the root cause of the
>> issue?
>> > > > >
>> > > > > P.S : I can issue "show files" command and it works without
>> issues.
>> > > > >
>> > > > > Thanks,
>> > > > > Malathi
>> > > > >
>> > > > > On Thu, 20 Aug 2015 at 22:51 Ted Dunning <te...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Some specific answers here.
>> > > > > >
>> > > > > > On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com>
>> > wrote:
>> > > > > >
>> > > > > > > I have the following questions:
>> > > > > > > 1) Is it possible to run drill in a machine outside hadoop
>> > cluster
>> > > > and
>> > > > > > > query the hdfs files in the cluster?
>> > > > > > >
>> > > > > >
>> > > > > > Yes.  Absolutely.
>> > > > > >
>> > > > > >
>> > > > > > > 2) If yes, is there any need of additional configuration
>> change?
>> > > > > > >
>> > > > > >
>> > > > > > Yes.
>> > > > > >
>> > > > > > You have to set up a data source that points to the external
>> data.
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > Abdelhakim Deneche
>> >
>> > Software Engineer
>> >
>> >   <http://www.mapr.com/>
>> >
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > >
>> >
>>
>
>

Re: Need help in querying HDFS from drill

Posted by Venki Korukanti <ve...@gmail.com>.
Hi,

On Sun, Aug 23, 2015 at 10:26 PM, Malathi <ma...@gmail.com> wrote:

> Hi,
>
> The entire log of the sqlline is attached in the dropbox link:
> https://www.dropbox.com/s/oijx9vjibk1md5x/sqlline.log?dl=0
>
> Error id is :
>
> *aa3410a8-8b11-412e-bb89-dec3e8bb8bb4*
> P.S : And one more thing to note is that it worked when both drill and
> apache HDFS run in centos in different machines. It doesn't work when HDFS
> runs in ubuntu. Please let me know of there could be anything wrong with
> the setup.
>
What version of HDFS are you using? Currently Drill comes with HDFS
fileclient 2.4.1 version packaged with it. It could be due to client
compatibility issue with the server. Not sure how HDFS handles
compatibility across versions.

Also is the version of the HDFS installed on CentOS different from Ubuntu?

>
>
> Thanks,
> Malathi
>
> On Fri, 21 Aug 2015 at 21:16 Abdel Hakim Deneche <ad...@maprtech.com>
> wrote:
>
> > Malathi, I couldn't find the error message in the files you shared, I
> only
> > see the error id (which would be useful if I had the full Drillbit.log
> > file) and the stack trace (but it's on the client side, it doesn't
> actually
> > tell us where the exception happened on the server side).
> >
> > Can you share the error message you saw in Sqlline ? you can also use the
> > error Id you got with the error and search for it in the Drillbit.log,
> the
> > first occurrence should give us the stack trace on the server side.
> >
> > Thanks!
> >
> > On Fri, Aug 21, 2015 at 1:48 AM, Malathi <ma...@gmail.com> wrote:
> >
> > > Select * from `test.json` was the query.
> > >
> > > On Fri, 21 Aug 2015 at 11:12 Ted Dunning <te...@gmail.com>
> wrote:
> > >
> > > > What was the query?
> > > >
> > > >
> > > >
> > > > On Thu, Aug 20, 2015 at 10:36 PM, Malathi <ma...@gmail.com>
> wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > I have created the data source to point to my external cluster.
> > > > >
> > > > > Data source configuration:
> > > > > https://www.dropbox.com/s/g5peo43baf1bqgj/drill.config?dl=0
> > > > >
> > > > > Error log I am getting when issue a select * query:
> > > > > https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
> > > > >
> > > > > Can you please let me know what could be the root cause of the
> issue?
> > > > >
> > > > > P.S : I can issue "show files" command and it works without issues.
> > > > >
> > > > > Thanks,
> > > > > Malathi
> > > > >
> > > > > On Thu, 20 Aug 2015 at 22:51 Ted Dunning <te...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Some specific answers here.
> > > > > >
> > > > > > On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > I have the following questions:
> > > > > > > 1) Is it possible to run drill in a machine outside hadoop
> > cluster
> > > > and
> > > > > > > query the hdfs files in the cluster?
> > > > > > >
> > > > > >
> > > > > > Yes.  Absolutely.
> > > > > >
> > > > > >
> > > > > > > 2) If yes, is there any need of additional configuration
> change?
> > > > > > >
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > You have to set up a data source that points to the external
> data.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Re: Need help in querying HDFS from drill

Posted by Malathi <ma...@gmail.com>.
Hi,

The entire log of the sqlline is attached in the dropbox link:
https://www.dropbox.com/s/oijx9vjibk1md5x/sqlline.log?dl=0

Error id is :

*aa3410a8-8b11-412e-bb89-dec3e8bb8bb4*
P.S : And one more thing to note is that it worked when both drill and
apache HDFS run in centos in different machines. It doesn't work when HDFS
runs in ubuntu. Please let me know of there could be anything wrong with
the setup.


Thanks,
Malathi

On Fri, 21 Aug 2015 at 21:16 Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> Malathi, I couldn't find the error message in the files you shared, I only
> see the error id (which would be useful if I had the full Drillbit.log
> file) and the stack trace (but it's on the client side, it doesn't actually
> tell us where the exception happened on the server side).
>
> Can you share the error message you saw in Sqlline ? you can also use the
> error Id you got with the error and search for it in the Drillbit.log, the
> first occurrence should give us the stack trace on the server side.
>
> Thanks!
>
> On Fri, Aug 21, 2015 at 1:48 AM, Malathi <ma...@gmail.com> wrote:
>
> > Select * from `test.json` was the query.
> >
> > On Fri, 21 Aug 2015 at 11:12 Ted Dunning <te...@gmail.com> wrote:
> >
> > > What was the query?
> > >
> > >
> > >
> > > On Thu, Aug 20, 2015 at 10:36 PM, Malathi <ma...@gmail.com> wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > I have created the data source to point to my external cluster.
> > > >
> > > > Data source configuration:
> > > > https://www.dropbox.com/s/g5peo43baf1bqgj/drill.config?dl=0
> > > >
> > > > Error log I am getting when issue a select * query:
> > > > https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
> > > >
> > > > Can you please let me know what could be the root cause of the issue?
> > > >
> > > > P.S : I can issue "show files" command and it works without issues.
> > > >
> > > > Thanks,
> > > > Malathi
> > > >
> > > > On Thu, 20 Aug 2015 at 22:51 Ted Dunning <te...@gmail.com>
> > wrote:
> > > >
> > > > > Some specific answers here.
> > > > >
> > > > > On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com>
> wrote:
> > > > >
> > > > > > I have the following questions:
> > > > > > 1) Is it possible to run drill in a machine outside hadoop
> cluster
> > > and
> > > > > > query the hdfs files in the cluster?
> > > > > >
> > > > >
> > > > > Yes.  Absolutely.
> > > > >
> > > > >
> > > > > > 2) If yes, is there any need of additional configuration change?
> > > > > >
> > > > >
> > > > > Yes.
> > > > >
> > > > > You have to set up a data source that points to the external data.
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Need help in querying HDFS from drill

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
Malathi, I couldn't find the error message in the files you shared, I only
see the error id (which would be useful if I had the full Drillbit.log
file) and the stack trace (but it's on the client side, it doesn't actually
tell us where the exception happened on the server side).

Can you share the error message you saw in Sqlline ? you can also use the
error Id you got with the error and search for it in the Drillbit.log, the
first occurrence should give us the stack trace on the server side.

Thanks!

On Fri, Aug 21, 2015 at 1:48 AM, Malathi <ma...@gmail.com> wrote:

> Select * from `test.json` was the query.
>
> On Fri, 21 Aug 2015 at 11:12 Ted Dunning <te...@gmail.com> wrote:
>
> > What was the query?
> >
> >
> >
> > On Thu, Aug 20, 2015 at 10:36 PM, Malathi <ma...@gmail.com> wrote:
> >
> > > Hi Ted,
> > >
> > > I have created the data source to point to my external cluster.
> > >
> > > Data source configuration:
> > > https://www.dropbox.com/s/g5peo43baf1bqgj/drill.config?dl=0
> > >
> > > Error log I am getting when issue a select * query:
> > > https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
> > >
> > > Can you please let me know what could be the root cause of the issue?
> > >
> > > P.S : I can issue "show files" command and it works without issues.
> > >
> > > Thanks,
> > > Malathi
> > >
> > > On Thu, 20 Aug 2015 at 22:51 Ted Dunning <te...@gmail.com>
> wrote:
> > >
> > > > Some specific answers here.
> > > >
> > > > On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:
> > > >
> > > > > I have the following questions:
> > > > > 1) Is it possible to run drill in a machine outside hadoop cluster
> > and
> > > > > query the hdfs files in the cluster?
> > > > >
> > > >
> > > > Yes.  Absolutely.
> > > >
> > > >
> > > > > 2) If yes, is there any need of additional configuration change?
> > > > >
> > > >
> > > > Yes.
> > > >
> > > > You have to set up a data source that points to the external data.
> > > >
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Need help in querying HDFS from drill

Posted by Malathi <ma...@gmail.com>.
Select * from `test.json` was the query.

On Fri, 21 Aug 2015 at 11:12 Ted Dunning <te...@gmail.com> wrote:

> What was the query?
>
>
>
> On Thu, Aug 20, 2015 at 10:36 PM, Malathi <ma...@gmail.com> wrote:
>
> > Hi Ted,
> >
> > I have created the data source to point to my external cluster.
> >
> > Data source configuration:
> > https://www.dropbox.com/s/g5peo43baf1bqgj/drill.config?dl=0
> >
> > Error log I am getting when issue a select * query:
> > https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
> >
> > Can you please let me know what could be the root cause of the issue?
> >
> > P.S : I can issue "show files" command and it works without issues.
> >
> > Thanks,
> > Malathi
> >
> > On Thu, 20 Aug 2015 at 22:51 Ted Dunning <te...@gmail.com> wrote:
> >
> > > Some specific answers here.
> > >
> > > On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:
> > >
> > > > I have the following questions:
> > > > 1) Is it possible to run drill in a machine outside hadoop cluster
> and
> > > > query the hdfs files in the cluster?
> > > >
> > >
> > > Yes.  Absolutely.
> > >
> > >
> > > > 2) If yes, is there any need of additional configuration change?
> > > >
> > >
> > > Yes.
> > >
> > > You have to set up a data source that points to the external data.
> > >
> >
>

Re: Need help in querying HDFS from drill

Posted by Ted Dunning <te...@gmail.com>.
What was the query?



On Thu, Aug 20, 2015 at 10:36 PM, Malathi <ma...@gmail.com> wrote:

> Hi Ted,
>
> I have created the data source to point to my external cluster.
>
> Data source configuration:
> https://www.dropbox.com/s/g5peo43baf1bqgj/drill.config?dl=0
>
> Error log I am getting when issue a select * query:
> https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0
>
> Can you please let me know what could be the root cause of the issue?
>
> P.S : I can issue "show files" command and it works without issues.
>
> Thanks,
> Malathi
>
> On Thu, 20 Aug 2015 at 22:51 Ted Dunning <te...@gmail.com> wrote:
>
> > Some specific answers here.
> >
> > On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:
> >
> > > I have the following questions:
> > > 1) Is it possible to run drill in a machine outside hadoop cluster and
> > > query the hdfs files in the cluster?
> > >
> >
> > Yes.  Absolutely.
> >
> >
> > > 2) If yes, is there any need of additional configuration change?
> > >
> >
> > Yes.
> >
> > You have to set up a data source that points to the external data.
> >
>

Re: Need help in querying HDFS from drill

Posted by Malathi <ma...@gmail.com>.
Hi Ted,

I have created the data source to point to my external cluster.

Data source configuration:
https://www.dropbox.com/s/g5peo43baf1bqgj/drill.config?dl=0

Error log I am getting when issue a select * query:
https://www.dropbox.com/s/5bgyw4jetweczoj/drill.log?dl=0

Can you please let me know what could be the root cause of the issue?

P.S : I can issue "show files" command and it works without issues.

Thanks,
Malathi

On Thu, 20 Aug 2015 at 22:51 Ted Dunning <te...@gmail.com> wrote:

> Some specific answers here.
>
> On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:
>
> > I have the following questions:
> > 1) Is it possible to run drill in a machine outside hadoop cluster and
> > query the hdfs files in the cluster?
> >
>
> Yes.  Absolutely.
>
>
> > 2) If yes, is there any need of additional configuration change?
> >
>
> Yes.
>
> You have to set up a data source that points to the external data.
>

Re: Need help in querying HDFS from drill

Posted by Ted Dunning <te...@gmail.com>.
Some specific answers here.

On Thu, Aug 20, 2015 at 4:13 AM, Malathi <ma...@gmail.com> wrote:

> I have the following questions:
> 1) Is it possible to run drill in a machine outside hadoop cluster and
> query the hdfs files in the cluster?
>

Yes.  Absolutely.


> 2) If yes, is there any need of additional configuration change?
>

Yes.

You have to set up a data source that points to the external data.