You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by "Nikos R. Katsipoulakis" <ni...@gmail.com> on 2017/01/11 15:37:30 UTC

Additional information on JSON_SUB_SCAN operator and access to query profiles not from the Web UI

Hello all,

I am a new user of Apache Drill and I am in the process of better
understanding its internals. To that end, I have two questions, for which I
was unable to find more information online.

First, when I execute an EXPLAIN command for a query that gets its data
from JSON files, I see a physical operator named JSON_SUB_SCAN. What does
that operator exactly do? Is it only used for parsing (extracting) fields
from JSON data? Or does it perform additional processing? As far as I know,
Drill uses Jackson Streaming API for extracting JSON data. Is that still
true? Finally, what is the equivalent operator for CSV files?

Second, I need to access query profiles from a server that is behind a
firewall. Therefore, accessing the URL of that machine on port 8047 is a
headache (since I have to submit a ticket to IT Support). My question is
whether I can access the Query Profiles in any other way? Like from the
sqlline or through log/profile files created while executing queries.

Thank you and Kind Regards,

-- 
Nikos R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

Re: Additional information on JSON_SUB_SCAN operator and access to query profiles not from the Web UI

Posted by Parth Chandra <pc...@mapr.com>.

The time in the query profile is the exact time. The console (sqlline) will include time it took to get data back from the server (and possibly time to format/display the data).

Query planning and optimization time is not explicitly reported but you can get an idea of it by looking at the first start time for all the fragments. The earliest start time will be the time the server took to plan and optimize.

________________________________
From: Nikos R. Katsipoulakis <ni...@gmail.com>
Sent: Thursday, January 19, 2017 4:10:59 PM
To: user@drill.apache.org
Subject: Re: Additional information on JSON_SUB_SCAN operator and access to query profiles not from the Web UI

Hello again,

Thank you Parth for your suggestions! I will try to follow your
instructions and do something like you suggested on the server.

In addition, I noticed something odd: When I execute a query on Drill's
console (terminal) I get an execution time (let's say) X. When I get the
execution profile from the profiler on the Web Console, I see that an
execution time Y is reported, which is always less than X. From what I
understand, the profiler does not include in its timer some additional
operations, which are included on the time reported on the Drill Console.
Why does the previous happen? Is there any chance that in the execution
times reported in Drill's console are included additional startup costs for
a query (like query parsing, evaluation, optimization etc.)? If yes, can I
get an exact breakdown of the time spent for a query?

Thank you,
Nikos

On Thu, Jan 19, 2017 at 5:48 PM, Parth Chandra <pc...@mapr.com> wrote:

> JSON_SUB_SAN is the Json reader. It uses Jackson to do actual parsing, and
> converts the data into Drill's internal value vector format. TEXT_SUB_SCAN
> is the corresponding operator for csv.
>
> If the Drill system has access to the /log/profile directory then you can,
> in fact, use Drill to query the json in the query profile. You might want
> to setup an nfs location for the query profiles,so that the directory is
> visible to all drillbits.  The simply create a new workspace pointing to
> the directory. You will be able to read the profiles like any other Json
> file.
>
> ________________________________
> From: Nikos R. Katsipoulakis <ni...@gmail.com>
> Sent: Wednesday, January 11, 2017 7:37:30 AM
> To: user@drill.apache.org
> Subject: Additional information on JSON_SUB_SCAN operator and access to
> query profiles not from the Web UI
>
> Hello all,
>
> I am a new user of Apache Drill and I am in the process of better
> understanding its internals. To that end, I have two questions, for which I
> was unable to find more information online.
>
> First, when I execute an EXPLAIN command for a query that gets its data
> from JSON files, I see a physical operator named JSON_SUB_SCAN. What does
> that operator exactly do? Is it only used for parsing (extracting) fields
> from JSON data? Or does it perform additional processing? As far as I know,
> Drill uses Jackson Streaming API for extracting JSON data. Is that still
> true? Finally, what is the equivalent operator for CSV files?
>
> Second, I need to access query profiles from a server that is behind a
> firewall. Therefore, accessing the URL of that machine on port 8047 is a
> headache (since I have to submit a ticket to IT Support). My question is
> whether I can access the Query Profiles in any other way? Like from the
> sqlline or through log/profile files created while executing queries.
>
> Thank you and Kind Regards,
>
> --
> Nikos R. Katsipoulakis,
> Department of Computer Science
> University of Pittsburgh
>

--
Nikos R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

Re: Additional information on JSON_SUB_SCAN operator and access to query profiles not from the Web UI

Posted by "Nikos R. Katsipoulakis" <ni...@gmail.com>.

Hello again,

Thank you Parth for your suggestions! I will try to follow your
instructions and do something like you suggested on the server.

In addition, I noticed something odd: When I execute a query on Drill's
console (terminal) I get an execution time (let's say) X. When I get the
execution profile from the profiler on the Web Console, I see that an
execution time Y is reported, which is always less than X. From what I
understand, the profiler does not include in its timer some additional
operations, which are included on the time reported on the Drill Console.
Why does the previous happen? Is there any chance that in the execution
times reported in Drill's console are included additional startup costs for
a query (like query parsing, evaluation, optimization etc.)? If yes, can I
get an exact breakdown of the time spent for a query?

Thank you,
Nikos

On Thu, Jan 19, 2017 at 5:48 PM, Parth Chandra <pc...@mapr.com> wrote:

> JSON_SUB_SAN is the Json reader. It uses Jackson to do actual parsing, and
> converts the data into Drill's internal value vector format. TEXT_SUB_SCAN
> is the corresponding operator for csv.
>
> If the Drill system has access to the /log/profile directory then you can,
> in fact, use Drill to query the json in the query profile. You might want
> to setup an nfs location for the query profiles,so that the directory is
> visible to all drillbits.  The simply create a new workspace pointing to
> the directory. You will be able to read the profiles like any other Json
> file.
>
> ________________________________
> From: Nikos R. Katsipoulakis <ni...@gmail.com>
> Sent: Wednesday, January 11, 2017 7:37:30 AM
> To: user@drill.apache.org
> Subject: Additional information on JSON_SUB_SCAN operator and access to
> query profiles not from the Web UI
>
> Hello all,
>
> I am a new user of Apache Drill and I am in the process of better
> understanding its internals. To that end, I have two questions, for which I
> was unable to find more information online.
>
> First, when I execute an EXPLAIN command for a query that gets its data
> from JSON files, I see a physical operator named JSON_SUB_SCAN. What does
> that operator exactly do? Is it only used for parsing (extracting) fields
> from JSON data? Or does it perform additional processing? As far as I know,
> Drill uses Jackson Streaming API for extracting JSON data. Is that still
> true? Finally, what is the equivalent operator for CSV files?
>
> Second, I need to access query profiles from a server that is behind a
> firewall. Therefore, accessing the URL of that machine on port 8047 is a
> headache (since I have to submit a ticket to IT Support). My question is
> whether I can access the Query Profiles in any other way? Like from the
> sqlline or through log/profile files created while executing queries.
>
> Thank you and Kind Regards,
>
> --
> Nikos R. Katsipoulakis,
> Department of Computer Science
> University of Pittsburgh
>

-- 
Nikos R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

Re: Additional information on JSON_SUB_SCAN operator and access to query profiles not from the Web UI

Posted by Parth Chandra <pc...@mapr.com>.

JSON_SUB_SAN is the Json reader. It uses Jackson to do actual parsing, and converts the data into Drill's internal value vector format. TEXT_SUB_SCAN is the corresponding operator for csv.

If the Drill system has access to the /log/profile directory then you can, in fact, use Drill to query the json in the query profile. You might want to setup an nfs location for the query profiles,so that the directory is visible to all drillbits.  The simply create a new workspace pointing to the directory. You will be able to read the profiles like any other Json file.

________________________________
From: Nikos R. Katsipoulakis <ni...@gmail.com>
Sent: Wednesday, January 11, 2017 7:37:30 AM
To: user@drill.apache.org
Subject: Additional information on JSON_SUB_SCAN operator and access to query profiles not from the Web UI

Hello all,

I am a new user of Apache Drill and I am in the process of better
understanding its internals. To that end, I have two questions, for which I
was unable to find more information online.

First, when I execute an EXPLAIN command for a query that gets its data
from JSON files, I see a physical operator named JSON_SUB_SCAN. What does
that operator exactly do? Is it only used for parsing (extracting) fields
from JSON data? Or does it perform additional processing? As far as I know,
Drill uses Jackson Streaming API for extracting JSON data. Is that still
true? Finally, what is the equivalent operator for CSV files?

Second, I need to access query profiles from a server that is behind a
firewall. Therefore, accessing the URL of that machine on port 8047 is a
headache (since I have to submit a ticket to IT Support). My question is
whether I can access the Query Profiles in any other way? Like from the
sqlline or through log/profile files created while executing queries.

Thank you and Kind Regards,

--
Nikos R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh