You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Paolo Spanevello <pa...@gmail.com> on 2016/01/10 13:23:11 UTC

JSON File, Total numbers Record: 1

Hi all,

i'm trying to query the file that you can find in attach with drill apache
1.4 . The result of this qurey is always 1 record.

The query that i'm running is :

SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t

If i run the similar query with the file donuts.json found on
https://drill.apache.org/docs/sample-data-donuts/ the query runs properly.

SELECT t.topping FROM donuts.json t

Thanks in advance.

Paolo

Re: JSON File, Total numbers Record: 1

Posted by Paolo Spanevello <pa...@gmail.com>.

Hi Jason,

You are right,  it probably makes sense to introduce such a concept. We
could be free from the DataSet.

Could I support or apply some requirement on the web?

Best,
Paolo

2016-01-14 17:36 GMT+01:00 Jason Altekruse <al...@gmail.com>:

> You can use * to propagate all traits up. The only problem with this is
> that it will pull up a copy of the list you are flattening as well (copying
> the complete list next to each record in the flattened out version of the
> list). Databases have always frustratingly lacked a feature of 'select *
> (excluding the large field I don't need to read, that just strains the
> database because someone thought we should put binaries in the table)'.
> Considering the liberal types of data Drill can query, it probably makes
> sense to introduce such a concept.
>
> I'm going to create a few views to make this easier to process. Note to
> create view you need to be in a writable workspace. By default the dfs.tmp
> workspace is the only writable one, see this page for configuring a more
> permanent workspace to store your views in
> https://drill.apache.org/docs/workspaces/
>
> Here is a simple view that can copy up all of the fields, although it will
> contain extra copies of the first list level that was flattened.
>
> create or replace view ride_db_view as select
> flatten(t.flat_rides.INTERVALS) as flat_intervals, t.* from (select
> version,flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t;
>
> Looking at the data is kind of hard with the size of the records, but I
> tried to write a few queries to illustrate this point.
>
> Here is a simple select out of the view above, selecting out of the list
> of flattened records in the outermost query, it returns 80 records:
> select t.flat_intervals as first_int from ride_db_view as t;
>
> You can also grab the first element out of the un_flattened version of the
> same list out of the view, this however is confusing because it still
> returns 80 records, because flat_rides now has been copied next to each
> flattened interval, so a bunch of these are repeats:
> select t.flat_rides.intervals[0] as first_int from ride_db_view as t;
>
> Here you can filter the results to find where the first element in the
> inner list matches outermost flattened one, this gets you back to the size
> of the root level list of rides, Drill currently doesn't implement map
> equality so I used the combination of the start and stop time to compare
> the maps:
> select * from (select t.flat_rides.intervals[0] as first_int,
> t.flat_intervals from ride_db_view as t) as t2 where t2.first_int.`start` =
> t2.flat_intervals.`start` AND t2.first_int.stop = t2.flat_intervals.stop;
>
> Thankfully for you, your dataset is well structured to avoid most of this
> headache, as a bunch of your fields are nested into maps, and Drill allows
> you to select the whole map by name to move it up to the next outer query.
>
> Here is a little bit longer query that selects all of the data, without
> keeping around extra copies of the lists, you only have to enumerate the
> few fields at the top of each "RIDE" record.
> create or replace view ride_db_view as select
> flatten(t.flat_rides.INTERVALS) as flat_intervals, t.flat_rides.METRICS,
> t.flat_rides.TAGS,t.flat_rides.`filename`, t.flat_rides.`date`,
> t.flat_rides.`fingerprint`, t.flat_rides.`crc`, t.flat_rides.`metacrc`,
> t.flat_rides.           `timestamp`, t.flat_rides.`dbversion`,
> t.flat_rides.`color`,t.flat_rides.`present`,t.flat_rides.`isRun`,t.flat_rides.`isSwim`,t.flat_rides.`weight`,t.flat_rides.`samples`
> from (select version,flatten(rides) as flat_rides from
> dfs.tmp.`rideDB.json`) as t;
>
> Here you can see that flat_rides is no longer available:
> >> select t.flat_rides.intervals[0] as first_int from ride_db_view as t;
> Jan 14, 2016 8:22:42 AM
> org.apache.calcite.sql.validate.SqlValidatorException <init>
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Column
> 'flat_rides' not found in table 't'
>
> Here is what the output data looks like, all of the long list of columns
> are nested under metrics, tags and flat_intervals.
>
> 0: jdbc:drill:zk=local> select * from ride_db_view limit 1;
>
> +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
> | flat_intervals | metrics | tags | filename | date_col | fingerprint |
> crc | metacrc | timestamp_col | dbversion | color | present | isRun |
> isSwim | weight | samples |
>
> +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
> | {"name":"Attivita' intera
> ","start":"0","stop":"9920","startKM":"0","stopKM":"85.9888","type":"2","color":"#000080","seq":"0","METRICS":{"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}}
> |
> {"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","l3_sustain":"1298.00000","l4_sustain":"440.00000","l5_sustain":"1782.00000","l6_sustain":"298.00000","l7_sustain":"123.00000","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}
> | {"Athlete":"prova ","Calendar Text":"Bike\n ","Data":"TDSPHC-A-L-E---
> ","Device":"SRM ","Device Info":" ","File Format":"
> ","Filename":"2014_12_02_11_10_23.json ","Month":"dicembre ","Notes":"
> ","Source Filename":"2014_12_02_11_10_23_2014_12_02_11_10_23.json
> ","Sport":"Bike ","Weekday":"mar ","Year":"2014 "} |
> 2014_12_02_11_10_23.json | 2014/12/02 10:10:23 UTC | 104091 | 0 | 65172 |
> 1451321862 | 115 | #010101 | TDSPHC-A-L-E--- | 0 | 0 | 75 | 1 |
>
> +----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
>
>
>
> On Thu, Jan 14, 2016 at 3:22 AM, Paolo Spanevello <pa...@gmail.com>
> wrote:
>
>> Ciao to All,
>>
>> thanks for your answer, it works!
>>
>> Above the query and the result
>>
>> select tt.Athlete, tt.ride_date, tt.skiba_wprime_low from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
>> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low from
>> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as
>> t) as tt where tt.flat_intervals.name = 'Attivita'' intera '
>>
>>   prova 2014/12/02 10:10:23 UTC
>> 2.42418
>>   prova 2014/12/03 08:31:21 UTC 1.95597
>>   prova 2014/12/05 10:11:35 UTC 1.90420
>>   prova 2014/12/17 08:27:33 UTC 1.72623
>>
>>
>> Now i'm writing all of you to understand if there is a way to drill down
>> all the attributes, without specific each single attribute otherwise I have
>> to write down 145 attributes into the query, and just to be more complicate
>> to maintenance the drill, the drill is into the subquery, it means that I
>> have to write down in the query twice the list of attribute, so 145*2.
>>
>> The query it could look like this:
>>
>> select tt.Athlete, tt.ride_date,
>> tt.skiba_wprime_low,tt.skiba_wprime_max,.....**145 attributes*,
>> tt.aerobic_decoupling from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
>> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low,
>> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_max as skiba_wprime_max,.....**145
>> attributes*, t.flat_Rides.INTERVALS.METRICS.aerobic_decoupling as
>> aerobic_decoupling from
>> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as
>> t) as tt where tt.flat_intervals.name = 'Attivita'' intera '
>>
>>
>> Thanks again for your support.
>>
>> Best regards,
>> Paolo
>>
>>
>> 2016-01-14 0:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>>
>>> The issue you have now is that you are passing an array into KVGEN, which
>>> only works on maps. You can just flatten it again instead.
>>>
>>> Looking more closely at your dataset, I don't think you have any maps
>>> that
>>> would require KVGEN to analyze them, they all have keys that appear in
>>> all
>>> records, KVGEN is just for sparse maps. An example would be if you put
>>> user
>>> ID as the key, in which case you wouldn't want to create thousands of
>>> different SQL statements (using each user_id as a column name) to analyze
>>> different users.
>>>
>>> You should just use the dot notation to access nested members of one of
>>> the
>>> maps after you've flattened down to the correct level.
>>>
>>> Example (note the backticks around `start` as it is a SQL reserved word):
>>> select t2.flat_intervals.name, t2.flat_intervals.`start`,
>>> t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as
>>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select
>>> flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2;
>>>
>>> I also noticed that Metrics appears at two different levels, so here are
>>> two examples for reaching both levels:
>>>
>>> select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides)
>>> as
>>> flat_rides from dfs.tmp.`rideDB.json`) as t
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | 2.42418    |
>>> | -13.54943  |
>>> | 0.83858    |
>>> | null       |
>>> +------------+
>>>
>>> This query also shows how you can include a field next to the array when
>>> flattening and bring it up the the outer query if you still need this
>>> data
>>> associated with the flattened data (note `date` is also a reserved word).
>>>
>>> select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low,
>>> t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as
>>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low,
>>> t.flat_rides.`date`
>>> as ride_date from (select flatten(rides) as flat_rides from
>>> dfs.tmp.`rideDB.json`) as t) as t2;
>>>
>>> +-------------------+--------------------------+
>>> | skiba_wprime_low  |        ride_date         |
>>> +-------------------+--------------------------+
>>> | 2.42418           | 2014/12/02 10:10:23 UTC  |
>>> | 1.95597           | 2014/12/02 10:10:23 UTC  |
>>> | 1.90420           | 2014/12/02 10:10:23 UTC  |
>>> | 1.72623           | 2014/12/02 10:10:23 UTC  |
>>> | 1.04275           | 2014/12/02 10:10:23 UTC  |
>>> | 1.55861           | 2014/12/02 10:10:23 UTC  |
>>> | 17.73633          | 2014/12/02 10:10:23 UTC  |
>>> | 17.31987          | 2014/12/02 10:10:23 UTC  |
>>> | 18.47823          | 2014/12/02 10:10:23 UTC  |
>>> ... data clipped
>>>
>>>
>>> On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <paolospane@gmail.com
>>> >
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > thanks to all for your support.
>>> >
>>> > I'm trying as u suggest, to apply the kvgen function. Below you can
>>> find
>>> > the query and the error.
>>> >
>>> > *select tt.flat_intervals from (select
>>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>>> tt*
>>> >
>>> > I have this error message.
>>> >
>>> > Best regards,
>>> > Paolo
>>> >
>>> > ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>>> > select tt.flat_intervals from (select
>>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>>> tt
>>> > [30027]Query execution error. Details:[
>>> > SYSTEM ERROR: DrillRuntimeException: kvgen function only supports
>>> Simple
>>> > maps as input
>>> >
>>> > Fragment 0:0
>>> >
>>> > [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
>>> > ]
>>> >
>>> > 2016-01-13 20:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>>> >
>>> >> Wasn't thinking about that, I was seeing the images because I was
>>> included
>>> >> directly in the recipients lists.
>>> >>
>>> >> +1 on text for searchability.
>>> >>
>>> >> Can you please rerun the queries and post the errors here as text?
>>> >>
>>> >> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <cm...@mapr.com>
>>> >> wrote:
>>> >>
>>> >> > The apache mailing lists don't support image attachments, it would
>>> be
>>> >> > better to attach text anyway since it's searchable.
>>> >> >
>>> >> > Chris Matta
>>> >> > cmatta@mapr.com
>>> >> > 215-701-3146
>>> >> >
>>> >> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <
>>> paolospane@gmail.com
>>> >> >
>>> >> > wrote:
>>> >> >
>>> >> >> Hi,
>>> >> >>
>>> >> >> thanks again Jason for your support.
>>> >> >>
>>> >> >> I'm trying as u suggest, to apply the kvgen function. Below you can
>>> >> find
>>> >> >> the query and the error.
>>> >> >>
>>> >> >> *select tt.flat_intervals from (select
>>> >> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from
>>> (select
>>> >> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t )
>>> as
>>> >> tt*
>>> >> >>
>>> >> >> [image: Immagine incorporata 1]
>>> >> >>
>>> >> >>
>>> >> >> Could u pls suggest how to fix this drill?
>>> >> >>
>>> >> >> Best regards,
>>> >> >> Paolo
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <
>>> altekrusejason@gmail.com>:
>>> >> >>
>>> >> >>> This is a poor error messages that is produced when you try to
>>> >> flatten a
>>> >> >>> field that is not an array, for these fields you can just use the
>>> dot
>>> >> >>> notation to access their inner members (i.e.
>>> >> >>> flattened_array_of_maps.member_field_in_map). If you have a field
>>> >> where the
>>> >> >>> keys in a map are "unknown" or you want to do analysis on the
>>> keys,
>>> >> please
>>> >> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that
>>> reported
>>> >> this
>>> >> >>> issue a while ago to myself and will work to improve the message
>>> [2].
>>> >> >>>
>>> >> >>>
>>> >> >>> [1] - https://drill.apache.org/docs/kvgen/
>>> >> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>>> >> >>>
>>> >> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
>>> >> paolospane@gmail.com>
>>> >> >>> wrote:
>>> >> >>>
>>> >> >>>> Hi  All,
>>> >> >>>>
>>> >> >>>> Jason,I used your suggests and it works, thanks a lot!
>>> >> >>>>
>>> >> >>>> As u wrote i used a subquery to have the all list of INTERVALS
>>> as I
>>> >> >>>> show below.
>>> >> >>>>
>>> >> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from
>>> (select
>>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>>> >> >>>>
>>> >> >>>> [image: Immagine incorporata 1]
>>> >> >>>>
>>> >> >>>> The attribute "flat_intervals" is full of data that I would like
>>> to
>>> >> >>>> have separate them in several attributes as it is showed in the
>>> link:
>>> >> >>>> https://drill.apache.org/docs/flatten/
>>> >> >>>> I used a new subquery to have them:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>>> >> from
>>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from
>>> (select
>>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as
>>> t) as
>>> >> tt*
>>> >>
>>> >> >>>>
>>> >> >>>> but I have this error:
>>> >> >>>>
>>> >> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the
>>> query:
>>> >> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>>> from
>>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from
>>> (select
>>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as
>>> t) as
>>> >> tt
>>> >> >>>> [30027]Query execution error. Details:[
>>> >> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
>>> >> >>>> org.apache.drill.exec.vector.complex.MapVector to
>>> >> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>>> >> >>>>
>>> >> >>>> Fragment 0:0
>>> >> >>>>
>>> >> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
>>> >> 192.168.1.101:31010]
>>> >> >>>> ]
>>> >> >>>>
>>> >> >>>> [image: Immagine incorporata 2]
>>> >> >>>>
>>> >> >>>> Thanks in advance!
>>> >> >>>>
>>> >> >>>> Paolo
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <
>>> altekrusejason@gmail.com
>>> >> >:
>>> >> >>>>
>>> >> >>>>> Paolo,
>>> >> >>>>>
>>> >> >>>>> Drill currently reads single JSON objects as single records. If
>>> you
>>> >> >>>>> look at
>>> >> >>>>> the top of your file you can see that the root of your document
>>> is a
>>> >> >>>>> single
>>> >> >>>>> JSON object.
>>> >> >>>>>
>>> >> >>>>> Drill accepts two formats for individual records:
>>> >> >>>>>
>>> >> >>>>> The Mongo import format, a series of JSON object one after the
>>> other
>>> >> >>>>> in a
>>> >> >>>>> file, whitespace is irrelevant, each one need not be followed
>>> by a
>>> >> >>>>> newline
>>> >> >>>>>
>>> >> >>>>> {"a" : 1, "b" : "hello" }
>>> >> >>>>> {"a": 5 : "b" : "guten tag" }
>>> >> >>>>>
>>> >> >>>>> A JSON array of objects
>>> >> >>>>>
>>> >> >>>>> [
>>> >> >>>>>     {"a" : 1, "b" : "hello" },
>>> >> >>>>>     {"a" : 2, "b", "guten tag"}
>>> >> >>>>> ]
>>> >> >>>>>
>>> >> >>>>> When you have a file like this you can read it using the FLATTEN
>>> >> >>>>> functionality of Drill to turn an array into a series of
>>> records.
>>> >> >>>>> https://drill.apache.org/docs/flatten/
>>> >> >>>>>
>>> >> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>> >> >>>>>
>>> >> >>>>> To work with the data further, you can put the flatten call in a
>>> >> >>>>> subquery.
>>> >> >>>>> Here is how you can select the first element from each records
>>> list
>>> >> of
>>> >> >>>>> INTERVALS and select one of the nested fields inside of METRICS
>>> once
>>> >> >>>>> the
>>> >> >>>>> data has been flattened.
>>> >> >>>>> To analyze the array, you could flatten again to get an exploded
>>> >> >>>>> dataset
>>> >> >>>>> with one record per interval across all records
>>> >> >>>>>
>>> >> >>>>> select t.flat_rides.INTERVALS[0],
>>> >> t.flat_rides.METRICS.skiba_wprime_low
>>> >> >>>>> from (select flatten(rides) as flat_rides from
>>> >> dfs.tmp.`rideDB.json`)
>>> >> >>>>> as t;
>>> >> >>>>>
>>> >> >>>>> Here you can see that individual columns can be selected next
>>> to the
>>> >> >>>>> flatten call, this will copy the data into each new record:
>>> >> >>>>>
>>> >> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>>> >> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select
>>> flatten(rides) as
>>> >> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>> >> >>>>>
>>> >> >>>>> Happy Drilling!
>>> >> >>>>>
>>> >> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
>>> >> >>>>> paolospane@gmail.com>
>>> >> >>>>> wrote:
>>> >> >>>>>
>>> >> >>>>> > Hi all,
>>> >> >>>>> >
>>> >> >>>>> > i'm trying to query the file that you can find in attach with
>>> >> drill
>>> >> >>>>> apache
>>> >> >>>>> > 1.4 . The result of this qurey is always 1 record.
>>> >> >>>>> >
>>> >> >>>>> > The query that i'm running is :
>>> >> >>>>> >
>>> >> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>>> >> >>>>> >
>>> >> >>>>> > If i run the similar query with the file donuts.json found on
>>> >> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query
>>> runs
>>> >> >>>>> properly.
>>> >> >>>>> >
>>> >> >>>>> > SELECT t.topping FROM donuts.json t
>>> >> >>>>> >
>>> >> >>>>> > Thanks in advance.
>>> >> >>>>> >
>>> >> >>>>> > Paolo
>>> >> >>>>> >
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: JSON File, Total numbers Record: 1

Posted by Jason Altekruse <al...@gmail.com>.

You can use * to propagate all traits up. The only problem with this is
that it will pull up a copy of the list you are flattening as well (copying
the complete list next to each record in the flattened out version of the
list). Databases have always frustratingly lacked a feature of 'select *
(excluding the large field I don't need to read, that just strains the
database because someone thought we should put binaries in the table)'.
Considering the liberal types of data Drill can query, it probably makes
sense to introduce such a concept.

I'm going to create a few views to make this easier to process. Note to
create view you need to be in a writable workspace. By default the dfs.tmp
workspace is the only writable one, see this page for configuring a more
permanent workspace to store your views in
https://drill.apache.org/docs/workspaces/

Here is a simple view that can copy up all of the fields, although it will
contain extra copies of the first list level that was flattened.

create or replace view ride_db_view as select
flatten(t.flat_rides.INTERVALS) as flat_intervals, t.* from (select
version,flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t;

Looking at the data is kind of hard with the size of the records, but I
tried to write a few queries to illustrate this point.

Here is a simple select out of the view above, selecting out of the list of
flattened records in the outermost query, it returns 80 records:
select t.flat_intervals as first_int from ride_db_view as t;

You can also grab the first element out of the un_flattened version of the
same list out of the view, this however is confusing because it still
returns 80 records, because flat_rides now has been copied next to each
flattened interval, so a bunch of these are repeats:
select t.flat_rides.intervals[0] as first_int from ride_db_view as t;

Here you can filter the results to find where the first element in the
inner list matches outermost flattened one, this gets you back to the size
of the root level list of rides, Drill currently doesn't implement map
equality so I used the combination of the start and stop time to compare
the maps:
select * from (select t.flat_rides.intervals[0] as first_int,
t.flat_intervals from ride_db_view as t) as t2 where t2.first_int.`start` =
t2.flat_intervals.`start` AND t2.first_int.stop = t2.flat_intervals.stop;

Thankfully for you, your dataset is well structured to avoid most of this
headache, as a bunch of your fields are nested into maps, and Drill allows
you to select the whole map by name to move it up to the next outer query.

Here is a little bit longer query that selects all of the data, without
keeping around extra copies of the lists, you only have to enumerate the
few fields at the top of each "RIDE" record.
create or replace view ride_db_view as select
flatten(t.flat_rides.INTERVALS) as flat_intervals, t.flat_rides.METRICS,
t.flat_rides.TAGS,t.flat_rides.`filename`, t.flat_rides.`date`,
t.flat_rides.`fingerprint`, t.flat_rides.`crc`, t.flat_rides.`metacrc`,
t.flat_rides.           `timestamp`, t.flat_rides.`dbversion`,
t.flat_rides.`color`,t.flat_rides.`present`,t.flat_rides.`isRun`,t.flat_rides.`isSwim`,t.flat_rides.`weight`,t.flat_rides.`samples`
from (select version,flatten(rides) as flat_rides from
dfs.tmp.`rideDB.json`) as t;

Here you can see that flat_rides is no longer available:
>> select t.flat_rides.intervals[0] as first_int from ride_db_view as t;
Jan 14, 2016 8:22:42 AM
org.apache.calcite.sql.validate.SqlValidatorException <init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Column
'flat_rides' not found in table 't'

Here is what the output data looks like, all of the long list of columns
are nested under metrics, tags and flat_intervals.

0: jdbc:drill:zk=local> select * from ride_db_view limit 1;
+----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
| flat_intervals | metrics | tags | filename | date_col | fingerprint | crc
| metacrc | timestamp_col | dbversion | color | present | isRun | isSwim |
weight | samples |
+----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+
| {"name":"Attivita' intera
","start":"0","stop":"9920","startKM":"0","stopKM":"85.9888","type":"2","color":"#000080","seq":"0","METRICS":{"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}}
|
{"skiba_wprime_low":"2.42418","skiba_wprime_max":"87.87909","skiba_wprime_matches":"13.00000","skiba_wprime_maxmatch":"8.85200","skiba_wprime_tau":"490.00000","skiba_wprime_exp":"155.77200","skiba_wprime_watts":"15.70124","wtime_in_zone_L1":"4017.00000","wtime_in_zone_L2":"2762.00000","wtime_in_zone_L3":"2584.00000","wtime_in_zone_L4":"558.00000","skiba_cp_exp":"1627.87600","1s_peak_wpk":"12.33333","5s_peak_wpk":"9.75467","10s_peak_wpk":"7.67333","15s_peak_wpk":"6.22844","20s_peak_wpk":"6.06733","30s_peak_wpk":"5.69556","1m_peak_wpk":"4.68489","5m_peak_wpk":"3.93831","10m_peak_wpk":"3.74687","20m_peak_wpk":"3.65183","30m_peak_wpk":"3.30927","60m_peak_wpk":"2.91914","vo2max":"49.53376","average_wpk":"2.39713","trimp_points":"161.96102","trimp_100_points":"110.74976","trimp_zonal_points":"201.52667","time_in_zone_L1":"2693.00000","time_in_zone_L2":"1519.00000","time_in_zone_L3":"2306.00000","time_in_zone_L4":"1253.00000","time_in_zone_L5":"1025.00000","time_in_zone_L6":"855.00000","time_in_zone_L7":"270.00000","percent_in_zone_L1":"27.14444","percent_in_zone_L2":"15.31096","percent_in_zone_L3":"23.24362","percent_in_zone_L4":"12.62978","percent_in_zone_L5":"10.33162","percent_in_zone_L6":"8.61808","percent_in_zone_L7":"2.72150","triscore":"206.25656","l3_sustain":"1298.00000","l4_sustain":"440.00000","l5_sustain":"1782.00000","l6_sustain":"298.00000","l7_sustain":"123.00000","peak_percent":"70.79363","power_zone":"2.85570","power_fatigue_index":"99.78378","power_pacing_index":"19.43623","1s_critical_power":"925.00000","5s_critical_power":"731.60000","10s_critical_power":"575.50000","15s_critical_power":"467.13333","20s_critical_power":"455.05000","30s_critical_power":"427.16667","1m_critical_power":"351.36667","2m_critical_power":"316.74167","3m_critical_power":"308.15556","5m_critical_power":"295.37333","8m_critical_power":"287.22500","10m_critical_power":"281.01500","20m_critical_power":"273.88750","30m_critical_power":"248.19556","60m_critical_power":"218.93528","90m_critical_power":"207.03278","1m_critical_power_hr":"138.05000","5m_critical_power_hr":"153.56333","10m_critical_power_hr":"154.99167","20m_critical_power_hr":"154.51333","30m_critical_power_hr":"148.59833","60m_critical_power_hr":"141.42861","time_in_zone_H1":"1899.00000","time_in_zone_H2":"2719.00000","time_in_zone_H3":"4018.00000","time_in_zone_H4":"1285.00000","percent_in_zone_H1":"19.14122","percent_in_zone_H2":"27.40651","percent_in_zone_H3":"40.49995","percent_in_zone_H4":"12.95232","daniels_points":"154.36992","daniels_equivalent_power":"216.28054","a_coggan_np":"226.44079","a_coggan_if":"0.90576","a_coggan_tss":"226.09047","a_coggam_variability_index":"1.25951","a_friel_efficiency_factor":"1.73689","a_coggan_tssperhour":"82.04069","coggan_np":"222.89726","coggan_if":"0.89159","coggan_tss":"219.06974","coggam_variability_index":"1.23980","friel_efficiency_factor":"1.70971","coggan_tssperhour":"79.49310","a_skiba_xpower":"219.75992","a_skiba_relative_intensity":"0.87904","a_skiba_bike_score":"212.94620","a_skiba_variability_index":"1.22235","a_skiba_response_index":"1.68565","atiss_score":"216.58554","antiss_score":"8.66888","cp_setting":"250.00000","skiba_xpower":"216.28053","skiba_relative_intensity":"0.86512","skiba_bike_score":"206.25656","skiba_variability_index":"1.20299","tiss_delta":"96.15152","skiba_response_index":"1.65896","ride_count":"1.00000","workout_time":"9921.00000","time_riding":"9921.00000","time_carrying":"21.00000","total_distance":"85.99094","distance_swim":"85990.93889","climb_rating":"5.34573","athlete_weight":"75.00000","elevation_gain":"678.00000","elevation_loss":"678.00000","total_work":"1783.64800","average_speed":"31.20324","pace":"1.92288","pace_swim":"0.19229","average_power":"179.78510","average_apower":"182.33790","nonzero_power":"221.73645","average_hr":"130.37121","average_ct":"38.07369","heartbeats":"21552.53333","hrpw":"1.37902","wb":"384.42133","ap_percent_max":"17.97851","hrnp":"1.70971","average_cad":"80.57447","average_temp":"12.14676","max_power":"925.00000","max_heartrate":"165.00000","max_ct":"38.63611","max_speed":"70.70000","max_cadence":"117.00000","max_temp":"18.00000","ninety_five_percent_hr":"158.00000","vam":"246.02359","eoa":"1.40003","gradient":"0.78846","meanpowervariance":"61.43749","maxpowervariance":"843.00000","total_kcalories":"1773.95946","aerobic_decoupling":"19.19399"}
| {"Athlete":"prova ","Calendar Text":"Bike\n ","Data":"TDSPHC-A-L-E---
","Device":"SRM ","Device Info":" ","File Format":"
","Filename":"2014_12_02_11_10_23.json ","Month":"dicembre ","Notes":"
","Source Filename":"2014_12_02_11_10_23_2014_12_02_11_10_23.json
","Sport":"Bike ","Weekday":"mar ","Year":"2014 "} |
2014_12_02_11_10_23.json | 2014/12/02 10:10:23 UTC | 104091 | 0 | 65172 |
1451321862 | 115 | #010101 | TDSPHC-A-L-E--- | 0 | 0 | 75 | 1 |
+----------------+---------+------+----------+----------+-------------+-----+---------+---------------+-----------+-------+---------+-------+--------+--------+---------+



On Thu, Jan 14, 2016 at 3:22 AM, Paolo Spanevello <pa...@gmail.com>
wrote:

> Ciao to All,
>
> thanks for your answer, it works!
>
> Above the query and the result
>
> select tt.Athlete, tt.ride_date, tt.skiba_wprime_low from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low from
> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as t)
> as tt where tt.flat_intervals.name = 'Attivita'' intera '
>
>  prova 2014/12/02 10:10:23 UTC
> 2.42418 prova 2014/12/03 08:31:21 UTC1.95597 prova 2014/12/05 10:11:35 UTC
> 1.90420 prova 2014/12/17 08:27:33 UTC1.72623
>
>
> Now i'm writing all of you to understand if there is a way to drill down
> all the attributes, without specific each single attribute otherwise I have
> to write down 145 attributes into the query, and just to be more complicate
> to maintenance the drill, the drill is into the subquery, it means that I
> have to write down in the query twice the list of attribute, so 145*2.
>
> The query it could look like this:
>
> select tt.Athlete, tt.ride_date,
> tt.skiba_wprime_low,tt.skiba_wprime_max,.....**145 attributes*,
> tt.aerobic_decoupling from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
> ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low,
> t.flat_Rides.INTERVALS.METRICS.skiba_wprime_max as skiba_wprime_max,.....**145
> attributes*, t.flat_Rides.INTERVALS.METRICS.aerobic_decoupling as
> aerobic_decoupling from
> (select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as t)
> as tt where tt.flat_intervals.name = 'Attivita'' intera '
>
>
> Thanks again for your support.
>
> Best regards,
> Paolo
>
>
> 2016-01-14 0:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>
>> The issue you have now is that you are passing an array into KVGEN, which
>> only works on maps. You can just flatten it again instead.
>>
>> Looking more closely at your dataset, I don't think you have any maps that
>> would require KVGEN to analyze them, they all have keys that appear in all
>> records, KVGEN is just for sparse maps. An example would be if you put
>> user
>> ID as the key, in which case you wouldn't want to create thousands of
>> different SQL statements (using each user_id as a column name) to analyze
>> different users.
>>
>> You should just use the dot notation to access nested members of one of
>> the
>> maps after you've flattened down to the correct level.
>>
>> Example (note the backticks around `start` as it is a SQL reserved word):
>> select t2.flat_intervals.name, t2.flat_intervals.`start`,
>> t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as
>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select
>> flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2;
>>
>> I also noticed that Metrics appears at two different levels, so here are
>> two examples for reaching both levels:
>>
>> select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides)
>> as
>> flat_rides from dfs.tmp.`rideDB.json`) as t
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | 2.42418    |
>> | -13.54943  |
>> | 0.83858    |
>> | null       |
>> +------------+
>>
>> This query also shows how you can include a field next to the array when
>> flattening and bring it up the the outer query if you still need this data
>> associated with the flattened data (note `date` is also a reserved word).
>>
>> select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low,
>> t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as
>> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low, t.flat_rides.`date`
>> as ride_date from (select flatten(rides) as flat_rides from
>> dfs.tmp.`rideDB.json`) as t) as t2;
>>
>> +-------------------+--------------------------+
>> | skiba_wprime_low  |        ride_date         |
>> +-------------------+--------------------------+
>> | 2.42418           | 2014/12/02 10:10:23 UTC  |
>> | 1.95597           | 2014/12/02 10:10:23 UTC  |
>> | 1.90420           | 2014/12/02 10:10:23 UTC  |
>> | 1.72623           | 2014/12/02 10:10:23 UTC  |
>> | 1.04275           | 2014/12/02 10:10:23 UTC  |
>> | 1.55861           | 2014/12/02 10:10:23 UTC  |
>> | 17.73633          | 2014/12/02 10:10:23 UTC  |
>> | 17.31987          | 2014/12/02 10:10:23 UTC  |
>> | 18.47823          | 2014/12/02 10:10:23 UTC  |
>> ... data clipped
>>
>>
>> On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <pa...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > thanks to all for your support.
>> >
>> > I'm trying as u suggest, to apply the kvgen function. Below you can find
>> > the query and the error.
>> >
>> > *select tt.flat_intervals from (select
>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>> tt*
>> >
>> > I have this error message.
>> >
>> > Best regards,
>> > Paolo
>> >
>> > ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>> > select tt.flat_intervals from (select
>> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt
>> > [30027]Query execution error. Details:[
>> > SYSTEM ERROR: DrillRuntimeException: kvgen function only supports Simple
>> > maps as input
>> >
>> > Fragment 0:0
>> >
>> > [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
>> > ]
>> >
>> > 2016-01-13 20:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>> >
>> >> Wasn't thinking about that, I was seeing the images because I was
>> included
>> >> directly in the recipients lists.
>> >>
>> >> +1 on text for searchability.
>> >>
>> >> Can you please rerun the queries and post the errors here as text?
>> >>
>> >> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <cm...@mapr.com>
>> >> wrote:
>> >>
>> >> > The apache mailing lists don't support image attachments, it would be
>> >> > better to attach text anyway since it's searchable.
>> >> >
>> >> > Chris Matta
>> >> > cmatta@mapr.com
>> >> > 215-701-3146
>> >> >
>> >> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <
>> paolospane@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> thanks again Jason for your support.
>> >> >>
>> >> >> I'm trying as u suggest, to apply the kvgen function. Below you can
>> >> find
>> >> >> the query and the error.
>> >> >>
>> >> >> *select tt.flat_intervals from (select
>> >> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from
>> (select
>> >> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t )
>> as
>> >> tt*
>> >> >>
>> >> >> [image: Immagine incorporata 1]
>> >> >>
>> >> >>
>> >> >> Could u pls suggest how to fix this drill?
>> >> >>
>> >> >> Best regards,
>> >> >> Paolo
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <
>> altekrusejason@gmail.com>:
>> >> >>
>> >> >>> This is a poor error messages that is produced when you try to
>> >> flatten a
>> >> >>> field that is not an array, for these fields you can just use the
>> dot
>> >> >>> notation to access their inner members (i.e.
>> >> >>> flattened_array_of_maps.member_field_in_map). If you have a field
>> >> where the
>> >> >>> keys in a map are "unknown" or you want to do analysis on the keys,
>> >> please
>> >> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported
>> >> this
>> >> >>> issue a while ago to myself and will work to improve the message
>> [2].
>> >> >>>
>> >> >>>
>> >> >>> [1] - https://drill.apache.org/docs/kvgen/
>> >> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>> >> >>>
>> >> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
>> >> paolospane@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Hi  All,
>> >> >>>>
>> >> >>>> Jason,I used your suggests and it works, thanks a lot!
>> >> >>>>
>> >> >>>> As u wrote i used a subquery to have the all list of INTERVALS as
>> I
>> >> >>>> show below.
>> >> >>>>
>> >> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>> >> >>>>
>> >> >>>> [image: Immagine incorporata 1]
>> >> >>>>
>> >> >>>> The attribute "flat_intervals" is full of data that I would like
>> to
>> >> >>>> have separate them in several attributes as it is showed in the
>> link:
>> >> >>>> https://drill.apache.org/docs/flatten/
>> >> >>>> I used a new subquery to have them:
>> >> >>>>
>> >> >>>>
>> >> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>> >> from
>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t)
>> as
>> >> tt*
>> >>
>> >> >>>>
>> >> >>>> but I have this error:
>> >> >>>>
>> >> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the
>> query:
>> >> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>> from
>> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t)
>> as
>> >> tt
>> >> >>>> [30027]Query execution error. Details:[
>> >> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
>> >> >>>> org.apache.drill.exec.vector.complex.MapVector to
>> >> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>> >> >>>>
>> >> >>>> Fragment 0:0
>> >> >>>>
>> >> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
>> >> 192.168.1.101:31010]
>> >> >>>> ]
>> >> >>>>
>> >> >>>> [image: Immagine incorporata 2]
>> >> >>>>
>> >> >>>> Thanks in advance!
>> >> >>>>
>> >> >>>> Paolo
>> >> >>>>
>> >> >>>>
>> >> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <
>> altekrusejason@gmail.com
>> >> >:
>> >> >>>>
>> >> >>>>> Paolo,
>> >> >>>>>
>> >> >>>>> Drill currently reads single JSON objects as single records. If
>> you
>> >> >>>>> look at
>> >> >>>>> the top of your file you can see that the root of your document
>> is a
>> >> >>>>> single
>> >> >>>>> JSON object.
>> >> >>>>>
>> >> >>>>> Drill accepts two formats for individual records:
>> >> >>>>>
>> >> >>>>> The Mongo import format, a series of JSON object one after the
>> other
>> >> >>>>> in a
>> >> >>>>> file, whitespace is irrelevant, each one need not be followed by
>> a
>> >> >>>>> newline
>> >> >>>>>
>> >> >>>>> {"a" : 1, "b" : "hello" }
>> >> >>>>> {"a": 5 : "b" : "guten tag" }
>> >> >>>>>
>> >> >>>>> A JSON array of objects
>> >> >>>>>
>> >> >>>>> [
>> >> >>>>>     {"a" : 1, "b" : "hello" },
>> >> >>>>>     {"a" : 2, "b", "guten tag"}
>> >> >>>>> ]
>> >> >>>>>
>> >> >>>>> When you have a file like this you can read it using the FLATTEN
>> >> >>>>> functionality of Drill to turn an array into a series of records.
>> >> >>>>> https://drill.apache.org/docs/flatten/
>> >> >>>>>
>> >> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>> >> >>>>>
>> >> >>>>> To work with the data further, you can put the flatten call in a
>> >> >>>>> subquery.
>> >> >>>>> Here is how you can select the first element from each records
>> list
>> >> of
>> >> >>>>> INTERVALS and select one of the nested fields inside of METRICS
>> once
>> >> >>>>> the
>> >> >>>>> data has been flattened.
>> >> >>>>> To analyze the array, you could flatten again to get an exploded
>> >> >>>>> dataset
>> >> >>>>> with one record per interval across all records
>> >> >>>>>
>> >> >>>>> select t.flat_rides.INTERVALS[0],
>> >> t.flat_rides.METRICS.skiba_wprime_low
>> >> >>>>> from (select flatten(rides) as flat_rides from
>> >> dfs.tmp.`rideDB.json`)
>> >> >>>>> as t;
>> >> >>>>>
>> >> >>>>> Here you can see that individual columns can be selected next to
>> the
>> >> >>>>> flatten call, this will copy the data into each new record:
>> >> >>>>>
>> >> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> >> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select
>> flatten(rides) as
>> >> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>> >> >>>>>
>> >> >>>>> Happy Drilling!
>> >> >>>>>
>> >> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
>> >> >>>>> paolospane@gmail.com>
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>> > Hi all,
>> >> >>>>> >
>> >> >>>>> > i'm trying to query the file that you can find in attach with
>> >> drill
>> >> >>>>> apache
>> >> >>>>> > 1.4 . The result of this qurey is always 1 record.
>> >> >>>>> >
>> >> >>>>> > The query that i'm running is :
>> >> >>>>> >
>> >> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>> >> >>>>> >
>> >> >>>>> > If i run the similar query with the file donuts.json found on
>> >> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query
>> runs
>> >> >>>>> properly.
>> >> >>>>> >
>> >> >>>>> > SELECT t.topping FROM donuts.json t
>> >> >>>>> >
>> >> >>>>> > Thanks in advance.
>> >> >>>>> >
>> >> >>>>> > Paolo
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: JSON File, Total numbers Record: 1

Posted by Paolo Spanevello <pa...@gmail.com>.

Ciao to All,

thanks for your answer, it works!

Above the query and the result

select tt.Athlete, tt.ride_date, tt.skiba_wprime_low from
(select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low from
(select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as t)
as tt where tt.flat_intervals.name = 'Attivita'' intera '

 prova 2014/12/02 10:10:23 UTC
2.42418 prova 2014/12/03 08:31:21 UTC1.95597 prova 2014/12/05 10:11:35 UTC
1.90420 prova 2014/12/17 08:27:33 UTC1.72623


Now i'm writing all of you to understand if there is a way to drill down
all the attributes, without specific each single attribute otherwise I have
to write down 145 attributes into the query, and just to be more complicate
to maintenance the drill, the drill is into the subquery, it means that I
have to write down in the query twice the list of attribute, so 145*2.

The query it could look like this:

select tt.Athlete, tt.ride_date,
tt.skiba_wprime_low,tt.skiba_wprime_max,.....**145 attributes*,
tt.aerobic_decoupling from
(select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.`date` as
ride_date,flatten(t.flat_rides.INTERVALS) as flat_intervals,
t.flat_Rides.INTERVALS.METRICS.skiba_wprime_low as skiba_wprime_low,
t.flat_Rides.INTERVALS.METRICS.skiba_wprime_max as skiba_wprime_max,.....**145
attributes*, t.flat_Rides.INTERVALS.METRICS.aerobic_decoupling as
aerobic_decoupling from
(select flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider2`) as t)
as tt where tt.flat_intervals.name = 'Attivita'' intera '


Thanks again for your support.

Best regards,
Paolo


2016-01-14 0:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:

> The issue you have now is that you are passing an array into KVGEN, which
> only works on maps. You can just flatten it again instead.
>
> Looking more closely at your dataset, I don't think you have any maps that
> would require KVGEN to analyze them, they all have keys that appear in all
> records, KVGEN is just for sparse maps. An example would be if you put user
> ID as the key, in which case you wouldn't want to create thousands of
> different SQL statements (using each user_id as a column name) to analyze
> different users.
>
> You should just use the dot notation to access nested members of one of the
> maps after you've flattened down to the correct level.
>
> Example (note the backticks around `start` as it is a SQL reserved word):
> select t2.flat_intervals.name, t2.flat_intervals.`start`,
> t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as
> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select
> flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2;
>
> I also noticed that Metrics appears at two different levels, so here are
> two examples for reaching both levels:
>
> select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
> flat_rides from dfs.tmp.`rideDB.json`) as t
> +------------+
> |   EXPR$0   |
> +------------+
> | 2.42418    |
> | -13.54943  |
> | 0.83858    |
> | null       |
> +------------+
>
> This query also shows how you can include a field next to the array when
> flattening and bring it up the the outer query if you still need this data
> associated with the flattened data (note `date` is also a reserved word).
>
> select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low,
> t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as
> flat_intervals, t.flat_rides.METRICS.skiba_wprime_low, t.flat_rides.`date`
> as ride_date from (select flatten(rides) as flat_rides from
> dfs.tmp.`rideDB.json`) as t) as t2;
>
> +-------------------+--------------------------+
> | skiba_wprime_low  |        ride_date         |
> +-------------------+--------------------------+
> | 2.42418           | 2014/12/02 10:10:23 UTC  |
> | 1.95597           | 2014/12/02 10:10:23 UTC  |
> | 1.90420           | 2014/12/02 10:10:23 UTC  |
> | 1.72623           | 2014/12/02 10:10:23 UTC  |
> | 1.04275           | 2014/12/02 10:10:23 UTC  |
> | 1.55861           | 2014/12/02 10:10:23 UTC  |
> | 17.73633          | 2014/12/02 10:10:23 UTC  |
> | 17.31987          | 2014/12/02 10:10:23 UTC  |
> | 18.47823          | 2014/12/02 10:10:23 UTC  |
> ... data clipped
>
>
> On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <pa...@gmail.com>
> wrote:
>
> > Hi,
> >
> > thanks to all for your support.
> >
> > I'm trying as u suggest, to apply the kvgen function. Below you can find
> > the query and the error.
> >
> > *select tt.flat_intervals from (select
> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*
> >
> > I have this error message.
> >
> > Best regards,
> > Paolo
> >
> > ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> > select tt.flat_intervals from (select
> > flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> > flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt
> > [30027]Query execution error. Details:[
> > SYSTEM ERROR: DrillRuntimeException: kvgen function only supports Simple
> > maps as input
> >
> > Fragment 0:0
> >
> > [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
> > ]
> >
> > 2016-01-13 20:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:
> >
> >> Wasn't thinking about that, I was seeing the images because I was
> included
> >> directly in the recipients lists.
> >>
> >> +1 on text for searchability.
> >>
> >> Can you please rerun the queries and post the errors here as text?
> >>
> >> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <cm...@mapr.com>
> >> wrote:
> >>
> >> > The apache mailing lists don't support image attachments, it would be
> >> > better to attach text anyway since it's searchable.
> >> >
> >> > Chris Matta
> >> > cmatta@mapr.com
> >> > 215-701-3146
> >> >
> >> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <
> paolospane@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> thanks again Jason for your support.
> >> >>
> >> >> I'm trying as u suggest, to apply the kvgen function. Below you can
> >> find
> >> >> the query and the error.
> >> >>
> >> >> *select tt.flat_intervals from (select
> >> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> >> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
> >> tt*
> >> >>
> >> >> [image: Immagine incorporata 1]
> >> >>
> >> >>
> >> >> Could u pls suggest how to fix this drill?
> >> >>
> >> >> Best regards,
> >> >> Paolo
> >> >>
> >> >>
> >> >>
> >> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <altekrusejason@gmail.com
> >:
> >> >>
> >> >>> This is a poor error messages that is produced when you try to
> >> flatten a
> >> >>> field that is not an array, for these fields you can just use the
> dot
> >> >>> notation to access their inner members (i.e.
> >> >>> flattened_array_of_maps.member_field_in_map). If you have a field
> >> where the
> >> >>> keys in a map are "unknown" or you want to do analysis on the keys,
> >> please
> >> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported
> >> this
> >> >>> issue a while ago to myself and will work to improve the message
> [2].
> >> >>>
> >> >>>
> >> >>> [1] - https://drill.apache.org/docs/kvgen/
> >> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
> >> >>>
> >> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
> >> paolospane@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Hi  All,
> >> >>>>
> >> >>>> Jason,I used your suggests and it works, thanks a lot!
> >> >>>>
> >> >>>> As u wrote i used a subquery to have the all list of INTERVALS as I
> >> >>>> show below.
> >> >>>>
> >> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
> >> >>>>
> >> >>>> [image: Immagine incorporata 1]
> >> >>>>
> >> >>>> The attribute "flat_intervals" is full of data that I would like to
> >> >>>> have separate them in several attributes as it is showed in the
> link:
> >> >>>> https://drill.apache.org/docs/flatten/
> >> >>>> I used a new subquery to have them:
> >> >>>>
> >> >>>>
> >> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
> >> from
> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t)
> as
> >> tt*
> >>
> >> >>>>
> >> >>>> but I have this error:
> >> >>>>
> >> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the
> query:
> >> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
> from
> >> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> >> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> >> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t)
> as
> >> tt
> >> >>>> [30027]Query execution error. Details:[
> >> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
> >> >>>> org.apache.drill.exec.vector.complex.MapVector to
> >> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
> >> >>>>
> >> >>>> Fragment 0:0
> >> >>>>
> >> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
> >> 192.168.1.101:31010]
> >> >>>> ]
> >> >>>>
> >> >>>> [image: Immagine incorporata 2]
> >> >>>>
> >> >>>> Thanks in advance!
> >> >>>>
> >> >>>> Paolo
> >> >>>>
> >> >>>>
> >> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <
> altekrusejason@gmail.com
> >> >:
> >> >>>>
> >> >>>>> Paolo,
> >> >>>>>
> >> >>>>> Drill currently reads single JSON objects as single records. If
> you
> >> >>>>> look at
> >> >>>>> the top of your file you can see that the root of your document
> is a
> >> >>>>> single
> >> >>>>> JSON object.
> >> >>>>>
> >> >>>>> Drill accepts two formats for individual records:
> >> >>>>>
> >> >>>>> The Mongo import format, a series of JSON object one after the
> other
> >> >>>>> in a
> >> >>>>> file, whitespace is irrelevant, each one need not be followed by a
> >> >>>>> newline
> >> >>>>>
> >> >>>>> {"a" : 1, "b" : "hello" }
> >> >>>>> {"a": 5 : "b" : "guten tag" }
> >> >>>>>
> >> >>>>> A JSON array of objects
> >> >>>>>
> >> >>>>> [
> >> >>>>>     {"a" : 1, "b" : "hello" },
> >> >>>>>     {"a" : 2, "b", "guten tag"}
> >> >>>>> ]
> >> >>>>>
> >> >>>>> When you have a file like this you can read it using the FLATTEN
> >> >>>>> functionality of Drill to turn an array into a series of records.
> >> >>>>> https://drill.apache.org/docs/flatten/
> >> >>>>>
> >> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
> >> >>>>>
> >> >>>>> To work with the data further, you can put the flatten call in a
> >> >>>>> subquery.
> >> >>>>> Here is how you can select the first element from each records
> list
> >> of
> >> >>>>> INTERVALS and select one of the nested fields inside of METRICS
> once
> >> >>>>> the
> >> >>>>> data has been flattened.
> >> >>>>> To analyze the array, you could flatten again to get an exploded
> >> >>>>> dataset
> >> >>>>> with one record per interval across all records
> >> >>>>>
> >> >>>>> select t.flat_rides.INTERVALS[0],
> >> t.flat_rides.METRICS.skiba_wprime_low
> >> >>>>> from (select flatten(rides) as flat_rides from
> >> dfs.tmp.`rideDB.json`)
> >> >>>>> as t;
> >> >>>>>
> >> >>>>> Here you can see that individual columns can be selected next to
> the
> >> >>>>> flatten call, this will copy the data into each new record:
> >> >>>>>
> >> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
> >> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides)
> as
> >> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
> >> >>>>>
> >> >>>>> Happy Drilling!
> >> >>>>>
> >> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
> >> >>>>> paolospane@gmail.com>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>> > Hi all,
> >> >>>>> >
> >> >>>>> > i'm trying to query the file that you can find in attach with
> >> drill
> >> >>>>> apache
> >> >>>>> > 1.4 . The result of this qurey is always 1 record.
> >> >>>>> >
> >> >>>>> > The query that i'm running is :
> >> >>>>> >
> >> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
> >> >>>>> >
> >> >>>>> > If i run the similar query with the file donuts.json found on
> >> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query
> runs
> >> >>>>> properly.
> >> >>>>> >
> >> >>>>> > SELECT t.topping FROM donuts.json t
> >> >>>>> >
> >> >>>>> > Thanks in advance.
> >> >>>>> >
> >> >>>>> > Paolo
> >> >>>>> >
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
> >
>

Re: JSON File, Total numbers Record: 1

Posted by Jason Altekruse <al...@gmail.com>.

The issue you have now is that you are passing an array into KVGEN, which
only works on maps. You can just flatten it again instead.

Looking more closely at your dataset, I don't think you have any maps that
would require KVGEN to analyze them, they all have keys that appear in all
records, KVGEN is just for sparse maps. An example would be if you put user
ID as the key, in which case you wouldn't want to create thousands of
different SQL statements (using each user_id as a column name) to analyze
different users.

You should just use the dot notation to access nested members of one of the
maps after you've flattened down to the correct level.

Example (note the backticks around `start` as it is a SQL reserved word):
select t2.flat_intervals.name, t2.flat_intervals.`start`,
t2.flat_intervals.stop from (select flatten(t.flat_rides.INTERVALS) as
flat_intervals, t.flat_rides.METRICS.skiba_wprime_low from (select
flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t) as t2;

I also noticed that Metrics appears at two different levels, so here are
two examples for reaching both levels:

select t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
flat_rides from dfs.tmp.`rideDB.json`) as t
+------------+
|   EXPR$0   |
+------------+
| 2.42418    |
| -13.54943  |
| 0.83858    |
| null       |
+------------+

This query also shows how you can include a field next to the array when
flattening and bring it up the the outer query if you still need this data
associated with the flattened data (note `date` is also a reserved word).

select t2.flat_intervals.METRICS.skiba_wprime_low as skiba_wprime_low,
t2.ride_date from (select flatten(t.flat_rides.INTERVALS) as
flat_intervals, t.flat_rides.METRICS.skiba_wprime_low, t.flat_rides.`date`
as ride_date from (select flatten(rides) as flat_rides from
dfs.tmp.`rideDB.json`) as t) as t2;

+-------------------+--------------------------+
| skiba_wprime_low  |        ride_date         |
+-------------------+--------------------------+
| 2.42418           | 2014/12/02 10:10:23 UTC  |
| 1.95597           | 2014/12/02 10:10:23 UTC  |
| 1.90420           | 2014/12/02 10:10:23 UTC  |
| 1.72623           | 2014/12/02 10:10:23 UTC  |
| 1.04275           | 2014/12/02 10:10:23 UTC  |
| 1.55861           | 2014/12/02 10:10:23 UTC  |
| 17.73633          | 2014/12/02 10:10:23 UTC  |
| 17.31987          | 2014/12/02 10:10:23 UTC  |
| 18.47823          | 2014/12/02 10:10:23 UTC  |
... data clipped


On Wed, Jan 13, 2016 at 12:26 PM, Paolo Spanevello <pa...@gmail.com>
wrote:

> Hi,
>
> thanks to all for your support.
>
> I'm trying as u suggest, to apply the kvgen function. Below you can find
> the query and the error.
>
> *select tt.flat_intervals from (select
> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*
>
> I have this error message.
>
> Best regards,
> Paolo
>
> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> select tt.flat_intervals from (select
> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt
> [30027]Query execution error. Details:[
> SYSTEM ERROR: DrillRuntimeException: kvgen function only supports Simple
> maps as input
>
> Fragment 0:0
>
> [Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
> ]
>
> 2016-01-13 20:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>
>> Wasn't thinking about that, I was seeing the images because I was included
>> directly in the recipients lists.
>>
>> +1 on text for searchability.
>>
>> Can you please rerun the queries and post the errors here as text?
>>
>> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <cm...@mapr.com>
>> wrote:
>>
>> > The apache mailing lists don't support image attachments, it would be
>> > better to attach text anyway since it's searchable.
>> >
>> > Chris Matta
>> > cmatta@mapr.com
>> > 215-701-3146
>> >
>> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <paolospane@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> thanks again Jason for your support.
>> >>
>> >> I'm trying as u suggest, to apply the kvgen function. Below you can
>> find
>> >> the query and the error.
>> >>
>> >> *select tt.flat_intervals from (select
>> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
>> tt*
>> >>
>> >> [image: Immagine incorporata 1]
>> >>
>> >>
>> >> Could u pls suggest how to fix this drill?
>> >>
>> >> Best regards,
>> >> Paolo
>> >>
>> >>
>> >>
>> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>> >>
>> >>> This is a poor error messages that is produced when you try to
>> flatten a
>> >>> field that is not an array, for these fields you can just use the dot
>> >>> notation to access their inner members (i.e.
>> >>> flattened_array_of_maps.member_field_in_map). If you have a field
>> where the
>> >>> keys in a map are "unknown" or you want to do analysis on the keys,
>> please
>> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported
>> this
>> >>> issue a while ago to myself and will work to improve the message [2].
>> >>>
>> >>>
>> >>> [1] - https://drill.apache.org/docs/kvgen/
>> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>> >>>
>> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
>> paolospane@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hi  All,
>> >>>>
>> >>>> Jason,I used your suggests and it works, thanks a lot!
>> >>>>
>> >>>> As u wrote i used a subquery to have the all list of INTERVALS as I
>> >>>> show below.
>> >>>>
>> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>> >>>>
>> >>>> [image: Immagine incorporata 1]
>> >>>>
>> >>>> The attribute "flat_intervals" is full of data that I would like to
>> >>>> have separate them in several attributes as it is showed in the link:
>> >>>> https://drill.apache.org/docs/flatten/
>> >>>> I used a new subquery to have them:
>> >>>>
>> >>>>
>> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat
>> from
>> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as
>> tt*
>>
>> >>>>
>> >>>> but I have this error:
>> >>>>
>> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as
>> tt
>> >>>> [30027]Query execution error. Details:[
>> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
>> >>>> org.apache.drill.exec.vector.complex.MapVector to
>> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>> >>>>
>> >>>> Fragment 0:0
>> >>>>
>> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
>> 192.168.1.101:31010]
>> >>>> ]
>> >>>>
>> >>>> [image: Immagine incorporata 2]
>> >>>>
>> >>>> Thanks in advance!
>> >>>>
>> >>>> Paolo
>> >>>>
>> >>>>
>> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <altekrusejason@gmail.com
>> >:
>> >>>>
>> >>>>> Paolo,
>> >>>>>
>> >>>>> Drill currently reads single JSON objects as single records. If you
>> >>>>> look at
>> >>>>> the top of your file you can see that the root of your document is a
>> >>>>> single
>> >>>>> JSON object.
>> >>>>>
>> >>>>> Drill accepts two formats for individual records:
>> >>>>>
>> >>>>> The Mongo import format, a series of JSON object one after the other
>> >>>>> in a
>> >>>>> file, whitespace is irrelevant, each one need not be followed by a
>> >>>>> newline
>> >>>>>
>> >>>>> {"a" : 1, "b" : "hello" }
>> >>>>> {"a": 5 : "b" : "guten tag" }
>> >>>>>
>> >>>>> A JSON array of objects
>> >>>>>
>> >>>>> [
>> >>>>>     {"a" : 1, "b" : "hello" },
>> >>>>>     {"a" : 2, "b", "guten tag"}
>> >>>>> ]
>> >>>>>
>> >>>>> When you have a file like this you can read it using the FLATTEN
>> >>>>> functionality of Drill to turn an array into a series of records.
>> >>>>> https://drill.apache.org/docs/flatten/
>> >>>>>
>> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>> >>>>>
>> >>>>> To work with the data further, you can put the flatten call in a
>> >>>>> subquery.
>> >>>>> Here is how you can select the first element from each records list
>> of
>> >>>>> INTERVALS and select one of the nested fields inside of METRICS once
>> >>>>> the
>> >>>>> data has been flattened.
>> >>>>> To analyze the array, you could flatten again to get an exploded
>> >>>>> dataset
>> >>>>> with one record per interval across all records
>> >>>>>
>> >>>>> select t.flat_rides.INTERVALS[0],
>> t.flat_rides.METRICS.skiba_wprime_low
>> >>>>> from (select flatten(rides) as flat_rides from
>> dfs.tmp.`rideDB.json`)
>> >>>>> as t;
>> >>>>>
>> >>>>> Here you can see that individual columns can be selected next to the
>> >>>>> flatten call, this will copy the data into each new record:
>> >>>>>
>> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>> >>>>>
>> >>>>> Happy Drilling!
>> >>>>>
>> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
>> >>>>> paolospane@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> > Hi all,
>> >>>>> >
>> >>>>> > i'm trying to query the file that you can find in attach with
>> drill
>> >>>>> apache
>> >>>>> > 1.4 . The result of this qurey is always 1 record.
>> >>>>> >
>> >>>>> > The query that i'm running is :
>> >>>>> >
>> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>> >>>>> >
>> >>>>> > If i run the similar query with the file donuts.json found on
>> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>> >>>>> properly.
>> >>>>> >
>> >>>>> > SELECT t.topping FROM donuts.json t
>> >>>>> >
>> >>>>> > Thanks in advance.
>> >>>>> >
>> >>>>> > Paolo
>> >>>>> >
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

Re: JSON File, Total numbers Record: 1

Posted by Paolo Spanevello <pa...@gmail.com>.

Hi,

thanks to all for your support.

I'm trying as u suggest, to apply the kvgen function. Below you can find
the query and the error.

*select tt.flat_intervals from (select
flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*

I have this error message.

Best regards,
Paolo

ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
select tt.flat_intervals from (select
flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt
[30027]Query execution error. Details:[
SYSTEM ERROR: DrillRuntimeException: kvgen function only supports Simple
maps as input

Fragment 0:0

[Error Id: 85f8e8ba-fb87-428a-ac2e-dea78498c222 on 10.1.0.74:31010]
]

2016-01-13 20:28 GMT+01:00 Jason Altekruse <al...@gmail.com>:

> Wasn't thinking about that, I was seeing the images because I was included
> directly in the recipients lists.
>
> +1 on text for searchability.
>
> Can you please rerun the queries and post the errors here as text?
>
> On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <cm...@mapr.com>
> wrote:
>
> > The apache mailing lists don't support image attachments, it would be
> > better to attach text anyway since it's searchable.
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> >
> > On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <pa...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> thanks again Jason for your support.
> >>
> >> I'm trying as u suggest, to apply the kvgen function. Below you can find
> >> the query and the error.
> >>
> >> *select tt.flat_intervals from (select
> >> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> >> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as
> tt*
> >>
> >> [image: Immagine incorporata 1]
> >>
> >>
> >> Could u pls suggest how to fix this drill?
> >>
> >> Best regards,
> >> Paolo
> >>
> >>
> >>
> >> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <al...@gmail.com>:
> >>
> >>> This is a poor error messages that is produced when you try to flatten
> a
> >>> field that is not an array, for these fields you can just use the dot
> >>> notation to access their inner members (i.e.
> >>> flattened_array_of_maps.member_field_in_map). If you have a field
> where the
> >>> keys in a map are "unknown" or you want to do analysis on the keys,
> please
> >>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported
> this
> >>> issue a while ago to myself and will work to improve the message [2].
> >>>
> >>>
> >>> [1] - https://drill.apache.org/docs/kvgen/
> >>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
> >>>
> >>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <
> paolospane@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi  All,
> >>>>
> >>>> Jason,I used your suggests and it works, thanks a lot!
> >>>>
> >>>> As u wrote i used a subquery to have the all list of INTERVALS as I
> >>>> show below.
> >>>>
> >>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
> >>>>
> >>>> [image: Immagine incorporata 1]
> >>>>
> >>>> The attribute "flat_intervals" is full of data that I would like to
> >>>> have separate them in several attributes as it is showed in the link:
> >>>> https://drill.apache.org/docs/flatten/
> >>>> I used a new subquery to have them:
> >>>>
> >>>>
> >>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as
> tt*
> >>>>
> >>>> but I have this error:
> >>>>
> >>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> >>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
> >>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> >>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> >>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as
> tt
> >>>> [30027]Query execution error. Details:[
> >>>> SYSTEM ERROR: ClassCastException: Cannot cast
> >>>> org.apache.drill.exec.vector.complex.MapVector to
> >>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
> >>>>
> >>>> Fragment 0:0
> >>>>
> >>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on
> 192.168.1.101:31010]
> >>>> ]
> >>>>
> >>>> [image: Immagine incorporata 2]
> >>>>
> >>>> Thanks in advance!
> >>>>
> >>>> Paolo
> >>>>
> >>>>
> >>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <altekrusejason@gmail.com
> >:
> >>>>
> >>>>> Paolo,
> >>>>>
> >>>>> Drill currently reads single JSON objects as single records. If you
> >>>>> look at
> >>>>> the top of your file you can see that the root of your document is a
> >>>>> single
> >>>>> JSON object.
> >>>>>
> >>>>> Drill accepts two formats for individual records:
> >>>>>
> >>>>> The Mongo import format, a series of JSON object one after the other
> >>>>> in a
> >>>>> file, whitespace is irrelevant, each one need not be followed by a
> >>>>> newline
> >>>>>
> >>>>> {"a" : 1, "b" : "hello" }
> >>>>> {"a": 5 : "b" : "guten tag" }
> >>>>>
> >>>>> A JSON array of objects
> >>>>>
> >>>>> [
> >>>>>     {"a" : 1, "b" : "hello" },
> >>>>>     {"a" : 2, "b", "guten tag"}
> >>>>> ]
> >>>>>
> >>>>> When you have a file like this you can read it using the FLATTEN
> >>>>> functionality of Drill to turn an array into a series of records.
> >>>>> https://drill.apache.org/docs/flatten/
> >>>>>
> >>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
> >>>>>
> >>>>> To work with the data further, you can put the flatten call in a
> >>>>> subquery.
> >>>>> Here is how you can select the first element from each records list
> of
> >>>>> INTERVALS and select one of the nested fields inside of METRICS once
> >>>>> the
> >>>>> data has been flattened.
> >>>>> To analyze the array, you could flatten again to get an exploded
> >>>>> dataset
> >>>>> with one record per interval across all records
> >>>>>
> >>>>> select t.flat_rides.INTERVALS[0],
> t.flat_rides.METRICS.skiba_wprime_low
> >>>>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`)
> >>>>> as t;
> >>>>>
> >>>>> Here you can see that individual columns can be selected next to the
> >>>>> flatten call, this will copy the data into each new record:
> >>>>>
> >>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
> >>>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
> >>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
> >>>>>
> >>>>> Happy Drilling!
> >>>>>
> >>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
> >>>>> paolospane@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> > Hi all,
> >>>>> >
> >>>>> > i'm trying to query the file that you can find in attach with drill
> >>>>> apache
> >>>>> > 1.4 . The result of this qurey is always 1 record.
> >>>>> >
> >>>>> > The query that i'm running is :
> >>>>> >
> >>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
> >>>>> >
> >>>>> > If i run the similar query with the file donuts.json found on
> >>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
> >>>>> properly.
> >>>>> >
> >>>>> > SELECT t.topping FROM donuts.json t
> >>>>> >
> >>>>> > Thanks in advance.
> >>>>> >
> >>>>> > Paolo
> >>>>> >
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: JSON File, Total numbers Record: 1

Posted by Jason Altekruse <al...@gmail.com>.

Wasn't thinking about that, I was seeing the images because I was included
directly in the recipients lists.

+1 on text for searchability.

Can you please rerun the queries and post the errors here as text?

On Wed, Jan 13, 2016 at 11:20 AM, Christopher Matta <cm...@mapr.com> wrote:

> The apache mailing lists don't support image attachments, it would be
> better to attach text anyway since it's searchable.
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
>
> On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <pa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> thanks again Jason for your support.
>>
>> I'm trying as u suggest, to apply the kvgen function. Below you can find
>> the query and the error.
>>
>> *select tt.flat_intervals from (select
>> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*
>>
>> [image: Immagine incorporata 1]
>>
>>
>> Could u pls suggest how to fix this drill?
>>
>> Best regards,
>> Paolo
>>
>>
>>
>> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>>
>>> This is a poor error messages that is produced when you try to flatten a
>>> field that is not an array, for these fields you can just use the dot
>>> notation to access their inner members (i.e.
>>> flattened_array_of_maps.member_field_in_map). If you have a field where the
>>> keys in a map are "unknown" or you want to do analysis on the keys, please
>>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported this
>>> issue a while ago to myself and will work to improve the message [2].
>>>
>>>
>>> [1] - https://drill.apache.org/docs/kvgen/
>>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>>>
>>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <pa...@gmail.com>
>>> wrote:
>>>
>>>> Hi  All,
>>>>
>>>> Jason,I used your suggests and it works, thanks a lot!
>>>>
>>>> As u wrote i used a subquery to have the all list of INTERVALS as I
>>>> show below.
>>>>
>>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>>>>
>>>> [image: Immagine incorporata 1]
>>>>
>>>> The attribute "flat_intervals" is full of data that I would like to
>>>> have separate them in several attributes as it is showed in the link:
>>>> https://drill.apache.org/docs/flatten/
>>>> I used a new subquery to have them:
>>>>
>>>>
>>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*
>>>>
>>>> but I have this error:
>>>>
>>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
>>>> [30027]Query execution error. Details:[
>>>> SYSTEM ERROR: ClassCastException: Cannot cast
>>>> org.apache.drill.exec.vector.complex.MapVector to
>>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>>>>
>>>> Fragment 0:0
>>>>
>>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
>>>> ]
>>>>
>>>> [image: Immagine incorporata 2]
>>>>
>>>> Thanks in advance!
>>>>
>>>> Paolo
>>>>
>>>>
>>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>>>>
>>>>> Paolo,
>>>>>
>>>>> Drill currently reads single JSON objects as single records. If you
>>>>> look at
>>>>> the top of your file you can see that the root of your document is a
>>>>> single
>>>>> JSON object.
>>>>>
>>>>> Drill accepts two formats for individual records:
>>>>>
>>>>> The Mongo import format, a series of JSON object one after the other
>>>>> in a
>>>>> file, whitespace is irrelevant, each one need not be followed by a
>>>>> newline
>>>>>
>>>>> {"a" : 1, "b" : "hello" }
>>>>> {"a": 5 : "b" : "guten tag" }
>>>>>
>>>>> A JSON array of objects
>>>>>
>>>>> [
>>>>>     {"a" : 1, "b" : "hello" },
>>>>>     {"a" : 2, "b", "guten tag"}
>>>>> ]
>>>>>
>>>>> When you have a file like this you can read it using the FLATTEN
>>>>> functionality of Drill to turn an array into a series of records.
>>>>> https://drill.apache.org/docs/flatten/
>>>>>
>>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>>>>
>>>>> To work with the data further, you can put the flatten call in a
>>>>> subquery.
>>>>> Here is how you can select the first element from each records list of
>>>>> INTERVALS and select one of the nested fields inside of METRICS once
>>>>> the
>>>>> data has been flattened.
>>>>> To analyze the array, you could flatten again to get an exploded
>>>>> dataset
>>>>> with one record per interval across all records
>>>>>
>>>>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
>>>>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`)
>>>>> as t;
>>>>>
>>>>> Here you can see that individual columns can be selected next to the
>>>>> flatten call, this will copy the data into each new record:
>>>>>
>>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>>>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>>>>
>>>>> Happy Drilling!
>>>>>
>>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <
>>>>> paolospane@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi all,
>>>>> >
>>>>> > i'm trying to query the file that you can find in attach with drill
>>>>> apache
>>>>> > 1.4 . The result of this qurey is always 1 record.
>>>>> >
>>>>> > The query that i'm running is :
>>>>> >
>>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>>>>> >
>>>>> > If i run the similar query with the file donuts.json found on
>>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>>>>> properly.
>>>>> >
>>>>> > SELECT t.topping FROM donuts.json t
>>>>> >
>>>>> > Thanks in advance.
>>>>> >
>>>>> > Paolo
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: JSON File, Total numbers Record: 1

Posted by Christopher Matta <cm...@mapr.com>.

The apache mailing lists don't support image attachments, it would be
better to attach text anyway since it's searchable.

Chris Matta
cmatta@mapr.com
215-701-3146

On Wed, Jan 13, 2016 at 9:55 AM, Paolo Spanevello <pa...@gmail.com>
wrote:

> Hi,
>
> thanks again Jason for your support.
>
> I'm trying as u suggest, to apply the kvgen function. Below you can find
> the query and the error.
>
> *select tt.flat_intervals from (select
> flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*
>
> [image: Immagine incorporata 1]
>
>
> Could u pls suggest how to fix this drill?
>
> Best regards,
> Paolo
>
>
>
> 2016-01-12 16:31 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>
>> This is a poor error messages that is produced when you try to flatten a
>> field that is not an array, for these fields you can just use the dot
>> notation to access their inner members (i.e.
>> flattened_array_of_maps.member_field_in_map). If you have a field where the
>> keys in a map are "unknown" or you want to do analysis on the keys, please
>> refer to the KVGEN docs [1]. I have assigned the JIRA that reported this
>> issue a while ago to myself and will work to improve the message [2].
>>
>>
>> [1] - https://drill.apache.org/docs/kvgen/
>> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>>
>> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <pa...@gmail.com>
>> wrote:
>>
>>> Hi  All,
>>>
>>> Jason,I used your suggests and it works, thanks a lot!
>>>
>>> As u wrote i used a subquery to have the all list of INTERVALS as I show
>>> below.
>>>
>>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>>>
>>> [image: Immagine incorporata 1]
>>>
>>> The attribute "flat_intervals" is full of data that I would like to have
>>> separate them in several attributes as it is showed in the link:
>>> https://drill.apache.org/docs/flatten/
>>> I used a new subquery to have them:
>>>
>>>
>>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*
>>>
>>> but I have this error:
>>>
>>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
>>> [30027]Query execution error. Details:[
>>> SYSTEM ERROR: ClassCastException: Cannot cast
>>> org.apache.drill.exec.vector.complex.MapVector to
>>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>>>
>>> Fragment 0:0
>>>
>>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
>>> ]
>>>
>>> [image: Immagine incorporata 2]
>>>
>>> Thanks in advance!
>>>
>>> Paolo
>>>
>>>
>>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>>>
>>>> Paolo,
>>>>
>>>> Drill currently reads single JSON objects as single records. If you
>>>> look at
>>>> the top of your file you can see that the root of your document is a
>>>> single
>>>> JSON object.
>>>>
>>>> Drill accepts two formats for individual records:
>>>>
>>>> The Mongo import format, a series of JSON object one after the other in
>>>> a
>>>> file, whitespace is irrelevant, each one need not be followed by a
>>>> newline
>>>>
>>>> {"a" : 1, "b" : "hello" }
>>>> {"a": 5 : "b" : "guten tag" }
>>>>
>>>> A JSON array of objects
>>>>
>>>> [
>>>>     {"a" : 1, "b" : "hello" },
>>>>     {"a" : 2, "b", "guten tag"}
>>>> ]
>>>>
>>>> When you have a file like this you can read it using the FLATTEN
>>>> functionality of Drill to turn an array into a series of records.
>>>> https://drill.apache.org/docs/flatten/
>>>>
>>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>>>
>>>> To work with the data further, you can put the flatten call in a
>>>> subquery.
>>>> Here is how you can select the first element from each records list of
>>>> INTERVALS and select one of the nested fields inside of METRICS once the
>>>> data has been flattened.
>>>> To analyze the array, you could flatten again to get an exploded dataset
>>>> with one record per interval across all records
>>>>
>>>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
>>>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`)
>>>> as t;
>>>>
>>>> Here you can see that individual columns can be selected next to the
>>>> flatten call, this will copy the data into each new record:
>>>>
>>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>>>
>>>> Happy Drilling!
>>>>
>>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <paolospane@gmail.com
>>>> >
>>>> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > i'm trying to query the file that you can find in attach with drill
>>>> apache
>>>> > 1.4 . The result of this qurey is always 1 record.
>>>> >
>>>> > The query that i'm running is :
>>>> >
>>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>>>> >
>>>> > If i run the similar query with the file donuts.json found on
>>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>>>> properly.
>>>> >
>>>> > SELECT t.topping FROM donuts.json t
>>>> >
>>>> > Thanks in advance.
>>>> >
>>>> > Paolo
>>>> >
>>>>
>>>
>>>
>>
>

Re: JSON File, Total numbers Record: 1

Posted by Paolo Spanevello <pa...@gmail.com>.

Hi,

thanks again Jason for your support.

I'm trying as u suggest, to apply the kvgen function. Below you can find
the query and the error.

*select tt.flat_intervals from (select
flatten(kvgen(t.flat_rides.INTERVALS)) as flat_intervals from (select
flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t ) as tt*

[image: Immagine incorporata 1]


Could u pls suggest how to fix this drill?

Best regards,
Paolo



2016-01-12 16:31 GMT+01:00 Jason Altekruse <al...@gmail.com>:

> This is a poor error messages that is produced when you try to flatten a
> field that is not an array, for these fields you can just use the dot
> notation to access their inner members (i.e.
> flattened_array_of_maps.member_field_in_map). If you have a field where the
> keys in a map are "unknown" or you want to do analysis on the keys, please
> refer to the KVGEN docs [1]. I have assigned the JIRA that reported this
> issue a while ago to myself and will work to improve the message [2].
>
>
> [1] - https://drill.apache.org/docs/kvgen/
> [2] - https://issues.apache.org/jira/browse/DRILL-2182
>
> On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <pa...@gmail.com>
> wrote:
>
>> Hi  All,
>>
>> Jason,I used your suggests and it works, thanks a lot!
>>
>> As u wrote i used a subquery to have the all list of INTERVALS as I show
>> below.
>>
>> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>>
>> [image: Immagine incorporata 1]
>>
>> The attribute "flat_intervals" is full of data that I would like to have
>> separate them in several attributes as it is showed in the link:
>> https://drill.apache.org/docs/flatten/
>> I used a new subquery to have them:
>>
>>
>> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*
>>
>> but I have this error:
>>
>> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
>> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
>> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
>> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
>> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
>> [30027]Query execution error. Details:[
>> SYSTEM ERROR: ClassCastException: Cannot cast
>> org.apache.drill.exec.vector.complex.MapVector to
>> org.apache.drill.exec.vector.complex.RepeatedValueVector
>>
>> Fragment 0:0
>>
>> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
>> ]
>>
>> [image: Immagine incorporata 2]
>>
>> Thanks in advance!
>>
>> Paolo
>>
>>
>> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>>
>>> Paolo,
>>>
>>> Drill currently reads single JSON objects as single records. If you look
>>> at
>>> the top of your file you can see that the root of your document is a
>>> single
>>> JSON object.
>>>
>>> Drill accepts two formats for individual records:
>>>
>>> The Mongo import format, a series of JSON object one after the other in a
>>> file, whitespace is irrelevant, each one need not be followed by a
>>> newline
>>>
>>> {"a" : 1, "b" : "hello" }
>>> {"a": 5 : "b" : "guten tag" }
>>>
>>> A JSON array of objects
>>>
>>> [
>>>     {"a" : 1, "b" : "hello" },
>>>     {"a" : 2, "b", "guten tag"}
>>> ]
>>>
>>> When you have a file like this you can read it using the FLATTEN
>>> functionality of Drill to turn an array into a series of records.
>>> https://drill.apache.org/docs/flatten/
>>>
>>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>>
>>> To work with the data further, you can put the flatten call in a
>>> subquery.
>>> Here is how you can select the first element from each records list of
>>> INTERVALS and select one of the nested fields inside of METRICS once the
>>> data has been flattened.
>>> To analyze the array, you could flatten again to get an exploded dataset
>>> with one record per interval across all records
>>>
>>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
>>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as
>>> t;
>>>
>>> Here you can see that individual columns can be selected next to the
>>> flatten call, this will copy the data into each new record:
>>>
>>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>>
>>> Happy Drilling!
>>>
>>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <pa...@gmail.com>
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > i'm trying to query the file that you can find in attach with drill
>>> apache
>>> > 1.4 . The result of this qurey is always 1 record.
>>> >
>>> > The query that i'm running is :
>>> >
>>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>>> >
>>> > If i run the similar query with the file donuts.json found on
>>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>>> properly.
>>> >
>>> > SELECT t.topping FROM donuts.json t
>>> >
>>> > Thanks in advance.
>>> >
>>> > Paolo
>>> >
>>>
>>
>>
>

Re: JSON File, Total numbers Record: 1

Posted by Jason Altekruse <al...@gmail.com>.

This is a poor error messages that is produced when you try to flatten a
field that is not an array, for these fields you can just use the dot
notation to access their inner members (i.e.
flattened_array_of_maps.member_field_in_map). If you have a field where the
keys in a map are "unknown" or you want to do analysis on the keys, please
refer to the KVGEN docs [1]. I have assigned the JIRA that reported this
issue a while ago to myself and will work to improve the message [2].


[1] - https://drill.apache.org/docs/kvgen/
[2] - https://issues.apache.org/jira/browse/DRILL-2182

On Tue, Jan 12, 2016 at 1:20 AM, Paolo Spanevello <pa...@gmail.com>
wrote:

> Hi  All,
>
> Jason,I used your suggests and it works, thanks a lot!
>
> As u wrote i used a subquery to have the all list of INTERVALS as I show
> below.
>
> *select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*
>
> [image: Immagine incorporata 1]
>
> The attribute "flat_intervals" is full of data that I would like to have
> separate them in several attributes as it is showed in the link:
> https://drill.apache.org/docs/flatten/
> I used a new subquery to have them:
>
>
> *select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*
>
> but I have this error:
>
> ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
> (select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
> crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
> flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
> [30027]Query execution error. Details:[
> SYSTEM ERROR: ClassCastException: Cannot cast
> org.apache.drill.exec.vector.complex.MapVector to
> org.apache.drill.exec.vector.complex.RepeatedValueVector
>
> Fragment 0:0
>
> [Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
> ]
>
> [image: Immagine incorporata 2]
>
> Thanks in advance!
>
> Paolo
>
>
> 2016-01-11 17:39 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>
>> Paolo,
>>
>> Drill currently reads single JSON objects as single records. If you look
>> at
>> the top of your file you can see that the root of your document is a
>> single
>> JSON object.
>>
>> Drill accepts two formats for individual records:
>>
>> The Mongo import format, a series of JSON object one after the other in a
>> file, whitespace is irrelevant, each one need not be followed by a newline
>>
>> {"a" : 1, "b" : "hello" }
>> {"a": 5 : "b" : "guten tag" }
>>
>> A JSON array of objects
>>
>> [
>>     {"a" : 1, "b" : "hello" },
>>     {"a" : 2, "b", "guten tag"}
>> ]
>>
>> When you have a file like this you can read it using the FLATTEN
>> functionality of Drill to turn an array into a series of records.
>> https://drill.apache.org/docs/flatten/
>>
>> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>>
>> To work with the data further, you can put the flatten call in a subquery.
>> Here is how you can select the first element from each records list of
>> INTERVALS and select one of the nested fields inside of METRICS once the
>> data has been flattened.
>> To analyze the array, you could flatten again to get an exploded dataset
>> with one record per interval across all records
>>
>> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
>> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as
>> t;
>>
>> Here you can see that individual columns can be selected next to the
>> flatten call, this will copy the data into each new record:
>>
>> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
>> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
>> flat_rides from dfs.tmp.`rideDB.json`) as t;
>>
>> Happy Drilling!
>>
>> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <pa...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > i'm trying to query the file that you can find in attach with drill
>> apache
>> > 1.4 . The result of this qurey is always 1 record.
>> >
>> > The query that i'm running is :
>> >
>> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>> >
>> > If i run the similar query with the file donuts.json found on
>> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
>> properly.
>> >
>> > SELECT t.topping FROM donuts.json t
>> >
>> > Thanks in advance.
>> >
>> > Paolo
>> >
>>
>
>

Re: JSON File, Total numbers Record: 1

Posted by Paolo Spanevello <pa...@gmail.com>.

Hi  All,

Jason,I used your suggests and it works, thanks a lot!

As u wrote i used a subquery to have the all list of INTERVALS as I show
below.

*select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t*

[image: Immagine incorporata 1]

The attribute "flat_intervals" is full of data that I would like to have
separate them in several attributes as it is showed in the link:
https://drill.apache.org/docs/flatten/
I used a new subquery to have them:


*select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
(select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt*

but I have this error:

ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
select tt.Athlete, tt.crc, flatten(tt.flat_intervals) as newflat from
(select t.flat_rides.tags.Athlete as Athlete,t.flat_rides.crc as
crc,flatten(t.flat_rides.INTERVALS) as flat_intervals from (select
flatten(rides) as flat_rides from `dfs`.`tmp`.`provaRider`) as t) as tt
[30027]Query execution error. Details:[
SYSTEM ERROR: ClassCastException: Cannot cast
org.apache.drill.exec.vector.complex.MapVector to
org.apache.drill.exec.vector.complex.RepeatedValueVector

Fragment 0:0

[Error Id: a22fe80f-43e4-43cb-bb98-5541ecb92d4c on 192.168.1.101:31010]
]

[image: Immagine incorporata 2]

Thanks in advance!

Paolo


2016-01-11 17:39 GMT+01:00 Jason Altekruse <al...@gmail.com>:

> Paolo,
>
> Drill currently reads single JSON objects as single records. If you look at
> the top of your file you can see that the root of your document is a single
> JSON object.
>
> Drill accepts two formats for individual records:
>
> The Mongo import format, a series of JSON object one after the other in a
> file, whitespace is irrelevant, each one need not be followed by a newline
>
> {"a" : 1, "b" : "hello" }
> {"a": 5 : "b" : "guten tag" }
>
> A JSON array of objects
>
> [
>     {"a" : 1, "b" : "hello" },
>     {"a" : 2, "b", "guten tag"}
> ]
>
> When you have a file like this you can read it using the FLATTEN
> functionality of Drill to turn an array into a series of records.
> https://drill.apache.org/docs/flatten/
>
> select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;
>
> To work with the data further, you can put the flatten call in a subquery.
> Here is how you can select the first element from each records list of
> INTERVALS and select one of the nested fields inside of METRICS once the
> data has been flattened.
> To analyze the array, you could flatten again to get an exploded dataset
> with one record per interval across all records
>
> select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
> from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t;
>
> Here you can see that individual columns can be selected next to the
> flatten call, this will copy the data into each new record:
>
> select flatten(t.flat_rides.INTERVALS) as flat_intervals,
> t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
> flat_rides from dfs.tmp.`rideDB.json`) as t;
>
> Happy Drilling!
>
> On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <pa...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > i'm trying to query the file that you can find in attach with drill
> apache
> > 1.4 . The result of this qurey is always 1 record.
> >
> > The query that i'm running is :
> >
> > SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
> >
> > If i run the similar query with the file donuts.json found on
> > https://drill.apache.org/docs/sample-data-donuts/ the query runs
> properly.
> >
> > SELECT t.topping FROM donuts.json t
> >
> > Thanks in advance.
> >
> > Paolo
> >
>

Re: JSON File, Total numbers Record: 1

Posted by Jason Altekruse <al...@gmail.com>.

Paolo,

Drill currently reads single JSON objects as single records. If you look at
the top of your file you can see that the root of your document is a single
JSON object.

Drill accepts two formats for individual records:

The Mongo import format, a series of JSON object one after the other in a
file, whitespace is irrelevant, each one need not be followed by a newline

{"a" : 1, "b" : "hello" }
{"a": 5 : "b" : "guten tag" }

A JSON array of objects

[
    {"a" : 1, "b" : "hello" },
    {"a" : 2, "b", "guten tag"}
]

When you have a file like this you can read it using the FLATTEN
functionality of Drill to turn an array into a series of records.
https://drill.apache.org/docs/flatten/

select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`;

To work with the data further, you can put the flatten call in a subquery.
Here is how you can select the first element from each records list of
INTERVALS and select one of the nested fields inside of METRICS once the
data has been flattened.
To analyze the array, you could flatten again to get an exploded dataset
with one record per interval across all records

select t.flat_rides.INTERVALS[0], t.flat_rides.METRICS.skiba_wprime_low
from (select flatten(rides) as flat_rides from dfs.tmp.`rideDB.json`) as t;

Here you can see that individual columns can be selected next to the
flatten call, this will copy the data into each new record:

select flatten(t.flat_rides.INTERVALS) as flat_intervals,
t.flat_rides.METRICS.skiba_wprime_low from (select flatten(rides) as
flat_rides from dfs.tmp.`rideDB.json`) as t;

Happy Drilling!

On Sun, Jan 10, 2016 at 4:23 AM, Paolo Spanevello <pa...@gmail.com>
wrote:

> Hi all,
>
> i'm trying to query the file that you can find in attach with drill apache
> 1.4 . The result of this qurey is always 1 record.
>
> The query that i'm running is :
>
> SELECT t.rides.INTERVALS.METRICS FROM rideDB.json t
>
> If i run the similar query with the file donuts.json found on
> https://drill.apache.org/docs/sample-data-donuts/ the query runs properly.
>
> SELECT t.topping FROM donuts.json t
>
> Thanks in advance.
>
> Paolo
>