You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lens.apache.org by Rajat Khandelwal <pr...@apache.org> on 2015/08/05 11:20:20 UTC

[DISCUSS] Time Ranges in Lens

Currently the time_range_in clause is inclusive of the start date and
exclusive of the end date. i.e. time_range_in(time_dimension, date1, date2)
translates to date1<=time_dimension<date2

Recently there have been multiple discussions regarding this approach in
our internal forums at InMobi. There have been arguments that as an
analytics system, it should accept end date inclusive. Starting this thread
to know opinions of other people regarding both approaches. I'm defining
and briefly discussing both approaches below:

Approach 1: End date exclusive.
Disclaimer: I personally prefer this approach and would shamelessly argue
in its favor. I'll give links to some of the articles that I very much
agree with (no need to reinvent the wheel of argument)

   1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder of Ksplice,
   Inc
   <https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg>
   2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
   <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
   3. Both arguments are for generic ranges. I believe time range is also a
   range and don't see why we need to handle that as a special case.


Approach 2: End date inclusive.

   1. Lens query language is a bit like sql and the BETWEEN clause in sql
   is inclusive of both start and end date.
   2. Most analytic systems --- including google analytics --- follow this
   convention


I'm posting the question on a bigger forum
<https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed>
as well in hopes of getting more opinions. I request everyone to post their
views here regarding both approaches.

Regards

Re: [DISCUSS] Time Ranges in Lens

Posted by Yash Sharma <ya...@gmail.com>.
Sorry for participating late.
I am personally in favor of exclusive end date, mainly because it puts hard
limit on end date and we don't have to provide extra date granularity, and
partly because I have got used to this. This however may not be case with
the analysts who would would find it counter intuitive.

One solution is mentioned in the link shared by Rajat. We can have default
behavior as exclusive of end date (similar to mysql), and explicit syntax
for inclusive/exclusive dates in case someone is interested:

expr1 BETWEEN expr2 [ INCL[USIVE] | EXCL[USIVE] ] AND expr3 [ INCL[USIVE] |
EXCL[USIVE] ]

Between with dates is a well known and well criticized problem and it can
be cool to have solved such a long living problem. We do need to evaluate
what is the effort for supporting this feature.

Other interesting posts:
Issues with between dates -
http://stackoverflow.com/a/749663/1562474
Motivating not to use between for dates -
http://sqlblog.com/blogs/aaron_bertrand/archive/2011/10/19/what-do-between-and-the-devil-have-in-common.aspx


On Sun, Aug 9, 2015 at 5:55 PM, Rajat Khandelwal <pr...@apache.org> wrote:

> Just to add to that, a quick google search for "sql between time range
> inclusive" gives the following links:
>
> http://stackoverflow.com/questions/16347649/sql-between-not-inclusive
>
> http://stackoverflow.com/questions/749615/does-ms-sql-servers-between-include-the-range-boundaries
>
> The following things were noticable:
>
>
>    1.
>
>    SELECT * FROM Cases WHERE created_at BETWEEN '2013-05-01' AND
> '2013-05-01'
>
>    doesn't work, Use inequalities instead of between. Notice that then
>    you'll have to add one day to the second value: ... where (created_at >=
>    '2013-05-01' and created_at < '2013-05-02')
>    2.
>
>    select * from person where dob between '2011-01-01' and '2011-01-31'
>
>    returns results without results of 31st day.
>
> One answer <http://programmers.stackexchange.com/a/262025> even goes on to
> say:
>
> RECOMMENDATION: Unless/until the SQL standard is amended, don't use BETWEEN
> for dates/times. Instead get into the habit of coding DATE range
> comparisons as independent conditions on the start and end boundaries of
> your BETWEEN range. This is a bit verbose, but will leave you writing
> conditions that are intuitive (thus less likely to be buggy) and clear to
> the database optimizers, allowing for optimal execution plans to be
> determined and indexes to be used.
>
>
>
> On Sat, Aug 8, 2015 at 9:34 PM Srikanth Sundarrajan <sr...@hotmail.com>
> wrote:
>
> > What you have done there is to extract year and have done numerical range
> > query. To my knowledge such a scheme isn't in the ANSI SQL standard
> >
> > Sent from my iPhone
> >
> > On 08-Aug-2015, at 6:22 pm, "Himanshu Gahlaut" <
> > himanshu.gahlaut@inmobi.com> wrote:
> >
> > >>>>> The reason for me calling it unique is that the grain supported in
> > the
> > > range can be different from the time grain of the data itself (for ex,
> > one
> > > can query data by providing the year while the actual data in the facts
> > may
> > > be in seconds granularity)
> > >
> > > [Himanshu]: Support for this behaviour already exists in other
> databases.
> > > E.g. PostgresSQL allows to query data by providing year while the
> actual
> > > data in the facts is in seconds granularity. See below.
> > >
> > > himanshu.gahlaut=# select * from test_table;
> > >
> > >        time         | userid | requests
> > >
> > > ---------------------+--------+----------
> > >
> > > 2005-10-19 10:23:*54* | user1  |     1000
> > >
> > > 2007-04-15 09:01:*05* | user1  |     2000
> > >
> > > (2 rows)
> > >
> > >
> > > himanshu.gahlaut=# select sum(requests) as total_requests from
> test_table
> > > where extract(year from time) BETWEEN '*2005*' and '*2007*';
> > >
> > > total_requests
> > >
> > > ----------------
> > >
> > >           3000
> > >
> > > (1 row)
> > >
> > >
> > > *BETWEEN clause gave data for 2007 as well. From and To are both
> > inclusive
> > > here.*
> > >
> > >
> > >>>>> If there is an inclination to include inclusive time ranges, the
> way
> > > forward to me seems like doing away with supporting arbitrary time
> grain
> > in
> > > the query and always accept time in query to the precise second.
> Clearly
> > > this would be withdrawing a powerful feature from lens.
> > >
> > > [Himanshu]: When PostgresSQL can support an arbitrary time grain in the
> > > query (as shown above) with data stored in facts at seconds
> granularity,
> > > I'm not sure why Lens will not be able to handle it in implementation.
> I
> > > already gave a solution in this email thread by which time ranges  can
> be
> > > made inclusive in Lens without removing the support to give an
> arbitrary
> > > time grain in the query. Do you see the solution breaking somewhere ?
> > >
> > >
> > > Regards,
> > >
> > > Himanshu
> > >
> > > On Thu, Aug 6, 2015 at 9:46 AM, Srikanth Sundarrajan <
> > sriksun@hotmail.com>
> > > wrote:
> > >
> > >> Here is my 2 cents on this.
> > >>
> > >> As long as there is an unambiguous articulation of what is
> > >> included/excluded in the time range element of the Cube QL, there is
> no
> > >> forcing function to change this, unless this breaks any well
> understood
> > >> convention and is counter intuitive. Given the unique nature of the
> > >> time_range parameter supported in Cube QL and what values it can
> assume,
> > >> there doesn't seem to be risk of breaking convention here. The reason
> > for
> > >> me calling it unique is that the grain supported in the range can be
> > >> different from the time grain of the data itself (for ex, one can
> query
> > >> data by providing the year while the actual data in the facts may be
> in
> > >> seconds granularity). The exclusive end date seems to emanate from
> > having
> > >> to support this mismatched time grains between data and user query and
> > the
> > >> system is having to expand the time grain provided by the user to
> match
> > >> that of the data in the facts.
> > >>
> > >> If there is an inclination to include inclusive time ranges, the way
> > >> forward to me seems like doing away with supporting arbitrary time
> > grain in
> > >> the query and always accept time in query to the precise second.
> Clearly
> > >> this would be withdrawing a powerful feature from lens, and should be
> > >> considered only if the notion of exclusive end time seems very
> > unintuitive
> > >> for the users.
> > >>
> > >> Regards
> > >> Srikanth Sundarrajan
> > >>
> > >>> Date: Wed, 5 Aug 2015 20:28:46 +0530
> > >>> Subject: Re: [DISCUSS] Time Ranges in Lens
> > >>> From: nitin.gupta@inmobi.com
> > >>> To: dev@lens.incubator.apache.org
> > >>>
> > >>> I believe Users seem to think of a time range as half open, a meeting
> > >>> appointment of 1:00 pm - 2:00 pm means a one hour appointment. But
> with
> > >>> date ranges, the expectation is to include the end date. A date range
> > >>> specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
> > >>> To be able to make a decision here, I believe we follow what existing
> > >>> Analytics and Data Query systems do. That is to include the end time
> > >>> specified as available with Google Analytics or Support for BETWEEN
> in
> > >> SQL.
> > >>> That may mean more complexity on the implementation but that's a hit
> we
> > >>> will have to take.
> > >>>
> > >>> Thanks,
> > >>> Nitin
> > >>>
> > >>>
> > >>> On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
> > >>> himanshu.gahlaut@inmobi.com> wrote:
> > >>>
> > >>>> Have given a real world use case of banking system where making to
> > >>>> field exclusive makes the communication harder for user.
> > >>>>
> > >>>> Please give a real world use case which makes things harder for user
> > by
> > >>>> making to field inclusive OR a real world use case which is
> impossible
> > >> to
> > >>>> implement by making to field inclusive. So far the points shared
> below
> > >> for
> > >>>> making to date exclusive are hard to understand and it is difficult
> to
> > >>>> comprehend the problem which is coming in way of making to date
> > >> inclusive.
> > >>>>
> > >>>> On Wednesday, August 5, 2015, Rajat Khandelwal <
> > rajatgupta59@gmail.com
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>> Time range consists of a start time and an end time. If start time
> > >> and
> > >>>> end
> > >>>>> time are same, then we would say the ranges are equal.
> > >>>>>
> > >>>>> The three ranges in my example are same by this definition of
> > >> equality.
> > >>>>> While approach 2 hinders that.
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
> > >>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Why is this a problem ?
> > >>>>>>
> > >>>>>> On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
> > >>>> rajatgupta59@gmail.com
> > >>>>> <javascript:;>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Problem with that is, that the meaning of range changes with
> > >>>>> granularity.
> > >>>>>>> While with approach 1, it's not the case.
> > >>>>>>>
> > >>>>>>> A Range of 2015-01-01 to 2015-01-02 would mean 2 days,
> > >> 2015-01-01-00
> > >>>> to
> > >>>>>>> 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> > >>>>>>> 2015-01-02-00-00 would mean 2 days and 1 minute, and
> > >>>>> 2015-01-01-00-00-00
> > >>>>>> to
> > >>>>>>> 2015-01-02-00-00-00 would mean 2 days and 1 second.
> > >>>>>>>
> > >>>>>>> While approach 1 has no such problem. A given range doesn't
> > >> change
> > >>>>> with a
> > >>>>>>> change in granularity. Every one of these examples would mean 1
> > >> day.
> > >>>>>>>
> > >>>>>>> Here I've begun the argument with day granularity and went
> > >> deeper.
> > >>>> The
> > >>>>>> same
> > >>>>>>> argument can be started from a Yearly granularity and there the
> > >>>>> different
> > >>>>>>> would be more prominent.
> > >>>>>>>
> > >>>>>>> Regards
> > >>>>>>>
> > >>>>>>> On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> > >>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Say granularity is an ordered list of nodes where every node in
> > >>>> list
> > >>>>> is
> > >>>>>>>> parent of its next node.
> > >>>>>>>>
> > >>>>>>>> For a System X: all granularity could be (Year => Month =>
> > >> Day).
> > >>>>>>>> For System Y: all granularity could be (Year => Month => Day =>
> > >>>> Hour
> > >>>>> =>
> > >>>>>>>> Minutes => Seconds => Milliseconds => Microseconds =>
> > >> Picosecond =>
> > >>>>>>>> Femtosecond => Attosecond)
> > >>>>>>>>
> > >>>>>>>> As long as every node has a logic defined to compute the lower
> > >>>> limit
> > >>>>>> and
> > >>>>>>>> upper limit value, implementation should be able to fetch all
> > >> the
> > >>>>> data
> > >>>>>>>> under its parent node.
> > >>>>>>>>
> > >>>>>>>> E.G:
> > >>>>>>>>
> > >>>>>>>> To get all the data for 2015-10-31, implementation needs to
> > >> convert
> > >>>>>> this
> > >>>>>>>> into 2015-10-31-LowerLimitHourNode to
> > >> 2015-10-31-UpperLimitHourNode
> > >>>>> to
> > >>>>>>>> fetch data from hourly partitions, where lower limit hour = 00
> > >> and
> > >>>>>> upper
> > >>>>>>>> limit hour = 23. Similarly any granularity can be handled at
> > >>>>>>> implementation
> > >>>>>>>> level.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> I guess a banking system in real world would be worried about
> > >> very
> > >>>>> fine
> > >>>>>>>> granularities of time.
> > >>>>>>>>
> > >>>>>>>> Even if we assume banking system has day as the finest
> > >>>> granularity. A
> > >>>>>>> user
> > >>>>>>>> can still ask for statement from the month of April, 2015 to
> > >>>> October,
> > >>>>>>> 2015
> > >>>>>>>> without specifying the specific date fields. In this case bank
> > >> will
> > >>>>>> have
> > >>>>>>> to
> > >>>>>>>> execute the logic to fetch all transactions from April to
> > >> October
> > >>>>> with
> > >>>>>>> both
> > >>>>>>>> inclusive and also handling its finest granularity which is
> > >> assumed
> > >>>>> as
> > >>>>>>> day.
> > >>>>>>>> In this case a banking system would convert 2015-04 to 2015-10
> > >> into
> > >>>>>>>> 2015-04-01 to 2015-10-31 to fetch all data from daily
> > >> partitions,
> > >>>>> where
> > >>>>>>> 01
> > >>>>>>>> is the lower limit for day node in month of April and 31 is the
> > >>>>> higher
> > >>>>>>>> limit for day node in month of October.
> > >>>>>>>>
> > >>>>>>>> If data is not available in daily partitions and bank's
> > >> underlying
> > >>>>>> system
> > >>>>>>>> supports further fine time granularities, it will convert
> > >> 2015-04
> > >>>> to
> > >>>>>>>> 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
> > >>>> hourly
> > >>>>>>>> partitions.
> > >>>>>>>>
> > >>>>>>>> Similarly if data is not available at hourly partitions, it
> > >> will go
> > >>>>> to
> > >>>>>>> the
> > >>>>>>>> next level. As long as an upper limit and lower limit is
> > >> defined
> > >>>> for
> > >>>>>> each
> > >>>>>>>> node, implementation can handle any time granularity.
> > >>>>>>>>
> > >>>>>>>> Meta point is that in real world, when a user will communicate
> > >> to
> > >>>> get
> > >>>>>>>> statement of April to October. His intentions will never be to
> > >>>>> exclude
> > >>>>>>>> October. Exclusion of October will be a surprise for him.
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Himanshu
> > >>>>>>>>
> > >>>>>>>> On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
> > >>>>> amareshwari@gmail.com <javascript:;>
> > >>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> When a system has to understand time at all granularities,
> > >> and
> > >>>> user
> > >>>>>> can
> > >>>>>>>>> give time at all granularities, shouldn't we chose a
> > >>>> specification
> > >>>>>>> which
> > >>>>>>>> is
> > >>>>>>>>> easy and error prone.
> > >>>>>>>>>
> > >>>>>>>>> The example of banking system is not worried about finer
> > >>>>>> granularities
> > >>>>>>> of
> > >>>>>>>>> time.
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > >>>>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> This has two parts:
> > >>>>>>>>>>
> > >>>>>>>>>> (1) Interfacing with user shall be cognitive. Something
> > >> which
> > >>>>> user
> > >>>>>>> can
> > >>>>>>>>>> relate to his natural thought process.
> > >>>>>>>>>>
> > >>>>>>>>>> E.g.: In a banking system, if a user goes to a banker to
> > >> get
> > >>>> his
> > >>>>>> bank
> > >>>>>>>>>> statement. He will say "give me statement from 11th October
> > >>>> 2015
> > >>>>> to
> > >>>>>>>> 23rd
> > >>>>>>>>>> October 2015". This clearly communicates in simplified
> > >> language
> > >>>>>> that
> > >>>>>>>> user
> > >>>>>>>>>> is interested in bank statement from 11th October to 23rd
> > >>>>> October.
> > >>>>>> If
> > >>>>>>>>>> banker now does not return data of 23rd October, it will
> > >> be a
> > >>>>>>> surprise
> > >>>>>>>>> for
> > >>>>>>>>>> user.
> > >>>>>>>>>>
> > >>>>>>>>>> Banker can inform every user that To date is exclusive, so
> > >> if
> > >>>> you
> > >>>>>>> need
> > >>>>>>>>>> statement till 23rd October you should come and tell me
> > >> "give
> > >>>> me
> > >>>>>>>>> statement
> > >>>>>>>>>> from 11th October to 24th October". I believe a user will
> > >>>>>> immediately
> > >>>>>>>>> reply
> > >>>>>>>>>> => If I need statement only till 23rd October, why shall I
> > >>>>>>> communicate
> > >>>>>>>>> 11th
> > >>>>>>>>>> October to 24th October. If you still insist, next time the
> > >>>> user
> > >>>>>> will
> > >>>>>>>>> come
> > >>>>>>>>>> and say "give me statement from 11th October to 24th
> > >> October
> > >>>> with
> > >>>>>>> 24th
> > >>>>>>>>>> exclusive". System just made it harder for user to
> > >> communicate
> > >>>>> with
> > >>>>>>>>> banker.
> > >>>>>>>>>>
> > >>>>>>>>>> (2) Implementation to achieve cognitive behaviour
> > >>>>>>>>>>
> > >>>>>>>>>> With To date inclusive, banker needs to make sure that it
> > >> gives
> > >>>>> all
> > >>>>>>>>>> transaction records for 23rd October 2015 (To date) at the
> > >>>> finest
> > >>>>>>>>>> granularity available with the bank. This will satisfy
> > >> users
> > >>>>>>>> requirement
> > >>>>>>>>>> without any element of surprise.
> > >>>>>>>>>>
> > >>>>>>>>>> If the finest granularity is millisecond. It will have a
> > >> higher
> > >>>>>> limit
> > >>>>>>>> to
> > >>>>>>>>> it
> > >>>>>>>>>> which is 999 in most of the systems. Banker will fetch all
> > >>>>>> partitions
> > >>>>>>>>> until
> > >>>>>>>>>> higher limit of finest granularity for 23rd October and
> > >> return
> > >>>>> the
> > >>>>>>>>>> statement to user.
> > >>>>>>>>>>
> > >>>>>>>>>> If there was a leap second/millisecond, it is upto the
> > >> banker
> > >>>> to
> > >>>>>>> handle
> > >>>>>>>>> and
> > >>>>>>>>>> give data for it. When a user wants statement from 11th
> > >> October
> > >>>>> to
> > >>>>>>> 23rd
> > >>>>>>>>>> October, user should not worry about the leap
> > >>>> second/millisecond
> > >>>>>>> which
> > >>>>>>>>>> happened on any of the days from 11th October to 23rd
> > >> October
> > >>>>>>>>> (inclusive).
> > >>>>>>>>>>
> > >>>>>>>>>> Regards,
> > >>>>>>>>>> Himanshu
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> > >>>>>>> amareshwari@gmail.com <javascript:;>
> > >>>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> I prefer time-range end date to be exclusive of the end
> > >> date,
> > >>>>>>> because
> > >>>>>>>>> of
> > >>>>>>>>>>> the following reasons:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1. Giving a month range end as start of next month is
> > >> easy,
> > >>>> as
> > >>>>>> you
> > >>>>>>>> need
> > >>>>>>>>>> not
> > >>>>>>>>>>> remember and get confused with number of days in month :
> > >> 30,
> > >>>>> 31,
> > >>>>>>>> 28/29
> > >>>>>>>>>> (in
> > >>>>>>>>>>> february leap year) - wont miss ending days because of
> > >>>>> mistakes.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2. When you are giving a any range, you need not worry
> > >> about
> > >>>>> what
> > >>>>>>>>>>> granularity should the time be specified.
> > >>>>>>>>>>> For ex, for including time upto 2015-08-01 05th.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Should user specify the following with inclusive ranges?
> > >>>>>>>>>>>> 2015-08-01-04 (would be correct, assuming whole of 4 th
> > >>>> hour
> > >>>>>> will
> > >>>>>>>> be
> > >>>>>>>>>>> included)
> > >>>>>>>>>>>> 2015-08-01-04-59 (would be correct, but what if there
> > >> is
> > >>>> leap
> > >>>>>>>> second
> > >>>>>>>>>> :p )
> > >>>>>>>>>>>> 2015-08-01-04-59-000 (wrong, would include only 000th
> > >> milli
> > >>>>>>> second
> > >>>>>>>>> and
> > >>>>>>>>>>> not others)
> > >>>>>>>>>>>> 2015-08-01-04-59-999 (not sure if we will have leap
> > >> milli
> > >>>>>>> seconds)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Looking forward to see other arguments on making
> > >> timerange
> > >>>> end
> > >>>>>> date
> > >>>>>>>>>>> inclusive.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks
> > >>>>>>>>>>> Amareshwari
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> > >>>>>>> prongs@apache.org <javascript:;>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Currently the time_range_in clause is inclusive of the
> > >>>> start
> > >>>>>> date
> > >>>>>>>> and
> > >>>>>>>>>>>> exclusive of the end date. i.e.
> > >>>> time_range_in(time_dimension,
> > >>>>>>>> date1,
> > >>>>>>>>>>> date2)
> > >>>>>>>>>>>> translates to date1<=time_dimension<date2
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Recently there have been multiple discussions regarding
> > >>>> this
> > >>>>>>>> approach
> > >>>>>>>>>> in
> > >>>>>>>>>>>> our internal forums at InMobi. There have been
> > >> arguments
> > >>>> that
> > >>>>>> as
> > >>>>>>> an
> > >>>>>>>>>>>> analytics system, it should accept end date inclusive.
> > >>>>> Starting
> > >>>>>>>> this
> > >>>>>>>>>>> thread
> > >>>>>>>>>>>> to know opinions of other people regarding both
> > >> approaches.
> > >>>>> I'm
> > >>>>>>>>>> defining
> > >>>>>>>>>>>> and briefly discussing both approaches below:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Approach 1: End date exclusive.
> > >>>>>>>>>>>> Disclaimer: I personally prefer this approach and would
> > >>>>>>> shamelessly
> > >>>>>>>>>> argue
> > >>>>>>>>>>>> in its favor. I'll give links to some of the articles
> > >> that
> > >>>> I
> > >>>>>> very
> > >>>>>>>>> much
> > >>>>>>>>>>>> agree with (no need to reinvent the wheel of argument)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>   1. Views of Anders Kaseorg, MIT PhD student in CS;
> > >>>>> Cofounder
> > >>>>>>> of
> > >>>>>>>>>>> Ksplice,
> > >>>>>>>>>>>>   Inc
> > >>>>>>>>>>>>   <
> > >>
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > >>>>>>>>>>>>   2. Views of Edgar Dijkstra, Creator of Dijkstra's
> > >>>>> algorithm
> > >>>>>>>>>>>>   <
> > >> http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > >>>>>>>>>>>>   3. Both arguments are for generic ranges. I believe
> > >> time
> > >>>>>> range
> > >>>>>>>> is
> > >>>>>>>>>>> also a
> > >>>>>>>>>>>>   range and don't see why we need to handle that as a
> > >>>>> special
> > >>>>>>>> case.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Approach 2: End date inclusive.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>   1. Lens query language is a bit like sql and the
> > >> BETWEEN
> > >>>>>>> clause
> > >>>>>>>> in
> > >>>>>>>>>> sql
> > >>>>>>>>>>>>   is inclusive of both start and end date.
> > >>>>>>>>>>>>   2. Most analytic systems --- including google
> > >> analytics
> > >>>>> ---
> > >>>>>>>> follow
> > >>>>>>>>>>> this
> > >>>>>>>>>>>>   convention
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'm posting the question on a bigger forum
> > >>>>>>>>>>>> <
> > >>
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > >>>>>>>>>>>> as well in hopes of getting more opinions. I request
> > >>>> everyone
> > >>>>>> to
> > >>>>>>>> post
> > >>>>>>>>>>> their
> > >>>>>>>>>>>> views here regarding both approaches.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >> _____________________________________________________________
> > >>>>>>>>>> The information contained in this communication is intended
> > >>>>> solely
> > >>>>>>> for
> > >>>>>>>>> the
> > >>>>>>>>>> use of the individual or entity to whom it is addressed and
> > >>>>> others
> > >>>>>>>>>> authorized to receive it. It may contain confidential or
> > >>>> legally
> > >>>>>>>>> privileged
> > >>>>>>>>>> information. If you are not the intended recipient you are
> > >>>> hereby
> > >>>>>>>>> notified
> > >>>>>>>>>> that any disclosure, copying, distribution or taking any
> > >> action
> > >>>>> in
> > >>>>>>>>> reliance
> > >>>>>>>>>> on the contents of this information is strictly prohibited
> > >> and
> > >>>>> may
> > >>>>>> be
> > >>>>>>>>>> unlawful. If you have received this communication in error,
> > >>>>> please
> > >>>>>>>> notify
> > >>>>>>>>>> us immediately by responding to this email and then delete
> > >> it
> > >>>>> from
> > >>>>>>> your
> > >>>>>>>>>> system. The firm is neither liable for the proper and
> > >> complete
> > >>>>>>>>> transmission
> > >>>>>>>>>> of the information contained in this communication nor for
> > >> any
> > >>>>>> delay
> > >>>>>>> in
> > >>>>>>>>> its
> > >>>>>>>>>> receipt.
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> _____________________________________________________________
> > >>>>>>>> The information contained in this communication is intended
> > >> solely
> > >>>>> for
> > >>>>>>> the
> > >>>>>>>> use of the individual or entity to whom it is addressed and
> > >> others
> > >>>>>>>> authorized to receive it. It may contain confidential or
> > >> legally
> > >>>>>>> privileged
> > >>>>>>>> information. If you are not the intended recipient you are
> > >> hereby
> > >>>>>>> notified
> > >>>>>>>> that any disclosure, copying, distribution or taking any
> > >> action in
> > >>>>>>> reliance
> > >>>>>>>> on the contents of this information is strictly prohibited and
> > >> may
> > >>>> be
> > >>>>>>>> unlawful. If you have received this communication in error,
> > >> please
> > >>>>>> notify
> > >>>>>>>> us immediately by responding to this email and then delete it
> > >> from
> > >>>>> your
> > >>>>>>>> system. The firm is neither liable for the proper and complete
> > >>>>>>> transmission
> > >>>>>>>> of the information contained in this communication nor for any
> > >>>> delay
> > >>>>> in
> > >>>>>>> its
> > >>>>>>>> receipt.
> > >>>>>>
> > >>>>>> --
> > >>>>>> _____________________________________________________________
> > >>>>>> The information contained in this communication is intended solely
> > >> for
> > >>>>> the
> > >>>>>> use of the individual or entity to whom it is addressed and others
> > >>>>>> authorized to receive it. It may contain confidential or legally
> > >>>>> privileged
> > >>>>>> information. If you are not the intended recipient you are hereby
> > >>>>> notified
> > >>>>>> that any disclosure, copying, distribution or taking any action in
> > >>>>> reliance
> > >>>>>> on the contents of this information is strictly prohibited and may
> > >> be
> > >>>>>> unlawful. If you have received this communication in error, please
> > >>>> notify
> > >>>>>> us immediately by responding to this email and then delete it from
> > >> your
> > >>>>>> system. The firm is neither liable for the proper and complete
> > >>>>> transmission
> > >>>>>> of the information contained in this communication nor for any
> > >> delay in
> > >>>>> its
> > >>>>>> receipt.
> > >>>>
> > >>>> --
> > >>>> _____________________________________________________________
> > >>>> The information contained in this communication is intended solely
> for
> > >> the
> > >>>> use of the individual or entity to whom it is addressed and others
> > >>>> authorized to receive it. It may contain confidential or legally
> > >> privileged
> > >>>> information. If you are not the intended recipient you are hereby
> > >> notified
> > >>>> that any disclosure, copying, distribution or taking any action in
> > >> reliance
> > >>>> on the contents of this information is strictly prohibited and may
> be
> > >>>> unlawful. If you have received this communication in error, please
> > >> notify
> > >>>> us immediately by responding to this email and then delete it from
> > your
> > >>>> system. The firm is neither liable for the proper and complete
> > >> transmission
> > >>>> of the information contained in this communication nor for any delay
> > >> in its
> > >>>> receipt.
> > >>>
> > >>> --
> > >>> _____________________________________________________________
> > >>> The information contained in this communication is intended solely
> for
> > >> the
> > >>> use of the individual or entity to whom it is addressed and others
> > >>> authorized to receive it. It may contain confidential or legally
> > >> privileged
> > >>> information. If you are not the intended recipient you are hereby
> > >> notified
> > >>> that any disclosure, copying, distribution or taking any action in
> > >> reliance
> > >>> on the contents of this information is strictly prohibited and may be
> > >>> unlawful. If you have received this communication in error, please
> > notify
> > >>> us immediately by responding to this email and then delete it from
> your
> > >>> system. The firm is neither liable for the proper and complete
> > >> transmission
> > >>> of the information contained in this communication nor for any delay
> in
> > >> its
> > >>> receipt.
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> >
>

Re: [DISCUSS] Time Ranges in Lens

Posted by Rajat Khandelwal <pr...@apache.org>.
Just to add to that, a quick google search for "sql between time range
inclusive" gives the following links:

http://stackoverflow.com/questions/16347649/sql-between-not-inclusive
http://stackoverflow.com/questions/749615/does-ms-sql-servers-between-include-the-range-boundaries

The following things were noticable:


   1.

   SELECT * FROM Cases WHERE created_at BETWEEN '2013-05-01' AND '2013-05-01'

   doesn't work, Use inequalities instead of between. Notice that then
   you'll have to add one day to the second value: ... where (created_at >=
   '2013-05-01' and created_at < '2013-05-02')
   2.

   select * from person where dob between '2011-01-01' and '2011-01-31'

   returns results without results of 31st day.

One answer <http://programmers.stackexchange.com/a/262025> even goes on to
say:

RECOMMENDATION: Unless/until the SQL standard is amended, don't use BETWEEN
for dates/times. Instead get into the habit of coding DATE range
comparisons as independent conditions on the start and end boundaries of
your BETWEEN range. This is a bit verbose, but will leave you writing
conditions that are intuitive (thus less likely to be buggy) and clear to
the database optimizers, allowing for optimal execution plans to be
determined and indexes to be used.



On Sat, Aug 8, 2015 at 9:34 PM Srikanth Sundarrajan <sr...@hotmail.com>
wrote:

> What you have done there is to extract year and have done numerical range
> query. To my knowledge such a scheme isn't in the ANSI SQL standard
>
> Sent from my iPhone
>
> On 08-Aug-2015, at 6:22 pm, "Himanshu Gahlaut" <
> himanshu.gahlaut@inmobi.com> wrote:
>
> >>>>> The reason for me calling it unique is that the grain supported in
> the
> > range can be different from the time grain of the data itself (for ex,
> one
> > can query data by providing the year while the actual data in the facts
> may
> > be in seconds granularity)
> >
> > [Himanshu]: Support for this behaviour already exists in other databases.
> > E.g. PostgresSQL allows to query data by providing year while the actual
> > data in the facts is in seconds granularity. See below.
> >
> > himanshu.gahlaut=# select * from test_table;
> >
> >        time         | userid | requests
> >
> > ---------------------+--------+----------
> >
> > 2005-10-19 10:23:*54* | user1  |     1000
> >
> > 2007-04-15 09:01:*05* | user1  |     2000
> >
> > (2 rows)
> >
> >
> > himanshu.gahlaut=# select sum(requests) as total_requests from test_table
> > where extract(year from time) BETWEEN '*2005*' and '*2007*';
> >
> > total_requests
> >
> > ----------------
> >
> >           3000
> >
> > (1 row)
> >
> >
> > *BETWEEN clause gave data for 2007 as well. From and To are both
> inclusive
> > here.*
> >
> >
> >>>>> If there is an inclination to include inclusive time ranges, the way
> > forward to me seems like doing away with supporting arbitrary time grain
> in
> > the query and always accept time in query to the precise second. Clearly
> > this would be withdrawing a powerful feature from lens.
> >
> > [Himanshu]: When PostgresSQL can support an arbitrary time grain in the
> > query (as shown above) with data stored in facts at seconds granularity,
> > I'm not sure why Lens will not be able to handle it in implementation. I
> > already gave a solution in this email thread by which time ranges  can be
> > made inclusive in Lens without removing the support to give an arbitrary
> > time grain in the query. Do you see the solution breaking somewhere ?
> >
> >
> > Regards,
> >
> > Himanshu
> >
> > On Thu, Aug 6, 2015 at 9:46 AM, Srikanth Sundarrajan <
> sriksun@hotmail.com>
> > wrote:
> >
> >> Here is my 2 cents on this.
> >>
> >> As long as there is an unambiguous articulation of what is
> >> included/excluded in the time range element of the Cube QL, there is no
> >> forcing function to change this, unless this breaks any well understood
> >> convention and is counter intuitive. Given the unique nature of the
> >> time_range parameter supported in Cube QL and what values it can assume,
> >> there doesn't seem to be risk of breaking convention here. The reason
> for
> >> me calling it unique is that the grain supported in the range can be
> >> different from the time grain of the data itself (for ex, one can query
> >> data by providing the year while the actual data in the facts may be in
> >> seconds granularity). The exclusive end date seems to emanate from
> having
> >> to support this mismatched time grains between data and user query and
> the
> >> system is having to expand the time grain provided by the user to match
> >> that of the data in the facts.
> >>
> >> If there is an inclination to include inclusive time ranges, the way
> >> forward to me seems like doing away with supporting arbitrary time
> grain in
> >> the query and always accept time in query to the precise second. Clearly
> >> this would be withdrawing a powerful feature from lens, and should be
> >> considered only if the notion of exclusive end time seems very
> unintuitive
> >> for the users.
> >>
> >> Regards
> >> Srikanth Sundarrajan
> >>
> >>> Date: Wed, 5 Aug 2015 20:28:46 +0530
> >>> Subject: Re: [DISCUSS] Time Ranges in Lens
> >>> From: nitin.gupta@inmobi.com
> >>> To: dev@lens.incubator.apache.org
> >>>
> >>> I believe Users seem to think of a time range as half open, a meeting
> >>> appointment of 1:00 pm - 2:00 pm means a one hour appointment. But with
> >>> date ranges, the expectation is to include the end date. A date range
> >>> specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
> >>> To be able to make a decision here, I believe we follow what existing
> >>> Analytics and Data Query systems do. That is to include the end time
> >>> specified as available with Google Analytics or Support for BETWEEN in
> >> SQL.
> >>> That may mean more complexity on the implementation but that's a hit we
> >>> will have to take.
> >>>
> >>> Thanks,
> >>> Nitin
> >>>
> >>>
> >>> On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
> >>> himanshu.gahlaut@inmobi.com> wrote:
> >>>
> >>>> Have given a real world use case of banking system where making to
> >>>> field exclusive makes the communication harder for user.
> >>>>
> >>>> Please give a real world use case which makes things harder for user
> by
> >>>> making to field inclusive OR a real world use case which is impossible
> >> to
> >>>> implement by making to field inclusive. So far the points shared below
> >> for
> >>>> making to date exclusive are hard to understand and it is difficult to
> >>>> comprehend the problem which is coming in way of making to date
> >> inclusive.
> >>>>
> >>>> On Wednesday, August 5, 2015, Rajat Khandelwal <
> rajatgupta59@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> Time range consists of a start time and an end time. If start time
> >> and
> >>>> end
> >>>>> time are same, then we would say the ranges are equal.
> >>>>>
> >>>>> The three ranges in my example are same by this definition of
> >> equality.
> >>>>> While approach 2 hinders that.
> >>>>>
> >>>>>
> >>>>> On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
> >>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
> >>>>> wrote:
> >>>>>
> >>>>>> Why is this a problem ?
> >>>>>>
> >>>>>> On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
> >>>> rajatgupta59@gmail.com
> >>>>> <javascript:;>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Problem with that is, that the meaning of range changes with
> >>>>> granularity.
> >>>>>>> While with approach 1, it's not the case.
> >>>>>>>
> >>>>>>> A Range of 2015-01-01 to 2015-01-02 would mean 2 days,
> >> 2015-01-01-00
> >>>> to
> >>>>>>> 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> >>>>>>> 2015-01-02-00-00 would mean 2 days and 1 minute, and
> >>>>> 2015-01-01-00-00-00
> >>>>>> to
> >>>>>>> 2015-01-02-00-00-00 would mean 2 days and 1 second.
> >>>>>>>
> >>>>>>> While approach 1 has no such problem. A given range doesn't
> >> change
> >>>>> with a
> >>>>>>> change in granularity. Every one of these examples would mean 1
> >> day.
> >>>>>>>
> >>>>>>> Here I've begun the argument with day granularity and went
> >> deeper.
> >>>> The
> >>>>>> same
> >>>>>>> argument can be started from a Yearly granularity and there the
> >>>>> different
> >>>>>>> would be more prominent.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> >>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Say granularity is an ordered list of nodes where every node in
> >>>> list
> >>>>> is
> >>>>>>>> parent of its next node.
> >>>>>>>>
> >>>>>>>> For a System X: all granularity could be (Year => Month =>
> >> Day).
> >>>>>>>> For System Y: all granularity could be (Year => Month => Day =>
> >>>> Hour
> >>>>> =>
> >>>>>>>> Minutes => Seconds => Milliseconds => Microseconds =>
> >> Picosecond =>
> >>>>>>>> Femtosecond => Attosecond)
> >>>>>>>>
> >>>>>>>> As long as every node has a logic defined to compute the lower
> >>>> limit
> >>>>>> and
> >>>>>>>> upper limit value, implementation should be able to fetch all
> >> the
> >>>>> data
> >>>>>>>> under its parent node.
> >>>>>>>>
> >>>>>>>> E.G:
> >>>>>>>>
> >>>>>>>> To get all the data for 2015-10-31, implementation needs to
> >> convert
> >>>>>> this
> >>>>>>>> into 2015-10-31-LowerLimitHourNode to
> >> 2015-10-31-UpperLimitHourNode
> >>>>> to
> >>>>>>>> fetch data from hourly partitions, where lower limit hour = 00
> >> and
> >>>>>> upper
> >>>>>>>> limit hour = 23. Similarly any granularity can be handled at
> >>>>>>> implementation
> >>>>>>>> level.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I guess a banking system in real world would be worried about
> >> very
> >>>>> fine
> >>>>>>>> granularities of time.
> >>>>>>>>
> >>>>>>>> Even if we assume banking system has day as the finest
> >>>> granularity. A
> >>>>>>> user
> >>>>>>>> can still ask for statement from the month of April, 2015 to
> >>>> October,
> >>>>>>> 2015
> >>>>>>>> without specifying the specific date fields. In this case bank
> >> will
> >>>>>> have
> >>>>>>> to
> >>>>>>>> execute the logic to fetch all transactions from April to
> >> October
> >>>>> with
> >>>>>>> both
> >>>>>>>> inclusive and also handling its finest granularity which is
> >> assumed
> >>>>> as
> >>>>>>> day.
> >>>>>>>> In this case a banking system would convert 2015-04 to 2015-10
> >> into
> >>>>>>>> 2015-04-01 to 2015-10-31 to fetch all data from daily
> >> partitions,
> >>>>> where
> >>>>>>> 01
> >>>>>>>> is the lower limit for day node in month of April and 31 is the
> >>>>> higher
> >>>>>>>> limit for day node in month of October.
> >>>>>>>>
> >>>>>>>> If data is not available in daily partitions and bank's
> >> underlying
> >>>>>> system
> >>>>>>>> supports further fine time granularities, it will convert
> >> 2015-04
> >>>> to
> >>>>>>>> 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
> >>>> hourly
> >>>>>>>> partitions.
> >>>>>>>>
> >>>>>>>> Similarly if data is not available at hourly partitions, it
> >> will go
> >>>>> to
> >>>>>>> the
> >>>>>>>> next level. As long as an upper limit and lower limit is
> >> defined
> >>>> for
> >>>>>> each
> >>>>>>>> node, implementation can handle any time granularity.
> >>>>>>>>
> >>>>>>>> Meta point is that in real world, when a user will communicate
> >> to
> >>>> get
> >>>>>>>> statement of April to October. His intentions will never be to
> >>>>> exclude
> >>>>>>>> October. Exclusion of October will be a surprise for him.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Himanshu
> >>>>>>>>
> >>>>>>>> On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
> >>>>> amareshwari@gmail.com <javascript:;>
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> When a system has to understand time at all granularities,
> >> and
> >>>> user
> >>>>>> can
> >>>>>>>>> give time at all granularities, shouldn't we chose a
> >>>> specification
> >>>>>>> which
> >>>>>>>> is
> >>>>>>>>> easy and error prone.
> >>>>>>>>>
> >>>>>>>>> The example of banking system is not worried about finer
> >>>>>> granularities
> >>>>>>> of
> >>>>>>>>> time.
> >>>>>>>>>
> >>>>>>>>> On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> >>>>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
> >>>>>>>>>
> >>>>>>>>>> This has two parts:
> >>>>>>>>>>
> >>>>>>>>>> (1) Interfacing with user shall be cognitive. Something
> >> which
> >>>>> user
> >>>>>>> can
> >>>>>>>>>> relate to his natural thought process.
> >>>>>>>>>>
> >>>>>>>>>> E.g.: In a banking system, if a user goes to a banker to
> >> get
> >>>> his
> >>>>>> bank
> >>>>>>>>>> statement. He will say "give me statement from 11th October
> >>>> 2015
> >>>>> to
> >>>>>>>> 23rd
> >>>>>>>>>> October 2015". This clearly communicates in simplified
> >> language
> >>>>>> that
> >>>>>>>> user
> >>>>>>>>>> is interested in bank statement from 11th October to 23rd
> >>>>> October.
> >>>>>> If
> >>>>>>>>>> banker now does not return data of 23rd October, it will
> >> be a
> >>>>>>> surprise
> >>>>>>>>> for
> >>>>>>>>>> user.
> >>>>>>>>>>
> >>>>>>>>>> Banker can inform every user that To date is exclusive, so
> >> if
> >>>> you
> >>>>>>> need
> >>>>>>>>>> statement till 23rd October you should come and tell me
> >> "give
> >>>> me
> >>>>>>>>> statement
> >>>>>>>>>> from 11th October to 24th October". I believe a user will
> >>>>>> immediately
> >>>>>>>>> reply
> >>>>>>>>>> => If I need statement only till 23rd October, why shall I
> >>>>>>> communicate
> >>>>>>>>> 11th
> >>>>>>>>>> October to 24th October. If you still insist, next time the
> >>>> user
> >>>>>> will
> >>>>>>>>> come
> >>>>>>>>>> and say "give me statement from 11th October to 24th
> >> October
> >>>> with
> >>>>>>> 24th
> >>>>>>>>>> exclusive". System just made it harder for user to
> >> communicate
> >>>>> with
> >>>>>>>>> banker.
> >>>>>>>>>>
> >>>>>>>>>> (2) Implementation to achieve cognitive behaviour
> >>>>>>>>>>
> >>>>>>>>>> With To date inclusive, banker needs to make sure that it
> >> gives
> >>>>> all
> >>>>>>>>>> transaction records for 23rd October 2015 (To date) at the
> >>>> finest
> >>>>>>>>>> granularity available with the bank. This will satisfy
> >> users
> >>>>>>>> requirement
> >>>>>>>>>> without any element of surprise.
> >>>>>>>>>>
> >>>>>>>>>> If the finest granularity is millisecond. It will have a
> >> higher
> >>>>>> limit
> >>>>>>>> to
> >>>>>>>>> it
> >>>>>>>>>> which is 999 in most of the systems. Banker will fetch all
> >>>>>> partitions
> >>>>>>>>> until
> >>>>>>>>>> higher limit of finest granularity for 23rd October and
> >> return
> >>>>> the
> >>>>>>>>>> statement to user.
> >>>>>>>>>>
> >>>>>>>>>> If there was a leap second/millisecond, it is upto the
> >> banker
> >>>> to
> >>>>>>> handle
> >>>>>>>>> and
> >>>>>>>>>> give data for it. When a user wants statement from 11th
> >> October
> >>>>> to
> >>>>>>> 23rd
> >>>>>>>>>> October, user should not worry about the leap
> >>>> second/millisecond
> >>>>>>> which
> >>>>>>>>>> happened on any of the days from 11th October to 23rd
> >> October
> >>>>>>>>> (inclusive).
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Himanshu
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> >>>>>>> amareshwari@gmail.com <javascript:;>
> >>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I prefer time-range end date to be exclusive of the end
> >> date,
> >>>>>>> because
> >>>>>>>>> of
> >>>>>>>>>>> the following reasons:
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Giving a month range end as start of next month is
> >> easy,
> >>>> as
> >>>>>> you
> >>>>>>>> need
> >>>>>>>>>> not
> >>>>>>>>>>> remember and get confused with number of days in month :
> >> 30,
> >>>>> 31,
> >>>>>>>> 28/29
> >>>>>>>>>> (in
> >>>>>>>>>>> february leap year) - wont miss ending days because of
> >>>>> mistakes.
> >>>>>>>>>>>
> >>>>>>>>>>> 2. When you are giving a any range, you need not worry
> >> about
> >>>>> what
> >>>>>>>>>>> granularity should the time be specified.
> >>>>>>>>>>> For ex, for including time upto 2015-08-01 05th.
> >>>>>>>>>>>
> >>>>>>>>>>> Should user specify the following with inclusive ranges?
> >>>>>>>>>>>> 2015-08-01-04 (would be correct, assuming whole of 4 th
> >>>> hour
> >>>>>> will
> >>>>>>>> be
> >>>>>>>>>>> included)
> >>>>>>>>>>>> 2015-08-01-04-59 (would be correct, but what if there
> >> is
> >>>> leap
> >>>>>>>> second
> >>>>>>>>>> :p )
> >>>>>>>>>>>> 2015-08-01-04-59-000 (wrong, would include only 000th
> >> milli
> >>>>>>> second
> >>>>>>>>> and
> >>>>>>>>>>> not others)
> >>>>>>>>>>>> 2015-08-01-04-59-999 (not sure if we will have leap
> >> milli
> >>>>>>> seconds)
> >>>>>>>>>>>
> >>>>>>>>>>> Looking forward to see other arguments on making
> >> timerange
> >>>> end
> >>>>>> date
> >>>>>>>>>>> inclusive.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> Amareshwari
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> >>>>>>> prongs@apache.org <javascript:;>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Currently the time_range_in clause is inclusive of the
> >>>> start
> >>>>>> date
> >>>>>>>> and
> >>>>>>>>>>>> exclusive of the end date. i.e.
> >>>> time_range_in(time_dimension,
> >>>>>>>> date1,
> >>>>>>>>>>> date2)
> >>>>>>>>>>>> translates to date1<=time_dimension<date2
> >>>>>>>>>>>>
> >>>>>>>>>>>> Recently there have been multiple discussions regarding
> >>>> this
> >>>>>>>> approach
> >>>>>>>>>> in
> >>>>>>>>>>>> our internal forums at InMobi. There have been
> >> arguments
> >>>> that
> >>>>>> as
> >>>>>>> an
> >>>>>>>>>>>> analytics system, it should accept end date inclusive.
> >>>>> Starting
> >>>>>>>> this
> >>>>>>>>>>> thread
> >>>>>>>>>>>> to know opinions of other people regarding both
> >> approaches.
> >>>>> I'm
> >>>>>>>>>> defining
> >>>>>>>>>>>> and briefly discussing both approaches below:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Approach 1: End date exclusive.
> >>>>>>>>>>>> Disclaimer: I personally prefer this approach and would
> >>>>>>> shamelessly
> >>>>>>>>>> argue
> >>>>>>>>>>>> in its favor. I'll give links to some of the articles
> >> that
> >>>> I
> >>>>>> very
> >>>>>>>>> much
> >>>>>>>>>>>> agree with (no need to reinvent the wheel of argument)
> >>>>>>>>>>>>
> >>>>>>>>>>>>   1. Views of Anders Kaseorg, MIT PhD student in CS;
> >>>>> Cofounder
> >>>>>>> of
> >>>>>>>>>>> Ksplice,
> >>>>>>>>>>>>   Inc
> >>>>>>>>>>>>   <
> >>
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> >>>>>>>>>>>>   2. Views of Edgar Dijkstra, Creator of Dijkstra's
> >>>>> algorithm
> >>>>>>>>>>>>   <
> >> http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> >>>>>>>>>>>>   3. Both arguments are for generic ranges. I believe
> >> time
> >>>>>> range
> >>>>>>>> is
> >>>>>>>>>>> also a
> >>>>>>>>>>>>   range and don't see why we need to handle that as a
> >>>>> special
> >>>>>>>> case.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Approach 2: End date inclusive.
> >>>>>>>>>>>>
> >>>>>>>>>>>>   1. Lens query language is a bit like sql and the
> >> BETWEEN
> >>>>>>> clause
> >>>>>>>> in
> >>>>>>>>>> sql
> >>>>>>>>>>>>   is inclusive of both start and end date.
> >>>>>>>>>>>>   2. Most analytic systems --- including google
> >> analytics
> >>>>> ---
> >>>>>>>> follow
> >>>>>>>>>>> this
> >>>>>>>>>>>>   convention
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm posting the question on a bigger forum
> >>>>>>>>>>>> <
> >>
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> >>>>>>>>>>>> as well in hopes of getting more opinions. I request
> >>>> everyone
> >>>>>> to
> >>>>>>>> post
> >>>>>>>>>>> their
> >>>>>>>>>>>> views here regarding both approaches.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards
> >>>>>>>>>>
> >>>>>>>>>> --
> >> _____________________________________________________________
> >>>>>>>>>> The information contained in this communication is intended
> >>>>> solely
> >>>>>>> for
> >>>>>>>>> the
> >>>>>>>>>> use of the individual or entity to whom it is addressed and
> >>>>> others
> >>>>>>>>>> authorized to receive it. It may contain confidential or
> >>>> legally
> >>>>>>>>> privileged
> >>>>>>>>>> information. If you are not the intended recipient you are
> >>>> hereby
> >>>>>>>>> notified
> >>>>>>>>>> that any disclosure, copying, distribution or taking any
> >> action
> >>>>> in
> >>>>>>>>> reliance
> >>>>>>>>>> on the contents of this information is strictly prohibited
> >> and
> >>>>> may
> >>>>>> be
> >>>>>>>>>> unlawful. If you have received this communication in error,
> >>>>> please
> >>>>>>>> notify
> >>>>>>>>>> us immediately by responding to this email and then delete
> >> it
> >>>>> from
> >>>>>>> your
> >>>>>>>>>> system. The firm is neither liable for the proper and
> >> complete
> >>>>>>>>> transmission
> >>>>>>>>>> of the information contained in this communication nor for
> >> any
> >>>>>> delay
> >>>>>>> in
> >>>>>>>>> its
> >>>>>>>>>> receipt.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> _____________________________________________________________
> >>>>>>>> The information contained in this communication is intended
> >> solely
> >>>>> for
> >>>>>>> the
> >>>>>>>> use of the individual or entity to whom it is addressed and
> >> others
> >>>>>>>> authorized to receive it. It may contain confidential or
> >> legally
> >>>>>>> privileged
> >>>>>>>> information. If you are not the intended recipient you are
> >> hereby
> >>>>>>> notified
> >>>>>>>> that any disclosure, copying, distribution or taking any
> >> action in
> >>>>>>> reliance
> >>>>>>>> on the contents of this information is strictly prohibited and
> >> may
> >>>> be
> >>>>>>>> unlawful. If you have received this communication in error,
> >> please
> >>>>>> notify
> >>>>>>>> us immediately by responding to this email and then delete it
> >> from
> >>>>> your
> >>>>>>>> system. The firm is neither liable for the proper and complete
> >>>>>>> transmission
> >>>>>>>> of the information contained in this communication nor for any
> >>>> delay
> >>>>> in
> >>>>>>> its
> >>>>>>>> receipt.
> >>>>>>
> >>>>>> --
> >>>>>> _____________________________________________________________
> >>>>>> The information contained in this communication is intended solely
> >> for
> >>>>> the
> >>>>>> use of the individual or entity to whom it is addressed and others
> >>>>>> authorized to receive it. It may contain confidential or legally
> >>>>> privileged
> >>>>>> information. If you are not the intended recipient you are hereby
> >>>>> notified
> >>>>>> that any disclosure, copying, distribution or taking any action in
> >>>>> reliance
> >>>>>> on the contents of this information is strictly prohibited and may
> >> be
> >>>>>> unlawful. If you have received this communication in error, please
> >>>> notify
> >>>>>> us immediately by responding to this email and then delete it from
> >> your
> >>>>>> system. The firm is neither liable for the proper and complete
> >>>>> transmission
> >>>>>> of the information contained in this communication nor for any
> >> delay in
> >>>>> its
> >>>>>> receipt.
> >>>>
> >>>> --
> >>>> _____________________________________________________________
> >>>> The information contained in this communication is intended solely for
> >> the
> >>>> use of the individual or entity to whom it is addressed and others
> >>>> authorized to receive it. It may contain confidential or legally
> >> privileged
> >>>> information. If you are not the intended recipient you are hereby
> >> notified
> >>>> that any disclosure, copying, distribution or taking any action in
> >> reliance
> >>>> on the contents of this information is strictly prohibited and may be
> >>>> unlawful. If you have received this communication in error, please
> >> notify
> >>>> us immediately by responding to this email and then delete it from
> your
> >>>> system. The firm is neither liable for the proper and complete
> >> transmission
> >>>> of the information contained in this communication nor for any delay
> >> in its
> >>>> receipt.
> >>>
> >>> --
> >>> _____________________________________________________________
> >>> The information contained in this communication is intended solely for
> >> the
> >>> use of the individual or entity to whom it is addressed and others
> >>> authorized to receive it. It may contain confidential or legally
> >> privileged
> >>> information. If you are not the intended recipient you are hereby
> >> notified
> >>> that any disclosure, copying, distribution or taking any action in
> >> reliance
> >>> on the contents of this information is strictly prohibited and may be
> >>> unlawful. If you have received this communication in error, please
> notify
> >>> us immediately by responding to this email and then delete it from your
> >>> system. The firm is neither liable for the proper and complete
> >> transmission
> >>> of the information contained in this communication nor for any delay in
> >> its
> >>> receipt.
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
>

Re: [DISCUSS] Time Ranges in Lens

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
>> To my knowledge such a scheme isn't in the ANSI SQL standard

Just to clarify, meant a scheme to query using a higher grain when data in facts are of lower grain

Sent from my iPhone

> On 08-Aug-2015, at 9:34 pm, "Srikanth Sundarrajan" <sr...@hotmail.com> wrote:
> 
> What you have done there is to extract year and have done numerical range query. To my knowledge such a scheme isn't in the ANSI SQL standard
> 
> Sent from my iPhone
> 
> On 08-Aug-2015, at 6:22 pm, "Himanshu Gahlaut" <hi...@inmobi.com> wrote:
> 
>>>>>> The reason for me calling it unique is that the grain supported in the
>> range can be different from the time grain of the data itself (for ex, one
>> can query data by providing the year while the actual data in the facts may
>> be in seconds granularity)
>> 
>> [Himanshu]: Support for this behaviour already exists in other databases.
>> E.g. PostgresSQL allows to query data by providing year while the actual
>> data in the facts is in seconds granularity. See below.
>> 
>> himanshu.gahlaut=# select * from test_table;
>> 
>>       time         | userid | requests
>> 
>> ---------------------+--------+----------
>> 
>> 2005-10-19 10:23:*54* | user1  |     1000
>> 
>> 2007-04-15 09:01:*05* | user1  |     2000
>> 
>> (2 rows)
>> 
>> 
>> himanshu.gahlaut=# select sum(requests) as total_requests from test_table
>> where extract(year from time) BETWEEN '*2005*' and '*2007*';
>> 
>> total_requests
>> 
>> ----------------
>> 
>>          3000
>> 
>> (1 row)
>> 
>> 
>> *BETWEEN clause gave data for 2007 as well. From and To are both inclusive
>> here.*
>> 
>> 
>>>>>> If there is an inclination to include inclusive time ranges, the way
>> forward to me seems like doing away with supporting arbitrary time grain in
>> the query and always accept time in query to the precise second. Clearly
>> this would be withdrawing a powerful feature from lens.
>> 
>> [Himanshu]: When PostgresSQL can support an arbitrary time grain in the
>> query (as shown above) with data stored in facts at seconds granularity,
>> I'm not sure why Lens will not be able to handle it in implementation. I
>> already gave a solution in this email thread by which time ranges  can be
>> made inclusive in Lens without removing the support to give an arbitrary
>> time grain in the query. Do you see the solution breaking somewhere ?
>> 
>> 
>> Regards,
>> 
>> Himanshu
>> 
>> On Thu, Aug 6, 2015 at 9:46 AM, Srikanth Sundarrajan <sr...@hotmail.com>
>> wrote:
>> 
>>> Here is my 2 cents on this.
>>> 
>>> As long as there is an unambiguous articulation of what is
>>> included/excluded in the time range element of the Cube QL, there is no
>>> forcing function to change this, unless this breaks any well understood
>>> convention and is counter intuitive. Given the unique nature of the
>>> time_range parameter supported in Cube QL and what values it can assume,
>>> there doesn't seem to be risk of breaking convention here. The reason for
>>> me calling it unique is that the grain supported in the range can be
>>> different from the time grain of the data itself (for ex, one can query
>>> data by providing the year while the actual data in the facts may be in
>>> seconds granularity). The exclusive end date seems to emanate from having
>>> to support this mismatched time grains between data and user query and the
>>> system is having to expand the time grain provided by the user to match
>>> that of the data in the facts.
>>> 
>>> If there is an inclination to include inclusive time ranges, the way
>>> forward to me seems like doing away with supporting arbitrary time grain in
>>> the query and always accept time in query to the precise second. Clearly
>>> this would be withdrawing a powerful feature from lens, and should be
>>> considered only if the notion of exclusive end time seems very unintuitive
>>> for the users.
>>> 
>>> Regards
>>> Srikanth Sundarrajan
>>> 
>>>> Date: Wed, 5 Aug 2015 20:28:46 +0530
>>>> Subject: Re: [DISCUSS] Time Ranges in Lens
>>>> From: nitin.gupta@inmobi.com
>>>> To: dev@lens.incubator.apache.org
>>>> 
>>>> I believe Users seem to think of a time range as half open, a meeting
>>>> appointment of 1:00 pm - 2:00 pm means a one hour appointment. But with
>>>> date ranges, the expectation is to include the end date. A date range
>>>> specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
>>>> To be able to make a decision here, I believe we follow what existing
>>>> Analytics and Data Query systems do. That is to include the end time
>>>> specified as available with Google Analytics or Support for BETWEEN in
>>> SQL.
>>>> That may mean more complexity on the implementation but that's a hit we
>>>> will have to take.
>>>> 
>>>> Thanks,
>>>> Nitin
>>>> 
>>>> 
>>>> On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
>>>> himanshu.gahlaut@inmobi.com> wrote:
>>>> 
>>>>> Have given a real world use case of banking system where making to
>>>>> field exclusive makes the communication harder for user.
>>>>> 
>>>>> Please give a real world use case which makes things harder for user by
>>>>> making to field inclusive OR a real world use case which is impossible
>>> to
>>>>> implement by making to field inclusive. So far the points shared below
>>> for
>>>>> making to date exclusive are hard to understand and it is difficult to
>>>>> comprehend the problem which is coming in way of making to date
>>> inclusive.
>>>>> 
>>>>> On Wednesday, August 5, 2015, Rajat Khandelwal <rajatgupta59@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Time range consists of a start time and an end time. If start time
>>> and
>>>>> end
>>>>>> time are same, then we would say the ranges are equal.
>>>>>> 
>>>>>> The three ranges in my example are same by this definition of
>>> equality.
>>>>>> While approach 2 hinders that.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
>>>>>> wrote:
>>>>>> 
>>>>>>> Why is this a problem ?
>>>>>>> 
>>>>>>> On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
>>>>> rajatgupta59@gmail.com
>>>>>> <javascript:;>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Problem with that is, that the meaning of range changes with
>>>>>> granularity.
>>>>>>>> While with approach 1, it's not the case.
>>>>>>>> 
>>>>>>>> A Range of 2015-01-01 to 2015-01-02 would mean 2 days,
>>> 2015-01-01-00
>>>>> to
>>>>>>>> 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
>>>>>>>> 2015-01-02-00-00 would mean 2 days and 1 minute, and
>>>>>> 2015-01-01-00-00-00
>>>>>>> to
>>>>>>>> 2015-01-02-00-00-00 would mean 2 days and 1 second.
>>>>>>>> 
>>>>>>>> While approach 1 has no such problem. A given range doesn't
>>> change
>>>>>> with a
>>>>>>>> change in granularity. Every one of these examples would mean 1
>>> day.
>>>>>>>> 
>>>>>>>> Here I've begun the argument with day granularity and went
>>> deeper.
>>>>> The
>>>>>>> same
>>>>>>>> argument can be started from a Yearly granularity and there the
>>>>>> different
>>>>>>>> would be more prominent.
>>>>>>>> 
>>>>>>>> Regards
>>>>>>>> 
>>>>>>>> On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
>>>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Say granularity is an ordered list of nodes where every node in
>>>>> list
>>>>>> is
>>>>>>>>> parent of its next node.
>>>>>>>>> 
>>>>>>>>> For a System X: all granularity could be (Year => Month =>
>>> Day).
>>>>>>>>> For System Y: all granularity could be (Year => Month => Day =>
>>>>> Hour
>>>>>> =>
>>>>>>>>> Minutes => Seconds => Milliseconds => Microseconds =>
>>> Picosecond =>
>>>>>>>>> Femtosecond => Attosecond)
>>>>>>>>> 
>>>>>>>>> As long as every node has a logic defined to compute the lower
>>>>> limit
>>>>>>> and
>>>>>>>>> upper limit value, implementation should be able to fetch all
>>> the
>>>>>> data
>>>>>>>>> under its parent node.
>>>>>>>>> 
>>>>>>>>> E.G:
>>>>>>>>> 
>>>>>>>>> To get all the data for 2015-10-31, implementation needs to
>>> convert
>>>>>>> this
>>>>>>>>> into 2015-10-31-LowerLimitHourNode to
>>> 2015-10-31-UpperLimitHourNode
>>>>>> to
>>>>>>>>> fetch data from hourly partitions, where lower limit hour = 00
>>> and
>>>>>>> upper
>>>>>>>>> limit hour = 23. Similarly any granularity can be handled at
>>>>>>>> implementation
>>>>>>>>> level.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I guess a banking system in real world would be worried about
>>> very
>>>>>> fine
>>>>>>>>> granularities of time.
>>>>>>>>> 
>>>>>>>>> Even if we assume banking system has day as the finest
>>>>> granularity. A
>>>>>>>> user
>>>>>>>>> can still ask for statement from the month of April, 2015 to
>>>>> October,
>>>>>>>> 2015
>>>>>>>>> without specifying the specific date fields. In this case bank
>>> will
>>>>>>> have
>>>>>>>> to
>>>>>>>>> execute the logic to fetch all transactions from April to
>>> October
>>>>>> with
>>>>>>>> both
>>>>>>>>> inclusive and also handling its finest granularity which is
>>> assumed
>>>>>> as
>>>>>>>> day.
>>>>>>>>> In this case a banking system would convert 2015-04 to 2015-10
>>> into
>>>>>>>>> 2015-04-01 to 2015-10-31 to fetch all data from daily
>>> partitions,
>>>>>> where
>>>>>>>> 01
>>>>>>>>> is the lower limit for day node in month of April and 31 is the
>>>>>> higher
>>>>>>>>> limit for day node in month of October.
>>>>>>>>> 
>>>>>>>>> If data is not available in daily partitions and bank's
>>> underlying
>>>>>>> system
>>>>>>>>> supports further fine time granularities, it will convert
>>> 2015-04
>>>>> to
>>>>>>>>> 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
>>>>> hourly
>>>>>>>>> partitions.
>>>>>>>>> 
>>>>>>>>> Similarly if data is not available at hourly partitions, it
>>> will go
>>>>>> to
>>>>>>>> the
>>>>>>>>> next level. As long as an upper limit and lower limit is
>>> defined
>>>>> for
>>>>>>> each
>>>>>>>>> node, implementation can handle any time granularity.
>>>>>>>>> 
>>>>>>>>> Meta point is that in real world, when a user will communicate
>>> to
>>>>> get
>>>>>>>>> statement of April to October. His intentions will never be to
>>>>>> exclude
>>>>>>>>> October. Exclusion of October will be a surprise for him.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Himanshu
>>>>>>>>> 
>>>>>>>>> On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
>>>>>> amareshwari@gmail.com <javascript:;>
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> When a system has to understand time at all granularities,
>>> and
>>>>> user
>>>>>>> can
>>>>>>>>>> give time at all granularities, shouldn't we chose a
>>>>> specification
>>>>>>>> which
>>>>>>>>> is
>>>>>>>>>> easy and error prone.
>>>>>>>>>> 
>>>>>>>>>> The example of banking system is not worried about finer
>>>>>>> granularities
>>>>>>>> of
>>>>>>>>>> time.
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
>>>>>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> This has two parts:
>>>>>>>>>>> 
>>>>>>>>>>> (1) Interfacing with user shall be cognitive. Something
>>> which
>>>>>> user
>>>>>>>> can
>>>>>>>>>>> relate to his natural thought process.
>>>>>>>>>>> 
>>>>>>>>>>> E.g.: In a banking system, if a user goes to a banker to
>>> get
>>>>> his
>>>>>>> bank
>>>>>>>>>>> statement. He will say "give me statement from 11th October
>>>>> 2015
>>>>>> to
>>>>>>>>> 23rd
>>>>>>>>>>> October 2015". This clearly communicates in simplified
>>> language
>>>>>>> that
>>>>>>>>> user
>>>>>>>>>>> is interested in bank statement from 11th October to 23rd
>>>>>> October.
>>>>>>> If
>>>>>>>>>>> banker now does not return data of 23rd October, it will
>>> be a
>>>>>>>> surprise
>>>>>>>>>> for
>>>>>>>>>>> user.
>>>>>>>>>>> 
>>>>>>>>>>> Banker can inform every user that To date is exclusive, so
>>> if
>>>>> you
>>>>>>>> need
>>>>>>>>>>> statement till 23rd October you should come and tell me
>>> "give
>>>>> me
>>>>>>>>>> statement
>>>>>>>>>>> from 11th October to 24th October". I believe a user will
>>>>>>> immediately
>>>>>>>>>> reply
>>>>>>>>>>> => If I need statement only till 23rd October, why shall I
>>>>>>>> communicate
>>>>>>>>>> 11th
>>>>>>>>>>> October to 24th October. If you still insist, next time the
>>>>> user
>>>>>>> will
>>>>>>>>>> come
>>>>>>>>>>> and say "give me statement from 11th October to 24th
>>> October
>>>>> with
>>>>>>>> 24th
>>>>>>>>>>> exclusive". System just made it harder for user to
>>> communicate
>>>>>> with
>>>>>>>>>> banker.
>>>>>>>>>>> 
>>>>>>>>>>> (2) Implementation to achieve cognitive behaviour
>>>>>>>>>>> 
>>>>>>>>>>> With To date inclusive, banker needs to make sure that it
>>> gives
>>>>>> all
>>>>>>>>>>> transaction records for 23rd October 2015 (To date) at the
>>>>> finest
>>>>>>>>>>> granularity available with the bank. This will satisfy
>>> users
>>>>>>>>> requirement
>>>>>>>>>>> without any element of surprise.
>>>>>>>>>>> 
>>>>>>>>>>> If the finest granularity is millisecond. It will have a
>>> higher
>>>>>>> limit
>>>>>>>>> to
>>>>>>>>>> it
>>>>>>>>>>> which is 999 in most of the systems. Banker will fetch all
>>>>>>> partitions
>>>>>>>>>> until
>>>>>>>>>>> higher limit of finest granularity for 23rd October and
>>> return
>>>>>> the
>>>>>>>>>>> statement to user.
>>>>>>>>>>> 
>>>>>>>>>>> If there was a leap second/millisecond, it is upto the
>>> banker
>>>>> to
>>>>>>>> handle
>>>>>>>>>> and
>>>>>>>>>>> give data for it. When a user wants statement from 11th
>>> October
>>>>>> to
>>>>>>>> 23rd
>>>>>>>>>>> October, user should not worry about the leap
>>>>> second/millisecond
>>>>>>>> which
>>>>>>>>>>> happened on any of the days from 11th October to 23rd
>>> October
>>>>>>>>>> (inclusive).
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Himanshu
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
>>>>>>>> amareshwari@gmail.com <javascript:;>
>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I prefer time-range end date to be exclusive of the end
>>> date,
>>>>>>>> because
>>>>>>>>>> of
>>>>>>>>>>>> the following reasons:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Giving a month range end as start of next month is
>>> easy,
>>>>> as
>>>>>>> you
>>>>>>>>> need
>>>>>>>>>>> not
>>>>>>>>>>>> remember and get confused with number of days in month :
>>> 30,
>>>>>> 31,
>>>>>>>>> 28/29
>>>>>>>>>>> (in
>>>>>>>>>>>> february leap year) - wont miss ending days because of
>>>>>> mistakes.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. When you are giving a any range, you need not worry
>>> about
>>>>>> what
>>>>>>>>>>>> granularity should the time be specified.
>>>>>>>>>>>> For ex, for including time upto 2015-08-01 05th.
>>>>>>>>>>>> 
>>>>>>>>>>>> Should user specify the following with inclusive ranges?
>>>>>>>>>>>>> 2015-08-01-04 (would be correct, assuming whole of 4 th
>>>>> hour
>>>>>>> will
>>>>>>>>> be
>>>>>>>>>>>> included)
>>>>>>>>>>>>> 2015-08-01-04-59 (would be correct, but what if there
>>> is
>>>>> leap
>>>>>>>>> second
>>>>>>>>>>> :p )
>>>>>>>>>>>>> 2015-08-01-04-59-000 (wrong, would include only 000th
>>> milli
>>>>>>>> second
>>>>>>>>>> and
>>>>>>>>>>>> not others)
>>>>>>>>>>>>> 2015-08-01-04-59-999 (not sure if we will have leap
>>> milli
>>>>>>>> seconds)
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking forward to see other arguments on making
>>> timerange
>>>>> end
>>>>>>> date
>>>>>>>>>>>> inclusive.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Amareshwari
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
>>>>>>>> prongs@apache.org <javascript:;>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Currently the time_range_in clause is inclusive of the
>>>>> start
>>>>>>> date
>>>>>>>>> and
>>>>>>>>>>>>> exclusive of the end date. i.e.
>>>>> time_range_in(time_dimension,
>>>>>>>>> date1,
>>>>>>>>>>>> date2)
>>>>>>>>>>>>> translates to date1<=time_dimension<date2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Recently there have been multiple discussions regarding
>>>>> this
>>>>>>>>> approach
>>>>>>>>>>> in
>>>>>>>>>>>>> our internal forums at InMobi. There have been
>>> arguments
>>>>> that
>>>>>>> as
>>>>>>>> an
>>>>>>>>>>>>> analytics system, it should accept end date inclusive.
>>>>>> Starting
>>>>>>>>> this
>>>>>>>>>>>> thread
>>>>>>>>>>>>> to know opinions of other people regarding both
>>> approaches.
>>>>>> I'm
>>>>>>>>>>> defining
>>>>>>>>>>>>> and briefly discussing both approaches below:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Approach 1: End date exclusive.
>>>>>>>>>>>>> Disclaimer: I personally prefer this approach and would
>>>>>>>> shamelessly
>>>>>>>>>>> argue
>>>>>>>>>>>>> in its favor. I'll give links to some of the articles
>>> that
>>>>> I
>>>>>>> very
>>>>>>>>>> much
>>>>>>>>>>>>> agree with (no need to reinvent the wheel of argument)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  1. Views of Anders Kaseorg, MIT PhD student in CS;
>>>>>> Cofounder
>>>>>>>> of
>>>>>>>>>>>> Ksplice,
>>>>>>>>>>>>>  Inc
>>>>>>>>>>>>>  <
>>> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
>>>>>>>>>>>>>  2. Views of Edgar Dijkstra, Creator of Dijkstra's
>>>>>> algorithm
>>>>>>>>>>>>>  <
>>> http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
>>>>>>>>>>>>>  3. Both arguments are for generic ranges. I believe
>>> time
>>>>>>> range
>>>>>>>>> is
>>>>>>>>>>>> also a
>>>>>>>>>>>>>  range and don't see why we need to handle that as a
>>>>>> special
>>>>>>>>> case.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Approach 2: End date inclusive.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>  1. Lens query language is a bit like sql and the
>>> BETWEEN
>>>>>>>> clause
>>>>>>>>> in
>>>>>>>>>>> sql
>>>>>>>>>>>>>  is inclusive of both start and end date.
>>>>>>>>>>>>>  2. Most analytic systems --- including google
>>> analytics
>>>>>> ---
>>>>>>>>> follow
>>>>>>>>>>>> this
>>>>>>>>>>>>>  convention
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm posting the question on a bigger forum
>>>>>>>>>>>>> <
>>> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
>>>>>>>>>>>>> as well in hopes of getting more opinions. I request
>>>>> everyone
>>>>>>> to
>>>>>>>>> post
>>>>>>>>>>>> their
>>>>>>>>>>>>> views here regarding both approaches.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards
>>>>>>>>>>> 
>>>>>>>>>>> --
>>> _____________________________________________________________
>>>>>>>>>>> The information contained in this communication is intended
>>>>>> solely
>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>>> use of the individual or entity to whom it is addressed and
>>>>>> others
>>>>>>>>>>> authorized to receive it. It may contain confidential or
>>>>> legally
>>>>>>>>>> privileged
>>>>>>>>>>> information. If you are not the intended recipient you are
>>>>> hereby
>>>>>>>>>> notified
>>>>>>>>>>> that any disclosure, copying, distribution or taking any
>>> action
>>>>>> in
>>>>>>>>>> reliance
>>>>>>>>>>> on the contents of this information is strictly prohibited
>>> and
>>>>>> may
>>>>>>> be
>>>>>>>>>>> unlawful. If you have received this communication in error,
>>>>>> please
>>>>>>>>> notify
>>>>>>>>>>> us immediately by responding to this email and then delete
>>> it
>>>>>> from
>>>>>>>> your
>>>>>>>>>>> system. The firm is neither liable for the proper and
>>> complete
>>>>>>>>>> transmission
>>>>>>>>>>> of the information contained in this communication nor for
>>> any
>>>>>>> delay
>>>>>>>> in
>>>>>>>>>> its
>>>>>>>>>>> receipt.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> _____________________________________________________________
>>>>>>>>> The information contained in this communication is intended
>>> solely
>>>>>> for
>>>>>>>> the
>>>>>>>>> use of the individual or entity to whom it is addressed and
>>> others
>>>>>>>>> authorized to receive it. It may contain confidential or
>>> legally
>>>>>>>> privileged
>>>>>>>>> information. If you are not the intended recipient you are
>>> hereby
>>>>>>>> notified
>>>>>>>>> that any disclosure, copying, distribution or taking any
>>> action in
>>>>>>>> reliance
>>>>>>>>> on the contents of this information is strictly prohibited and
>>> may
>>>>> be
>>>>>>>>> unlawful. If you have received this communication in error,
>>> please
>>>>>>> notify
>>>>>>>>> us immediately by responding to this email and then delete it
>>> from
>>>>>> your
>>>>>>>>> system. The firm is neither liable for the proper and complete
>>>>>>>> transmission
>>>>>>>>> of the information contained in this communication nor for any
>>>>> delay
>>>>>> in
>>>>>>>> its
>>>>>>>>> receipt.
>>>>>>> 
>>>>>>> --
>>>>>>> _____________________________________________________________
>>>>>>> The information contained in this communication is intended solely
>>> for
>>>>>> the
>>>>>>> use of the individual or entity to whom it is addressed and others
>>>>>>> authorized to receive it. It may contain confidential or legally
>>>>>> privileged
>>>>>>> information. If you are not the intended recipient you are hereby
>>>>>> notified
>>>>>>> that any disclosure, copying, distribution or taking any action in
>>>>>> reliance
>>>>>>> on the contents of this information is strictly prohibited and may
>>> be
>>>>>>> unlawful. If you have received this communication in error, please
>>>>> notify
>>>>>>> us immediately by responding to this email and then delete it from
>>> your
>>>>>>> system. The firm is neither liable for the proper and complete
>>>>>> transmission
>>>>>>> of the information contained in this communication nor for any
>>> delay in
>>>>>> its
>>>>>>> receipt.
>>>>> 
>>>>> --
>>>>> _____________________________________________________________
>>>>> The information contained in this communication is intended solely for
>>> the
>>>>> use of the individual or entity to whom it is addressed and others
>>>>> authorized to receive it. It may contain confidential or legally
>>> privileged
>>>>> information. If you are not the intended recipient you are hereby
>>> notified
>>>>> that any disclosure, copying, distribution or taking any action in
>>> reliance
>>>>> on the contents of this information is strictly prohibited and may be
>>>>> unlawful. If you have received this communication in error, please
>>> notify
>>>>> us immediately by responding to this email and then delete it from your
>>>>> system. The firm is neither liable for the proper and complete
>>> transmission
>>>>> of the information contained in this communication nor for any delay
>>> in its
>>>>> receipt.
>>>> 
>>>> --
>>>> _____________________________________________________________
>>>> The information contained in this communication is intended solely for
>>> the
>>>> use of the individual or entity to whom it is addressed and others
>>>> authorized to receive it. It may contain confidential or legally
>>> privileged
>>>> information. If you are not the intended recipient you are hereby
>>> notified
>>>> that any disclosure, copying, distribution or taking any action in
>>> reliance
>>>> on the contents of this information is strictly prohibited and may be
>>>> unlawful. If you have received this communication in error, please notify
>>>> us immediately by responding to this email and then delete it from your
>>>> system. The firm is neither liable for the proper and complete
>>> transmission
>>>> of the information contained in this communication nor for any delay in
>>> its
>>>> receipt.
>> 
>> -- 
>> _____________________________________________________________
>> The information contained in this communication is intended solely for the 
>> use of the individual or entity to whom it is addressed and others 
>> authorized to receive it. It may contain confidential or legally privileged 
>> information. If you are not the intended recipient you are hereby notified 
>> that any disclosure, copying, distribution or taking any action in reliance 
>> on the contents of this information is strictly prohibited and may be 
>> unlawful. If you have received this communication in error, please notify 
>> us immediately by responding to this email and then delete it from your 
>> system. The firm is neither liable for the proper and complete transmission 
>> of the information contained in this communication nor for any delay in its 
>> receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
What you have done there is to extract year and have done numerical range query. To my knowledge such a scheme isn't in the ANSI SQL standard

Sent from my iPhone

On 08-Aug-2015, at 6:22 pm, "Himanshu Gahlaut" <hi...@inmobi.com> wrote:

>>>>> The reason for me calling it unique is that the grain supported in the
> range can be different from the time grain of the data itself (for ex, one
> can query data by providing the year while the actual data in the facts may
> be in seconds granularity)
> 
> [Himanshu]: Support for this behaviour already exists in other databases.
> E.g. PostgresSQL allows to query data by providing year while the actual
> data in the facts is in seconds granularity. See below.
> 
> himanshu.gahlaut=# select * from test_table;
> 
>        time         | userid | requests
> 
> ---------------------+--------+----------
> 
> 2005-10-19 10:23:*54* | user1  |     1000
> 
> 2007-04-15 09:01:*05* | user1  |     2000
> 
> (2 rows)
> 
> 
> himanshu.gahlaut=# select sum(requests) as total_requests from test_table
> where extract(year from time) BETWEEN '*2005*' and '*2007*';
> 
> total_requests
> 
> ----------------
> 
>           3000
> 
> (1 row)
> 
> 
> *BETWEEN clause gave data for 2007 as well. From and To are both inclusive
> here.*
> 
> 
>>>>> If there is an inclination to include inclusive time ranges, the way
> forward to me seems like doing away with supporting arbitrary time grain in
> the query and always accept time in query to the precise second. Clearly
> this would be withdrawing a powerful feature from lens.
> 
> [Himanshu]: When PostgresSQL can support an arbitrary time grain in the
> query (as shown above) with data stored in facts at seconds granularity,
> I'm not sure why Lens will not be able to handle it in implementation. I
> already gave a solution in this email thread by which time ranges  can be
> made inclusive in Lens without removing the support to give an arbitrary
> time grain in the query. Do you see the solution breaking somewhere ?
> 
> 
> Regards,
> 
> Himanshu
> 
> On Thu, Aug 6, 2015 at 9:46 AM, Srikanth Sundarrajan <sr...@hotmail.com>
> wrote:
> 
>> Here is my 2 cents on this.
>> 
>> As long as there is an unambiguous articulation of what is
>> included/excluded in the time range element of the Cube QL, there is no
>> forcing function to change this, unless this breaks any well understood
>> convention and is counter intuitive. Given the unique nature of the
>> time_range parameter supported in Cube QL and what values it can assume,
>> there doesn't seem to be risk of breaking convention here. The reason for
>> me calling it unique is that the grain supported in the range can be
>> different from the time grain of the data itself (for ex, one can query
>> data by providing the year while the actual data in the facts may be in
>> seconds granularity). The exclusive end date seems to emanate from having
>> to support this mismatched time grains between data and user query and the
>> system is having to expand the time grain provided by the user to match
>> that of the data in the facts.
>> 
>> If there is an inclination to include inclusive time ranges, the way
>> forward to me seems like doing away with supporting arbitrary time grain in
>> the query and always accept time in query to the precise second. Clearly
>> this would be withdrawing a powerful feature from lens, and should be
>> considered only if the notion of exclusive end time seems very unintuitive
>> for the users.
>> 
>> Regards
>> Srikanth Sundarrajan
>> 
>>> Date: Wed, 5 Aug 2015 20:28:46 +0530
>>> Subject: Re: [DISCUSS] Time Ranges in Lens
>>> From: nitin.gupta@inmobi.com
>>> To: dev@lens.incubator.apache.org
>>> 
>>> I believe Users seem to think of a time range as half open, a meeting
>>> appointment of 1:00 pm - 2:00 pm means a one hour appointment. But with
>>> date ranges, the expectation is to include the end date. A date range
>>> specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
>>> To be able to make a decision here, I believe we follow what existing
>>> Analytics and Data Query systems do. That is to include the end time
>>> specified as available with Google Analytics or Support for BETWEEN in
>> SQL.
>>> That may mean more complexity on the implementation but that's a hit we
>>> will have to take.
>>> 
>>> Thanks,
>>> Nitin
>>> 
>>> 
>>> On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
>>> himanshu.gahlaut@inmobi.com> wrote:
>>> 
>>>> Have given a real world use case of banking system where making to
>>>> field exclusive makes the communication harder for user.
>>>> 
>>>> Please give a real world use case which makes things harder for user by
>>>> making to field inclusive OR a real world use case which is impossible
>> to
>>>> implement by making to field inclusive. So far the points shared below
>> for
>>>> making to date exclusive are hard to understand and it is difficult to
>>>> comprehend the problem which is coming in way of making to date
>> inclusive.
>>>> 
>>>> On Wednesday, August 5, 2015, Rajat Khandelwal <rajatgupta59@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> Time range consists of a start time and an end time. If start time
>> and
>>>> end
>>>>> time are same, then we would say the ranges are equal.
>>>>> 
>>>>> The three ranges in my example are same by this definition of
>> equality.
>>>>> While approach 2 hinders that.
>>>>> 
>>>>> 
>>>>> On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
>>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
>>>>> wrote:
>>>>> 
>>>>>> Why is this a problem ?
>>>>>> 
>>>>>> On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
>>>> rajatgupta59@gmail.com
>>>>> <javascript:;>>
>>>>>> wrote:
>>>>>> 
>>>>>>> Problem with that is, that the meaning of range changes with
>>>>> granularity.
>>>>>>> While with approach 1, it's not the case.
>>>>>>> 
>>>>>>> A Range of 2015-01-01 to 2015-01-02 would mean 2 days,
>> 2015-01-01-00
>>>> to
>>>>>>> 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
>>>>>>> 2015-01-02-00-00 would mean 2 days and 1 minute, and
>>>>> 2015-01-01-00-00-00
>>>>>> to
>>>>>>> 2015-01-02-00-00-00 would mean 2 days and 1 second.
>>>>>>> 
>>>>>>> While approach 1 has no such problem. A given range doesn't
>> change
>>>>> with a
>>>>>>> change in granularity. Every one of these examples would mean 1
>> day.
>>>>>>> 
>>>>>>> Here I've begun the argument with day granularity and went
>> deeper.
>>>> The
>>>>>> same
>>>>>>> argument can be started from a Yearly granularity and there the
>>>>> different
>>>>>>> would be more prominent.
>>>>>>> 
>>>>>>> Regards
>>>>>>> 
>>>>>>> On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
>>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Say granularity is an ordered list of nodes where every node in
>>>> list
>>>>> is
>>>>>>>> parent of its next node.
>>>>>>>> 
>>>>>>>> For a System X: all granularity could be (Year => Month =>
>> Day).
>>>>>>>> For System Y: all granularity could be (Year => Month => Day =>
>>>> Hour
>>>>> =>
>>>>>>>> Minutes => Seconds => Milliseconds => Microseconds =>
>> Picosecond =>
>>>>>>>> Femtosecond => Attosecond)
>>>>>>>> 
>>>>>>>> As long as every node has a logic defined to compute the lower
>>>> limit
>>>>>> and
>>>>>>>> upper limit value, implementation should be able to fetch all
>> the
>>>>> data
>>>>>>>> under its parent node.
>>>>>>>> 
>>>>>>>> E.G:
>>>>>>>> 
>>>>>>>> To get all the data for 2015-10-31, implementation needs to
>> convert
>>>>>> this
>>>>>>>> into 2015-10-31-LowerLimitHourNode to
>> 2015-10-31-UpperLimitHourNode
>>>>> to
>>>>>>>> fetch data from hourly partitions, where lower limit hour = 00
>> and
>>>>>> upper
>>>>>>>> limit hour = 23. Similarly any granularity can be handled at
>>>>>>> implementation
>>>>>>>> level.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I guess a banking system in real world would be worried about
>> very
>>>>> fine
>>>>>>>> granularities of time.
>>>>>>>> 
>>>>>>>> Even if we assume banking system has day as the finest
>>>> granularity. A
>>>>>>> user
>>>>>>>> can still ask for statement from the month of April, 2015 to
>>>> October,
>>>>>>> 2015
>>>>>>>> without specifying the specific date fields. In this case bank
>> will
>>>>>> have
>>>>>>> to
>>>>>>>> execute the logic to fetch all transactions from April to
>> October
>>>>> with
>>>>>>> both
>>>>>>>> inclusive and also handling its finest granularity which is
>> assumed
>>>>> as
>>>>>>> day.
>>>>>>>> In this case a banking system would convert 2015-04 to 2015-10
>> into
>>>>>>>> 2015-04-01 to 2015-10-31 to fetch all data from daily
>> partitions,
>>>>> where
>>>>>>> 01
>>>>>>>> is the lower limit for day node in month of April and 31 is the
>>>>> higher
>>>>>>>> limit for day node in month of October.
>>>>>>>> 
>>>>>>>> If data is not available in daily partitions and bank's
>> underlying
>>>>>> system
>>>>>>>> supports further fine time granularities, it will convert
>> 2015-04
>>>> to
>>>>>>>> 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
>>>> hourly
>>>>>>>> partitions.
>>>>>>>> 
>>>>>>>> Similarly if data is not available at hourly partitions, it
>> will go
>>>>> to
>>>>>>> the
>>>>>>>> next level. As long as an upper limit and lower limit is
>> defined
>>>> for
>>>>>> each
>>>>>>>> node, implementation can handle any time granularity.
>>>>>>>> 
>>>>>>>> Meta point is that in real world, when a user will communicate
>> to
>>>> get
>>>>>>>> statement of April to October. His intentions will never be to
>>>>> exclude
>>>>>>>> October. Exclusion of October will be a surprise for him.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Himanshu
>>>>>>>> 
>>>>>>>> On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
>>>>> amareshwari@gmail.com <javascript:;>
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> When a system has to understand time at all granularities,
>> and
>>>> user
>>>>>> can
>>>>>>>>> give time at all granularities, shouldn't we chose a
>>>> specification
>>>>>>> which
>>>>>>>> is
>>>>>>>>> easy and error prone.
>>>>>>>>> 
>>>>>>>>> The example of banking system is not worried about finer
>>>>>> granularities
>>>>>>> of
>>>>>>>>> time.
>>>>>>>>> 
>>>>>>>>> On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
>>>>>>>>> himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
>>>>>>>>> 
>>>>>>>>>> This has two parts:
>>>>>>>>>> 
>>>>>>>>>> (1) Interfacing with user shall be cognitive. Something
>> which
>>>>> user
>>>>>>> can
>>>>>>>>>> relate to his natural thought process.
>>>>>>>>>> 
>>>>>>>>>> E.g.: In a banking system, if a user goes to a banker to
>> get
>>>> his
>>>>>> bank
>>>>>>>>>> statement. He will say "give me statement from 11th October
>>>> 2015
>>>>> to
>>>>>>>> 23rd
>>>>>>>>>> October 2015". This clearly communicates in simplified
>> language
>>>>>> that
>>>>>>>> user
>>>>>>>>>> is interested in bank statement from 11th October to 23rd
>>>>> October.
>>>>>> If
>>>>>>>>>> banker now does not return data of 23rd October, it will
>> be a
>>>>>>> surprise
>>>>>>>>> for
>>>>>>>>>> user.
>>>>>>>>>> 
>>>>>>>>>> Banker can inform every user that To date is exclusive, so
>> if
>>>> you
>>>>>>> need
>>>>>>>>>> statement till 23rd October you should come and tell me
>> "give
>>>> me
>>>>>>>>> statement
>>>>>>>>>> from 11th October to 24th October". I believe a user will
>>>>>> immediately
>>>>>>>>> reply
>>>>>>>>>> => If I need statement only till 23rd October, why shall I
>>>>>>> communicate
>>>>>>>>> 11th
>>>>>>>>>> October to 24th October. If you still insist, next time the
>>>> user
>>>>>> will
>>>>>>>>> come
>>>>>>>>>> and say "give me statement from 11th October to 24th
>> October
>>>> with
>>>>>>> 24th
>>>>>>>>>> exclusive". System just made it harder for user to
>> communicate
>>>>> with
>>>>>>>>> banker.
>>>>>>>>>> 
>>>>>>>>>> (2) Implementation to achieve cognitive behaviour
>>>>>>>>>> 
>>>>>>>>>> With To date inclusive, banker needs to make sure that it
>> gives
>>>>> all
>>>>>>>>>> transaction records for 23rd October 2015 (To date) at the
>>>> finest
>>>>>>>>>> granularity available with the bank. This will satisfy
>> users
>>>>>>>> requirement
>>>>>>>>>> without any element of surprise.
>>>>>>>>>> 
>>>>>>>>>> If the finest granularity is millisecond. It will have a
>> higher
>>>>>> limit
>>>>>>>> to
>>>>>>>>> it
>>>>>>>>>> which is 999 in most of the systems. Banker will fetch all
>>>>>> partitions
>>>>>>>>> until
>>>>>>>>>> higher limit of finest granularity for 23rd October and
>> return
>>>>> the
>>>>>>>>>> statement to user.
>>>>>>>>>> 
>>>>>>>>>> If there was a leap second/millisecond, it is upto the
>> banker
>>>> to
>>>>>>> handle
>>>>>>>>> and
>>>>>>>>>> give data for it. When a user wants statement from 11th
>> October
>>>>> to
>>>>>>> 23rd
>>>>>>>>>> October, user should not worry about the leap
>>>> second/millisecond
>>>>>>> which
>>>>>>>>>> happened on any of the days from 11th October to 23rd
>> October
>>>>>>>>> (inclusive).
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Himanshu
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
>>>>>>> amareshwari@gmail.com <javascript:;>
>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I prefer time-range end date to be exclusive of the end
>> date,
>>>>>>> because
>>>>>>>>> of
>>>>>>>>>>> the following reasons:
>>>>>>>>>>> 
>>>>>>>>>>> 1. Giving a month range end as start of next month is
>> easy,
>>>> as
>>>>>> you
>>>>>>>> need
>>>>>>>>>> not
>>>>>>>>>>> remember and get confused with number of days in month :
>> 30,
>>>>> 31,
>>>>>>>> 28/29
>>>>>>>>>> (in
>>>>>>>>>>> february leap year) - wont miss ending days because of
>>>>> mistakes.
>>>>>>>>>>> 
>>>>>>>>>>> 2. When you are giving a any range, you need not worry
>> about
>>>>> what
>>>>>>>>>>> granularity should the time be specified.
>>>>>>>>>>> For ex, for including time upto 2015-08-01 05th.
>>>>>>>>>>> 
>>>>>>>>>>> Should user specify the following with inclusive ranges?
>>>>>>>>>>>> 2015-08-01-04 (would be correct, assuming whole of 4 th
>>>> hour
>>>>>> will
>>>>>>>> be
>>>>>>>>>>> included)
>>>>>>>>>>>> 2015-08-01-04-59 (would be correct, but what if there
>> is
>>>> leap
>>>>>>>> second
>>>>>>>>>> :p )
>>>>>>>>>>>> 2015-08-01-04-59-000 (wrong, would include only 000th
>> milli
>>>>>>> second
>>>>>>>>> and
>>>>>>>>>>> not others)
>>>>>>>>>>>> 2015-08-01-04-59-999 (not sure if we will have leap
>> milli
>>>>>>> seconds)
>>>>>>>>>>> 
>>>>>>>>>>> Looking forward to see other arguments on making
>> timerange
>>>> end
>>>>>> date
>>>>>>>>>>> inclusive.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> Amareshwari
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
>>>>>>> prongs@apache.org <javascript:;>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Currently the time_range_in clause is inclusive of the
>>>> start
>>>>>> date
>>>>>>>> and
>>>>>>>>>>>> exclusive of the end date. i.e.
>>>> time_range_in(time_dimension,
>>>>>>>> date1,
>>>>>>>>>>> date2)
>>>>>>>>>>>> translates to date1<=time_dimension<date2
>>>>>>>>>>>> 
>>>>>>>>>>>> Recently there have been multiple discussions regarding
>>>> this
>>>>>>>> approach
>>>>>>>>>> in
>>>>>>>>>>>> our internal forums at InMobi. There have been
>> arguments
>>>> that
>>>>>> as
>>>>>>> an
>>>>>>>>>>>> analytics system, it should accept end date inclusive.
>>>>> Starting
>>>>>>>> this
>>>>>>>>>>> thread
>>>>>>>>>>>> to know opinions of other people regarding both
>> approaches.
>>>>> I'm
>>>>>>>>>> defining
>>>>>>>>>>>> and briefly discussing both approaches below:
>>>>>>>>>>>> 
>>>>>>>>>>>> Approach 1: End date exclusive.
>>>>>>>>>>>> Disclaimer: I personally prefer this approach and would
>>>>>>> shamelessly
>>>>>>>>>> argue
>>>>>>>>>>>> in its favor. I'll give links to some of the articles
>> that
>>>> I
>>>>>> very
>>>>>>>>> much
>>>>>>>>>>>> agree with (no need to reinvent the wheel of argument)
>>>>>>>>>>>> 
>>>>>>>>>>>>   1. Views of Anders Kaseorg, MIT PhD student in CS;
>>>>> Cofounder
>>>>>>> of
>>>>>>>>>>> Ksplice,
>>>>>>>>>>>>   Inc
>>>>>>>>>>>>   <
>> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
>>>>>>>>>>>>   2. Views of Edgar Dijkstra, Creator of Dijkstra's
>>>>> algorithm
>>>>>>>>>>>>   <
>> http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
>>>>>>>>>>>>   3. Both arguments are for generic ranges. I believe
>> time
>>>>>> range
>>>>>>>> is
>>>>>>>>>>> also a
>>>>>>>>>>>>   range and don't see why we need to handle that as a
>>>>> special
>>>>>>>> case.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Approach 2: End date inclusive.
>>>>>>>>>>>> 
>>>>>>>>>>>>   1. Lens query language is a bit like sql and the
>> BETWEEN
>>>>>>> clause
>>>>>>>> in
>>>>>>>>>> sql
>>>>>>>>>>>>   is inclusive of both start and end date.
>>>>>>>>>>>>   2. Most analytic systems --- including google
>> analytics
>>>>> ---
>>>>>>>> follow
>>>>>>>>>>> this
>>>>>>>>>>>>   convention
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm posting the question on a bigger forum
>>>>>>>>>>>> <
>> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
>>>>>>>>>>>> as well in hopes of getting more opinions. I request
>>>> everyone
>>>>>> to
>>>>>>>> post
>>>>>>>>>>> their
>>>>>>>>>>>> views here regarding both approaches.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards
>>>>>>>>>> 
>>>>>>>>>> --
>> _____________________________________________________________
>>>>>>>>>> The information contained in this communication is intended
>>>>> solely
>>>>>>> for
>>>>>>>>> the
>>>>>>>>>> use of the individual or entity to whom it is addressed and
>>>>> others
>>>>>>>>>> authorized to receive it. It may contain confidential or
>>>> legally
>>>>>>>>> privileged
>>>>>>>>>> information. If you are not the intended recipient you are
>>>> hereby
>>>>>>>>> notified
>>>>>>>>>> that any disclosure, copying, distribution or taking any
>> action
>>>>> in
>>>>>>>>> reliance
>>>>>>>>>> on the contents of this information is strictly prohibited
>> and
>>>>> may
>>>>>> be
>>>>>>>>>> unlawful. If you have received this communication in error,
>>>>> please
>>>>>>>> notify
>>>>>>>>>> us immediately by responding to this email and then delete
>> it
>>>>> from
>>>>>>> your
>>>>>>>>>> system. The firm is neither liable for the proper and
>> complete
>>>>>>>>> transmission
>>>>>>>>>> of the information contained in this communication nor for
>> any
>>>>>> delay
>>>>>>> in
>>>>>>>>> its
>>>>>>>>>> receipt.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> _____________________________________________________________
>>>>>>>> The information contained in this communication is intended
>> solely
>>>>> for
>>>>>>> the
>>>>>>>> use of the individual or entity to whom it is addressed and
>> others
>>>>>>>> authorized to receive it. It may contain confidential or
>> legally
>>>>>>> privileged
>>>>>>>> information. If you are not the intended recipient you are
>> hereby
>>>>>>> notified
>>>>>>>> that any disclosure, copying, distribution or taking any
>> action in
>>>>>>> reliance
>>>>>>>> on the contents of this information is strictly prohibited and
>> may
>>>> be
>>>>>>>> unlawful. If you have received this communication in error,
>> please
>>>>>> notify
>>>>>>>> us immediately by responding to this email and then delete it
>> from
>>>>> your
>>>>>>>> system. The firm is neither liable for the proper and complete
>>>>>>> transmission
>>>>>>>> of the information contained in this communication nor for any
>>>> delay
>>>>> in
>>>>>>> its
>>>>>>>> receipt.
>>>>>> 
>>>>>> --
>>>>>> _____________________________________________________________
>>>>>> The information contained in this communication is intended solely
>> for
>>>>> the
>>>>>> use of the individual or entity to whom it is addressed and others
>>>>>> authorized to receive it. It may contain confidential or legally
>>>>> privileged
>>>>>> information. If you are not the intended recipient you are hereby
>>>>> notified
>>>>>> that any disclosure, copying, distribution or taking any action in
>>>>> reliance
>>>>>> on the contents of this information is strictly prohibited and may
>> be
>>>>>> unlawful. If you have received this communication in error, please
>>>> notify
>>>>>> us immediately by responding to this email and then delete it from
>> your
>>>>>> system. The firm is neither liable for the proper and complete
>>>>> transmission
>>>>>> of the information contained in this communication nor for any
>> delay in
>>>>> its
>>>>>> receipt.
>>>> 
>>>> --
>>>> _____________________________________________________________
>>>> The information contained in this communication is intended solely for
>> the
>>>> use of the individual or entity to whom it is addressed and others
>>>> authorized to receive it. It may contain confidential or legally
>> privileged
>>>> information. If you are not the intended recipient you are hereby
>> notified
>>>> that any disclosure, copying, distribution or taking any action in
>> reliance
>>>> on the contents of this information is strictly prohibited and may be
>>>> unlawful. If you have received this communication in error, please
>> notify
>>>> us immediately by responding to this email and then delete it from your
>>>> system. The firm is neither liable for the proper and complete
>> transmission
>>>> of the information contained in this communication nor for any delay
>> in its
>>>> receipt.
>>> 
>>> --
>>> _____________________________________________________________
>>> The information contained in this communication is intended solely for
>> the
>>> use of the individual or entity to whom it is addressed and others
>>> authorized to receive it. It may contain confidential or legally
>> privileged
>>> information. If you are not the intended recipient you are hereby
>> notified
>>> that any disclosure, copying, distribution or taking any action in
>> reliance
>>> on the contents of this information is strictly prohibited and may be
>>> unlawful. If you have received this communication in error, please notify
>>> us immediately by responding to this email and then delete it from your
>>> system. The firm is neither liable for the proper and complete
>> transmission
>>> of the information contained in this communication nor for any delay in
>> its
>>> receipt.
> 
> -- 
> _____________________________________________________________
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally privileged 
> information. If you are not the intended recipient you are hereby notified 
> that any disclosure, copying, distribution or taking any action in reliance 
> on the contents of this information is strictly prohibited and may be 
> unlawful. If you have received this communication in error, please notify 
> us immediately by responding to this email and then delete it from your 
> system. The firm is neither liable for the proper and complete transmission 
> of the information contained in this communication nor for any delay in its 
> receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by Himanshu Gahlaut <hi...@inmobi.com>.
>>>>The reason for me calling it unique is that the grain supported in the
range can be different from the time grain of the data itself (for ex, one
can query data by providing the year while the actual data in the facts may
be in seconds granularity)

[Himanshu]: Support for this behaviour already exists in other databases.
E.g. PostgresSQL allows to query data by providing year while the actual
data in the facts is in seconds granularity. See below.

himanshu.gahlaut=# select * from test_table;

        time         | userid | requests

---------------------+--------+----------

 2005-10-19 10:23:*54* | user1  |     1000

 2007-04-15 09:01:*05* | user1  |     2000

(2 rows)


himanshu.gahlaut=# select sum(requests) as total_requests from test_table
where extract(year from time) BETWEEN '*2005*' and '*2007*';

 total_requests

----------------

           3000

(1 row)


*BETWEEN clause gave data for 2007 as well. From and To are both inclusive
here.*


>>>>If there is an inclination to include inclusive time ranges, the way
forward to me seems like doing away with supporting arbitrary time grain in
the query and always accept time in query to the precise second. Clearly
this would be withdrawing a powerful feature from lens.

[Himanshu]: When PostgresSQL can support an arbitrary time grain in the
query (as shown above) with data stored in facts at seconds granularity,
I'm not sure why Lens will not be able to handle it in implementation. I
already gave a solution in this email thread by which time ranges  can be
made inclusive in Lens without removing the support to give an arbitrary
time grain in the query. Do you see the solution breaking somewhere ?


Regards,

Himanshu

On Thu, Aug 6, 2015 at 9:46 AM, Srikanth Sundarrajan <sr...@hotmail.com>
wrote:

> Here is my 2 cents on this.
>
> As long as there is an unambiguous articulation of what is
> included/excluded in the time range element of the Cube QL, there is no
> forcing function to change this, unless this breaks any well understood
> convention and is counter intuitive. Given the unique nature of the
> time_range parameter supported in Cube QL and what values it can assume,
> there doesn't seem to be risk of breaking convention here. The reason for
> me calling it unique is that the grain supported in the range can be
> different from the time grain of the data itself (for ex, one can query
> data by providing the year while the actual data in the facts may be in
> seconds granularity). The exclusive end date seems to emanate from having
> to support this mismatched time grains between data and user query and the
> system is having to expand the time grain provided by the user to match
> that of the data in the facts.
>
> If there is an inclination to include inclusive time ranges, the way
> forward to me seems like doing away with supporting arbitrary time grain in
> the query and always accept time in query to the precise second. Clearly
> this would be withdrawing a powerful feature from lens, and should be
> considered only if the notion of exclusive end time seems very unintuitive
> for the users.
>
> Regards
> Srikanth Sundarrajan
>
> > Date: Wed, 5 Aug 2015 20:28:46 +0530
> > Subject: Re: [DISCUSS] Time Ranges in Lens
> > From: nitin.gupta@inmobi.com
> > To: dev@lens.incubator.apache.org
> >
> > I believe Users seem to think of a time range as half open, a meeting
> > appointment of 1:00 pm - 2:00 pm means a one hour appointment. But with
> > date ranges, the expectation is to include the end date. A date range
> > specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
> > To be able to make a decision here, I believe we follow what existing
> > Analytics and Data Query systems do. That is to include the end time
> > specified as available with Google Analytics or Support for BETWEEN in
> SQL.
> > That may mean more complexity on the implementation but that's a hit we
> > will have to take.
> >
> > Thanks,
> > Nitin
> >
> >
> > On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
> > himanshu.gahlaut@inmobi.com> wrote:
> >
> > > Have given a real world use case of banking system where making to
> > > field exclusive makes the communication harder for user.
> > >
> > > Please give a real world use case which makes things harder for user by
> > > making to field inclusive OR a real world use case which is impossible
> to
> > > implement by making to field inclusive. So far the points shared below
> for
> > > making to date exclusive are hard to understand and it is difficult to
> > > comprehend the problem which is coming in way of making to date
> inclusive.
> > >
> > > On Wednesday, August 5, 2015, Rajat Khandelwal <rajatgupta59@gmail.com
> >
> > > wrote:
> > >
> > > > Time range consists of a start time and an end time. If start time
> and
> > > end
> > > > time are same, then we would say the ranges are equal.
> > > >
> > > > The three ranges in my example are same by this definition of
> equality.
> > > > While approach 2 hinders that.
> > > >
> > > >
> > > > On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
> > > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > > wrote:
> > > >
> > > > > Why is this a problem ?
> > > > >
> > > > > On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
> > > rajatgupta59@gmail.com
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Problem with that is, that the meaning of range changes with
> > > > granularity.
> > > > > > While with approach 1, it's not the case.
> > > > > >
> > > > > > A Range of 2015-01-01 to 2015-01-02 would mean 2 days,
> 2015-01-01-00
> > > to
> > > > > > 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> > > > > > 2015-01-02-00-00 would mean 2 days and 1 minute, and
> > > > 2015-01-01-00-00-00
> > > > > to
> > > > > > 2015-01-02-00-00-00 would mean 2 days and 1 second.
> > > > > >
> > > > > > While approach 1 has no such problem. A given range doesn't
> change
> > > > with a
> > > > > > change in granularity. Every one of these examples would mean 1
> day.
> > > > > >
> > > > > > Here I've begun the argument with day granularity and went
> deeper.
> > > The
> > > > > same
> > > > > > argument can be started from a Yearly granularity and there the
> > > > different
> > > > > > would be more prominent.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> > > > > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Say granularity is an ordered list of nodes where every node in
> > > list
> > > > is
> > > > > > > parent of its next node.
> > > > > > >
> > > > > > > For a System X: all granularity could be (Year => Month =>
> Day).
> > > > > > > For System Y: all granularity could be (Year => Month => Day =>
> > > Hour
> > > > =>
> > > > > > > Minutes => Seconds => Milliseconds => Microseconds =>
> Picosecond =>
> > > > > > > Femtosecond => Attosecond)
> > > > > > >
> > > > > > > As long as every node has a logic defined to compute the lower
> > > limit
> > > > > and
> > > > > > > upper limit value, implementation should be able to fetch all
> the
> > > > data
> > > > > > > under its parent node.
> > > > > > >
> > > > > > > E.G:
> > > > > > >
> > > > > > > To get all the data for 2015-10-31, implementation needs to
> convert
> > > > > this
> > > > > > > into 2015-10-31-LowerLimitHourNode to
> 2015-10-31-UpperLimitHourNode
> > > > to
> > > > > > > fetch data from hourly partitions, where lower limit hour = 00
> and
> > > > > upper
> > > > > > > limit hour = 23. Similarly any granularity can be handled at
> > > > > > implementation
> > > > > > > level.
> > > > > > >
> > > > > > >
> > > > > > > I guess a banking system in real world would be worried about
> very
> > > > fine
> > > > > > > granularities of time.
> > > > > > >
> > > > > > > Even if we assume banking system has day as the finest
> > > granularity. A
> > > > > > user
> > > > > > > can still ask for statement from the month of April, 2015 to
> > > October,
> > > > > > 2015
> > > > > > > without specifying the specific date fields. In this case bank
> will
> > > > > have
> > > > > > to
> > > > > > > execute the logic to fetch all transactions from April to
> October
> > > > with
> > > > > > both
> > > > > > > inclusive and also handling its finest granularity which is
> assumed
> > > > as
> > > > > > day.
> > > > > > > In this case a banking system would convert 2015-04 to 2015-10
> into
> > > > > > > 2015-04-01 to 2015-10-31 to fetch all data from daily
> partitions,
> > > > where
> > > > > > 01
> > > > > > > is the lower limit for day node in month of April and 31 is the
> > > > higher
> > > > > > > limit for day node in month of October.
> > > > > > >
> > > > > > > If data is not available in daily partitions and bank's
> underlying
> > > > > system
> > > > > > > supports further fine time granularities, it will convert
> 2015-04
> > > to
> > > > > > > 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
> > > hourly
> > > > > > > partitions.
> > > > > > >
> > > > > > > Similarly if data is not available at hourly partitions, it
> will go
> > > > to
> > > > > > the
> > > > > > > next level. As long as an upper limit and lower limit is
> defined
> > > for
> > > > > each
> > > > > > > node, implementation can handle any time granularity.
> > > > > > >
> > > > > > > Meta point is that in real world, when a user will communicate
> to
> > > get
> > > > > > > statement of April to October. His intentions will never be to
> > > > exclude
> > > > > > > October. Exclusion of October will be a surprise for him.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Himanshu
> > > > > > >
> > > > > > > On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
> > > > amareshwari@gmail.com <javascript:;>
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > When a system has to understand time at all granularities,
> and
> > > user
> > > > > can
> > > > > > > > give time at all granularities, shouldn't we chose a
> > > specification
> > > > > > which
> > > > > > > is
> > > > > > > > easy and error prone.
> > > > > > > >
> > > > > > > > The example of banking system is not worried about finer
> > > > > granularities
> > > > > > of
> > > > > > > > time.
> > > > > > > >
> > > > > > > > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > > > > > > > himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
> > > > > > > >
> > > > > > > > > This has two parts:
> > > > > > > > >
> > > > > > > > > (1) Interfacing with user shall be cognitive. Something
> which
> > > > user
> > > > > > can
> > > > > > > > > relate to his natural thought process.
> > > > > > > > >
> > > > > > > > > E.g.: In a banking system, if a user goes to a banker to
> get
> > > his
> > > > > bank
> > > > > > > > > statement. He will say "give me statement from 11th October
> > > 2015
> > > > to
> > > > > > > 23rd
> > > > > > > > > October 2015". This clearly communicates in simplified
> language
> > > > > that
> > > > > > > user
> > > > > > > > > is interested in bank statement from 11th October to 23rd
> > > > October.
> > > > > If
> > > > > > > > > banker now does not return data of 23rd October, it will
> be a
> > > > > > surprise
> > > > > > > > for
> > > > > > > > > user.
> > > > > > > > >
> > > > > > > > > Banker can inform every user that To date is exclusive, so
> if
> > > you
> > > > > > need
> > > > > > > > > statement till 23rd October you should come and tell me
> "give
> > > me
> > > > > > > > statement
> > > > > > > > > from 11th October to 24th October". I believe a user will
> > > > > immediately
> > > > > > > > reply
> > > > > > > > > => If I need statement only till 23rd October, why shall I
> > > > > > communicate
> > > > > > > > 11th
> > > > > > > > > October to 24th October. If you still insist, next time the
> > > user
> > > > > will
> > > > > > > > come
> > > > > > > > > and say "give me statement from 11th October to 24th
> October
> > > with
> > > > > > 24th
> > > > > > > > > exclusive". System just made it harder for user to
> communicate
> > > > with
> > > > > > > > banker.
> > > > > > > > >
> > > > > > > > > (2) Implementation to achieve cognitive behaviour
> > > > > > > > >
> > > > > > > > > With To date inclusive, banker needs to make sure that it
> gives
> > > > all
> > > > > > > > > transaction records for 23rd October 2015 (To date) at the
> > > finest
> > > > > > > > > granularity available with the bank. This will satisfy
> users
> > > > > > > requirement
> > > > > > > > > without any element of surprise.
> > > > > > > > >
> > > > > > > > > If the finest granularity is millisecond. It will have a
> higher
> > > > > limit
> > > > > > > to
> > > > > > > > it
> > > > > > > > > which is 999 in most of the systems. Banker will fetch all
> > > > > partitions
> > > > > > > > until
> > > > > > > > > higher limit of finest granularity for 23rd October and
> return
> > > > the
> > > > > > > > > statement to user.
> > > > > > > > >
> > > > > > > > > If there was a leap second/millisecond, it is upto the
> banker
> > > to
> > > > > > handle
> > > > > > > > and
> > > > > > > > > give data for it. When a user wants statement from 11th
> October
> > > > to
> > > > > > 23rd
> > > > > > > > > October, user should not worry about the leap
> > > second/millisecond
> > > > > > which
> > > > > > > > > happened on any of the days from 11th October to 23rd
> October
> > > > > > > > (inclusive).
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Himanshu
> > > > > > > > >
> > > > > > > > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> > > > > > amareshwari@gmail.com <javascript:;>
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I prefer time-range end date to be exclusive of the end
> date,
> > > > > > because
> > > > > > > > of
> > > > > > > > > > the following reasons:
> > > > > > > > > >
> > > > > > > > > > 1. Giving a month range end as start of next month is
> easy,
> > > as
> > > > > you
> > > > > > > need
> > > > > > > > > not
> > > > > > > > > > remember and get confused with number of days in month :
> 30,
> > > > 31,
> > > > > > > 28/29
> > > > > > > > > (in
> > > > > > > > > > february leap year) - wont miss ending days because of
> > > > mistakes.
> > > > > > > > > >
> > > > > > > > > > 2. When you are giving a any range, you need not worry
> about
> > > > what
> > > > > > > > > > granularity should the time be specified.
> > > > > > > > > > For ex, for including time upto 2015-08-01 05th.
> > > > > > > > > >
> > > > > > > > > > Should user specify the following with inclusive ranges?
> > > > > > > > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th
> > > hour
> > > > > will
> > > > > > > be
> > > > > > > > > > included)
> > > > > > > > > > > 2015-08-01-04-59 (would be correct, but what if there
> is
> > > leap
> > > > > > > second
> > > > > > > > > :p )
> > > > > > > > > > > 2015-08-01-04-59-000 (wrong, would include only 000th
> milli
> > > > > > second
> > > > > > > > and
> > > > > > > > > > not others)
> > > > > > > > > > > 2015-08-01-04-59-999 (not sure if we will have leap
> milli
> > > > > > seconds)
> > > > > > > > > >
> > > > > > > > > > Looking forward to see other arguments on making
> timerange
> > > end
> > > > > date
> > > > > > > > > > inclusive.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Amareshwari
> > > > > > > > > >
> > > > > > > > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> > > > > > prongs@apache.org <javascript:;>>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Currently the time_range_in clause is inclusive of the
> > > start
> > > > > date
> > > > > > > and
> > > > > > > > > > > exclusive of the end date. i.e.
> > > time_range_in(time_dimension,
> > > > > > > date1,
> > > > > > > > > > date2)
> > > > > > > > > > > translates to date1<=time_dimension<date2
> > > > > > > > > > >
> > > > > > > > > > > Recently there have been multiple discussions regarding
> > > this
> > > > > > > approach
> > > > > > > > > in
> > > > > > > > > > > our internal forums at InMobi. There have been
> arguments
> > > that
> > > > > as
> > > > > > an
> > > > > > > > > > > analytics system, it should accept end date inclusive.
> > > > Starting
> > > > > > > this
> > > > > > > > > > thread
> > > > > > > > > > > to know opinions of other people regarding both
> approaches.
> > > > I'm
> > > > > > > > > defining
> > > > > > > > > > > and briefly discussing both approaches below:
> > > > > > > > > > >
> > > > > > > > > > > Approach 1: End date exclusive.
> > > > > > > > > > > Disclaimer: I personally prefer this approach and would
> > > > > > shamelessly
> > > > > > > > > argue
> > > > > > > > > > > in its favor. I'll give links to some of the articles
> that
> > > I
> > > > > very
> > > > > > > > much
> > > > > > > > > > > agree with (no need to reinvent the wheel of argument)
> > > > > > > > > > >
> > > > > > > > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS;
> > > > Cofounder
> > > > > > of
> > > > > > > > > > Ksplice,
> > > > > > > > > > >    Inc
> > > > > > > > > > >    <
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > > > > > > > >
> > > > > > > > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's
> > > > algorithm
> > > > > > > > > > >    <
> http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > > > > > > > >    3. Both arguments are for generic ranges. I believe
> time
> > > > > range
> > > > > > > is
> > > > > > > > > > also a
> > > > > > > > > > >    range and don't see why we need to handle that as a
> > > > special
> > > > > > > case.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Approach 2: End date inclusive.
> > > > > > > > > > >
> > > > > > > > > > >    1. Lens query language is a bit like sql and the
> BETWEEN
> > > > > > clause
> > > > > > > in
> > > > > > > > > sql
> > > > > > > > > > >    is inclusive of both start and end date.
> > > > > > > > > > >    2. Most analytic systems --- including google
> analytics
> > > > ---
> > > > > > > follow
> > > > > > > > > > this
> > > > > > > > > > >    convention
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'm posting the question on a bigger forum
> > > > > > > > > > > <
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > > > > > > > >
> > > > > > > > > > > as well in hopes of getting more opinions. I request
> > > everyone
> > > > > to
> > > > > > > post
> > > > > > > > > > their
> > > > > > > > > > > views here regarding both approaches.
> > > > > > > > > > >
> > > > > > > > > > > Regards
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> _____________________________________________________________
> > > > > > > > > The information contained in this communication is intended
> > > > solely
> > > > > > for
> > > > > > > > the
> > > > > > > > > use of the individual or entity to whom it is addressed and
> > > > others
> > > > > > > > > authorized to receive it. It may contain confidential or
> > > legally
> > > > > > > > privileged
> > > > > > > > > information. If you are not the intended recipient you are
> > > hereby
> > > > > > > > notified
> > > > > > > > > that any disclosure, copying, distribution or taking any
> action
> > > > in
> > > > > > > > reliance
> > > > > > > > > on the contents of this information is strictly prohibited
> and
> > > > may
> > > > > be
> > > > > > > > > unlawful. If you have received this communication in error,
> > > > please
> > > > > > > notify
> > > > > > > > > us immediately by responding to this email and then delete
> it
> > > > from
> > > > > > your
> > > > > > > > > system. The firm is neither liable for the proper and
> complete
> > > > > > > > transmission
> > > > > > > > > of the information contained in this communication nor for
> any
> > > > > delay
> > > > > > in
> > > > > > > > its
> > > > > > > > > receipt.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > _____________________________________________________________
> > > > > > > The information contained in this communication is intended
> solely
> > > > for
> > > > > > the
> > > > > > > use of the individual or entity to whom it is addressed and
> others
> > > > > > > authorized to receive it. It may contain confidential or
> legally
> > > > > > privileged
> > > > > > > information. If you are not the intended recipient you are
> hereby
> > > > > > notified
> > > > > > > that any disclosure, copying, distribution or taking any
> action in
> > > > > > reliance
> > > > > > > on the contents of this information is strictly prohibited and
> may
> > > be
> > > > > > > unlawful. If you have received this communication in error,
> please
> > > > > notify
> > > > > > > us immediately by responding to this email and then delete it
> from
> > > > your
> > > > > > > system. The firm is neither liable for the proper and complete
> > > > > > transmission
> > > > > > > of the information contained in this communication nor for any
> > > delay
> > > > in
> > > > > > its
> > > > > > > receipt.
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > _____________________________________________________________
> > > > > The information contained in this communication is intended solely
> for
> > > > the
> > > > > use of the individual or entity to whom it is addressed and others
> > > > > authorized to receive it. It may contain confidential or legally
> > > > privileged
> > > > > information. If you are not the intended recipient you are hereby
> > > > notified
> > > > > that any disclosure, copying, distribution or taking any action in
> > > > reliance
> > > > > on the contents of this information is strictly prohibited and may
> be
> > > > > unlawful. If you have received this communication in error, please
> > > notify
> > > > > us immediately by responding to this email and then delete it from
> your
> > > > > system. The firm is neither liable for the proper and complete
> > > > transmission
> > > > > of the information contained in this communication nor for any
> delay in
> > > > its
> > > > > receipt.
> > > > >
> > > >
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> privileged
> > > information. If you are not the intended recipient you are hereby
> notified
> > > that any disclosure, copying, distribution or taking any action in
> reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> transmission
> > > of the information contained in this communication nor for any delay
> in its
> > > receipt.
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
>
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by Sharad Agarwal <sh...@apache.org>.
On Thu, Aug 6, 2015 at 9:46 AM, Srikanth Sundarrajan <sr...@hotmail.com>
wrote:

> Here is my 2 cents on this.
>
> As long as there is an unambiguous articulation of what is
> included/excluded in the time range element of the Cube QL, there is no
> forcing function to change this, unless this breaks any well understood
> convention and is counter intuitive. Given the unique nature of the
> time_range parameter supported in Cube QL and what values it can assume,
> there doesn't seem to be risk of breaking convention here. The reason for
> me calling it unique is that the grain supported in the range can be
> different from the time grain of the data itself (for ex, one can query
> data by providing the year while the actual data in the facts may be in
> seconds granularity). The exclusive end date seems to emanate from having
> to support this mismatched time grains between data and user query and the
> system is having to expand the time grain provided by the user to match
> that of the data in the facts.
>


+1 As data and query time ranges need not be aligned. Having exclusive to
condition gives easier query expressivity.


>
> > Date: Wed, 5 Aug 2015 20:28:46 +0530
> > Subject: Re: [DISCUSS] Time Ranges in Lens
> > From: nitin.gupta@inmobi.com
> > To: dev@lens.incubator.apache.org
> >
> > I believe Users seem to think of a time range as half open, a meeting
> > appointment of 1:00 pm - 2:00 pm means a one hour appointment. But with
> > date ranges, the expectation is to include the end date. A date range
> > specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
> > To be able to make a decision here, I believe we follow what existing
> > Analytics and Data Query systems do. That is to include the end time
> > specified as available with Google Analytics or Support for BETWEEN in
> SQL.
> > That may mean more complexity on the implementation but that's a hit we
> > will have to take.
> >
> > Thanks,
> > Nitin
> >
> >
> > On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
> > himanshu.gahlaut@inmobi.com> wrote:
> >
> > > Have given a real world use case of banking system where making to
> > > field exclusive makes the communication harder for user.
> > >
> > > Please give a real world use case which makes things harder for user by
> > > making to field inclusive OR a real world use case which is impossible
> to
> > > implement by making to field inclusive. So far the points shared below
> for
> > > making to date exclusive are hard to understand and it is difficult to
> > > comprehend the problem which is coming in way of making to date
> inclusive.
> > >
> > > On Wednesday, August 5, 2015, Rajat Khandelwal <rajatgupta59@gmail.com
> >
> > > wrote:
> > >
> > > > Time range consists of a start time and an end time. If start time
> and
> > > end
> > > > time are same, then we would say the ranges are equal.
> > > >
> > > > The three ranges in my example are same by this definition of
> equality.
> > > > While approach 2 hinders that.
> > > >
> > > >
> > > > On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
> > > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > > wrote:
> > > >
> > > > > Why is this a problem ?
> > > > >
> > > > > On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
> > > rajatgupta59@gmail.com
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Problem with that is, that the meaning of range changes with
> > > > granularity.
> > > > > > While with approach 1, it's not the case.
> > > > > >
> > > > > > A Range of 2015-01-01 to 2015-01-02 would mean 2 days,
> 2015-01-01-00
> > > to
> > > > > > 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> > > > > > 2015-01-02-00-00 would mean 2 days and 1 minute, and
> > > > 2015-01-01-00-00-00
> > > > > to
> > > > > > 2015-01-02-00-00-00 would mean 2 days and 1 second.
> > > > > >
> > > > > > While approach 1 has no such problem. A given range doesn't
> change
> > > > with a
> > > > > > change in granularity. Every one of these examples would mean 1
> day.
> > > > > >
> > > > > > Here I've begun the argument with day granularity and went
> deeper.
> > > The
> > > > > same
> > > > > > argument can be started from a Yearly granularity and there the
> > > > different
> > > > > > would be more prominent.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> > > > > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Say granularity is an ordered list of nodes where every node in
> > > list
> > > > is
> > > > > > > parent of its next node.
> > > > > > >
> > > > > > > For a System X: all granularity could be (Year => Month =>
> Day).
> > > > > > > For System Y: all granularity could be (Year => Month => Day =>
> > > Hour
> > > > =>
> > > > > > > Minutes => Seconds => Milliseconds => Microseconds =>
> Picosecond =>
> > > > > > > Femtosecond => Attosecond)
> > > > > > >
> > > > > > > As long as every node has a logic defined to compute the lower
> > > limit
> > > > > and
> > > > > > > upper limit value, implementation should be able to fetch all
> the
> > > > data
> > > > > > > under its parent node.
> > > > > > >
> > > > > > > E.G:
> > > > > > >
> > > > > > > To get all the data for 2015-10-31, implementation needs to
> convert
> > > > > this
> > > > > > > into 2015-10-31-LowerLimitHourNode to
> 2015-10-31-UpperLimitHourNode
> > > > to
> > > > > > > fetch data from hourly partitions, where lower limit hour = 00
> and
> > > > > upper
> > > > > > > limit hour = 23. Similarly any granularity can be handled at
> > > > > > implementation
> > > > > > > level.
> > > > > > >
> > > > > > >
> > > > > > > I guess a banking system in real world would be worried about
> very
> > > > fine
> > > > > > > granularities of time.
> > > > > > >
> > > > > > > Even if we assume banking system has day as the finest
> > > granularity. A
> > > > > > user
> > > > > > > can still ask for statement from the month of April, 2015 to
> > > October,
> > > > > > 2015
> > > > > > > without specifying the specific date fields. In this case bank
> will
> > > > > have
> > > > > > to
> > > > > > > execute the logic to fetch all transactions from April to
> October
> > > > with
> > > > > > both
> > > > > > > inclusive and also handling its finest granularity which is
> assumed
> > > > as
> > > > > > day.
> > > > > > > In this case a banking system would convert 2015-04 to 2015-10
> into
> > > > > > > 2015-04-01 to 2015-10-31 to fetch all data from daily
> partitions,
> > > > where
> > > > > > 01
> > > > > > > is the lower limit for day node in month of April and 31 is the
> > > > higher
> > > > > > > limit for day node in month of October.
> > > > > > >
> > > > > > > If data is not available in daily partitions and bank's
> underlying
> > > > > system
> > > > > > > supports further fine time granularities, it will convert
> 2015-04
> > > to
> > > > > > > 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
> > > hourly
> > > > > > > partitions.
> > > > > > >
> > > > > > > Similarly if data is not available at hourly partitions, it
> will go
> > > > to
> > > > > > the
> > > > > > > next level. As long as an upper limit and lower limit is
> defined
> > > for
> > > > > each
> > > > > > > node, implementation can handle any time granularity.
> > > > > > >
> > > > > > > Meta point is that in real world, when a user will communicate
> to
> > > get
> > > > > > > statement of April to October. His intentions will never be to
> > > > exclude
> > > > > > > October. Exclusion of October will be a surprise for him.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Himanshu
> > > > > > >
> > > > > > > On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
> > > > amareshwari@gmail.com <javascript:;>
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > When a system has to understand time at all granularities,
> and
> > > user
> > > > > can
> > > > > > > > give time at all granularities, shouldn't we chose a
> > > specification
> > > > > > which
> > > > > > > is
> > > > > > > > easy and error prone.
> > > > > > > >
> > > > > > > > The example of banking system is not worried about finer
> > > > > granularities
> > > > > > of
> > > > > > > > time.
> > > > > > > >
> > > > > > > > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > > > > > > > himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
> > > > > > > >
> > > > > > > > > This has two parts:
> > > > > > > > >
> > > > > > > > > (1) Interfacing with user shall be cognitive. Something
> which
> > > > user
> > > > > > can
> > > > > > > > > relate to his natural thought process.
> > > > > > > > >
> > > > > > > > > E.g.: In a banking system, if a user goes to a banker to
> get
> > > his
> > > > > bank
> > > > > > > > > statement. He will say "give me statement from 11th October
> > > 2015
> > > > to
> > > > > > > 23rd
> > > > > > > > > October 2015". This clearly communicates in simplified
> language
> > > > > that
> > > > > > > user
> > > > > > > > > is interested in bank statement from 11th October to 23rd
> > > > October.
> > > > > If
> > > > > > > > > banker now does not return data of 23rd October, it will
> be a
> > > > > > surprise
> > > > > > > > for
> > > > > > > > > user.
> > > > > > > > >
> > > > > > > > > Banker can inform every user that To date is exclusive, so
> if
> > > you
> > > > > > need
> > > > > > > > > statement till 23rd October you should come and tell me
> "give
> > > me
> > > > > > > > statement
> > > > > > > > > from 11th October to 24th October". I believe a user will
> > > > > immediately
> > > > > > > > reply
> > > > > > > > > => If I need statement only till 23rd October, why shall I
> > > > > > communicate
> > > > > > > > 11th
> > > > > > > > > October to 24th October. If you still insist, next time the
> > > user
> > > > > will
> > > > > > > > come
> > > > > > > > > and say "give me statement from 11th October to 24th
> October
> > > with
> > > > > > 24th
> > > > > > > > > exclusive". System just made it harder for user to
> communicate
> > > > with
> > > > > > > > banker.
> > > > > > > > >
> > > > > > > > > (2) Implementation to achieve cognitive behaviour
> > > > > > > > >
> > > > > > > > > With To date inclusive, banker needs to make sure that it
> gives
> > > > all
> > > > > > > > > transaction records for 23rd October 2015 (To date) at the
> > > finest
> > > > > > > > > granularity available with the bank. This will satisfy
> users
> > > > > > > requirement
> > > > > > > > > without any element of surprise.
> > > > > > > > >
> > > > > > > > > If the finest granularity is millisecond. It will have a
> higher
> > > > > limit
> > > > > > > to
> > > > > > > > it
> > > > > > > > > which is 999 in most of the systems. Banker will fetch all
> > > > > partitions
> > > > > > > > until
> > > > > > > > > higher limit of finest granularity for 23rd October and
> return
> > > > the
> > > > > > > > > statement to user.
> > > > > > > > >
> > > > > > > > > If there was a leap second/millisecond, it is upto the
> banker
> > > to
> > > > > > handle
> > > > > > > > and
> > > > > > > > > give data for it. When a user wants statement from 11th
> October
> > > > to
> > > > > > 23rd
> > > > > > > > > October, user should not worry about the leap
> > > second/millisecond
> > > > > > which
> > > > > > > > > happened on any of the days from 11th October to 23rd
> October
> > > > > > > > (inclusive).
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Himanshu
> > > > > > > > >
> > > > > > > > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> > > > > > amareshwari@gmail.com <javascript:;>
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I prefer time-range end date to be exclusive of the end
> date,
> > > > > > because
> > > > > > > > of
> > > > > > > > > > the following reasons:
> > > > > > > > > >
> > > > > > > > > > 1. Giving a month range end as start of next month is
> easy,
> > > as
> > > > > you
> > > > > > > need
> > > > > > > > > not
> > > > > > > > > > remember and get confused with number of days in month :
> 30,
> > > > 31,
> > > > > > > 28/29
> > > > > > > > > (in
> > > > > > > > > > february leap year) - wont miss ending days because of
> > > > mistakes.
> > > > > > > > > >
> > > > > > > > > > 2. When you are giving a any range, you need not worry
> about
> > > > what
> > > > > > > > > > granularity should the time be specified.
> > > > > > > > > > For ex, for including time upto 2015-08-01 05th.
> > > > > > > > > >
> > > > > > > > > > Should user specify the following with inclusive ranges?
> > > > > > > > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th
> > > hour
> > > > > will
> > > > > > > be
> > > > > > > > > > included)
> > > > > > > > > > > 2015-08-01-04-59 (would be correct, but what if there
> is
> > > leap
> > > > > > > second
> > > > > > > > > :p )
> > > > > > > > > > > 2015-08-01-04-59-000 (wrong, would include only 000th
> milli
> > > > > > second
> > > > > > > > and
> > > > > > > > > > not others)
> > > > > > > > > > > 2015-08-01-04-59-999 (not sure if we will have leap
> milli
> > > > > > seconds)
> > > > > > > > > >
> > > > > > > > > > Looking forward to see other arguments on making
> timerange
> > > end
> > > > > date
> > > > > > > > > > inclusive.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Amareshwari
> > > > > > > > > >
> > > > > > > > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> > > > > > prongs@apache.org <javascript:;>>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Currently the time_range_in clause is inclusive of the
> > > start
> > > > > date
> > > > > > > and
> > > > > > > > > > > exclusive of the end date. i.e.
> > > time_range_in(time_dimension,
> > > > > > > date1,
> > > > > > > > > > date2)
> > > > > > > > > > > translates to date1<=time_dimension<date2
> > > > > > > > > > >
> > > > > > > > > > > Recently there have been multiple discussions regarding
> > > this
> > > > > > > approach
> > > > > > > > > in
> > > > > > > > > > > our internal forums at InMobi. There have been
> arguments
> > > that
> > > > > as
> > > > > > an
> > > > > > > > > > > analytics system, it should accept end date inclusive.
> > > > Starting
> > > > > > > this
> > > > > > > > > > thread
> > > > > > > > > > > to know opinions of other people regarding both
> approaches.
> > > > I'm
> > > > > > > > > defining
> > > > > > > > > > > and briefly discussing both approaches below:
> > > > > > > > > > >
> > > > > > > > > > > Approach 1: End date exclusive.
> > > > > > > > > > > Disclaimer: I personally prefer this approach and would
> > > > > > shamelessly
> > > > > > > > > argue
> > > > > > > > > > > in its favor. I'll give links to some of the articles
> that
> > > I
> > > > > very
> > > > > > > > much
> > > > > > > > > > > agree with (no need to reinvent the wheel of argument)
> > > > > > > > > > >
> > > > > > > > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS;
> > > > Cofounder
> > > > > > of
> > > > > > > > > > Ksplice,
> > > > > > > > > > >    Inc
> > > > > > > > > > >    <
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > > > > > > > >
> > > > > > > > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's
> > > > algorithm
> > > > > > > > > > >    <
> http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > > > > > > > >    3. Both arguments are for generic ranges. I believe
> time
> > > > > range
> > > > > > > is
> > > > > > > > > > also a
> > > > > > > > > > >    range and don't see why we need to handle that as a
> > > > special
> > > > > > > case.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Approach 2: End date inclusive.
> > > > > > > > > > >
> > > > > > > > > > >    1. Lens query language is a bit like sql and the
> BETWEEN
> > > > > > clause
> > > > > > > in
> > > > > > > > > sql
> > > > > > > > > > >    is inclusive of both start and end date.
> > > > > > > > > > >    2. Most analytic systems --- including google
> analytics
> > > > ---
> > > > > > > follow
> > > > > > > > > > this
> > > > > > > > > > >    convention
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'm posting the question on a bigger forum
> > > > > > > > > > > <
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > > > > > > > >
> > > > > > > > > > > as well in hopes of getting more opinions. I request
> > > everyone
> > > > > to
> > > > > > > post
> > > > > > > > > > their
> > > > > > > > > > > views here regarding both approaches.
> > > > > > > > > > >
> > > > > > > > > > > Regards
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> _____________________________________________________________
> > > > > > > > > The information contained in this communication is intended
> > > > solely
> > > > > > for
> > > > > > > > the
> > > > > > > > > use of the individual or entity to whom it is addressed and
> > > > others
> > > > > > > > > authorized to receive it. It may contain confidential or
> > > legally
> > > > > > > > privileged
> > > > > > > > > information. If you are not the intended recipient you are
> > > hereby
> > > > > > > > notified
> > > > > > > > > that any disclosure, copying, distribution or taking any
> action
> > > > in
> > > > > > > > reliance
> > > > > > > > > on the contents of this information is strictly prohibited
> and
> > > > may
> > > > > be
> > > > > > > > > unlawful. If you have received this communication in error,
> > > > please
> > > > > > > notify
> > > > > > > > > us immediately by responding to this email and then delete
> it
> > > > from
> > > > > > your
> > > > > > > > > system. The firm is neither liable for the proper and
> complete
> > > > > > > > transmission
> > > > > > > > > of the information contained in this communication nor for
> any
> > > > > delay
> > > > > > in
> > > > > > > > its
> > > > > > > > > receipt.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > _____________________________________________________________
> > > > > > > The information contained in this communication is intended
> solely
> > > > for
> > > > > > the
> > > > > > > use of the individual or entity to whom it is addressed and
> others
> > > > > > > authorized to receive it. It may contain confidential or
> legally
> > > > > > privileged
> > > > > > > information. If you are not the intended recipient you are
> hereby
> > > > > > notified
> > > > > > > that any disclosure, copying, distribution or taking any
> action in
> > > > > > reliance
> > > > > > > on the contents of this information is strictly prohibited and
> may
> > > be
> > > > > > > unlawful. If you have received this communication in error,
> please
> > > > > notify
> > > > > > > us immediately by responding to this email and then delete it
> from
> > > > your
> > > > > > > system. The firm is neither liable for the proper and complete
> > > > > > transmission
> > > > > > > of the information contained in this communication nor for any
> > > delay
> > > > in
> > > > > > its
> > > > > > > receipt.
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > _____________________________________________________________
> > > > > The information contained in this communication is intended solely
> for
> > > > the
> > > > > use of the individual or entity to whom it is addressed and others
> > > > > authorized to receive it. It may contain confidential or legally
> > > > privileged
> > > > > information. If you are not the intended recipient you are hereby
> > > > notified
> > > > > that any disclosure, copying, distribution or taking any action in
> > > > reliance
> > > > > on the contents of this information is strictly prohibited and may
> be
> > > > > unlawful. If you have received this communication in error, please
> > > notify
> > > > > us immediately by responding to this email and then delete it from
> your
> > > > > system. The firm is neither liable for the proper and complete
> > > > transmission
> > > > > of the information contained in this communication nor for any
> delay in
> > > > its
> > > > > receipt.
> > > > >
> > > >
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> privileged
> > > information. If you are not the intended recipient you are hereby
> notified
> > > that any disclosure, copying, distribution or taking any action in
> reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> transmission
> > > of the information contained in this communication nor for any delay
> in its
> > > receipt.
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
>
>

RE: [DISCUSS] Time Ranges in Lens

Posted by Srikanth Sundarrajan <sr...@hotmail.com>.
Here is my 2 cents on this.

As long as there is an unambiguous articulation of what is included/excluded in the time range element of the Cube QL, there is no forcing function to change this, unless this breaks any well understood convention and is counter intuitive. Given the unique nature of the time_range parameter supported in Cube QL and what values it can assume, there doesn't seem to be risk of breaking convention here. The reason for me calling it unique is that the grain supported in the range can be different from the time grain of the data itself (for ex, one can query data by providing the year while the actual data in the facts may be in seconds granularity). The exclusive end date seems to emanate from having to support this mismatched time grains between data and user query and the system is having to expand the time grain provided by the user to match that of the data in the facts.

If there is an inclination to include inclusive time ranges, the way forward to me seems like doing away with supporting arbitrary time grain in the query and always accept time in query to the precise second. Clearly this would be withdrawing a powerful feature from lens, and should be considered only if the notion of exclusive end time seems very unintuitive for the users.

Regards
Srikanth Sundarrajan

> Date: Wed, 5 Aug 2015 20:28:46 +0530
> Subject: Re: [DISCUSS] Time Ranges in Lens
> From: nitin.gupta@inmobi.com
> To: dev@lens.incubator.apache.org
> 
> I believe Users seem to think of a time range as half open, a meeting
> appointment of 1:00 pm - 2:00 pm means a one hour appointment. But with
> date ranges, the expectation is to include the end date. A date range
> specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
> To be able to make a decision here, I believe we follow what existing
> Analytics and Data Query systems do. That is to include the end time
> specified as available with Google Analytics or Support for BETWEEN in SQL.
> That may mean more complexity on the implementation but that's a hit we
> will have to take.
> 
> Thanks,
> Nitin
> 
> 
> On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
> himanshu.gahlaut@inmobi.com> wrote:
> 
> > Have given a real world use case of banking system where making to
> > field exclusive makes the communication harder for user.
> >
> > Please give a real world use case which makes things harder for user by
> > making to field inclusive OR a real world use case which is impossible to
> > implement by making to field inclusive. So far the points shared below for
> > making to date exclusive are hard to understand and it is difficult to
> > comprehend the problem which is coming in way of making to date inclusive.
> >
> > On Wednesday, August 5, 2015, Rajat Khandelwal <ra...@gmail.com>
> > wrote:
> >
> > > Time range consists of a start time and an end time. If start time and
> > end
> > > time are same, then we would say the ranges are equal.
> > >
> > > The three ranges in my example are same by this definition of equality.
> > > While approach 2 hinders that.
> > >
> > >
> > > On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
> > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > wrote:
> > >
> > > > Why is this a problem ?
> > > >
> > > > On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
> > rajatgupta59@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Problem with that is, that the meaning of range changes with
> > > granularity.
> > > > > While with approach 1, it's not the case.
> > > > >
> > > > > A Range of 2015-01-01 to 2015-01-02 would mean 2 days, 2015-01-01-00
> > to
> > > > > 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> > > > > 2015-01-02-00-00 would mean 2 days and 1 minute, and
> > > 2015-01-01-00-00-00
> > > > to
> > > > > 2015-01-02-00-00-00 would mean 2 days and 1 second.
> > > > >
> > > > > While approach 1 has no such problem. A given range doesn't change
> > > with a
> > > > > change in granularity. Every one of these examples would mean 1 day.
> > > > >
> > > > > Here I've begun the argument with day granularity and went deeper.
> > The
> > > > same
> > > > > argument can be started from a Yearly granularity and there the
> > > different
> > > > > would be more prominent.
> > > > >
> > > > > Regards
> > > > >
> > > > > On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> > > > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Say granularity is an ordered list of nodes where every node in
> > list
> > > is
> > > > > > parent of its next node.
> > > > > >
> > > > > > For a System X: all granularity could be (Year => Month => Day).
> > > > > > For System Y: all granularity could be (Year => Month => Day =>
> > Hour
> > > =>
> > > > > > Minutes => Seconds => Milliseconds => Microseconds => Picosecond =>
> > > > > > Femtosecond => Attosecond)
> > > > > >
> > > > > > As long as every node has a logic defined to compute the lower
> > limit
> > > > and
> > > > > > upper limit value, implementation should be able to fetch all the
> > > data
> > > > > > under its parent node.
> > > > > >
> > > > > > E.G:
> > > > > >
> > > > > > To get all the data for 2015-10-31, implementation needs to convert
> > > > this
> > > > > > into 2015-10-31-LowerLimitHourNode to 2015-10-31-UpperLimitHourNode
> > > to
> > > > > > fetch data from hourly partitions, where lower limit hour = 00 and
> > > > upper
> > > > > > limit hour = 23. Similarly any granularity can be handled at
> > > > > implementation
> > > > > > level.
> > > > > >
> > > > > >
> > > > > > I guess a banking system in real world would be worried about very
> > > fine
> > > > > > granularities of time.
> > > > > >
> > > > > > Even if we assume banking system has day as the finest
> > granularity. A
> > > > > user
> > > > > > can still ask for statement from the month of April, 2015 to
> > October,
> > > > > 2015
> > > > > > without specifying the specific date fields. In this case bank will
> > > > have
> > > > > to
> > > > > > execute the logic to fetch all transactions from April to October
> > > with
> > > > > both
> > > > > > inclusive and also handling its finest granularity which is assumed
> > > as
> > > > > day.
> > > > > > In this case a banking system would convert 2015-04 to 2015-10 into
> > > > > > 2015-04-01 to 2015-10-31 to fetch all data from daily partitions,
> > > where
> > > > > 01
> > > > > > is the lower limit for day node in month of April and 31 is the
> > > higher
> > > > > > limit for day node in month of October.
> > > > > >
> > > > > > If data is not available in daily partitions and bank's underlying
> > > > system
> > > > > > supports further fine time granularities, it will convert 2015-04
> > to
> > > > > > 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
> > hourly
> > > > > > partitions.
> > > > > >
> > > > > > Similarly if data is not available at hourly partitions, it will go
> > > to
> > > > > the
> > > > > > next level. As long as an upper limit and lower limit is defined
> > for
> > > > each
> > > > > > node, implementation can handle any time granularity.
> > > > > >
> > > > > > Meta point is that in real world, when a user will communicate to
> > get
> > > > > > statement of April to October. His intentions will never be to
> > > exclude
> > > > > > October. Exclusion of October will be a surprise for him.
> > > > > >
> > > > > > Regards,
> > > > > > Himanshu
> > > > > >
> > > > > > On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
> > > amareshwari@gmail.com <javascript:;>
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > When a system has to understand time at all granularities, and
> > user
> > > > can
> > > > > > > give time at all granularities, shouldn't we chose a
> > specification
> > > > > which
> > > > > > is
> > > > > > > easy and error prone.
> > > > > > >
> > > > > > > The example of banking system is not worried about finer
> > > > granularities
> > > > > of
> > > > > > > time.
> > > > > > >
> > > > > > > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > > > > > > himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
> > > > > > >
> > > > > > > > This has two parts:
> > > > > > > >
> > > > > > > > (1) Interfacing with user shall be cognitive. Something which
> > > user
> > > > > can
> > > > > > > > relate to his natural thought process.
> > > > > > > >
> > > > > > > > E.g.: In a banking system, if a user goes to a banker to get
> > his
> > > > bank
> > > > > > > > statement. He will say "give me statement from 11th October
> > 2015
> > > to
> > > > > > 23rd
> > > > > > > > October 2015". This clearly communicates in simplified language
> > > > that
> > > > > > user
> > > > > > > > is interested in bank statement from 11th October to 23rd
> > > October.
> > > > If
> > > > > > > > banker now does not return data of 23rd October, it will be a
> > > > > surprise
> > > > > > > for
> > > > > > > > user.
> > > > > > > >
> > > > > > > > Banker can inform every user that To date is exclusive, so if
> > you
> > > > > need
> > > > > > > > statement till 23rd October you should come and tell me "give
> > me
> > > > > > > statement
> > > > > > > > from 11th October to 24th October". I believe a user will
> > > > immediately
> > > > > > > reply
> > > > > > > > => If I need statement only till 23rd October, why shall I
> > > > > communicate
> > > > > > > 11th
> > > > > > > > October to 24th October. If you still insist, next time the
> > user
> > > > will
> > > > > > > come
> > > > > > > > and say "give me statement from 11th October to 24th October
> > with
> > > > > 24th
> > > > > > > > exclusive". System just made it harder for user to communicate
> > > with
> > > > > > > banker.
> > > > > > > >
> > > > > > > > (2) Implementation to achieve cognitive behaviour
> > > > > > > >
> > > > > > > > With To date inclusive, banker needs to make sure that it gives
> > > all
> > > > > > > > transaction records for 23rd October 2015 (To date) at the
> > finest
> > > > > > > > granularity available with the bank. This will satisfy users
> > > > > > requirement
> > > > > > > > without any element of surprise.
> > > > > > > >
> > > > > > > > If the finest granularity is millisecond. It will have a higher
> > > > limit
> > > > > > to
> > > > > > > it
> > > > > > > > which is 999 in most of the systems. Banker will fetch all
> > > > partitions
> > > > > > > until
> > > > > > > > higher limit of finest granularity for 23rd October and return
> > > the
> > > > > > > > statement to user.
> > > > > > > >
> > > > > > > > If there was a leap second/millisecond, it is upto the banker
> > to
> > > > > handle
> > > > > > > and
> > > > > > > > give data for it. When a user wants statement from 11th October
> > > to
> > > > > 23rd
> > > > > > > > October, user should not worry about the leap
> > second/millisecond
> > > > > which
> > > > > > > > happened on any of the days from 11th October to 23rd October
> > > > > > > (inclusive).
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Himanshu
> > > > > > > >
> > > > > > > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> > > > > amareshwari@gmail.com <javascript:;>
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I prefer time-range end date to be exclusive of the end date,
> > > > > because
> > > > > > > of
> > > > > > > > > the following reasons:
> > > > > > > > >
> > > > > > > > > 1. Giving a month range end as start of next month is easy,
> > as
> > > > you
> > > > > > need
> > > > > > > > not
> > > > > > > > > remember and get confused with number of days in month : 30,
> > > 31,
> > > > > > 28/29
> > > > > > > > (in
> > > > > > > > > february leap year) - wont miss ending days because of
> > > mistakes.
> > > > > > > > >
> > > > > > > > > 2. When you are giving a any range, you need not worry about
> > > what
> > > > > > > > > granularity should the time be specified.
> > > > > > > > > For ex, for including time upto 2015-08-01 05th.
> > > > > > > > >
> > > > > > > > > Should user specify the following with inclusive ranges?
> > > > > > > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th
> > hour
> > > > will
> > > > > > be
> > > > > > > > > included)
> > > > > > > > > > 2015-08-01-04-59 (would be correct, but what if there is
> > leap
> > > > > > second
> > > > > > > > :p )
> > > > > > > > > > 2015-08-01-04-59-000 (wrong, would include only 000th milli
> > > > > second
> > > > > > > and
> > > > > > > > > not others)
> > > > > > > > > > 2015-08-01-04-59-999 (not sure if we will have leap milli
> > > > > seconds)
> > > > > > > > >
> > > > > > > > > Looking forward to see other arguments on making timerange
> > end
> > > > date
> > > > > > > > > inclusive.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Amareshwari
> > > > > > > > >
> > > > > > > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> > > > > prongs@apache.org <javascript:;>>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Currently the time_range_in clause is inclusive of the
> > start
> > > > date
> > > > > > and
> > > > > > > > > > exclusive of the end date. i.e.
> > time_range_in(time_dimension,
> > > > > > date1,
> > > > > > > > > date2)
> > > > > > > > > > translates to date1<=time_dimension<date2
> > > > > > > > > >
> > > > > > > > > > Recently there have been multiple discussions regarding
> > this
> > > > > > approach
> > > > > > > > in
> > > > > > > > > > our internal forums at InMobi. There have been arguments
> > that
> > > > as
> > > > > an
> > > > > > > > > > analytics system, it should accept end date inclusive.
> > > Starting
> > > > > > this
> > > > > > > > > thread
> > > > > > > > > > to know opinions of other people regarding both approaches.
> > > I'm
> > > > > > > > defining
> > > > > > > > > > and briefly discussing both approaches below:
> > > > > > > > > >
> > > > > > > > > > Approach 1: End date exclusive.
> > > > > > > > > > Disclaimer: I personally prefer this approach and would
> > > > > shamelessly
> > > > > > > > argue
> > > > > > > > > > in its favor. I'll give links to some of the articles that
> > I
> > > > very
> > > > > > > much
> > > > > > > > > > agree with (no need to reinvent the wheel of argument)
> > > > > > > > > >
> > > > > > > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS;
> > > Cofounder
> > > > > of
> > > > > > > > > Ksplice,
> > > > > > > > > >    Inc
> > > > > > > > > >    <
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > > > > > > >
> > > > > > > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's
> > > algorithm
> > > > > > > > > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > > > > > > >    3. Both arguments are for generic ranges. I believe time
> > > > range
> > > > > > is
> > > > > > > > > also a
> > > > > > > > > >    range and don't see why we need to handle that as a
> > > special
> > > > > > case.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Approach 2: End date inclusive.
> > > > > > > > > >
> > > > > > > > > >    1. Lens query language is a bit like sql and the BETWEEN
> > > > > clause
> > > > > > in
> > > > > > > > sql
> > > > > > > > > >    is inclusive of both start and end date.
> > > > > > > > > >    2. Most analytic systems --- including google analytics
> > > ---
> > > > > > follow
> > > > > > > > > this
> > > > > > > > > >    convention
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I'm posting the question on a bigger forum
> > > > > > > > > > <
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > > > > > > >
> > > > > > > > > > as well in hopes of getting more opinions. I request
> > everyone
> > > > to
> > > > > > post
> > > > > > > > > their
> > > > > > > > > > views here regarding both approaches.
> > > > > > > > > >
> > > > > > > > > > Regards
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > _____________________________________________________________
> > > > > > > > The information contained in this communication is intended
> > > solely
> > > > > for
> > > > > > > the
> > > > > > > > use of the individual or entity to whom it is addressed and
> > > others
> > > > > > > > authorized to receive it. It may contain confidential or
> > legally
> > > > > > > privileged
> > > > > > > > information. If you are not the intended recipient you are
> > hereby
> > > > > > > notified
> > > > > > > > that any disclosure, copying, distribution or taking any action
> > > in
> > > > > > > reliance
> > > > > > > > on the contents of this information is strictly prohibited and
> > > may
> > > > be
> > > > > > > > unlawful. If you have received this communication in error,
> > > please
> > > > > > notify
> > > > > > > > us immediately by responding to this email and then delete it
> > > from
> > > > > your
> > > > > > > > system. The firm is neither liable for the proper and complete
> > > > > > > transmission
> > > > > > > > of the information contained in this communication nor for any
> > > > delay
> > > > > in
> > > > > > > its
> > > > > > > > receipt.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > _____________________________________________________________
> > > > > > The information contained in this communication is intended solely
> > > for
> > > > > the
> > > > > > use of the individual or entity to whom it is addressed and others
> > > > > > authorized to receive it. It may contain confidential or legally
> > > > > privileged
> > > > > > information. If you are not the intended recipient you are hereby
> > > > > notified
> > > > > > that any disclosure, copying, distribution or taking any action in
> > > > > reliance
> > > > > > on the contents of this information is strictly prohibited and may
> > be
> > > > > > unlawful. If you have received this communication in error, please
> > > > notify
> > > > > > us immediately by responding to this email and then delete it from
> > > your
> > > > > > system. The firm is neither liable for the proper and complete
> > > > > transmission
> > > > > > of the information contained in this communication nor for any
> > delay
> > > in
> > > > > its
> > > > > > receipt.
> > > > > >
> > > > >
> > > >
> > > > --
> > > > _____________________________________________________________
> > > > The information contained in this communication is intended solely for
> > > the
> > > > use of the individual or entity to whom it is addressed and others
> > > > authorized to receive it. It may contain confidential or legally
> > > privileged
> > > > information. If you are not the intended recipient you are hereby
> > > notified
> > > > that any disclosure, copying, distribution or taking any action in
> > > reliance
> > > > on the contents of this information is strictly prohibited and may be
> > > > unlawful. If you have received this communication in error, please
> > notify
> > > > us immediately by responding to this email and then delete it from your
> > > > system. The firm is neither liable for the proper and complete
> > > transmission
> > > > of the information contained in this communication nor for any delay in
> > > its
> > > > receipt.
> > > >
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally privileged
> > information. If you are not the intended recipient you are hereby notified
> > that any disclosure, copying, distribution or taking any action in reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete transmission
> > of the information contained in this communication nor for any delay in its
> > receipt.
> >
> 
> -- 
> _____________________________________________________________
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally privileged 
> information. If you are not the intended recipient you are hereby notified 
> that any disclosure, copying, distribution or taking any action in reliance 
> on the contents of this information is strictly prohibited and may be 
> unlawful. If you have received this communication in error, please notify 
> us immediately by responding to this email and then delete it from your 
> system. The firm is neither liable for the proper and complete transmission 
> of the information contained in this communication nor for any delay in its 
> receipt.
 		 	   		  

Re: [DISCUSS] Time Ranges in Lens

Posted by Nitin Gupta <ni...@inmobi.com>.
I believe Users seem to think of a time range as half open, a meeting
appointment of 1:00 pm - 2:00 pm means a one hour appointment. But with
date ranges, the expectation is to include the end date. A date range
specification of Jan 1 - Jan 31 would mean inclusion of Jan 31.
To be able to make a decision here, I believe we follow what existing
Analytics and Data Query systems do. That is to include the end time
specified as available with Google Analytics or Support for BETWEEN in SQL.
That may mean more complexity on the implementation but that's a hit we
will have to take.

Thanks,
Nitin


On Wed, Aug 5, 2015 at 7:06 PM, Himanshu Gahlaut <
himanshu.gahlaut@inmobi.com> wrote:

> Have given a real world use case of banking system where making to
> field exclusive makes the communication harder for user.
>
> Please give a real world use case which makes things harder for user by
> making to field inclusive OR a real world use case which is impossible to
> implement by making to field inclusive. So far the points shared below for
> making to date exclusive are hard to understand and it is difficult to
> comprehend the problem which is coming in way of making to date inclusive.
>
> On Wednesday, August 5, 2015, Rajat Khandelwal <ra...@gmail.com>
> wrote:
>
> > Time range consists of a start time and an end time. If start time and
> end
> > time are same, then we would say the ranges are equal.
> >
> > The three ranges in my example are same by this definition of equality.
> > While approach 2 hinders that.
> >
> >
> > On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
> > himanshu.gahlaut@inmobi.com <javascript:;>>
> > wrote:
> >
> > > Why is this a problem ?
> > >
> > > On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <
> rajatgupta59@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Problem with that is, that the meaning of range changes with
> > granularity.
> > > > While with approach 1, it's not the case.
> > > >
> > > > A Range of 2015-01-01 to 2015-01-02 would mean 2 days, 2015-01-01-00
> to
> > > > 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> > > > 2015-01-02-00-00 would mean 2 days and 1 minute, and
> > 2015-01-01-00-00-00
> > > to
> > > > 2015-01-02-00-00-00 would mean 2 days and 1 second.
> > > >
> > > > While approach 1 has no such problem. A given range doesn't change
> > with a
> > > > change in granularity. Every one of these examples would mean 1 day.
> > > >
> > > > Here I've begun the argument with day granularity and went deeper.
> The
> > > same
> > > > argument can be started from a Yearly granularity and there the
> > different
> > > > would be more prominent.
> > > >
> > > > Regards
> > > >
> > > > On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> > > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > > wrote:
> > > >
> > > > > Say granularity is an ordered list of nodes where every node in
> list
> > is
> > > > > parent of its next node.
> > > > >
> > > > > For a System X: all granularity could be (Year => Month => Day).
> > > > > For System Y: all granularity could be (Year => Month => Day =>
> Hour
> > =>
> > > > > Minutes => Seconds => Milliseconds => Microseconds => Picosecond =>
> > > > > Femtosecond => Attosecond)
> > > > >
> > > > > As long as every node has a logic defined to compute the lower
> limit
> > > and
> > > > > upper limit value, implementation should be able to fetch all the
> > data
> > > > > under its parent node.
> > > > >
> > > > > E.G:
> > > > >
> > > > > To get all the data for 2015-10-31, implementation needs to convert
> > > this
> > > > > into 2015-10-31-LowerLimitHourNode to 2015-10-31-UpperLimitHourNode
> > to
> > > > > fetch data from hourly partitions, where lower limit hour = 00 and
> > > upper
> > > > > limit hour = 23. Similarly any granularity can be handled at
> > > > implementation
> > > > > level.
> > > > >
> > > > >
> > > > > I guess a banking system in real world would be worried about very
> > fine
> > > > > granularities of time.
> > > > >
> > > > > Even if we assume banking system has day as the finest
> granularity. A
> > > > user
> > > > > can still ask for statement from the month of April, 2015 to
> October,
> > > > 2015
> > > > > without specifying the specific date fields. In this case bank will
> > > have
> > > > to
> > > > > execute the logic to fetch all transactions from April to October
> > with
> > > > both
> > > > > inclusive and also handling its finest granularity which is assumed
> > as
> > > > day.
> > > > > In this case a banking system would convert 2015-04 to 2015-10 into
> > > > > 2015-04-01 to 2015-10-31 to fetch all data from daily partitions,
> > where
> > > > 01
> > > > > is the lower limit for day node in month of April and 31 is the
> > higher
> > > > > limit for day node in month of October.
> > > > >
> > > > > If data is not available in daily partitions and bank's underlying
> > > system
> > > > > supports further fine time granularities, it will convert 2015-04
> to
> > > > > 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from
> hourly
> > > > > partitions.
> > > > >
> > > > > Similarly if data is not available at hourly partitions, it will go
> > to
> > > > the
> > > > > next level. As long as an upper limit and lower limit is defined
> for
> > > each
> > > > > node, implementation can handle any time granularity.
> > > > >
> > > > > Meta point is that in real world, when a user will communicate to
> get
> > > > > statement of April to October. His intentions will never be to
> > exclude
> > > > > October. Exclusion of October will be a surprise for him.
> > > > >
> > > > > Regards,
> > > > > Himanshu
> > > > >
> > > > > On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
> > amareshwari@gmail.com <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > When a system has to understand time at all granularities, and
> user
> > > can
> > > > > > give time at all granularities, shouldn't we chose a
> specification
> > > > which
> > > > > is
> > > > > > easy and error prone.
> > > > > >
> > > > > > The example of banking system is not worried about finer
> > > granularities
> > > > of
> > > > > > time.
> > > > > >
> > > > > > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > > > > > himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
> > > > > >
> > > > > > > This has two parts:
> > > > > > >
> > > > > > > (1) Interfacing with user shall be cognitive. Something which
> > user
> > > > can
> > > > > > > relate to his natural thought process.
> > > > > > >
> > > > > > > E.g.: In a banking system, if a user goes to a banker to get
> his
> > > bank
> > > > > > > statement. He will say "give me statement from 11th October
> 2015
> > to
> > > > > 23rd
> > > > > > > October 2015". This clearly communicates in simplified language
> > > that
> > > > > user
> > > > > > > is interested in bank statement from 11th October to 23rd
> > October.
> > > If
> > > > > > > banker now does not return data of 23rd October, it will be a
> > > > surprise
> > > > > > for
> > > > > > > user.
> > > > > > >
> > > > > > > Banker can inform every user that To date is exclusive, so if
> you
> > > > need
> > > > > > > statement till 23rd October you should come and tell me "give
> me
> > > > > > statement
> > > > > > > from 11th October to 24th October". I believe a user will
> > > immediately
> > > > > > reply
> > > > > > > => If I need statement only till 23rd October, why shall I
> > > > communicate
> > > > > > 11th
> > > > > > > October to 24th October. If you still insist, next time the
> user
> > > will
> > > > > > come
> > > > > > > and say "give me statement from 11th October to 24th October
> with
> > > > 24th
> > > > > > > exclusive". System just made it harder for user to communicate
> > with
> > > > > > banker.
> > > > > > >
> > > > > > > (2) Implementation to achieve cognitive behaviour
> > > > > > >
> > > > > > > With To date inclusive, banker needs to make sure that it gives
> > all
> > > > > > > transaction records for 23rd October 2015 (To date) at the
> finest
> > > > > > > granularity available with the bank. This will satisfy users
> > > > > requirement
> > > > > > > without any element of surprise.
> > > > > > >
> > > > > > > If the finest granularity is millisecond. It will have a higher
> > > limit
> > > > > to
> > > > > > it
> > > > > > > which is 999 in most of the systems. Banker will fetch all
> > > partitions
> > > > > > until
> > > > > > > higher limit of finest granularity for 23rd October and return
> > the
> > > > > > > statement to user.
> > > > > > >
> > > > > > > If there was a leap second/millisecond, it is upto the banker
> to
> > > > handle
> > > > > > and
> > > > > > > give data for it. When a user wants statement from 11th October
> > to
> > > > 23rd
> > > > > > > October, user should not worry about the leap
> second/millisecond
> > > > which
> > > > > > > happened on any of the days from 11th October to 23rd October
> > > > > > (inclusive).
> > > > > > >
> > > > > > > Regards,
> > > > > > > Himanshu
> > > > > > >
> > > > > > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> > > > amareshwari@gmail.com <javascript:;>
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I prefer time-range end date to be exclusive of the end date,
> > > > because
> > > > > > of
> > > > > > > > the following reasons:
> > > > > > > >
> > > > > > > > 1. Giving a month range end as start of next month is easy,
> as
> > > you
> > > > > need
> > > > > > > not
> > > > > > > > remember and get confused with number of days in month : 30,
> > 31,
> > > > > 28/29
> > > > > > > (in
> > > > > > > > february leap year) - wont miss ending days because of
> > mistakes.
> > > > > > > >
> > > > > > > > 2. When you are giving a any range, you need not worry about
> > what
> > > > > > > > granularity should the time be specified.
> > > > > > > > For ex, for including time upto 2015-08-01 05th.
> > > > > > > >
> > > > > > > > Should user specify the following with inclusive ranges?
> > > > > > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th
> hour
> > > will
> > > > > be
> > > > > > > > included)
> > > > > > > > > 2015-08-01-04-59 (would be correct, but what if there is
> leap
> > > > > second
> > > > > > > :p )
> > > > > > > > > 2015-08-01-04-59-000 (wrong, would include only 000th milli
> > > > second
> > > > > > and
> > > > > > > > not others)
> > > > > > > > > 2015-08-01-04-59-999 (not sure if we will have leap milli
> > > > seconds)
> > > > > > > >
> > > > > > > > Looking forward to see other arguments on making timerange
> end
> > > date
> > > > > > > > inclusive.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Amareshwari
> > > > > > > >
> > > > > > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> > > > prongs@apache.org <javascript:;>>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Currently the time_range_in clause is inclusive of the
> start
> > > date
> > > > > and
> > > > > > > > > exclusive of the end date. i.e.
> time_range_in(time_dimension,
> > > > > date1,
> > > > > > > > date2)
> > > > > > > > > translates to date1<=time_dimension<date2
> > > > > > > > >
> > > > > > > > > Recently there have been multiple discussions regarding
> this
> > > > > approach
> > > > > > > in
> > > > > > > > > our internal forums at InMobi. There have been arguments
> that
> > > as
> > > > an
> > > > > > > > > analytics system, it should accept end date inclusive.
> > Starting
> > > > > this
> > > > > > > > thread
> > > > > > > > > to know opinions of other people regarding both approaches.
> > I'm
> > > > > > > defining
> > > > > > > > > and briefly discussing both approaches below:
> > > > > > > > >
> > > > > > > > > Approach 1: End date exclusive.
> > > > > > > > > Disclaimer: I personally prefer this approach and would
> > > > shamelessly
> > > > > > > argue
> > > > > > > > > in its favor. I'll give links to some of the articles that
> I
> > > very
> > > > > > much
> > > > > > > > > agree with (no need to reinvent the wheel of argument)
> > > > > > > > >
> > > > > > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS;
> > Cofounder
> > > > of
> > > > > > > > Ksplice,
> > > > > > > > >    Inc
> > > > > > > > >    <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > > > > > >
> > > > > > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's
> > algorithm
> > > > > > > > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > > > > > >    3. Both arguments are for generic ranges. I believe time
> > > range
> > > > > is
> > > > > > > > also a
> > > > > > > > >    range and don't see why we need to handle that as a
> > special
> > > > > case.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Approach 2: End date inclusive.
> > > > > > > > >
> > > > > > > > >    1. Lens query language is a bit like sql and the BETWEEN
> > > > clause
> > > > > in
> > > > > > > sql
> > > > > > > > >    is inclusive of both start and end date.
> > > > > > > > >    2. Most analytic systems --- including google analytics
> > ---
> > > > > follow
> > > > > > > > this
> > > > > > > > >    convention
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm posting the question on a bigger forum
> > > > > > > > > <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > > > > > >
> > > > > > > > > as well in hopes of getting more opinions. I request
> everyone
> > > to
> > > > > post
> > > > > > > > their
> > > > > > > > > views here regarding both approaches.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > _____________________________________________________________
> > > > > > > The information contained in this communication is intended
> > solely
> > > > for
> > > > > > the
> > > > > > > use of the individual or entity to whom it is addressed and
> > others
> > > > > > > authorized to receive it. It may contain confidential or
> legally
> > > > > > privileged
> > > > > > > information. If you are not the intended recipient you are
> hereby
> > > > > > notified
> > > > > > > that any disclosure, copying, distribution or taking any action
> > in
> > > > > > reliance
> > > > > > > on the contents of this information is strictly prohibited and
> > may
> > > be
> > > > > > > unlawful. If you have received this communication in error,
> > please
> > > > > notify
> > > > > > > us immediately by responding to this email and then delete it
> > from
> > > > your
> > > > > > > system. The firm is neither liable for the proper and complete
> > > > > > transmission
> > > > > > > of the information contained in this communication nor for any
> > > delay
> > > > in
> > > > > > its
> > > > > > > receipt.
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > _____________________________________________________________
> > > > > The information contained in this communication is intended solely
> > for
> > > > the
> > > > > use of the individual or entity to whom it is addressed and others
> > > > > authorized to receive it. It may contain confidential or legally
> > > > privileged
> > > > > information. If you are not the intended recipient you are hereby
> > > > notified
> > > > > that any disclosure, copying, distribution or taking any action in
> > > > reliance
> > > > > on the contents of this information is strictly prohibited and may
> be
> > > > > unlawful. If you have received this communication in error, please
> > > notify
> > > > > us immediately by responding to this email and then delete it from
> > your
> > > > > system. The firm is neither liable for the proper and complete
> > > > transmission
> > > > > of the information contained in this communication nor for any
> delay
> > in
> > > > its
> > > > > receipt.
> > > > >
> > > >
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by Himanshu Gahlaut <hi...@inmobi.com>.
Have given a real world use case of banking system where making to
field exclusive makes the communication harder for user.

Please give a real world use case which makes things harder for user by
making to field inclusive OR a real world use case which is impossible to
implement by making to field inclusive. So far the points shared below for
making to date exclusive are hard to understand and it is difficult to
comprehend the problem which is coming in way of making to date inclusive.

On Wednesday, August 5, 2015, Rajat Khandelwal <ra...@gmail.com>
wrote:

> Time range consists of a start time and an end time. If start time and end
> time are same, then we would say the ranges are equal.
>
> The three ranges in my example are same by this definition of equality.
> While approach 2 hinders that.
>
>
> On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <
> himanshu.gahlaut@inmobi.com <javascript:;>>
> wrote:
>
> > Why is this a problem ?
> >
> > On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <rajatgupta59@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Problem with that is, that the meaning of range changes with
> granularity.
> > > While with approach 1, it's not the case.
> > >
> > > A Range of 2015-01-01 to 2015-01-02 would mean 2 days, 2015-01-01-00 to
> > > 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> > > 2015-01-02-00-00 would mean 2 days and 1 minute, and
> 2015-01-01-00-00-00
> > to
> > > 2015-01-02-00-00-00 would mean 2 days and 1 second.
> > >
> > > While approach 1 has no such problem. A given range doesn't change
> with a
> > > change in granularity. Every one of these examples would mean 1 day.
> > >
> > > Here I've begun the argument with day granularity and went deeper. The
> > same
> > > argument can be started from a Yearly granularity and there the
> different
> > > would be more prominent.
> > >
> > > Regards
> > >
> > > On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> > > himanshu.gahlaut@inmobi.com <javascript:;>>
> > > wrote:
> > >
> > > > Say granularity is an ordered list of nodes where every node in list
> is
> > > > parent of its next node.
> > > >
> > > > For a System X: all granularity could be (Year => Month => Day).
> > > > For System Y: all granularity could be (Year => Month => Day => Hour
> =>
> > > > Minutes => Seconds => Milliseconds => Microseconds => Picosecond =>
> > > > Femtosecond => Attosecond)
> > > >
> > > > As long as every node has a logic defined to compute the lower limit
> > and
> > > > upper limit value, implementation should be able to fetch all the
> data
> > > > under its parent node.
> > > >
> > > > E.G:
> > > >
> > > > To get all the data for 2015-10-31, implementation needs to convert
> > this
> > > > into 2015-10-31-LowerLimitHourNode to 2015-10-31-UpperLimitHourNode
> to
> > > > fetch data from hourly partitions, where lower limit hour = 00 and
> > upper
> > > > limit hour = 23. Similarly any granularity can be handled at
> > > implementation
> > > > level.
> > > >
> > > >
> > > > I guess a banking system in real world would be worried about very
> fine
> > > > granularities of time.
> > > >
> > > > Even if we assume banking system has day as the finest granularity. A
> > > user
> > > > can still ask for statement from the month of April, 2015 to October,
> > > 2015
> > > > without specifying the specific date fields. In this case bank will
> > have
> > > to
> > > > execute the logic to fetch all transactions from April to October
> with
> > > both
> > > > inclusive and also handling its finest granularity which is assumed
> as
> > > day.
> > > > In this case a banking system would convert 2015-04 to 2015-10 into
> > > > 2015-04-01 to 2015-10-31 to fetch all data from daily partitions,
> where
> > > 01
> > > > is the lower limit for day node in month of April and 31 is the
> higher
> > > > limit for day node in month of October.
> > > >
> > > > If data is not available in daily partitions and bank's underlying
> > system
> > > > supports further fine time granularities, it will convert 2015-04 to
> > > > 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from hourly
> > > > partitions.
> > > >
> > > > Similarly if data is not available at hourly partitions, it will go
> to
> > > the
> > > > next level. As long as an upper limit and lower limit is defined for
> > each
> > > > node, implementation can handle any time granularity.
> > > >
> > > > Meta point is that in real world, when a user will communicate to get
> > > > statement of April to October. His intentions will never be to
> exclude
> > > > October. Exclusion of October will be a surprise for him.
> > > >
> > > > Regards,
> > > > Himanshu
> > > >
> > > > On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <
> amareshwari@gmail.com <javascript:;>
> > >
> > > > wrote:
> > > >
> > > > > When a system has to understand time at all granularities, and user
> > can
> > > > > give time at all granularities, shouldn't we chose a specification
> > > which
> > > > is
> > > > > easy and error prone.
> > > > >
> > > > > The example of banking system is not worried about finer
> > granularities
> > > of
> > > > > time.
> > > > >
> > > > > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > > > > himanshu.gahlaut@inmobi.com <javascript:;>> wrote:
> > > > >
> > > > > > This has two parts:
> > > > > >
> > > > > > (1) Interfacing with user shall be cognitive. Something which
> user
> > > can
> > > > > > relate to his natural thought process.
> > > > > >
> > > > > > E.g.: In a banking system, if a user goes to a banker to get his
> > bank
> > > > > > statement. He will say "give me statement from 11th October 2015
> to
> > > > 23rd
> > > > > > October 2015". This clearly communicates in simplified language
> > that
> > > > user
> > > > > > is interested in bank statement from 11th October to 23rd
> October.
> > If
> > > > > > banker now does not return data of 23rd October, it will be a
> > > surprise
> > > > > for
> > > > > > user.
> > > > > >
> > > > > > Banker can inform every user that To date is exclusive, so if you
> > > need
> > > > > > statement till 23rd October you should come and tell me "give me
> > > > > statement
> > > > > > from 11th October to 24th October". I believe a user will
> > immediately
> > > > > reply
> > > > > > => If I need statement only till 23rd October, why shall I
> > > communicate
> > > > > 11th
> > > > > > October to 24th October. If you still insist, next time the user
> > will
> > > > > come
> > > > > > and say "give me statement from 11th October to 24th October with
> > > 24th
> > > > > > exclusive". System just made it harder for user to communicate
> with
> > > > > banker.
> > > > > >
> > > > > > (2) Implementation to achieve cognitive behaviour
> > > > > >
> > > > > > With To date inclusive, banker needs to make sure that it gives
> all
> > > > > > transaction records for 23rd October 2015 (To date) at the finest
> > > > > > granularity available with the bank. This will satisfy users
> > > > requirement
> > > > > > without any element of surprise.
> > > > > >
> > > > > > If the finest granularity is millisecond. It will have a higher
> > limit
> > > > to
> > > > > it
> > > > > > which is 999 in most of the systems. Banker will fetch all
> > partitions
> > > > > until
> > > > > > higher limit of finest granularity for 23rd October and return
> the
> > > > > > statement to user.
> > > > > >
> > > > > > If there was a leap second/millisecond, it is upto the banker to
> > > handle
> > > > > and
> > > > > > give data for it. When a user wants statement from 11th October
> to
> > > 23rd
> > > > > > October, user should not worry about the leap second/millisecond
> > > which
> > > > > > happened on any of the days from 11th October to 23rd October
> > > > > (inclusive).
> > > > > >
> > > > > > Regards,
> > > > > > Himanshu
> > > > > >
> > > > > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> > > amareshwari@gmail.com <javascript:;>
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I prefer time-range end date to be exclusive of the end date,
> > > because
> > > > > of
> > > > > > > the following reasons:
> > > > > > >
> > > > > > > 1. Giving a month range end as start of next month is easy, as
> > you
> > > > need
> > > > > > not
> > > > > > > remember and get confused with number of days in month : 30,
> 31,
> > > > 28/29
> > > > > > (in
> > > > > > > february leap year) - wont miss ending days because of
> mistakes.
> > > > > > >
> > > > > > > 2. When you are giving a any range, you need not worry about
> what
> > > > > > > granularity should the time be specified.
> > > > > > > For ex, for including time upto 2015-08-01 05th.
> > > > > > >
> > > > > > > Should user specify the following with inclusive ranges?
> > > > > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th hour
> > will
> > > > be
> > > > > > > included)
> > > > > > > > 2015-08-01-04-59 (would be correct, but what if there is leap
> > > > second
> > > > > > :p )
> > > > > > > > 2015-08-01-04-59-000 (wrong, would include only 000th milli
> > > second
> > > > > and
> > > > > > > not others)
> > > > > > > > 2015-08-01-04-59-999 (not sure if we will have leap milli
> > > seconds)
> > > > > > >
> > > > > > > Looking forward to see other arguments on making timerange end
> > date
> > > > > > > inclusive.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Amareshwari
> > > > > > >
> > > > > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> > > prongs@apache.org <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Currently the time_range_in clause is inclusive of the start
> > date
> > > > and
> > > > > > > > exclusive of the end date. i.e. time_range_in(time_dimension,
> > > > date1,
> > > > > > > date2)
> > > > > > > > translates to date1<=time_dimension<date2
> > > > > > > >
> > > > > > > > Recently there have been multiple discussions regarding this
> > > > approach
> > > > > > in
> > > > > > > > our internal forums at InMobi. There have been arguments that
> > as
> > > an
> > > > > > > > analytics system, it should accept end date inclusive.
> Starting
> > > > this
> > > > > > > thread
> > > > > > > > to know opinions of other people regarding both approaches.
> I'm
> > > > > > defining
> > > > > > > > and briefly discussing both approaches below:
> > > > > > > >
> > > > > > > > Approach 1: End date exclusive.
> > > > > > > > Disclaimer: I personally prefer this approach and would
> > > shamelessly
> > > > > > argue
> > > > > > > > in its favor. I'll give links to some of the articles that I
> > very
> > > > > much
> > > > > > > > agree with (no need to reinvent the wheel of argument)
> > > > > > > >
> > > > > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS;
> Cofounder
> > > of
> > > > > > > Ksplice,
> > > > > > > >    Inc
> > > > > > > >    <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > > > > >
> > > > > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's
> algorithm
> > > > > > > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > > > > >    3. Both arguments are for generic ranges. I believe time
> > range
> > > > is
> > > > > > > also a
> > > > > > > >    range and don't see why we need to handle that as a
> special
> > > > case.
> > > > > > > >
> > > > > > > >
> > > > > > > > Approach 2: End date inclusive.
> > > > > > > >
> > > > > > > >    1. Lens query language is a bit like sql and the BETWEEN
> > > clause
> > > > in
> > > > > > sql
> > > > > > > >    is inclusive of both start and end date.
> > > > > > > >    2. Most analytic systems --- including google analytics
> ---
> > > > follow
> > > > > > > this
> > > > > > > >    convention
> > > > > > > >
> > > > > > > >
> > > > > > > > I'm posting the question on a bigger forum
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > > > > >
> > > > > > > > as well in hopes of getting more opinions. I request everyone
> > to
> > > > post
> > > > > > > their
> > > > > > > > views here regarding both approaches.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > _____________________________________________________________
> > > > > > The information contained in this communication is intended
> solely
> > > for
> > > > > the
> > > > > > use of the individual or entity to whom it is addressed and
> others
> > > > > > authorized to receive it. It may contain confidential or legally
> > > > > privileged
> > > > > > information. If you are not the intended recipient you are hereby
> > > > > notified
> > > > > > that any disclosure, copying, distribution or taking any action
> in
> > > > > reliance
> > > > > > on the contents of this information is strictly prohibited and
> may
> > be
> > > > > > unlawful. If you have received this communication in error,
> please
> > > > notify
> > > > > > us immediately by responding to this email and then delete it
> from
> > > your
> > > > > > system. The firm is neither liable for the proper and complete
> > > > > transmission
> > > > > > of the information contained in this communication nor for any
> > delay
> > > in
> > > > > its
> > > > > > receipt.
> > > > > >
> > > > >
> > > >
> > > > --
> > > > _____________________________________________________________
> > > > The information contained in this communication is intended solely
> for
> > > the
> > > > use of the individual or entity to whom it is addressed and others
> > > > authorized to receive it. It may contain confidential or legally
> > > privileged
> > > > information. If you are not the intended recipient you are hereby
> > > notified
> > > > that any disclosure, copying, distribution or taking any action in
> > > reliance
> > > > on the contents of this information is strictly prohibited and may be
> > > > unlawful. If you have received this communication in error, please
> > notify
> > > > us immediately by responding to this email and then delete it from
> your
> > > > system. The firm is neither liable for the proper and complete
> > > transmission
> > > > of the information contained in this communication nor for any delay
> in
> > > its
> > > > receipt.
> > > >
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
> >
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by Rajat Khandelwal <ra...@gmail.com>.
Time range consists of a start time and an end time. If start time and end
time are same, then we would say the ranges are equal.

The three ranges in my example are same by this definition of equality.
While approach 2 hinders that.


On Wed, Aug 5, 2015 at 6:09 PM Himanshu Gahlaut <hi...@inmobi.com>
wrote:

> Why is this a problem ?
>
> On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <ra...@gmail.com>
> wrote:
>
> > Problem with that is, that the meaning of range changes with granularity.
> > While with approach 1, it's not the case.
> >
> > A Range of 2015-01-01 to 2015-01-02 would mean 2 days, 2015-01-01-00 to
> > 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> > 2015-01-02-00-00 would mean 2 days and 1 minute, and 2015-01-01-00-00-00
> to
> > 2015-01-02-00-00-00 would mean 2 days and 1 second.
> >
> > While approach 1 has no such problem. A given range doesn't change with a
> > change in granularity. Every one of these examples would mean 1 day.
> >
> > Here I've begun the argument with day granularity and went deeper. The
> same
> > argument can be started from a Yearly granularity and there the different
> > would be more prominent.
> >
> > Regards
> >
> > On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> > himanshu.gahlaut@inmobi.com>
> > wrote:
> >
> > > Say granularity is an ordered list of nodes where every node in list is
> > > parent of its next node.
> > >
> > > For a System X: all granularity could be (Year => Month => Day).
> > > For System Y: all granularity could be (Year => Month => Day => Hour =>
> > > Minutes => Seconds => Milliseconds => Microseconds => Picosecond =>
> > > Femtosecond => Attosecond)
> > >
> > > As long as every node has a logic defined to compute the lower limit
> and
> > > upper limit value, implementation should be able to fetch all the data
> > > under its parent node.
> > >
> > > E.G:
> > >
> > > To get all the data for 2015-10-31, implementation needs to convert
> this
> > > into 2015-10-31-LowerLimitHourNode to 2015-10-31-UpperLimitHourNode to
> > > fetch data from hourly partitions, where lower limit hour = 00 and
> upper
> > > limit hour = 23. Similarly any granularity can be handled at
> > implementation
> > > level.
> > >
> > >
> > > I guess a banking system in real world would be worried about very fine
> > > granularities of time.
> > >
> > > Even if we assume banking system has day as the finest granularity. A
> > user
> > > can still ask for statement from the month of April, 2015 to October,
> > 2015
> > > without specifying the specific date fields. In this case bank will
> have
> > to
> > > execute the logic to fetch all transactions from April to October with
> > both
> > > inclusive and also handling its finest granularity which is assumed as
> > day.
> > > In this case a banking system would convert 2015-04 to 2015-10 into
> > > 2015-04-01 to 2015-10-31 to fetch all data from daily partitions, where
> > 01
> > > is the lower limit for day node in month of April and 31 is the higher
> > > limit for day node in month of October.
> > >
> > > If data is not available in daily partitions and bank's underlying
> system
> > > supports further fine time granularities, it will convert 2015-04 to
> > > 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from hourly
> > > partitions.
> > >
> > > Similarly if data is not available at hourly partitions, it will go to
> > the
> > > next level. As long as an upper limit and lower limit is defined for
> each
> > > node, implementation can handle any time granularity.
> > >
> > > Meta point is that in real world, when a user will communicate to get
> > > statement of April to October. His intentions will never be to exclude
> > > October. Exclusion of October will be a surprise for him.
> > >
> > > Regards,
> > > Himanshu
> > >
> > > On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <amareshwari@gmail.com
> >
> > > wrote:
> > >
> > > > When a system has to understand time at all granularities, and user
> can
> > > > give time at all granularities, shouldn't we chose a specification
> > which
> > > is
> > > > easy and error prone.
> > > >
> > > > The example of banking system is not worried about finer
> granularities
> > of
> > > > time.
> > > >
> > > > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > > > himanshu.gahlaut@inmobi.com> wrote:
> > > >
> > > > > This has two parts:
> > > > >
> > > > > (1) Interfacing with user shall be cognitive. Something which user
> > can
> > > > > relate to his natural thought process.
> > > > >
> > > > > E.g.: In a banking system, if a user goes to a banker to get his
> bank
> > > > > statement. He will say "give me statement from 11th October 2015 to
> > > 23rd
> > > > > October 2015". This clearly communicates in simplified language
> that
> > > user
> > > > > is interested in bank statement from 11th October to 23rd October.
> If
> > > > > banker now does not return data of 23rd October, it will be a
> > surprise
> > > > for
> > > > > user.
> > > > >
> > > > > Banker can inform every user that To date is exclusive, so if you
> > need
> > > > > statement till 23rd October you should come and tell me "give me
> > > > statement
> > > > > from 11th October to 24th October". I believe a user will
> immediately
> > > > reply
> > > > > => If I need statement only till 23rd October, why shall I
> > communicate
> > > > 11th
> > > > > October to 24th October. If you still insist, next time the user
> will
> > > > come
> > > > > and say "give me statement from 11th October to 24th October with
> > 24th
> > > > > exclusive". System just made it harder for user to communicate with
> > > > banker.
> > > > >
> > > > > (2) Implementation to achieve cognitive behaviour
> > > > >
> > > > > With To date inclusive, banker needs to make sure that it gives all
> > > > > transaction records for 23rd October 2015 (To date) at the finest
> > > > > granularity available with the bank. This will satisfy users
> > > requirement
> > > > > without any element of surprise.
> > > > >
> > > > > If the finest granularity is millisecond. It will have a higher
> limit
> > > to
> > > > it
> > > > > which is 999 in most of the systems. Banker will fetch all
> partitions
> > > > until
> > > > > higher limit of finest granularity for 23rd October and return the
> > > > > statement to user.
> > > > >
> > > > > If there was a leap second/millisecond, it is upto the banker to
> > handle
> > > > and
> > > > > give data for it. When a user wants statement from 11th October to
> > 23rd
> > > > > October, user should not worry about the leap second/millisecond
> > which
> > > > > happened on any of the days from 11th October to 23rd October
> > > > (inclusive).
> > > > >
> > > > > Regards,
> > > > > Himanshu
> > > > >
> > > > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> > amareshwari@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I prefer time-range end date to be exclusive of the end date,
> > because
> > > > of
> > > > > > the following reasons:
> > > > > >
> > > > > > 1. Giving a month range end as start of next month is easy, as
> you
> > > need
> > > > > not
> > > > > > remember and get confused with number of days in month : 30, 31,
> > > 28/29
> > > > > (in
> > > > > > february leap year) - wont miss ending days because of mistakes.
> > > > > >
> > > > > > 2. When you are giving a any range, you need not worry about what
> > > > > > granularity should the time be specified.
> > > > > > For ex, for including time upto 2015-08-01 05th.
> > > > > >
> > > > > > Should user specify the following with inclusive ranges?
> > > > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th hour
> will
> > > be
> > > > > > included)
> > > > > > > 2015-08-01-04-59 (would be correct, but what if there is leap
> > > second
> > > > > :p )
> > > > > > > 2015-08-01-04-59-000 (wrong, would include only 000th milli
> > second
> > > > and
> > > > > > not others)
> > > > > > > 2015-08-01-04-59-999 (not sure if we will have leap milli
> > seconds)
> > > > > >
> > > > > > Looking forward to see other arguments on making timerange end
> date
> > > > > > inclusive.
> > > > > >
> > > > > > Thanks
> > > > > > Amareshwari
> > > > > >
> > > > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> > prongs@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Currently the time_range_in clause is inclusive of the start
> date
> > > and
> > > > > > > exclusive of the end date. i.e. time_range_in(time_dimension,
> > > date1,
> > > > > > date2)
> > > > > > > translates to date1<=time_dimension<date2
> > > > > > >
> > > > > > > Recently there have been multiple discussions regarding this
> > > approach
> > > > > in
> > > > > > > our internal forums at InMobi. There have been arguments that
> as
> > an
> > > > > > > analytics system, it should accept end date inclusive. Starting
> > > this
> > > > > > thread
> > > > > > > to know opinions of other people regarding both approaches. I'm
> > > > > defining
> > > > > > > and briefly discussing both approaches below:
> > > > > > >
> > > > > > > Approach 1: End date exclusive.
> > > > > > > Disclaimer: I personally prefer this approach and would
> > shamelessly
> > > > > argue
> > > > > > > in its favor. I'll give links to some of the articles that I
> very
> > > > much
> > > > > > > agree with (no need to reinvent the wheel of argument)
> > > > > > >
> > > > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder
> > of
> > > > > > Ksplice,
> > > > > > >    Inc
> > > > > > >    <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > > > >
> > > > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
> > > > > > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > > > >    3. Both arguments are for generic ranges. I believe time
> range
> > > is
> > > > > > also a
> > > > > > >    range and don't see why we need to handle that as a special
> > > case.
> > > > > > >
> > > > > > >
> > > > > > > Approach 2: End date inclusive.
> > > > > > >
> > > > > > >    1. Lens query language is a bit like sql and the BETWEEN
> > clause
> > > in
> > > > > sql
> > > > > > >    is inclusive of both start and end date.
> > > > > > >    2. Most analytic systems --- including google analytics ---
> > > follow
> > > > > > this
> > > > > > >    convention
> > > > > > >
> > > > > > >
> > > > > > > I'm posting the question on a bigger forum
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > > > >
> > > > > > > as well in hopes of getting more opinions. I request everyone
> to
> > > post
> > > > > > their
> > > > > > > views here regarding both approaches.
> > > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > _____________________________________________________________
> > > > > The information contained in this communication is intended solely
> > for
> > > > the
> > > > > use of the individual or entity to whom it is addressed and others
> > > > > authorized to receive it. It may contain confidential or legally
> > > > privileged
> > > > > information. If you are not the intended recipient you are hereby
> > > > notified
> > > > > that any disclosure, copying, distribution or taking any action in
> > > > reliance
> > > > > on the contents of this information is strictly prohibited and may
> be
> > > > > unlawful. If you have received this communication in error, please
> > > notify
> > > > > us immediately by responding to this email and then delete it from
> > your
> > > > > system. The firm is neither liable for the proper and complete
> > > > transmission
> > > > > of the information contained in this communication nor for any
> delay
> > in
> > > > its
> > > > > receipt.
> > > > >
> > > >
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Re: [DISCUSS] Time Ranges in Lens

Posted by Himanshu Gahlaut <hi...@inmobi.com>.
Why is this a problem ?

On Wed, Aug 5, 2015 at 6:04 PM, Rajat Khandelwal <ra...@gmail.com>
wrote:

> Problem with that is, that the meaning of range changes with granularity.
> While with approach 1, it's not the case.
>
> A Range of 2015-01-01 to 2015-01-02 would mean 2 days, 2015-01-01-00 to
> 2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
> 2015-01-02-00-00 would mean 2 days and 1 minute, and 2015-01-01-00-00-00 to
> 2015-01-02-00-00-00 would mean 2 days and 1 second.
>
> While approach 1 has no such problem. A given range doesn't change with a
> change in granularity. Every one of these examples would mean 1 day.
>
> Here I've begun the argument with day granularity and went deeper. The same
> argument can be started from a Yearly granularity and there the different
> would be more prominent.
>
> Regards
>
> On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <
> himanshu.gahlaut@inmobi.com>
> wrote:
>
> > Say granularity is an ordered list of nodes where every node in list is
> > parent of its next node.
> >
> > For a System X: all granularity could be (Year => Month => Day).
> > For System Y: all granularity could be (Year => Month => Day => Hour =>
> > Minutes => Seconds => Milliseconds => Microseconds => Picosecond =>
> > Femtosecond => Attosecond)
> >
> > As long as every node has a logic defined to compute the lower limit and
> > upper limit value, implementation should be able to fetch all the data
> > under its parent node.
> >
> > E.G:
> >
> > To get all the data for 2015-10-31, implementation needs to convert this
> > into 2015-10-31-LowerLimitHourNode to 2015-10-31-UpperLimitHourNode to
> > fetch data from hourly partitions, where lower limit hour = 00 and upper
> > limit hour = 23. Similarly any granularity can be handled at
> implementation
> > level.
> >
> >
> > I guess a banking system in real world would be worried about very fine
> > granularities of time.
> >
> > Even if we assume banking system has day as the finest granularity. A
> user
> > can still ask for statement from the month of April, 2015 to October,
> 2015
> > without specifying the specific date fields. In this case bank will have
> to
> > execute the logic to fetch all transactions from April to October with
> both
> > inclusive and also handling its finest granularity which is assumed as
> day.
> > In this case a banking system would convert 2015-04 to 2015-10 into
> > 2015-04-01 to 2015-10-31 to fetch all data from daily partitions, where
> 01
> > is the lower limit for day node in month of April and 31 is the higher
> > limit for day node in month of October.
> >
> > If data is not available in daily partitions and bank's underlying system
> > supports further fine time granularities, it will convert 2015-04 to
> > 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from hourly
> > partitions.
> >
> > Similarly if data is not available at hourly partitions, it will go to
> the
> > next level. As long as an upper limit and lower limit is defined for each
> > node, implementation can handle any time granularity.
> >
> > Meta point is that in real world, when a user will communicate to get
> > statement of April to October. His intentions will never be to exclude
> > October. Exclusion of October will be a surprise for him.
> >
> > Regards,
> > Himanshu
> >
> > On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <am...@gmail.com>
> > wrote:
> >
> > > When a system has to understand time at all granularities, and user can
> > > give time at all granularities, shouldn't we chose a specification
> which
> > is
> > > easy and error prone.
> > >
> > > The example of banking system is not worried about finer granularities
> of
> > > time.
> > >
> > > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > > himanshu.gahlaut@inmobi.com> wrote:
> > >
> > > > This has two parts:
> > > >
> > > > (1) Interfacing with user shall be cognitive. Something which user
> can
> > > > relate to his natural thought process.
> > > >
> > > > E.g.: In a banking system, if a user goes to a banker to get his bank
> > > > statement. He will say "give me statement from 11th October 2015 to
> > 23rd
> > > > October 2015". This clearly communicates in simplified language that
> > user
> > > > is interested in bank statement from 11th October to 23rd October. If
> > > > banker now does not return data of 23rd October, it will be a
> surprise
> > > for
> > > > user.
> > > >
> > > > Banker can inform every user that To date is exclusive, so if you
> need
> > > > statement till 23rd October you should come and tell me "give me
> > > statement
> > > > from 11th October to 24th October". I believe a user will immediately
> > > reply
> > > > => If I need statement only till 23rd October, why shall I
> communicate
> > > 11th
> > > > October to 24th October. If you still insist, next time the user will
> > > come
> > > > and say "give me statement from 11th October to 24th October with
> 24th
> > > > exclusive". System just made it harder for user to communicate with
> > > banker.
> > > >
> > > > (2) Implementation to achieve cognitive behaviour
> > > >
> > > > With To date inclusive, banker needs to make sure that it gives all
> > > > transaction records for 23rd October 2015 (To date) at the finest
> > > > granularity available with the bank. This will satisfy users
> > requirement
> > > > without any element of surprise.
> > > >
> > > > If the finest granularity is millisecond. It will have a higher limit
> > to
> > > it
> > > > which is 999 in most of the systems. Banker will fetch all partitions
> > > until
> > > > higher limit of finest granularity for 23rd October and return the
> > > > statement to user.
> > > >
> > > > If there was a leap second/millisecond, it is upto the banker to
> handle
> > > and
> > > > give data for it. When a user wants statement from 11th October to
> 23rd
> > > > October, user should not worry about the leap second/millisecond
> which
> > > > happened on any of the days from 11th October to 23rd October
> > > (inclusive).
> > > >
> > > > Regards,
> > > > Himanshu
> > > >
> > > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <
> amareshwari@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > I prefer time-range end date to be exclusive of the end date,
> because
> > > of
> > > > > the following reasons:
> > > > >
> > > > > 1. Giving a month range end as start of next month is easy, as you
> > need
> > > > not
> > > > > remember and get confused with number of days in month : 30, 31,
> > 28/29
> > > > (in
> > > > > february leap year) - wont miss ending days because of mistakes.
> > > > >
> > > > > 2. When you are giving a any range, you need not worry about what
> > > > > granularity should the time be specified.
> > > > > For ex, for including time upto 2015-08-01 05th.
> > > > >
> > > > > Should user specify the following with inclusive ranges?
> > > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th hour will
> > be
> > > > > included)
> > > > > > 2015-08-01-04-59 (would be correct, but what if there is leap
> > second
> > > > :p )
> > > > > > 2015-08-01-04-59-000 (wrong, would include only 000th milli
> second
> > > and
> > > > > not others)
> > > > > > 2015-08-01-04-59-999 (not sure if we will have leap milli
> seconds)
> > > > >
> > > > > Looking forward to see other arguments on making timerange end date
> > > > > inclusive.
> > > > >
> > > > > Thanks
> > > > > Amareshwari
> > > > >
> > > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <
> prongs@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Currently the time_range_in clause is inclusive of the start date
> > and
> > > > > > exclusive of the end date. i.e. time_range_in(time_dimension,
> > date1,
> > > > > date2)
> > > > > > translates to date1<=time_dimension<date2
> > > > > >
> > > > > > Recently there have been multiple discussions regarding this
> > approach
> > > > in
> > > > > > our internal forums at InMobi. There have been arguments that as
> an
> > > > > > analytics system, it should accept end date inclusive. Starting
> > this
> > > > > thread
> > > > > > to know opinions of other people regarding both approaches. I'm
> > > > defining
> > > > > > and briefly discussing both approaches below:
> > > > > >
> > > > > > Approach 1: End date exclusive.
> > > > > > Disclaimer: I personally prefer this approach and would
> shamelessly
> > > > argue
> > > > > > in its favor. I'll give links to some of the articles that I very
> > > much
> > > > > > agree with (no need to reinvent the wheel of argument)
> > > > > >
> > > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder
> of
> > > > > Ksplice,
> > > > > >    Inc
> > > > > >    <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > > >
> > > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
> > > > > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > > >    3. Both arguments are for generic ranges. I believe time range
> > is
> > > > > also a
> > > > > >    range and don't see why we need to handle that as a special
> > case.
> > > > > >
> > > > > >
> > > > > > Approach 2: End date inclusive.
> > > > > >
> > > > > >    1. Lens query language is a bit like sql and the BETWEEN
> clause
> > in
> > > > sql
> > > > > >    is inclusive of both start and end date.
> > > > > >    2. Most analytic systems --- including google analytics ---
> > follow
> > > > > this
> > > > > >    convention
> > > > > >
> > > > > >
> > > > > > I'm posting the question on a bigger forum
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > > >
> > > > > > as well in hopes of getting more opinions. I request everyone to
> > post
> > > > > their
> > > > > > views here regarding both approaches.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > >
> > > >
> > > > --
> > > > _____________________________________________________________
> > > > The information contained in this communication is intended solely
> for
> > > the
> > > > use of the individual or entity to whom it is addressed and others
> > > > authorized to receive it. It may contain confidential or legally
> > > privileged
> > > > information. If you are not the intended recipient you are hereby
> > > notified
> > > > that any disclosure, copying, distribution or taking any action in
> > > reliance
> > > > on the contents of this information is strictly prohibited and may be
> > > > unlawful. If you have received this communication in error, please
> > notify
> > > > us immediately by responding to this email and then delete it from
> your
> > > > system. The firm is neither liable for the proper and complete
> > > transmission
> > > > of the information contained in this communication nor for any delay
> in
> > > its
> > > > receipt.
> > > >
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
> >
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by Rajat Khandelwal <ra...@gmail.com>.
Problem with that is, that the meaning of range changes with granularity.
While with approach 1, it's not the case.

A Range of 2015-01-01 to 2015-01-02 would mean 2 days, 2015-01-01-00 to
2015-01-02-00 would mean 2 days and 1 hour, 2015-01-01-00-00 to
2015-01-02-00-00 would mean 2 days and 1 minute, and 2015-01-01-00-00-00 to
2015-01-02-00-00-00 would mean 2 days and 1 second.

While approach 1 has no such problem. A given range doesn't change with a
change in granularity. Every one of these examples would mean 1 day.

Here I've begun the argument with day granularity and went deeper. The same
argument can be started from a Yearly granularity and there the different
would be more prominent.

Regards

On Wed, Aug 5, 2015 at 5:36 PM Himanshu Gahlaut <hi...@inmobi.com>
wrote:

> Say granularity is an ordered list of nodes where every node in list is
> parent of its next node.
>
> For a System X: all granularity could be (Year => Month => Day).
> For System Y: all granularity could be (Year => Month => Day => Hour =>
> Minutes => Seconds => Milliseconds => Microseconds => Picosecond =>
> Femtosecond => Attosecond)
>
> As long as every node has a logic defined to compute the lower limit and
> upper limit value, implementation should be able to fetch all the data
> under its parent node.
>
> E.G:
>
> To get all the data for 2015-10-31, implementation needs to convert this
> into 2015-10-31-LowerLimitHourNode to 2015-10-31-UpperLimitHourNode to
> fetch data from hourly partitions, where lower limit hour = 00 and upper
> limit hour = 23. Similarly any granularity can be handled at implementation
> level.
>
>
> I guess a banking system in real world would be worried about very fine
> granularities of time.
>
> Even if we assume banking system has day as the finest granularity. A user
> can still ask for statement from the month of April, 2015 to October, 2015
> without specifying the specific date fields. In this case bank will have to
> execute the logic to fetch all transactions from April to October with both
> inclusive and also handling its finest granularity which is assumed as day.
> In this case a banking system would convert 2015-04 to 2015-10 into
> 2015-04-01 to 2015-10-31 to fetch all data from daily partitions, where 01
> is the lower limit for day node in month of April and 31 is the higher
> limit for day node in month of October.
>
> If data is not available in daily partitions and bank's underlying system
> supports further fine time granularities, it will convert 2015-04 to
> 2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from hourly
> partitions.
>
> Similarly if data is not available at hourly partitions, it will go to the
> next level. As long as an upper limit and lower limit is defined for each
> node, implementation can handle any time granularity.
>
> Meta point is that in real world, when a user will communicate to get
> statement of April to October. His intentions will never be to exclude
> October. Exclusion of October will be a surprise for him.
>
> Regards,
> Himanshu
>
> On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <am...@gmail.com>
> wrote:
>
> > When a system has to understand time at all granularities, and user can
> > give time at all granularities, shouldn't we chose a specification which
> is
> > easy and error prone.
> >
> > The example of banking system is not worried about finer granularities of
> > time.
> >
> > On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> > himanshu.gahlaut@inmobi.com> wrote:
> >
> > > This has two parts:
> > >
> > > (1) Interfacing with user shall be cognitive. Something which user can
> > > relate to his natural thought process.
> > >
> > > E.g.: In a banking system, if a user goes to a banker to get his bank
> > > statement. He will say "give me statement from 11th October 2015 to
> 23rd
> > > October 2015". This clearly communicates in simplified language that
> user
> > > is interested in bank statement from 11th October to 23rd October. If
> > > banker now does not return data of 23rd October, it will be a surprise
> > for
> > > user.
> > >
> > > Banker can inform every user that To date is exclusive, so if you need
> > > statement till 23rd October you should come and tell me "give me
> > statement
> > > from 11th October to 24th October". I believe a user will immediately
> > reply
> > > => If I need statement only till 23rd October, why shall I communicate
> > 11th
> > > October to 24th October. If you still insist, next time the user will
> > come
> > > and say "give me statement from 11th October to 24th October with 24th
> > > exclusive". System just made it harder for user to communicate with
> > banker.
> > >
> > > (2) Implementation to achieve cognitive behaviour
> > >
> > > With To date inclusive, banker needs to make sure that it gives all
> > > transaction records for 23rd October 2015 (To date) at the finest
> > > granularity available with the bank. This will satisfy users
> requirement
> > > without any element of surprise.
> > >
> > > If the finest granularity is millisecond. It will have a higher limit
> to
> > it
> > > which is 999 in most of the systems. Banker will fetch all partitions
> > until
> > > higher limit of finest granularity for 23rd October and return the
> > > statement to user.
> > >
> > > If there was a leap second/millisecond, it is upto the banker to handle
> > and
> > > give data for it. When a user wants statement from 11th October to 23rd
> > > October, user should not worry about the leap second/millisecond which
> > > happened on any of the days from 11th October to 23rd October
> > (inclusive).
> > >
> > > Regards,
> > > Himanshu
> > >
> > > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <amareshwari@gmail.com
> >
> > > wrote:
> > >
> > > > I prefer time-range end date to be exclusive of the end date, because
> > of
> > > > the following reasons:
> > > >
> > > > 1. Giving a month range end as start of next month is easy, as you
> need
> > > not
> > > > remember and get confused with number of days in month : 30, 31,
> 28/29
> > > (in
> > > > february leap year) - wont miss ending days because of mistakes.
> > > >
> > > > 2. When you are giving a any range, you need not worry about what
> > > > granularity should the time be specified.
> > > > For ex, for including time upto 2015-08-01 05th.
> > > >
> > > > Should user specify the following with inclusive ranges?
> > > > > 2015-08-01-04 (would be correct, assuming whole of 4 th hour will
> be
> > > > included)
> > > > > 2015-08-01-04-59 (would be correct, but what if there is leap
> second
> > > :p )
> > > > > 2015-08-01-04-59-000 (wrong, would include only 000th milli second
> > and
> > > > not others)
> > > > > 2015-08-01-04-59-999 (not sure if we will have leap milli seconds)
> > > >
> > > > Looking forward to see other arguments on making timerange end date
> > > > inclusive.
> > > >
> > > > Thanks
> > > > Amareshwari
> > > >
> > > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <pr...@apache.org>
> > > > wrote:
> > > >
> > > > > Currently the time_range_in clause is inclusive of the start date
> and
> > > > > exclusive of the end date. i.e. time_range_in(time_dimension,
> date1,
> > > > date2)
> > > > > translates to date1<=time_dimension<date2
> > > > >
> > > > > Recently there have been multiple discussions regarding this
> approach
> > > in
> > > > > our internal forums at InMobi. There have been arguments that as an
> > > > > analytics system, it should accept end date inclusive. Starting
> this
> > > > thread
> > > > > to know opinions of other people regarding both approaches. I'm
> > > defining
> > > > > and briefly discussing both approaches below:
> > > > >
> > > > > Approach 1: End date exclusive.
> > > > > Disclaimer: I personally prefer this approach and would shamelessly
> > > argue
> > > > > in its favor. I'll give links to some of the articles that I very
> > much
> > > > > agree with (no need to reinvent the wheel of argument)
> > > > >
> > > > >    1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder of
> > > > Ksplice,
> > > > >    Inc
> > > > >    <
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > > >
> > > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
> > > > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > > >    3. Both arguments are for generic ranges. I believe time range
> is
> > > > also a
> > > > >    range and don't see why we need to handle that as a special
> case.
> > > > >
> > > > >
> > > > > Approach 2: End date inclusive.
> > > > >
> > > > >    1. Lens query language is a bit like sql and the BETWEEN clause
> in
> > > sql
> > > > >    is inclusive of both start and end date.
> > > > >    2. Most analytic systems --- including google analytics ---
> follow
> > > > this
> > > > >    convention
> > > > >
> > > > >
> > > > > I'm posting the question on a bigger forum
> > > > > <
> > > > >
> > > >
> > >
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > > >
> > > > > as well in hopes of getting more opinions. I request everyone to
> post
> > > > their
> > > > > views here regarding both approaches.
> > > > >
> > > > > Regards
> > > > >
> > > >
> > >
> > > --
> > > _____________________________________________________________
> > > The information contained in this communication is intended solely for
> > the
> > > use of the individual or entity to whom it is addressed and others
> > > authorized to receive it. It may contain confidential or legally
> > privileged
> > > information. If you are not the intended recipient you are hereby
> > notified
> > > that any disclosure, copying, distribution or taking any action in
> > reliance
> > > on the contents of this information is strictly prohibited and may be
> > > unlawful. If you have received this communication in error, please
> notify
> > > us immediately by responding to this email and then delete it from your
> > > system. The firm is neither liable for the proper and complete
> > transmission
> > > of the information contained in this communication nor for any delay in
> > its
> > > receipt.
> > >
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Re: [DISCUSS] Time Ranges in Lens

Posted by Himanshu Gahlaut <hi...@inmobi.com>.
Say granularity is an ordered list of nodes where every node in list is
parent of its next node.

For a System X: all granularity could be (Year => Month => Day).
For System Y: all granularity could be (Year => Month => Day => Hour =>
Minutes => Seconds => Milliseconds => Microseconds => Picosecond =>
Femtosecond => Attosecond)

As long as every node has a logic defined to compute the lower limit and
upper limit value, implementation should be able to fetch all the data
under its parent node.

E.G:

To get all the data for 2015-10-31, implementation needs to convert this
into 2015-10-31-LowerLimitHourNode to 2015-10-31-UpperLimitHourNode to
fetch data from hourly partitions, where lower limit hour = 00 and upper
limit hour = 23. Similarly any granularity can be handled at implementation
level.


I guess a banking system in real world would be worried about very fine
granularities of time.

Even if we assume banking system has day as the finest granularity. A user
can still ask for statement from the month of April, 2015 to October, 2015
without specifying the specific date fields. In this case bank will have to
execute the logic to fetch all transactions from April to October with both
inclusive and also handling its finest granularity which is assumed as day.
In this case a banking system would convert 2015-04 to 2015-10 into
2015-04-01 to 2015-10-31 to fetch all data from daily partitions, where 01
is the lower limit for day node in month of April and 31 is the higher
limit for day node in month of October.

If data is not available in daily partitions and bank's underlying system
supports further fine time granularities, it will convert 2015-04 to
2015-10 into 2015-04-01-00 to 2015-10-31-23 to fetch data from hourly
partitions.

Similarly if data is not available at hourly partitions, it will go to the
next level. As long as an upper limit and lower limit is defined for each
node, implementation can handle any time granularity.

Meta point is that in real world, when a user will communicate to get
statement of April to October. His intentions will never be to exclude
October. Exclusion of October will be a surprise for him.

Regards,
Himanshu

On Wed, Aug 5, 2015 at 4:56 PM, amareshwarisr . <am...@gmail.com>
wrote:

> When a system has to understand time at all granularities, and user can
> give time at all granularities, shouldn't we chose a specification which is
> easy and error prone.
>
> The example of banking system is not worried about finer granularities of
> time.
>
> On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
> himanshu.gahlaut@inmobi.com> wrote:
>
> > This has two parts:
> >
> > (1) Interfacing with user shall be cognitive. Something which user can
> > relate to his natural thought process.
> >
> > E.g.: In a banking system, if a user goes to a banker to get his bank
> > statement. He will say "give me statement from 11th October 2015 to 23rd
> > October 2015". This clearly communicates in simplified language that user
> > is interested in bank statement from 11th October to 23rd October. If
> > banker now does not return data of 23rd October, it will be a surprise
> for
> > user.
> >
> > Banker can inform every user that To date is exclusive, so if you need
> > statement till 23rd October you should come and tell me "give me
> statement
> > from 11th October to 24th October". I believe a user will immediately
> reply
> > => If I need statement only till 23rd October, why shall I communicate
> 11th
> > October to 24th October. If you still insist, next time the user will
> come
> > and say "give me statement from 11th October to 24th October with 24th
> > exclusive". System just made it harder for user to communicate with
> banker.
> >
> > (2) Implementation to achieve cognitive behaviour
> >
> > With To date inclusive, banker needs to make sure that it gives all
> > transaction records for 23rd October 2015 (To date) at the finest
> > granularity available with the bank. This will satisfy users requirement
> > without any element of surprise.
> >
> > If the finest granularity is millisecond. It will have a higher limit to
> it
> > which is 999 in most of the systems. Banker will fetch all partitions
> until
> > higher limit of finest granularity for 23rd October and return the
> > statement to user.
> >
> > If there was a leap second/millisecond, it is upto the banker to handle
> and
> > give data for it. When a user wants statement from 11th October to 23rd
> > October, user should not worry about the leap second/millisecond which
> > happened on any of the days from 11th October to 23rd October
> (inclusive).
> >
> > Regards,
> > Himanshu
> >
> > On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <am...@gmail.com>
> > wrote:
> >
> > > I prefer time-range end date to be exclusive of the end date, because
> of
> > > the following reasons:
> > >
> > > 1. Giving a month range end as start of next month is easy, as you need
> > not
> > > remember and get confused with number of days in month : 30, 31, 28/29
> > (in
> > > february leap year) - wont miss ending days because of mistakes.
> > >
> > > 2. When you are giving a any range, you need not worry about what
> > > granularity should the time be specified.
> > > For ex, for including time upto 2015-08-01 05th.
> > >
> > > Should user specify the following with inclusive ranges?
> > > > 2015-08-01-04 (would be correct, assuming whole of 4 th hour will be
> > > included)
> > > > 2015-08-01-04-59 (would be correct, but what if there is leap second
> > :p )
> > > > 2015-08-01-04-59-000 (wrong, would include only 000th milli second
> and
> > > not others)
> > > > 2015-08-01-04-59-999 (not sure if we will have leap milli seconds)
> > >
> > > Looking forward to see other arguments on making timerange end date
> > > inclusive.
> > >
> > > Thanks
> > > Amareshwari
> > >
> > > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <pr...@apache.org>
> > > wrote:
> > >
> > > > Currently the time_range_in clause is inclusive of the start date and
> > > > exclusive of the end date. i.e. time_range_in(time_dimension, date1,
> > > date2)
> > > > translates to date1<=time_dimension<date2
> > > >
> > > > Recently there have been multiple discussions regarding this approach
> > in
> > > > our internal forums at InMobi. There have been arguments that as an
> > > > analytics system, it should accept end date inclusive. Starting this
> > > thread
> > > > to know opinions of other people regarding both approaches. I'm
> > defining
> > > > and briefly discussing both approaches below:
> > > >
> > > > Approach 1: End date exclusive.
> > > > Disclaimer: I personally prefer this approach and would shamelessly
> > argue
> > > > in its favor. I'll give links to some of the articles that I very
> much
> > > > agree with (no need to reinvent the wheel of argument)
> > > >
> > > >    1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder of
> > > Ksplice,
> > > >    Inc
> > > >    <
> > > >
> > >
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > > >
> > > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
> > > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > > >    3. Both arguments are for generic ranges. I believe time range is
> > > also a
> > > >    range and don't see why we need to handle that as a special case.
> > > >
> > > >
> > > > Approach 2: End date inclusive.
> > > >
> > > >    1. Lens query language is a bit like sql and the BETWEEN clause in
> > sql
> > > >    is inclusive of both start and end date.
> > > >    2. Most analytic systems --- including google analytics --- follow
> > > this
> > > >    convention
> > > >
> > > >
> > > > I'm posting the question on a bigger forum
> > > > <
> > > >
> > >
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > > >
> > > > as well in hopes of getting more opinions. I request everyone to post
> > > their
> > > > views here regarding both approaches.
> > > >
> > > > Regards
> > > >
> > >
> >
> > --
> > _____________________________________________________________
> > The information contained in this communication is intended solely for
> the
> > use of the individual or entity to whom it is addressed and others
> > authorized to receive it. It may contain confidential or legally
> privileged
> > information. If you are not the intended recipient you are hereby
> notified
> > that any disclosure, copying, distribution or taking any action in
> reliance
> > on the contents of this information is strictly prohibited and may be
> > unlawful. If you have received this communication in error, please notify
> > us immediately by responding to this email and then delete it from your
> > system. The firm is neither liable for the proper and complete
> transmission
> > of the information contained in this communication nor for any delay in
> its
> > receipt.
> >
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by "amareshwarisr ." <am...@gmail.com>.
When a system has to understand time at all granularities, and user can
give time at all granularities, shouldn't we chose a specification which is
easy and error prone.

The example of banking system is not worried about finer granularities of
time.

On Wed, Aug 5, 2015 at 4:33 PM, Himanshu Gahlaut <
himanshu.gahlaut@inmobi.com> wrote:

> This has two parts:
>
> (1) Interfacing with user shall be cognitive. Something which user can
> relate to his natural thought process.
>
> E.g.: In a banking system, if a user goes to a banker to get his bank
> statement. He will say "give me statement from 11th October 2015 to 23rd
> October 2015". This clearly communicates in simplified language that user
> is interested in bank statement from 11th October to 23rd October. If
> banker now does not return data of 23rd October, it will be a surprise for
> user.
>
> Banker can inform every user that To date is exclusive, so if you need
> statement till 23rd October you should come and tell me "give me statement
> from 11th October to 24th October". I believe a user will immediately reply
> => If I need statement only till 23rd October, why shall I communicate 11th
> October to 24th October. If you still insist, next time the user will come
> and say "give me statement from 11th October to 24th October with 24th
> exclusive". System just made it harder for user to communicate with banker.
>
> (2) Implementation to achieve cognitive behaviour
>
> With To date inclusive, banker needs to make sure that it gives all
> transaction records for 23rd October 2015 (To date) at the finest
> granularity available with the bank. This will satisfy users requirement
> without any element of surprise.
>
> If the finest granularity is millisecond. It will have a higher limit to it
> which is 999 in most of the systems. Banker will fetch all partitions until
> higher limit of finest granularity for 23rd October and return the
> statement to user.
>
> If there was a leap second/millisecond, it is upto the banker to handle and
> give data for it. When a user wants statement from 11th October to 23rd
> October, user should not worry about the leap second/millisecond which
> happened on any of the days from 11th October to 23rd October (inclusive).
>
> Regards,
> Himanshu
>
> On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <am...@gmail.com>
> wrote:
>
> > I prefer time-range end date to be exclusive of the end date, because of
> > the following reasons:
> >
> > 1. Giving a month range end as start of next month is easy, as you need
> not
> > remember and get confused with number of days in month : 30, 31, 28/29
> (in
> > february leap year) - wont miss ending days because of mistakes.
> >
> > 2. When you are giving a any range, you need not worry about what
> > granularity should the time be specified.
> > For ex, for including time upto 2015-08-01 05th.
> >
> > Should user specify the following with inclusive ranges?
> > > 2015-08-01-04 (would be correct, assuming whole of 4 th hour will be
> > included)
> > > 2015-08-01-04-59 (would be correct, but what if there is leap second
> :p )
> > > 2015-08-01-04-59-000 (wrong, would include only 000th milli second and
> > not others)
> > > 2015-08-01-04-59-999 (not sure if we will have leap milli seconds)
> >
> > Looking forward to see other arguments on making timerange end date
> > inclusive.
> >
> > Thanks
> > Amareshwari
> >
> > On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <pr...@apache.org>
> > wrote:
> >
> > > Currently the time_range_in clause is inclusive of the start date and
> > > exclusive of the end date. i.e. time_range_in(time_dimension, date1,
> > date2)
> > > translates to date1<=time_dimension<date2
> > >
> > > Recently there have been multiple discussions regarding this approach
> in
> > > our internal forums at InMobi. There have been arguments that as an
> > > analytics system, it should accept end date inclusive. Starting this
> > thread
> > > to know opinions of other people regarding both approaches. I'm
> defining
> > > and briefly discussing both approaches below:
> > >
> > > Approach 1: End date exclusive.
> > > Disclaimer: I personally prefer this approach and would shamelessly
> argue
> > > in its favor. I'll give links to some of the articles that I very much
> > > agree with (no need to reinvent the wheel of argument)
> > >
> > >    1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder of
> > Ksplice,
> > >    Inc
> > >    <
> > >
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > > >
> > >    2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
> > >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> > >    3. Both arguments are for generic ranges. I believe time range is
> > also a
> > >    range and don't see why we need to handle that as a special case.
> > >
> > >
> > > Approach 2: End date inclusive.
> > >
> > >    1. Lens query language is a bit like sql and the BETWEEN clause in
> sql
> > >    is inclusive of both start and end date.
> > >    2. Most analytic systems --- including google analytics --- follow
> > this
> > >    convention
> > >
> > >
> > > I'm posting the question on a bigger forum
> > > <
> > >
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > > >
> > > as well in hopes of getting more opinions. I request everyone to post
> > their
> > > views here regarding both approaches.
> > >
> > > Regards
> > >
> >
>
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>

Re: [DISCUSS] Time Ranges in Lens

Posted by Himanshu Gahlaut <hi...@inmobi.com>.
This has two parts:

(1) Interfacing with user shall be cognitive. Something which user can
relate to his natural thought process.

E.g.: In a banking system, if a user goes to a banker to get his bank
statement. He will say "give me statement from 11th October 2015 to 23rd
October 2015". This clearly communicates in simplified language that user
is interested in bank statement from 11th October to 23rd October. If
banker now does not return data of 23rd October, it will be a surprise for
user.

Banker can inform every user that To date is exclusive, so if you need
statement till 23rd October you should come and tell me "give me statement
from 11th October to 24th October". I believe a user will immediately reply
=> If I need statement only till 23rd October, why shall I communicate 11th
October to 24th October. If you still insist, next time the user will come
and say "give me statement from 11th October to 24th October with 24th
exclusive". System just made it harder for user to communicate with banker.

(2) Implementation to achieve cognitive behaviour

With To date inclusive, banker needs to make sure that it gives all
transaction records for 23rd October 2015 (To date) at the finest
granularity available with the bank. This will satisfy users requirement
without any element of surprise.

If the finest granularity is millisecond. It will have a higher limit to it
which is 999 in most of the systems. Banker will fetch all partitions until
higher limit of finest granularity for 23rd October and return the
statement to user.

If there was a leap second/millisecond, it is upto the banker to handle and
give data for it. When a user wants statement from 11th October to 23rd
October, user should not worry about the leap second/millisecond which
happened on any of the days from 11th October to 23rd October (inclusive).

Regards,
Himanshu

On Wed, Aug 5, 2015 at 3:16 PM, amareshwarisr . <am...@gmail.com>
wrote:

> I prefer time-range end date to be exclusive of the end date, because of
> the following reasons:
>
> 1. Giving a month range end as start of next month is easy, as you need not
> remember and get confused with number of days in month : 30, 31, 28/29 (in
> february leap year) - wont miss ending days because of mistakes.
>
> 2. When you are giving a any range, you need not worry about what
> granularity should the time be specified.
> For ex, for including time upto 2015-08-01 05th.
>
> Should user specify the following with inclusive ranges?
> > 2015-08-01-04 (would be correct, assuming whole of 4 th hour will be
> included)
> > 2015-08-01-04-59 (would be correct, but what if there is leap second :p )
> > 2015-08-01-04-59-000 (wrong, would include only 000th milli second and
> not others)
> > 2015-08-01-04-59-999 (not sure if we will have leap milli seconds)
>
> Looking forward to see other arguments on making timerange end date
> inclusive.
>
> Thanks
> Amareshwari
>
> On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <pr...@apache.org>
> wrote:
>
> > Currently the time_range_in clause is inclusive of the start date and
> > exclusive of the end date. i.e. time_range_in(time_dimension, date1,
> date2)
> > translates to date1<=time_dimension<date2
> >
> > Recently there have been multiple discussions regarding this approach in
> > our internal forums at InMobi. There have been arguments that as an
> > analytics system, it should accept end date inclusive. Starting this
> thread
> > to know opinions of other people regarding both approaches. I'm defining
> > and briefly discussing both approaches below:
> >
> > Approach 1: End date exclusive.
> > Disclaimer: I personally prefer this approach and would shamelessly argue
> > in its favor. I'll give links to some of the articles that I very much
> > agree with (no need to reinvent the wheel of argument)
> >
> >    1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder of
> Ksplice,
> >    Inc
> >    <
> >
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> > >
> >    2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
> >    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
> >    3. Both arguments are for generic ranges. I believe time range is
> also a
> >    range and don't see why we need to handle that as a special case.
> >
> >
> > Approach 2: End date inclusive.
> >
> >    1. Lens query language is a bit like sql and the BETWEEN clause in sql
> >    is inclusive of both start and end date.
> >    2. Most analytic systems --- including google analytics --- follow
> this
> >    convention
> >
> >
> > I'm posting the question on a bigger forum
> > <
> >
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> > >
> > as well in hopes of getting more opinions. I request everyone to post
> their
> > views here regarding both approaches.
> >
> > Regards
> >
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: [DISCUSS] Time Ranges in Lens

Posted by "amareshwarisr ." <am...@gmail.com>.
I prefer time-range end date to be exclusive of the end date, because of
the following reasons:

1. Giving a month range end as start of next month is easy, as you need not
remember and get confused with number of days in month : 30, 31, 28/29 (in
february leap year) - wont miss ending days because of mistakes.

2. When you are giving a any range, you need not worry about what
granularity should the time be specified.
For ex, for including time upto 2015-08-01 05th.

Should user specify the following with inclusive ranges?
> 2015-08-01-04 (would be correct, assuming whole of 4 th hour will be
included)
> 2015-08-01-04-59 (would be correct, but what if there is leap second :p )
> 2015-08-01-04-59-000 (wrong, would include only 000th milli second and
not others)
> 2015-08-01-04-59-999 (not sure if we will have leap milli seconds)

Looking forward to see other arguments on making timerange end date
inclusive.

Thanks
Amareshwari

On Wed, Aug 5, 2015 at 2:50 PM, Rajat Khandelwal <pr...@apache.org> wrote:

> Currently the time_range_in clause is inclusive of the start date and
> exclusive of the end date. i.e. time_range_in(time_dimension, date1, date2)
> translates to date1<=time_dimension<date2
>
> Recently there have been multiple discussions regarding this approach in
> our internal forums at InMobi. There have been arguments that as an
> analytics system, it should accept end date inclusive. Starting this thread
> to know opinions of other people regarding both approaches. I'm defining
> and briefly discussing both approaches below:
>
> Approach 1: End date exclusive.
> Disclaimer: I personally prefer this approach and would shamelessly argue
> in its favor. I'll give links to some of the articles that I very much
> agree with (no need to reinvent the wheel of argument)
>
>    1. Views of Anders Kaseorg, MIT PhD student in CS; Cofounder of Ksplice,
>    Inc
>    <
> https://www.quora.com/Why-are-Python-ranges-half-open-exclusive-instead-of-closed-inclusive/answer/Anders-Kaseorg
> >
>    2. Views of Edgar Dijkstra, Creator of Dijkstra's algorithm
>    <http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF>
>    3. Both arguments are for generic ranges. I believe time range is also a
>    range and don't see why we need to handle that as a special case.
>
>
> Approach 2: End date inclusive.
>
>    1. Lens query language is a bit like sql and the BETWEEN clause in sql
>    is inclusive of both start and end date.
>    2. Most analytic systems --- including google analytics --- follow this
>    convention
>
>
> I'm posting the question on a bigger forum
> <
> https://www.quora.com/Should-Time-ranges-in-an-analytical-system-be-half-open-or-closed
> >
> as well in hopes of getting more opinions. I request everyone to post their
> views here regarding both approaches.
>
> Regards
>