You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rya.apache.org by "Liu, Eric" <Er...@capitalone.com> on 2017/02/23 00:38:08 UTC

Timestamps and Cardinality in Queries

Hi,

Continuing from our talk earlier today I was wondering if you could provide more information about how timestamps could be queried in Rya.
Also, we are trying to support a type of query that would essentially be limiting on cardinality (different from the normal SPARQL limit because it’s for node cardinality rather than total results). I saw in one of Caleb’s talks that Rya’s query optimization involves checking cardinality first. I was wondering if there would be some way to tap into this feature for usage in queries?

Thanks,
Eric Liu
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
I'm not exactly sure what you mean by "limit per node".  I think you can
handle the limit per node in sparql, eg:

SELECT * WHERE {
  { SELECT * WHERE { ?s a <t:person> } LIMIT 100 }
  { SELECT * WHERE { ?s <p:talksTo> ?o  . ?o a <t:person>} LIMIT 10 }
}

So this would, at most, return 1000 results (at most 100 "people" who "talk
to" at most 10 other people).  Is this what you mean?

Note: I haven't tested this out yet in on an RDF repo, and I'd have to
double check how gracefully Rya handles this join.

--Aaron


On Wed, Feb 22, 2017 at 7:39 PM Liu, Eric <Er...@capitalone.com> wrote:

> Hi,
>
> Continuing from our talk earlier today I was wondering if you could
> provide more information about how timestamps could be queried in Rya.
> Also, we are trying to support a type of query that would essentially be
> limiting on cardinality (different from the normal SPARQL limit because
> it’s for node cardinality rather than total results). I saw in one of
> Caleb’s talks that Rya’s query optimization involves checking cardinality
> first. I was wondering if there would be some way to tap into this feature
> for usage in queries?
>
> Thanks,
> Eric Liu
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
Interesting…
I think I should give a little more background on what we’re trying to do. When trying to produce data lineage, we’d like to provide as many upstreams and downstreams as possible (hopefully several layers in both directions). This would mean getting a graph that is “deep” rather than “wide”. Is there any way you would suggest doing this? Our current approach is a bit hacky, filtering over predicates and going in both directions until it hits a total result limit or has nowhere to go.

On 2/23/17, 4:08 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:

    I put up a "limit per node" example here [1] and Rya behaves as I
    expected.  Strangely, OpenRDF does not :/
    
    Unfortunately, the join between the two inner "SELECT" statements is not
    the more efficient MultipleBindingSetsIterator.  Instead, the join is the
    less efficient OpenRDF Join.
    
    --Aaron
    
    [1]
    https://github.com/amihalik/sesame-debugging/blob/6a26045d89ebee1c72cc0452e2b01dc29544b85c/src/main/java/com/github/amihalik/sesame/debugging/NodeLimitExample.java
    
    On Wed, Feb 22, 2017 at 9:24 PM Meier, Caleb <Ca...@parsons.com>
    wrote:
    
    > Hey Eric,
    >
    > Currently timestamps can't be queried in Rya.  Do you need to be able to
    > query by timestamp, or simply discover the timestamp for a given node?  Rya
    > does have a temporal index, but that requires you to use a temporal
    > ontology to model the temporal properties of your graph nodes.
    > ________________________________________
    > From: Liu, Eric <Er...@capitalone.com>
    > Sent: Wednesday, February 22, 2017 6:38 PM
    > To: dev@rya.incubator.apache.org
    > Subject: Timestamps and Cardinality in Queries
    >
    > Hi,
    >
    > Continuing from our talk earlier today I was wondering if you could
    > provide more information about how timestamps could be queried in Rya.
    > Also, we are trying to support a type of query that would essentially be
    > limiting on cardinality (different from the normal SPARQL limit because
    > it’s for node cardinality rather than total results). I saw in one of
    > Caleb’s talks that Rya’s query optimization involves checking cardinality
    > first. I was wondering if there would be some way to tap into this feature
    > for usage in queries?
    >
    > Thanks,
    > Eric Liu
    > ________________________________________________________
    >
    > The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
I put up a "limit per node" example here [1] and Rya behaves as I
expected.  Strangely, OpenRDF does not :/

Unfortunately, the join between the two inner "SELECT" statements is not
the more efficient MultipleBindingSetsIterator.  Instead, the join is the
less efficient OpenRDF Join.

--Aaron

[1]
https://github.com/amihalik/sesame-debugging/blob/6a26045d89ebee1c72cc0452e2b01dc29544b85c/src/main/java/com/github/amihalik/sesame/debugging/NodeLimitExample.java

On Wed, Feb 22, 2017 at 9:24 PM Meier, Caleb <Ca...@parsons.com>
wrote:

> Hey Eric,
>
> Currently timestamps can't be queried in Rya.  Do you need to be able to
> query by timestamp, or simply discover the timestamp for a given node?  Rya
> does have a temporal index, but that requires you to use a temporal
> ontology to model the temporal properties of your graph nodes.
> ________________________________________
> From: Liu, Eric <Er...@capitalone.com>
> Sent: Wednesday, February 22, 2017 6:38 PM
> To: dev@rya.incubator.apache.org
> Subject: Timestamps and Cardinality in Queries
>
> Hi,
>
> Continuing from our talk earlier today I was wondering if you could
> provide more information about how timestamps could be queried in Rya.
> Also, we are trying to support a type of query that would essentially be
> limiting on cardinality (different from the normal SPARQL limit because
> it’s for node cardinality rather than total results). I saw in one of
> Caleb’s talks that Rya’s query optimization involves checking cardinality
> first. I was wondering if there would be some way to tap into this feature
> for usage in queries?
>
> Thanks,
> Eric Liu
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
Turns out all of these issues were related to our firewall and internal maven settings haha. Building on my personal computer worked out fine. Thanks for the help though!

On 3/1/17, 5:35 PM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:

     The mongo tests download and run a specially package version of Mongo.
    It seems like it's  having difficulty downloading.    Can you hit the URL
    for mongo?
    
    Could not open inputStream for
    http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.2.1.tgz
    
    
    On Wed, Mar 1, 2017 at 6:04 PM Liu, Eric <Er...@capitalone.com> wrote:
    
    > Hm, maven runs now, but it’s getting this error in the Mongo tests:
    > http://pastebin.com/Mt928ane
    >
    > On 3/1/17, 12:30 PM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
    >
    >     That's really strange.  Can you hit the maven central repo [1] from
    > your
    >     machine?
    >
    >     I guess delete the locationtech <repository> definition from your pom?
    >
    >
    >     [1] http://repo1.maven.org/maven2/org/apache/apache/17/
    >
    >     On Wed, Mar 1, 2017 at 2:31 PM Liu, Eric <Er...@capitalone.com>
    > wrote:
    >
    >     > Hmmm, deleting the files in .m2 doesn’t stop it from searching in
    >     > locationtech, and using the other mvn command gives me no log output.
    >     >
    >     > On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <aa...@gmail.com>
    > wrote:
    >     >
    >     >     transversing: gotcha.  I completely understand now.  And now I
    >     > understand
    >     >     how the prospector table would help with sniping out those nodes.
    >     >
    >     >     maven: yep, that's the right git repo.  Locationtech is required
    > when
    >     > you
    >     >     build with the 'geoindexing' profile.  Regardless, it's strange
    > that
    >     > maven
    >     >     tried to get the apache pom from locationtech.  Deleting the
    >     >     org/apache/apache directory should force maven to download the
    > apache
    >     > pom
    >     >     from maven central.
    >     >
    >     >     --Aaron
    >     >
    >     >     On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <
    > Eric.Liu@capitalone.com>
    >     > wrote:
    >     >
    >     >     > Oh, that’s not an issue, that’s what we would like to do when
    >     > traversing
    >     >     > through the data. If a node has a high cardinality we don’t
    > want to
    >     > further
    >     >     > traverse through its children.
    >     >     >
    >     >     > As for installation, did I clone the right repo for Rya? The
    > one I’m
    >     > using
    >     >     > has locationtech repos for SNAPSHOT and RELEASE:
    >     >     > https://github.com/apache/incubator-rya/blob/master/pom.xml
    >     >     >
    >     >     > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <
    > aaron.mihalik@gmail.com>
    >     > wrote:
    >     >     >
    >     >     >     Repos: The locationtech repo is up [1].  The issue is that
    > your
    >     > local
    >     >     > .m2
    >     >     >     repo is in a bad state.  Maven is trying to get the apache
    > pom
    >     > from
    >     >     >     locationtech.  Locationtech does not host that pom,
    > instead it's
    >     > on
    >     >     > maven
    >     >     >     central [2].
    >     >     >
    >     >     >     Two ways to fix this issue (you should do (1) and that'll
    > fix
    >     > it...
    >     >     > (2) is
    >     >     >     just another option for reference).
    >     >     >
    >     >     >     1. Delete your apache pom directory from your local maven
    > repo
    >     > (e.g.
    >     >     > rm -rf
    >     >     >     ~/.m2/repository/org/apache/apache/)
    >     >     >
    >     >     >     2. Tell maven to ignore remote repository metadata with
    > the -llr
    >     > flag
    >     >     > (e.g.
    >     >     >     mvn clean install -llr -Pgeoindexing)
    >     >     >
    >     >     >     Let me know if you have any other issues.
    >     >     >
    >     >     >     deep/wide: okay, I don't understand this statement: "if the
    >     >     > cardinality of
    >     >     >     a node is too high (for example, a user that owns a large
    > number
    >     > of
    >     >     >     datasets), the neighbors of that node will not be found."
    > Is
    >     > this a
    >     >     >     property of your current datstore, or is this an issue
    > with Rya?
    >     >     >
    >     >     >     --Aaron
    >     >     >
    >     >     >     [1]
    >     >     >
    >     >     >
    >     >
    > https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
    >     >     >     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
    >     >     >
    >     >     >     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <
    > pujav65@gmail.com>
    >     > wrote:
    >     >     >
    >     >     >     > Hey Eric,
    >     >     >     > Regarding the repos-- sometimes the location tech repos
    > go
    >     > down,
    >     >     > your best
    >     >     >     > bet is to wait a little bit and try again.  You can also
    >     > download the
    >     >     >     > latest artifacts off of the apache build server.
    >     >     >     > Since location tech is only used for the geo profile we
    > may
    >     > want to
    >     >     > move
    >     >     >     > where that repo is declared (or put it in the geo
    > profile).
    >     >     >     > For your use case, you could look to use the cardinality
    > in the
    >     >     > prospector
    >     >     >     > services for individual nodes.  Though the prospector
    > services
    >     > could
    >     >     > be run
    >     >     >     > once and then used to be representative (that wouldn't
    > work
    >     > for your
    >     >     > use
    >     >     >     > case), you could run them regularly to keep track of
    > counts
    >     > for your
    >     >     > use
    >     >     >     > case.  Are you using the count keyword or just manually
    >     > counting
    >     >     > edges?
    >     >     >     > The count keyword is pretty inefficient currently.  We
    > could
    >     > add
    >     >     > that to
    >     >     >     > our list of priorities maybe.
    >     >     >     >
    >     >     >     > Sent from my iPhone
    >     >     >     >
    >     >     >     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <
    >     > Eric.Liu@capitalone.com>
    >     >     > wrote:
    >     >     >     > >
    >     >     >     > > Hey Aaron,
    >     >     >     > >
    >     >     >     > > I’m currently setting up Rya to test these queries
    > with some
    >     > of our
    >     >     >     > data. I run into an error when I run ‘mvn clean
    > install’, I
    >     > attached
    >     >     > the
    >     >     >     > logs but it seems like I can’t connect to the snapshots
    > repo
    >     > you’re
    >     >     > using.
    >     >     >     > >
    >     >     >     > > As for “deep/wide”, it would be something like
    > starting at a
    >     >     > dataset,
    >     >     >     > then fanning out looking for relations where it is
    > either the
    >     >     > subject or
    >     >     >     > object, such as the user who created it, the job it came
    > from,
    >     > where
    >     >     > it’s
    >     >     >     > stored, etc. It would recurse on these neighboring nodes
    > until
    >     > a
    >     >     > total
    >     >     >     > number of results is reached. However, if the
    > cardinality of a
    >     > node
    >     >     > is too
    >     >     >     > high (for example, a user that owns a large number of
    >     > datasets), the
    >     >     >     > neighbors of that node will not be found. Really, the
    > goal is
    >     > to
    >     >     > find the
    >     >     >     > most distance relevant relationships possible, and this
    > is our
    >     >     > current
    >     >     >     > naïve way of doing so.
    >     >     >     > >
    >     >     >     > > Do you want to have a short call about this? I think
    > it’d be
    >     >     > easier to
    >     >     >     > explain/answer questions over the phone. I’m free pretty
    > much
    >     > any
    >     >     > time
    >     >     >     > 1pm-5pm PST tomorrow (3/1).
    >     >     >     > >
    >     >     >     > > Thanks,
    >     >     >     > > Eric
    >     >     >     > >
    >     >     >     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <
    >     > aaron.mihalik@gmail.com>
    >     >     > wrote:
    >     >     >     > >
    >     >     >     > >    deep vs wide: I played around with the property
    > paths
    >     > sparql
    >     >     > operator
    >     >     >     > and
    >     >     >     > >    put up an example here [1].  This is a slightly
    > different
    >     > query
    >     >     > than
    >     >     >     > the
    >     >     >     > >    one I sent out before.  It would be worth it for us
    > to
    >     > look at
    >     >     > how
    >     >     >     > this is
    >     >     >     > >    actually executed by OpenRDF.
    >     >     >     > >
    >     >     >     > >    Eric: Could you clarify by "deep vs wide"?  I think
    > I
    >     >     > understand your
    >     >     >     > >    queries, but I don't have a good intuition about
    > those
    >     > terms
    >     >     > and how
    >     >     >     > >    cardinality might figure into a query.  It would
    > probably
    >     > be a
    >     >     > bit
    >     >     >     > more
    >     >     >     > >    helpful if you provided a model or general
    > description
    >     > that is
    >     >     >     > (somewhat)
    >     >     >     > >    representative of your data.
    >     >     >     > >
    >     >     >     > >    --Aaron
    >     >     >     > >
    >     >     >     > >    [1]
    >     >     >     > >
    >     >     >     >
    >     >     >
    >     >
    > https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
    >     >     >     > >
    >     >     >     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
    >     >     > adina@usna.edu>
    >     >     >     > wrote:
    >     >     >     > >>
    >     >     >     > >> Hi Eric,
    >     >     >     > >>
    >     >     >     > >> If you want to query by the Accumulo timestamp,
    > something
    >     > like
    >     >     >     > >> timeRange(?ts, 13141201490, 13249201490) should work
    > in
    >     > Rya. I
    >     >     > did not
    >     >     >     > try
    >     >     >     > >> it lately, but timeRange() was in Rya originally. Not
    > sure
    >     > if it
    >     >     > was
    >     >     >     > >> removed in later iterations or whether it would be
    > useful
    >     > for
    >     >     > your use
    >     >     >     > >> case. First Rya paper
    >     >     >     > >>
    >     > https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
    >     >     >     > discusses
    >     >     >     > >> time ranges (Section 5.3 at the link above)
    >     >     >     > >>
    >     >     >     > >> Adina
    >     >     >     > >>
    >     >     >     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <
    >     > pujav65@gmail.com
    >     >     > >
    >     >     >     > wrote:
    >     >     >     > >>>
    >     >     >     > >>> Hey John,
    >     >     >     > >>> I'm pretty sure your pull request was merged-- it was
    >     > pulled in
    >     >     > through
    >     >     >     > >>> another pull request.  If not, sorry-- I thought it
    > had
    >     > been
    >     >     > merged and
    >     >     >     > >>> then just not closed.  I was going to spend some
    > time doing
    >     >     > merges
    >     >     >     > >> tomorrow
    >     >     >     > >>> so I can get it tomorrow.
    >     >     >     > >>>
    >     >     >     > >>> Sent from my iPhone
    >     >     >     > >>>
    >     >     >     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <
    >     > johns0806@gmail.com>
    >     >     > wrote:
    >     >     >     > >>>>
    >     >     >     > >>>> I have a pull request that fixes that problem.. it
    > has
    >     > been
    >     >     > stuck in
    >     >     >     > >>> limbo
    >     >     >     > >>>> for months..
    >     >     > https://github.com/apache/incubator-rya-site/pull/1  Can
    >     >     >     > >>>> someone merge it into master?
    >     >     >     > >>>>
    >     >     >     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
    >     >     > Eric.Liu@capitalone.com>
    >     >     >     > >>> wrote:
    >     >     >     > >>>>>
    >     >     >     > >>>>> Cool, thanks for the help.
    >     >     >     > >>>>> By the way, the link to the Rya Manual is outdated
    > on the
    >     >     >     > >>> rya.apache.org
    >     >     >     > >>>>> site. Should be pointing at
    > https://github.com/apache/
    >     >     >     > >>>>>
    >     > incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
    >     >     >     > >> index.md
    >     >     >     > >>>>>
    >     >     >     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
    >     >     > aaron.mihalik@gmail.com>
    >     >     >     > >>> wrote:
    >     >     >     > >>>>>
    >     >     >     > >>>>>   deep vs wide:
    >     >     >     > >>>>>
    >     >     >     > >>>>>   A property path query is probably your best bet.
    >     > Something
    >     >     > like:
    >     >     >     > >>>>>
    >     >     >     > >>>>>   for the following data:
    >     >     >     > >>>>>
    >     >     >     > >>>>>   s:EventA p:causes s:EventB
    >     >     >     > >>>>>   s:EventB p:causes s:EventC
    >     >     >     > >>>>>   s:EventC p:causes s:EventD
    >     >     >     > >>>>>
    >     >     >     > >>>>>
    >     >     >     > >>>>>   This query would start at EventB and work it's
    > way up
    >     > and
    >     >     > down the
    >     >     >     > >>>>> chain:
    >     >     >     > >>>>>
    >     >     >     > >>>>>   SELECT * WHERE {
    >     >     >     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s
    > ?p ?o
    >     >     >     > >>>>>   }
    >     >     >     > >>>>>
    >     >     >     > >>>>>
    >     >     >     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
    >     >     >     > >>> Caleb.Meier@parsons.com>
    >     >     >     > >>>>>   wrote:
    >     >     >     > >>>>>
    >     >     >     > >>>>>> Yes, that's a good place to start.  If you have
    > external
    >     >     > timestamps
    >     >     >     > >>>>> that
    >     >     >     > >>>>>> are built into your graph using the time ontology
    > in
    >     > owl (e.g
    >     >     > you
    >     >     >     > >>>>> have
    >     >     >     > >>>>>> triples of the form (event123, time:inDateTime,
    >     >     > 2017-02-23T14:29)),
    >     >     >     > >>>>> the
    >     >     >     > >>>>>> temporal index is exactly what you want.  If you
    > are
    >     > hoping
    >     >     > to query
    >     >     >     > >>>>> based
    >     >     >     > >>>>>> on the internal timestamps that Accumulo assigns
    > to your
    >     >     > triples,
    >     >     >     > >>>>> then
    >     >     >     > >>>>>> there are some slight tweaks that can be done to
    >     > facilitate
    >     >     > this,
    >     >     >     > >>>>> but it
    >     >     >     > >>>>>> won't be nearly as efficient (this will require
    > some
    >     > sort of
    >     >     > client
    >     >     >     > >>>>> side
    >     >     >     > >>>>>> filtering).
    >     >     >     > >>>>>>
    >     >     >     > >>>>>> Caleb A. Meier, Ph.D.
    >     >     >     > >>>>>> Software Engineer II ♦ Analyst
    >     >     >     > >>>>>> Parsons Corporation
    >     >     >     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington,
    > VA 22209
    >     >     >     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066>
    >     > <(703)%20797-3066> <(703)%20797-3066>
    >     >     > <(703)%20797-3066>
    >     >     >     > <(703)%20797-3066>
    >     >     >     > >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
    >     >     >     > >>>>>>
    >     >     >     > >>>>>> -----Original Message-----
    >     >     >     > >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
    >     >     >     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
    >     >     >     > >>>>>> To: dev@rya.incubator.apache.org
    >     >     >     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
    >     >     >     > >>>>>>
    >     >     >     > >>>>>> We’d like to be able to query by timestamp;
    >     > specifically, we
    >     >     > want to
    >     >     >     > >>>>> be
    >     >     >     > >>>>>> able to find all statements that were made within
    > a
    >     > given time
    >     >     >     > >>>>> range. Is
    >     >     >     > >>>>>> this what I should be looking at?
    >     >     >     > >>>>>>
    >     > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
    >     >     >     > >>>>>
    > apache.org_confluence_download_attachments_63407907_
    >     >     >     > >>>>>
    >     >     >
    > Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
    >     >     >     > >>>>>
    >     > 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
    >     >     >     > >>>>>
    > LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
    >     >     >     > >>> geo_4WXTD0qo8&m=
    >     >     >     > >>>>>
    >     > BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
    >     >     >     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <
    >     > Caleb.Meier@parsons.com>
    >     >     >     > wrote:
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Hey Eric,
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Currently timestamps can't be queried in Rya.
    > Do you
    >     > need
    >     >     > to be
    >     >     >     > >>>>> able
    >     >     >     > >>>>>> to query by timestamp, or simply discover the
    > timestamp
    >     > for a
    >     >     > given
    >     >     >     > >>>>> node?
    >     >     >     > >>>>>> Rya does have a temporal index, but that requires
    > you
    >     > to use a
    >     >     >     > >>>>> temporal
    >     >     >     > >>>>>> ontology to model the temporal properties of your
    > graph
    >     > nodes.
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   ________________________________________
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   From: Liu, Eric <Er...@capitalone.com>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   To: dev@rya.incubator.apache.org
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Subject: Timestamps and Cardinality in Queries
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Hi,
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Continuing from our talk earlier today I was
    >     > wondering if
    >     >     > you
    >     >     >     > >>>>> could
    >     >     >     > >>>>>> provide more information about how timestamps
    > could be
    >     >     > queried in
    >     >     >     > >>>>> Rya.
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Also, we are trying to support a type of query
    > that
    >     > would
    >     >     >     > >>>>> essentially
    >     >     >     > >>>>>> be limiting on cardinality (different from the
    > normal
    >     > SPARQL
    >     >     > limit
    >     >     >     > >>>>> because
    >     >     >     > >>>>>> it’s for node cardinality rather than total
    > results). I
    >     > saw
    >     >     > in one
    >     >     >     > of
    >     >     >     > >>>>>> Caleb’s talks that Rya’s query optimization
    > involves
    >     > checking
    >     >     >     > >>>>> cardinality
    >     >     >     > >>>>>> first. I was wondering if there would be some way
    > to
    >     > tap into
    >     >     > this
    >     >     >     > >>>>> feature
    >     >     >     > >>>>>> for usage in queries?
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Thanks,
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   Eric Liu
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >  ________________________________________________________
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>   The information contained in this e-mail is
    >     > confidential
    >     >     > and/or
    >     >     >     > >>>>>> proprietary to Capital One and/or its affiliates
    > and
    >     > may only
    >     >     > be
    >     >     >     > used
    >     >     >     > >>>>>> solely in performance of work or services for
    > Capital
    >     > One. The
    >     >     >     > >>>>> information
    >     >     >     > >>>>>> transmitted herewith is intended only for use by
    > the
    >     >     > individual or
    >     >     >     > >>>>> entity
    >     >     >     > >>>>>> to which it is addressed. If the reader of this
    > message
    >     > is
    >     >     > not the
    >     >     >     > >>>>> intended
    >     >     >     > >>>>>> recipient, you are hereby notified that any
    > review,
    >     >     > retransmission,
    >     >     >     > >>>>>> dissemination, distribution, copying or other use
    > of, or
    >     >     > taking of
    >     >     >     > >>>>> any
    >     >     >     > >>>>>> action in reliance upon this information is
    > strictly
    >     >     > prohibited. If
    >     >     >     > >>>>> you
    >     >     >     > >>>>>> have received this communication in error, please
    >     > contact the
    >     >     > sender
    >     >     >     > >>>>> and
    >     >     >     > >>>>>> delete the material from your computer.
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    > ________________________________________________________
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>>
    >     >     >     > >>>>>> The information contained in this e-mail is
    > confidential
    >     >     > and/or
    >     >     >     > >>>>>> proprietary to Capital One and/or its affiliates
    > and
    >     > may only
    >     >     > be
    >     >     >     > used
    >     >     >     > >>>>>> solely in performance of work or services for
    > Capital
    >     > One. The
    >     >     >     > >>>>> information
    >     >     >     > >>>>>> transmitted herewith is intended only for use by
    > the
    >     >     > individual or
    >     >     >     > >>>>> entity
    >     >     >     > >>>>>> to which it is addressed. If the reader of this
    > message
    >     > is
    >     >     > not the
    >     >     >     > >>>>> intended
    >     >     >     > >>>>>> recipient, you are hereby notified that any
    > review,
    >     >     > retransmission,
    >     >     >     > >>>>>> dissemination, distribution, copying or other use
    > of, or
    >     >     > taking of
    >     >     >     > >>>>> any
    >     >     >     > >>>>>> action in reliance upon this information is
    > strictly
    >     >     > prohibited. If
    >     >     >     > >>>>> you
    >     >     >     > >>>>>> have received this communication in error, please
    >     > contact the
    >     >     > sender
    >     >     >     > >>>>> and
    >     >     >     > >>>>>> delete the material from your computer.
    >     >     >     > >>>>>>
    >     >     >     > >>>>>
    >     >     >     > >>>>>
    >     >     >     > >>>>>
    > ________________________________________________________
    >     >     >     > >>>>>
    >     >     >     > >>>>> The information contained in this e-mail is
    > confidential
    >     > and/or
    >     >     >     > >>>>> proprietary to Capital One and/or its affiliates
    > and may
    >     > only
    >     >     > be used
    >     >     >     > >>>>> solely in performance of work or services for
    > Capital
    >     > One. The
    >     >     >     > >>> information
    >     >     >     > >>>>> transmitted herewith is intended only for use by
    > the
    >     >     > individual or
    >     >     >     > >>> entity
    >     >     >     > >>>>> to which it is addressed. If the reader of this
    > message
    >     > is not
    >     >     > the
    >     >     >     > >>> intended
    >     >     >     > >>>>> recipient, you are hereby notified that any review,
    >     >     > retransmission,
    >     >     >     > >>>>> dissemination, distribution, copying or other use
    > of, or
    >     >     > taking of
    >     >     >     > any
    >     >     >     > >>>>> action in reliance upon this information is
    > strictly
    >     >     > prohibited. If
    >     >     >     > >> you
    >     >     >     > >>>>> have received this communication in error, please
    >     > contact the
    >     >     > sender
    >     >     >     > >> and
    >     >     >     > >>>>> delete the material from your computer.
    >     >     >     > >>>>>
    >     >     >     > >>>
    >     >     >     > >>
    >     >     >     > >>
    >     >     >     > >>
    >     >     >     > >> --
    >     >     >     > >> Dr. Adina Crainiceanu
    >     >     >     > >> Associate Professor, Computer Science Department
    >     >     >     > >> United States Naval Academy
    >     >     >     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
    >     > <(410)%20293-6822>
    >     >     > <(410)%20293-6822>
    >     >     >     > >> adina@usna.edu
    >     >     >     > >> http://www.usna.edu/Users/cs/adina/
    >     >     >     > >>
    >     >     >     > >
    >     >     >     > >
    >     >     >     > >
    > ________________________________________________________
    >     >     >     > >
    >     >     >     > > The information contained in this e-mail is
    > confidential
    >     > and/or
    >     >     >     > proprietary to Capital One and/or its affiliates and may
    > only
    >     > be used
    >     >     >     > solely in performance of work or services for Capital
    > One. The
    >     >     > information
    >     >     >     > transmitted herewith is intended only for use by the
    >     > individual or
    >     >     > entity
    >     >     >     > to which it is addressed. If the reader of this message
    > is not
    >     > the
    >     >     > intended
    >     >     >     > recipient, you are hereby notified that any review,
    >     > retransmission,
    >     >     >     > dissemination, distribution, copying or other use of, or
    >     > taking of
    >     >     > any
    >     >     >     > action in reliance upon this information is strictly
    >     > prohibited. If
    >     >     > you
    >     >     >     > have received this communication in error, please
    > contact the
    >     > sender
    >     >     > and
    >     >     >     > delete the material from your computer.
    >     >     >     > > <log.txt>
    >     >     >     >
    >     >     >
    >     >     >
    >     >     > ________________________________________________________
    >     >     >
    >     >     > The information contained in this e-mail is confidential and/or
    >     >     > proprietary to Capital One and/or its affiliates and may only
    > be used
    >     >     > solely in performance of work or services for Capital One. The
    >     > information
    >     >     > transmitted herewith is intended only for use by the
    > individual or
    >     > entity
    >     >     > to which it is addressed. If the reader of this message is not
    > the
    >     > intended
    >     >     > recipient, you are hereby notified that any review,
    > retransmission,
    >     >     > dissemination, distribution, copying or other use of, or
    > taking of
    >     > any
    >     >     > action in reliance upon this information is strictly
    > prohibited. If
    >     > you
    >     >     > have received this communication in error, please contact the
    > sender
    >     > and
    >     >     > delete the material from your computer.
    >     >     >
    >     >
    >     >
    >     > ________________________________________________________
    >     >
    >     > The information contained in this e-mail is confidential and/or
    >     > proprietary to Capital One and/or its affiliates and may only be used
    >     > solely in performance of work or services for Capital One. The
    > information
    >     > transmitted herewith is intended only for use by the individual or
    > entity
    >     > to which it is addressed. If the reader of this message is not the
    > intended
    >     > recipient, you are hereby notified that any review, retransmission,
    >     > dissemination, distribution, copying or other use of, or taking of
    > any
    >     > action in reliance upon this information is strictly prohibited. If
    > you
    >     > have received this communication in error, please contact the sender
    > and
    >     > delete the material from your computer.
    >     >
    >
    >
    >
    >
    > ________________________________________________________
    >
    > The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
 The mongo tests download and run a specially package version of Mongo.
It seems like it's  having difficulty downloading.    Can you hit the URL
for mongo?

Could not open inputStream for
http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.2.1.tgz


On Wed, Mar 1, 2017 at 6:04 PM Liu, Eric <Er...@capitalone.com> wrote:

> Hm, maven runs now, but it’s getting this error in the Mongo tests:
> http://pastebin.com/Mt928ane
>
> On 3/1/17, 12:30 PM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
>
>     That's really strange.  Can you hit the maven central repo [1] from
> your
>     machine?
>
>     I guess delete the locationtech <repository> definition from your pom?
>
>
>     [1] http://repo1.maven.org/maven2/org/apache/apache/17/
>
>     On Wed, Mar 1, 2017 at 2:31 PM Liu, Eric <Er...@capitalone.com>
> wrote:
>
>     > Hmmm, deleting the files in .m2 doesn’t stop it from searching in
>     > locationtech, and using the other mvn command gives me no log output.
>     >
>     > On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <aa...@gmail.com>
> wrote:
>     >
>     >     transversing: gotcha.  I completely understand now.  And now I
>     > understand
>     >     how the prospector table would help with sniping out those nodes.
>     >
>     >     maven: yep, that's the right git repo.  Locationtech is required
> when
>     > you
>     >     build with the 'geoindexing' profile.  Regardless, it's strange
> that
>     > maven
>     >     tried to get the apache pom from locationtech.  Deleting the
>     >     org/apache/apache directory should force maven to download the
> apache
>     > pom
>     >     from maven central.
>     >
>     >     --Aaron
>     >
>     >     On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <
> Eric.Liu@capitalone.com>
>     > wrote:
>     >
>     >     > Oh, that’s not an issue, that’s what we would like to do when
>     > traversing
>     >     > through the data. If a node has a high cardinality we don’t
> want to
>     > further
>     >     > traverse through its children.
>     >     >
>     >     > As for installation, did I clone the right repo for Rya? The
> one I’m
>     > using
>     >     > has locationtech repos for SNAPSHOT and RELEASE:
>     >     > https://github.com/apache/incubator-rya/blob/master/pom.xml
>     >     >
>     >     > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <
> aaron.mihalik@gmail.com>
>     > wrote:
>     >     >
>     >     >     Repos: The locationtech repo is up [1].  The issue is that
> your
>     > local
>     >     > .m2
>     >     >     repo is in a bad state.  Maven is trying to get the apache
> pom
>     > from
>     >     >     locationtech.  Locationtech does not host that pom,
> instead it's
>     > on
>     >     > maven
>     >     >     central [2].
>     >     >
>     >     >     Two ways to fix this issue (you should do (1) and that'll
> fix
>     > it...
>     >     > (2) is
>     >     >     just another option for reference).
>     >     >
>     >     >     1. Delete your apache pom directory from your local maven
> repo
>     > (e.g.
>     >     > rm -rf
>     >     >     ~/.m2/repository/org/apache/apache/)
>     >     >
>     >     >     2. Tell maven to ignore remote repository metadata with
> the -llr
>     > flag
>     >     > (e.g.
>     >     >     mvn clean install -llr -Pgeoindexing)
>     >     >
>     >     >     Let me know if you have any other issues.
>     >     >
>     >     >     deep/wide: okay, I don't understand this statement: "if the
>     >     > cardinality of
>     >     >     a node is too high (for example, a user that owns a large
> number
>     > of
>     >     >     datasets), the neighbors of that node will not be found."
> Is
>     > this a
>     >     >     property of your current datstore, or is this an issue
> with Rya?
>     >     >
>     >     >     --Aaron
>     >     >
>     >     >     [1]
>     >     >
>     >     >
>     >
> https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
>     >     >     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
>     >     >
>     >     >     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <
> pujav65@gmail.com>
>     > wrote:
>     >     >
>     >     >     > Hey Eric,
>     >     >     > Regarding the repos-- sometimes the location tech repos
> go
>     > down,
>     >     > your best
>     >     >     > bet is to wait a little bit and try again.  You can also
>     > download the
>     >     >     > latest artifacts off of the apache build server.
>     >     >     > Since location tech is only used for the geo profile we
> may
>     > want to
>     >     > move
>     >     >     > where that repo is declared (or put it in the geo
> profile).
>     >     >     > For your use case, you could look to use the cardinality
> in the
>     >     > prospector
>     >     >     > services for individual nodes.  Though the prospector
> services
>     > could
>     >     > be run
>     >     >     > once and then used to be representative (that wouldn't
> work
>     > for your
>     >     > use
>     >     >     > case), you could run them regularly to keep track of
> counts
>     > for your
>     >     > use
>     >     >     > case.  Are you using the count keyword or just manually
>     > counting
>     >     > edges?
>     >     >     > The count keyword is pretty inefficient currently.  We
> could
>     > add
>     >     > that to
>     >     >     > our list of priorities maybe.
>     >     >     >
>     >     >     > Sent from my iPhone
>     >     >     >
>     >     >     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <
>     > Eric.Liu@capitalone.com>
>     >     > wrote:
>     >     >     > >
>     >     >     > > Hey Aaron,
>     >     >     > >
>     >     >     > > I’m currently setting up Rya to test these queries
> with some
>     > of our
>     >     >     > data. I run into an error when I run ‘mvn clean
> install’, I
>     > attached
>     >     > the
>     >     >     > logs but it seems like I can’t connect to the snapshots
> repo
>     > you’re
>     >     > using.
>     >     >     > >
>     >     >     > > As for “deep/wide”, it would be something like
> starting at a
>     >     > dataset,
>     >     >     > then fanning out looking for relations where it is
> either the
>     >     > subject or
>     >     >     > object, such as the user who created it, the job it came
> from,
>     > where
>     >     > it’s
>     >     >     > stored, etc. It would recurse on these neighboring nodes
> until
>     > a
>     >     > total
>     >     >     > number of results is reached. However, if the
> cardinality of a
>     > node
>     >     > is too
>     >     >     > high (for example, a user that owns a large number of
>     > datasets), the
>     >     >     > neighbors of that node will not be found. Really, the
> goal is
>     > to
>     >     > find the
>     >     >     > most distance relevant relationships possible, and this
> is our
>     >     > current
>     >     >     > naïve way of doing so.
>     >     >     > >
>     >     >     > > Do you want to have a short call about this? I think
> it’d be
>     >     > easier to
>     >     >     > explain/answer questions over the phone. I’m free pretty
> much
>     > any
>     >     > time
>     >     >     > 1pm-5pm PST tomorrow (3/1).
>     >     >     > >
>     >     >     > > Thanks,
>     >     >     > > Eric
>     >     >     > >
>     >     >     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <
>     > aaron.mihalik@gmail.com>
>     >     > wrote:
>     >     >     > >
>     >     >     > >    deep vs wide: I played around with the property
> paths
>     > sparql
>     >     > operator
>     >     >     > and
>     >     >     > >    put up an example here [1].  This is a slightly
> different
>     > query
>     >     > than
>     >     >     > the
>     >     >     > >    one I sent out before.  It would be worth it for us
> to
>     > look at
>     >     > how
>     >     >     > this is
>     >     >     > >    actually executed by OpenRDF.
>     >     >     > >
>     >     >     > >    Eric: Could you clarify by "deep vs wide"?  I think
> I
>     >     > understand your
>     >     >     > >    queries, but I don't have a good intuition about
> those
>     > terms
>     >     > and how
>     >     >     > >    cardinality might figure into a query.  It would
> probably
>     > be a
>     >     > bit
>     >     >     > more
>     >     >     > >    helpful if you provided a model or general
> description
>     > that is
>     >     >     > (somewhat)
>     >     >     > >    representative of your data.
>     >     >     > >
>     >     >     > >    --Aaron
>     >     >     > >
>     >     >     > >    [1]
>     >     >     > >
>     >     >     >
>     >     >
>     >
> https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
>     >     >     > >
>     >     >     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
>     >     > adina@usna.edu>
>     >     >     > wrote:
>     >     >     > >>
>     >     >     > >> Hi Eric,
>     >     >     > >>
>     >     >     > >> If you want to query by the Accumulo timestamp,
> something
>     > like
>     >     >     > >> timeRange(?ts, 13141201490, 13249201490) should work
> in
>     > Rya. I
>     >     > did not
>     >     >     > try
>     >     >     > >> it lately, but timeRange() was in Rya originally. Not
> sure
>     > if it
>     >     > was
>     >     >     > >> removed in later iterations or whether it would be
> useful
>     > for
>     >     > your use
>     >     >     > >> case. First Rya paper
>     >     >     > >>
>     > https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
>     >     >     > discusses
>     >     >     > >> time ranges (Section 5.3 at the link above)
>     >     >     > >>
>     >     >     > >> Adina
>     >     >     > >>
>     >     >     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <
>     > pujav65@gmail.com
>     >     > >
>     >     >     > wrote:
>     >     >     > >>>
>     >     >     > >>> Hey John,
>     >     >     > >>> I'm pretty sure your pull request was merged-- it was
>     > pulled in
>     >     > through
>     >     >     > >>> another pull request.  If not, sorry-- I thought it
> had
>     > been
>     >     > merged and
>     >     >     > >>> then just not closed.  I was going to spend some
> time doing
>     >     > merges
>     >     >     > >> tomorrow
>     >     >     > >>> so I can get it tomorrow.
>     >     >     > >>>
>     >     >     > >>> Sent from my iPhone
>     >     >     > >>>
>     >     >     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <
>     > johns0806@gmail.com>
>     >     > wrote:
>     >     >     > >>>>
>     >     >     > >>>> I have a pull request that fixes that problem.. it
> has
>     > been
>     >     > stuck in
>     >     >     > >>> limbo
>     >     >     > >>>> for months..
>     >     > https://github.com/apache/incubator-rya-site/pull/1  Can
>     >     >     > >>>> someone merge it into master?
>     >     >     > >>>>
>     >     >     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
>     >     > Eric.Liu@capitalone.com>
>     >     >     > >>> wrote:
>     >     >     > >>>>>
>     >     >     > >>>>> Cool, thanks for the help.
>     >     >     > >>>>> By the way, the link to the Rya Manual is outdated
> on the
>     >     >     > >>> rya.apache.org
>     >     >     > >>>>> site. Should be pointing at
> https://github.com/apache/
>     >     >     > >>>>>
>     > incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
>     >     >     > >> index.md
>     >     >     > >>>>>
>     >     >     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
>     >     > aaron.mihalik@gmail.com>
>     >     >     > >>> wrote:
>     >     >     > >>>>>
>     >     >     > >>>>>   deep vs wide:
>     >     >     > >>>>>
>     >     >     > >>>>>   A property path query is probably your best bet.
>     > Something
>     >     > like:
>     >     >     > >>>>>
>     >     >     > >>>>>   for the following data:
>     >     >     > >>>>>
>     >     >     > >>>>>   s:EventA p:causes s:EventB
>     >     >     > >>>>>   s:EventB p:causes s:EventC
>     >     >     > >>>>>   s:EventC p:causes s:EventD
>     >     >     > >>>>>
>     >     >     > >>>>>
>     >     >     > >>>>>   This query would start at EventB and work it's
> way up
>     > and
>     >     > down the
>     >     >     > >>>>> chain:
>     >     >     > >>>>>
>     >     >     > >>>>>   SELECT * WHERE {
>     >     >     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s
> ?p ?o
>     >     >     > >>>>>   }
>     >     >     > >>>>>
>     >     >     > >>>>>
>     >     >     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
>     >     >     > >>> Caleb.Meier@parsons.com>
>     >     >     > >>>>>   wrote:
>     >     >     > >>>>>
>     >     >     > >>>>>> Yes, that's a good place to start.  If you have
> external
>     >     > timestamps
>     >     >     > >>>>> that
>     >     >     > >>>>>> are built into your graph using the time ontology
> in
>     > owl (e.g
>     >     > you
>     >     >     > >>>>> have
>     >     >     > >>>>>> triples of the form (event123, time:inDateTime,
>     >     > 2017-02-23T14:29)),
>     >     >     > >>>>> the
>     >     >     > >>>>>> temporal index is exactly what you want.  If you
> are
>     > hoping
>     >     > to query
>     >     >     > >>>>> based
>     >     >     > >>>>>> on the internal timestamps that Accumulo assigns
> to your
>     >     > triples,
>     >     >     > >>>>> then
>     >     >     > >>>>>> there are some slight tweaks that can be done to
>     > facilitate
>     >     > this,
>     >     >     > >>>>> but it
>     >     >     > >>>>>> won't be nearly as efficient (this will require
> some
>     > sort of
>     >     > client
>     >     >     > >>>>> side
>     >     >     > >>>>>> filtering).
>     >     >     > >>>>>>
>     >     >     > >>>>>> Caleb A. Meier, Ph.D.
>     >     >     > >>>>>> Software Engineer II ♦ Analyst
>     >     >     > >>>>>> Parsons Corporation
>     >     >     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington,
> VA 22209
>     >     >     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066>
>     > <(703)%20797-3066> <(703)%20797-3066>
>     >     > <(703)%20797-3066>
>     >     >     > <(703)%20797-3066>
>     >     >     > >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
>     >     >     > >>>>>>
>     >     >     > >>>>>> -----Original Message-----
>     >     >     > >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
>     >     >     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
>     >     >     > >>>>>> To: dev@rya.incubator.apache.org
>     >     >     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
>     >     >     > >>>>>>
>     >     >     > >>>>>> We’d like to be able to query by timestamp;
>     > specifically, we
>     >     > want to
>     >     >     > >>>>> be
>     >     >     > >>>>>> able to find all statements that were made within
> a
>     > given time
>     >     >     > >>>>> range. Is
>     >     >     > >>>>>> this what I should be looking at?
>     >     >     > >>>>>>
>     > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>     >     >     > >>>>>
> apache.org_confluence_download_attachments_63407907_
>     >     >     > >>>>>
>     >     >
> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
>     >     >     > >>>>>
>     > 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
>     >     >     > >>>>>
> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
>     >     >     > >>> geo_4WXTD0qo8&m=
>     >     >     > >>>>>
>     > BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
>     >     >     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <
>     > Caleb.Meier@parsons.com>
>     >     >     > wrote:
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Hey Eric,
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Currently timestamps can't be queried in Rya.
> Do you
>     > need
>     >     > to be
>     >     >     > >>>>> able
>     >     >     > >>>>>> to query by timestamp, or simply discover the
> timestamp
>     > for a
>     >     > given
>     >     >     > >>>>> node?
>     >     >     > >>>>>> Rya does have a temporal index, but that requires
> you
>     > to use a
>     >     >     > >>>>> temporal
>     >     >     > >>>>>> ontology to model the temporal properties of your
> graph
>     > nodes.
>     >     >     > >>>>>>
>     >     >     > >>>>>>   ________________________________________
>     >     >     > >>>>>>
>     >     >     > >>>>>>   From: Liu, Eric <Er...@capitalone.com>
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
>     >     >     > >>>>>>
>     >     >     > >>>>>>   To: dev@rya.incubator.apache.org
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Subject: Timestamps and Cardinality in Queries
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Hi,
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Continuing from our talk earlier today I was
>     > wondering if
>     >     > you
>     >     >     > >>>>> could
>     >     >     > >>>>>> provide more information about how timestamps
> could be
>     >     > queried in
>     >     >     > >>>>> Rya.
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Also, we are trying to support a type of query
> that
>     > would
>     >     >     > >>>>> essentially
>     >     >     > >>>>>> be limiting on cardinality (different from the
> normal
>     > SPARQL
>     >     > limit
>     >     >     > >>>>> because
>     >     >     > >>>>>> it’s for node cardinality rather than total
> results). I
>     > saw
>     >     > in one
>     >     >     > of
>     >     >     > >>>>>> Caleb’s talks that Rya’s query optimization
> involves
>     > checking
>     >     >     > >>>>> cardinality
>     >     >     > >>>>>> first. I was wondering if there would be some way
> to
>     > tap into
>     >     > this
>     >     >     > >>>>> feature
>     >     >     > >>>>>> for usage in queries?
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Thanks,
>     >     >     > >>>>>>
>     >     >     > >>>>>>   Eric Liu
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >  ________________________________________________________
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>   The information contained in this e-mail is
>     > confidential
>     >     > and/or
>     >     >     > >>>>>> proprietary to Capital One and/or its affiliates
> and
>     > may only
>     >     > be
>     >     >     > used
>     >     >     > >>>>>> solely in performance of work or services for
> Capital
>     > One. The
>     >     >     > >>>>> information
>     >     >     > >>>>>> transmitted herewith is intended only for use by
> the
>     >     > individual or
>     >     >     > >>>>> entity
>     >     >     > >>>>>> to which it is addressed. If the reader of this
> message
>     > is
>     >     > not the
>     >     >     > >>>>> intended
>     >     >     > >>>>>> recipient, you are hereby notified that any
> review,
>     >     > retransmission,
>     >     >     > >>>>>> dissemination, distribution, copying or other use
> of, or
>     >     > taking of
>     >     >     > >>>>> any
>     >     >     > >>>>>> action in reliance upon this information is
> strictly
>     >     > prohibited. If
>     >     >     > >>>>> you
>     >     >     > >>>>>> have received this communication in error, please
>     > contact the
>     >     > sender
>     >     >     > >>>>> and
>     >     >     > >>>>>> delete the material from your computer.
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
> ________________________________________________________
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>>
>     >     >     > >>>>>> The information contained in this e-mail is
> confidential
>     >     > and/or
>     >     >     > >>>>>> proprietary to Capital One and/or its affiliates
> and
>     > may only
>     >     > be
>     >     >     > used
>     >     >     > >>>>>> solely in performance of work or services for
> Capital
>     > One. The
>     >     >     > >>>>> information
>     >     >     > >>>>>> transmitted herewith is intended only for use by
> the
>     >     > individual or
>     >     >     > >>>>> entity
>     >     >     > >>>>>> to which it is addressed. If the reader of this
> message
>     > is
>     >     > not the
>     >     >     > >>>>> intended
>     >     >     > >>>>>> recipient, you are hereby notified that any
> review,
>     >     > retransmission,
>     >     >     > >>>>>> dissemination, distribution, copying or other use
> of, or
>     >     > taking of
>     >     >     > >>>>> any
>     >     >     > >>>>>> action in reliance upon this information is
> strictly
>     >     > prohibited. If
>     >     >     > >>>>> you
>     >     >     > >>>>>> have received this communication in error, please
>     > contact the
>     >     > sender
>     >     >     > >>>>> and
>     >     >     > >>>>>> delete the material from your computer.
>     >     >     > >>>>>>
>     >     >     > >>>>>
>     >     >     > >>>>>
>     >     >     > >>>>>
> ________________________________________________________
>     >     >     > >>>>>
>     >     >     > >>>>> The information contained in this e-mail is
> confidential
>     > and/or
>     >     >     > >>>>> proprietary to Capital One and/or its affiliates
> and may
>     > only
>     >     > be used
>     >     >     > >>>>> solely in performance of work or services for
> Capital
>     > One. The
>     >     >     > >>> information
>     >     >     > >>>>> transmitted herewith is intended only for use by
> the
>     >     > individual or
>     >     >     > >>> entity
>     >     >     > >>>>> to which it is addressed. If the reader of this
> message
>     > is not
>     >     > the
>     >     >     > >>> intended
>     >     >     > >>>>> recipient, you are hereby notified that any review,
>     >     > retransmission,
>     >     >     > >>>>> dissemination, distribution, copying or other use
> of, or
>     >     > taking of
>     >     >     > any
>     >     >     > >>>>> action in reliance upon this information is
> strictly
>     >     > prohibited. If
>     >     >     > >> you
>     >     >     > >>>>> have received this communication in error, please
>     > contact the
>     >     > sender
>     >     >     > >> and
>     >     >     > >>>>> delete the material from your computer.
>     >     >     > >>>>>
>     >     >     > >>>
>     >     >     > >>
>     >     >     > >>
>     >     >     > >>
>     >     >     > >> --
>     >     >     > >> Dr. Adina Crainiceanu
>     >     >     > >> Associate Professor, Computer Science Department
>     >     >     > >> United States Naval Academy
>     >     >     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
>     > <(410)%20293-6822>
>     >     > <(410)%20293-6822>
>     >     >     > >> adina@usna.edu
>     >     >     > >> http://www.usna.edu/Users/cs/adina/
>     >     >     > >>
>     >     >     > >
>     >     >     > >
>     >     >     > >
> ________________________________________________________
>     >     >     > >
>     >     >     > > The information contained in this e-mail is
> confidential
>     > and/or
>     >     >     > proprietary to Capital One and/or its affiliates and may
> only
>     > be used
>     >     >     > solely in performance of work or services for Capital
> One. The
>     >     > information
>     >     >     > transmitted herewith is intended only for use by the
>     > individual or
>     >     > entity
>     >     >     > to which it is addressed. If the reader of this message
> is not
>     > the
>     >     > intended
>     >     >     > recipient, you are hereby notified that any review,
>     > retransmission,
>     >     >     > dissemination, distribution, copying or other use of, or
>     > taking of
>     >     > any
>     >     >     > action in reliance upon this information is strictly
>     > prohibited. If
>     >     > you
>     >     >     > have received this communication in error, please
> contact the
>     > sender
>     >     > and
>     >     >     > delete the material from your computer.
>     >     >     > > <log.txt>
>     >     >     >
>     >     >
>     >     >
>     >     > ________________________________________________________
>     >     >
>     >     > The information contained in this e-mail is confidential and/or
>     >     > proprietary to Capital One and/or its affiliates and may only
> be used
>     >     > solely in performance of work or services for Capital One. The
>     > information
>     >     > transmitted herewith is intended only for use by the
> individual or
>     > entity
>     >     > to which it is addressed. If the reader of this message is not
> the
>     > intended
>     >     > recipient, you are hereby notified that any review,
> retransmission,
>     >     > dissemination, distribution, copying or other use of, or
> taking of
>     > any
>     >     > action in reliance upon this information is strictly
> prohibited. If
>     > you
>     >     > have received this communication in error, please contact the
> sender
>     > and
>     >     > delete the material from your computer.
>     >     >
>     >
>     >
>     > ________________________________________________________
>     >
>     > The information contained in this e-mail is confidential and/or
>     > proprietary to Capital One and/or its affiliates and may only be used
>     > solely in performance of work or services for Capital One. The
> information
>     > transmitted herewith is intended only for use by the individual or
> entity
>     > to which it is addressed. If the reader of this message is not the
> intended
>     > recipient, you are hereby notified that any review, retransmission,
>     > dissemination, distribution, copying or other use of, or taking of
> any
>     > action in reliance upon this information is strictly prohibited. If
> you
>     > have received this communication in error, please contact the sender
> and
>     > delete the material from your computer.
>     >
>
>
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
Hm, maven runs now, but it’s getting this error in the Mongo tests:
http://pastebin.com/Mt928ane

On 3/1/17, 12:30 PM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:

    That's really strange.  Can you hit the maven central repo [1] from your
    machine?
    
    I guess delete the locationtech <repository> definition from your pom?
    
    
    [1] http://repo1.maven.org/maven2/org/apache/apache/17/
    
    On Wed, Mar 1, 2017 at 2:31 PM Liu, Eric <Er...@capitalone.com> wrote:
    
    > Hmmm, deleting the files in .m2 doesn’t stop it from searching in
    > locationtech, and using the other mvn command gives me no log output.
    >
    > On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
    >
    >     transversing: gotcha.  I completely understand now.  And now I
    > understand
    >     how the prospector table would help with sniping out those nodes.
    >
    >     maven: yep, that's the right git repo.  Locationtech is required when
    > you
    >     build with the 'geoindexing' profile.  Regardless, it's strange that
    > maven
    >     tried to get the apache pom from locationtech.  Deleting the
    >     org/apache/apache directory should force maven to download the apache
    > pom
    >     from maven central.
    >
    >     --Aaron
    >
    >     On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <Er...@capitalone.com>
    > wrote:
    >
    >     > Oh, that’s not an issue, that’s what we would like to do when
    > traversing
    >     > through the data. If a node has a high cardinality we don’t want to
    > further
    >     > traverse through its children.
    >     >
    >     > As for installation, did I clone the right repo for Rya? The one I’m
    > using
    >     > has locationtech repos for SNAPSHOT and RELEASE:
    >     > https://github.com/apache/incubator-rya/blob/master/pom.xml
    >     >
    >     > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <aa...@gmail.com>
    > wrote:
    >     >
    >     >     Repos: The locationtech repo is up [1].  The issue is that your
    > local
    >     > .m2
    >     >     repo is in a bad state.  Maven is trying to get the apache pom
    > from
    >     >     locationtech.  Locationtech does not host that pom, instead it's
    > on
    >     > maven
    >     >     central [2].
    >     >
    >     >     Two ways to fix this issue (you should do (1) and that'll fix
    > it...
    >     > (2) is
    >     >     just another option for reference).
    >     >
    >     >     1. Delete your apache pom directory from your local maven repo
    > (e.g.
    >     > rm -rf
    >     >     ~/.m2/repository/org/apache/apache/)
    >     >
    >     >     2. Tell maven to ignore remote repository metadata with the -llr
    > flag
    >     > (e.g.
    >     >     mvn clean install -llr -Pgeoindexing)
    >     >
    >     >     Let me know if you have any other issues.
    >     >
    >     >     deep/wide: okay, I don't understand this statement: "if the
    >     > cardinality of
    >     >     a node is too high (for example, a user that owns a large number
    > of
    >     >     datasets), the neighbors of that node will not be found."  Is
    > this a
    >     >     property of your current datstore, or is this an issue with Rya?
    >     >
    >     >     --Aaron
    >     >
    >     >     [1]
    >     >
    >     >
    > https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
    >     >     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
    >     >
    >     >     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <pu...@gmail.com>
    > wrote:
    >     >
    >     >     > Hey Eric,
    >     >     > Regarding the repos-- sometimes the location tech repos go
    > down,
    >     > your best
    >     >     > bet is to wait a little bit and try again.  You can also
    > download the
    >     >     > latest artifacts off of the apache build server.
    >     >     > Since location tech is only used for the geo profile we may
    > want to
    >     > move
    >     >     > where that repo is declared (or put it in the geo profile).
    >     >     > For your use case, you could look to use the cardinality in the
    >     > prospector
    >     >     > services for individual nodes.  Though the prospector services
    > could
    >     > be run
    >     >     > once and then used to be representative (that wouldn't work
    > for your
    >     > use
    >     >     > case), you could run them regularly to keep track of counts
    > for your
    >     > use
    >     >     > case.  Are you using the count keyword or just manually
    > counting
    >     > edges?
    >     >     > The count keyword is pretty inefficient currently.  We could
    > add
    >     > that to
    >     >     > our list of priorities maybe.
    >     >     >
    >     >     > Sent from my iPhone
    >     >     >
    >     >     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <
    > Eric.Liu@capitalone.com>
    >     > wrote:
    >     >     > >
    >     >     > > Hey Aaron,
    >     >     > >
    >     >     > > I’m currently setting up Rya to test these queries with some
    > of our
    >     >     > data. I run into an error when I run ‘mvn clean install’, I
    > attached
    >     > the
    >     >     > logs but it seems like I can’t connect to the snapshots repo
    > you’re
    >     > using.
    >     >     > >
    >     >     > > As for “deep/wide”, it would be something like starting at a
    >     > dataset,
    >     >     > then fanning out looking for relations where it is either the
    >     > subject or
    >     >     > object, such as the user who created it, the job it came from,
    > where
    >     > it’s
    >     >     > stored, etc. It would recurse on these neighboring nodes until
    > a
    >     > total
    >     >     > number of results is reached. However, if the cardinality of a
    > node
    >     > is too
    >     >     > high (for example, a user that owns a large number of
    > datasets), the
    >     >     > neighbors of that node will not be found. Really, the goal is
    > to
    >     > find the
    >     >     > most distance relevant relationships possible, and this is our
    >     > current
    >     >     > naïve way of doing so.
    >     >     > >
    >     >     > > Do you want to have a short call about this? I think it’d be
    >     > easier to
    >     >     > explain/answer questions over the phone. I’m free pretty much
    > any
    >     > time
    >     >     > 1pm-5pm PST tomorrow (3/1).
    >     >     > >
    >     >     > > Thanks,
    >     >     > > Eric
    >     >     > >
    >     >     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <
    > aaron.mihalik@gmail.com>
    >     > wrote:
    >     >     > >
    >     >     > >    deep vs wide: I played around with the property paths
    > sparql
    >     > operator
    >     >     > and
    >     >     > >    put up an example here [1].  This is a slightly different
    > query
    >     > than
    >     >     > the
    >     >     > >    one I sent out before.  It would be worth it for us to
    > look at
    >     > how
    >     >     > this is
    >     >     > >    actually executed by OpenRDF.
    >     >     > >
    >     >     > >    Eric: Could you clarify by "deep vs wide"?  I think I
    >     > understand your
    >     >     > >    queries, but I don't have a good intuition about those
    > terms
    >     > and how
    >     >     > >    cardinality might figure into a query.  It would probably
    > be a
    >     > bit
    >     >     > more
    >     >     > >    helpful if you provided a model or general description
    > that is
    >     >     > (somewhat)
    >     >     > >    representative of your data.
    >     >     > >
    >     >     > >    --Aaron
    >     >     > >
    >     >     > >    [1]
    >     >     > >
    >     >     >
    >     >
    > https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
    >     >     > >
    >     >     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
    >     > adina@usna.edu>
    >     >     > wrote:
    >     >     > >>
    >     >     > >> Hi Eric,
    >     >     > >>
    >     >     > >> If you want to query by the Accumulo timestamp, something
    > like
    >     >     > >> timeRange(?ts, 13141201490, 13249201490) should work in
    > Rya. I
    >     > did not
    >     >     > try
    >     >     > >> it lately, but timeRange() was in Rya originally. Not sure
    > if it
    >     > was
    >     >     > >> removed in later iterations or whether it would be useful
    > for
    >     > your use
    >     >     > >> case. First Rya paper
    >     >     > >>
    > https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
    >     >     > discusses
    >     >     > >> time ranges (Section 5.3 at the link above)
    >     >     > >>
    >     >     > >> Adina
    >     >     > >>
    >     >     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <
    > pujav65@gmail.com
    >     > >
    >     >     > wrote:
    >     >     > >>>
    >     >     > >>> Hey John,
    >     >     > >>> I'm pretty sure your pull request was merged-- it was
    > pulled in
    >     > through
    >     >     > >>> another pull request.  If not, sorry-- I thought it had
    > been
    >     > merged and
    >     >     > >>> then just not closed.  I was going to spend some time doing
    >     > merges
    >     >     > >> tomorrow
    >     >     > >>> so I can get it tomorrow.
    >     >     > >>>
    >     >     > >>> Sent from my iPhone
    >     >     > >>>
    >     >     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <
    > johns0806@gmail.com>
    >     > wrote:
    >     >     > >>>>
    >     >     > >>>> I have a pull request that fixes that problem.. it has
    > been
    >     > stuck in
    >     >     > >>> limbo
    >     >     > >>>> for months..
    >     > https://github.com/apache/incubator-rya-site/pull/1  Can
    >     >     > >>>> someone merge it into master?
    >     >     > >>>>
    >     >     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
    >     > Eric.Liu@capitalone.com>
    >     >     > >>> wrote:
    >     >     > >>>>>
    >     >     > >>>>> Cool, thanks for the help.
    >     >     > >>>>> By the way, the link to the Rya Manual is outdated on the
    >     >     > >>> rya.apache.org
    >     >     > >>>>> site. Should be pointing at https://github.com/apache/
    >     >     > >>>>>
    > incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
    >     >     > >> index.md
    >     >     > >>>>>
    >     >     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
    >     > aaron.mihalik@gmail.com>
    >     >     > >>> wrote:
    >     >     > >>>>>
    >     >     > >>>>>   deep vs wide:
    >     >     > >>>>>
    >     >     > >>>>>   A property path query is probably your best bet.
    > Something
    >     > like:
    >     >     > >>>>>
    >     >     > >>>>>   for the following data:
    >     >     > >>>>>
    >     >     > >>>>>   s:EventA p:causes s:EventB
    >     >     > >>>>>   s:EventB p:causes s:EventC
    >     >     > >>>>>   s:EventC p:causes s:EventD
    >     >     > >>>>>
    >     >     > >>>>>
    >     >     > >>>>>   This query would start at EventB and work it's way up
    > and
    >     > down the
    >     >     > >>>>> chain:
    >     >     > >>>>>
    >     >     > >>>>>   SELECT * WHERE {
    >     >     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
    >     >     > >>>>>   }
    >     >     > >>>>>
    >     >     > >>>>>
    >     >     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
    >     >     > >>> Caleb.Meier@parsons.com>
    >     >     > >>>>>   wrote:
    >     >     > >>>>>
    >     >     > >>>>>> Yes, that's a good place to start.  If you have external
    >     > timestamps
    >     >     > >>>>> that
    >     >     > >>>>>> are built into your graph using the time ontology in
    > owl (e.g
    >     > you
    >     >     > >>>>> have
    >     >     > >>>>>> triples of the form (event123, time:inDateTime,
    >     > 2017-02-23T14:29)),
    >     >     > >>>>> the
    >     >     > >>>>>> temporal index is exactly what you want.  If you are
    > hoping
    >     > to query
    >     >     > >>>>> based
    >     >     > >>>>>> on the internal timestamps that Accumulo assigns to your
    >     > triples,
    >     >     > >>>>> then
    >     >     > >>>>>> there are some slight tweaks that can be done to
    > facilitate
    >     > this,
    >     >     > >>>>> but it
    >     >     > >>>>>> won't be nearly as efficient (this will require some
    > sort of
    >     > client
    >     >     > >>>>> side
    >     >     > >>>>>> filtering).
    >     >     > >>>>>>
    >     >     > >>>>>> Caleb A. Meier, Ph.D.
    >     >     > >>>>>> Software Engineer II ♦ Analyst
    >     >     > >>>>>> Parsons Corporation
    >     >     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
    >     >     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066>
    > <(703)%20797-3066> <(703)%20797-3066>
    >     > <(703)%20797-3066>
    >     >     > <(703)%20797-3066>
    >     >     > >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
    >     >     > >>>>>>
    >     >     > >>>>>> -----Original Message-----
    >     >     > >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
    >     >     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
    >     >     > >>>>>> To: dev@rya.incubator.apache.org
    >     >     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
    >     >     > >>>>>>
    >     >     > >>>>>> We’d like to be able to query by timestamp;
    > specifically, we
    >     > want to
    >     >     > >>>>> be
    >     >     > >>>>>> able to find all statements that were made within a
    > given time
    >     >     > >>>>> range. Is
    >     >     > >>>>>> this what I should be looking at?
    >     >     > >>>>>>
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
    >     >     > >>>>> apache.org_confluence_download_attachments_63407907_
    >     >     > >>>>>
    >     > Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
    >     >     > >>>>>
    > 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
    >     >     > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
    >     >     > >>> geo_4WXTD0qo8&m=
    >     >     > >>>>>
    > BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
    >     >     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <
    > Caleb.Meier@parsons.com>
    >     >     > wrote:
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>   Hey Eric,
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>   Currently timestamps can't be queried in Rya.  Do you
    > need
    >     > to be
    >     >     > >>>>> able
    >     >     > >>>>>> to query by timestamp, or simply discover the timestamp
    > for a
    >     > given
    >     >     > >>>>> node?
    >     >     > >>>>>> Rya does have a temporal index, but that requires you
    > to use a
    >     >     > >>>>> temporal
    >     >     > >>>>>> ontology to model the temporal properties of your graph
    > nodes.
    >     >     > >>>>>>
    >     >     > >>>>>>   ________________________________________
    >     >     > >>>>>>
    >     >     > >>>>>>   From: Liu, Eric <Er...@capitalone.com>
    >     >     > >>>>>>
    >     >     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
    >     >     > >>>>>>
    >     >     > >>>>>>   To: dev@rya.incubator.apache.org
    >     >     > >>>>>>
    >     >     > >>>>>>   Subject: Timestamps and Cardinality in Queries
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>   Hi,
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>   Continuing from our talk earlier today I was
    > wondering if
    >     > you
    >     >     > >>>>> could
    >     >     > >>>>>> provide more information about how timestamps could be
    >     > queried in
    >     >     > >>>>> Rya.
    >     >     > >>>>>>
    >     >     > >>>>>>   Also, we are trying to support a type of query that
    > would
    >     >     > >>>>> essentially
    >     >     > >>>>>> be limiting on cardinality (different from the normal
    > SPARQL
    >     > limit
    >     >     > >>>>> because
    >     >     > >>>>>> it’s for node cardinality rather than total results). I
    > saw
    >     > in one
    >     >     > of
    >     >     > >>>>>> Caleb’s talks that Rya’s query optimization involves
    > checking
    >     >     > >>>>> cardinality
    >     >     > >>>>>> first. I was wondering if there would be some way to
    > tap into
    >     > this
    >     >     > >>>>> feature
    >     >     > >>>>>> for usage in queries?
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>   Thanks,
    >     >     > >>>>>>
    >     >     > >>>>>>   Eric Liu
    >     >     > >>>>>>
    >     >     > >>>>>>
    >  ________________________________________________________
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>   The information contained in this e-mail is
    > confidential
    >     > and/or
    >     >     > >>>>>> proprietary to Capital One and/or its affiliates and
    > may only
    >     > be
    >     >     > used
    >     >     > >>>>>> solely in performance of work or services for Capital
    > One. The
    >     >     > >>>>> information
    >     >     > >>>>>> transmitted herewith is intended only for use by the
    >     > individual or
    >     >     > >>>>> entity
    >     >     > >>>>>> to which it is addressed. If the reader of this message
    > is
    >     > not the
    >     >     > >>>>> intended
    >     >     > >>>>>> recipient, you are hereby notified that any review,
    >     > retransmission,
    >     >     > >>>>>> dissemination, distribution, copying or other use of, or
    >     > taking of
    >     >     > >>>>> any
    >     >     > >>>>>> action in reliance upon this information is strictly
    >     > prohibited. If
    >     >     > >>>>> you
    >     >     > >>>>>> have received this communication in error, please
    > contact the
    >     > sender
    >     >     > >>>>> and
    >     >     > >>>>>> delete the material from your computer.
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>> ________________________________________________________
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>>
    >     >     > >>>>>> The information contained in this e-mail is confidential
    >     > and/or
    >     >     > >>>>>> proprietary to Capital One and/or its affiliates and
    > may only
    >     > be
    >     >     > used
    >     >     > >>>>>> solely in performance of work or services for Capital
    > One. The
    >     >     > >>>>> information
    >     >     > >>>>>> transmitted herewith is intended only for use by the
    >     > individual or
    >     >     > >>>>> entity
    >     >     > >>>>>> to which it is addressed. If the reader of this message
    > is
    >     > not the
    >     >     > >>>>> intended
    >     >     > >>>>>> recipient, you are hereby notified that any review,
    >     > retransmission,
    >     >     > >>>>>> dissemination, distribution, copying or other use of, or
    >     > taking of
    >     >     > >>>>> any
    >     >     > >>>>>> action in reliance upon this information is strictly
    >     > prohibited. If
    >     >     > >>>>> you
    >     >     > >>>>>> have received this communication in error, please
    > contact the
    >     > sender
    >     >     > >>>>> and
    >     >     > >>>>>> delete the material from your computer.
    >     >     > >>>>>>
    >     >     > >>>>>
    >     >     > >>>>>
    >     >     > >>>>> ________________________________________________________
    >     >     > >>>>>
    >     >     > >>>>> The information contained in this e-mail is confidential
    > and/or
    >     >     > >>>>> proprietary to Capital One and/or its affiliates and may
    > only
    >     > be used
    >     >     > >>>>> solely in performance of work or services for Capital
    > One. The
    >     >     > >>> information
    >     >     > >>>>> transmitted herewith is intended only for use by the
    >     > individual or
    >     >     > >>> entity
    >     >     > >>>>> to which it is addressed. If the reader of this message
    > is not
    >     > the
    >     >     > >>> intended
    >     >     > >>>>> recipient, you are hereby notified that any review,
    >     > retransmission,
    >     >     > >>>>> dissemination, distribution, copying or other use of, or
    >     > taking of
    >     >     > any
    >     >     > >>>>> action in reliance upon this information is strictly
    >     > prohibited. If
    >     >     > >> you
    >     >     > >>>>> have received this communication in error, please
    > contact the
    >     > sender
    >     >     > >> and
    >     >     > >>>>> delete the material from your computer.
    >     >     > >>>>>
    >     >     > >>>
    >     >     > >>
    >     >     > >>
    >     >     > >>
    >     >     > >> --
    >     >     > >> Dr. Adina Crainiceanu
    >     >     > >> Associate Professor, Computer Science Department
    >     >     > >> United States Naval Academy
    >     >     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
    > <(410)%20293-6822>
    >     > <(410)%20293-6822>
    >     >     > >> adina@usna.edu
    >     >     > >> http://www.usna.edu/Users/cs/adina/
    >     >     > >>
    >     >     > >
    >     >     > >
    >     >     > > ________________________________________________________
    >     >     > >
    >     >     > > The information contained in this e-mail is confidential
    > and/or
    >     >     > proprietary to Capital One and/or its affiliates and may only
    > be used
    >     >     > solely in performance of work or services for Capital One. The
    >     > information
    >     >     > transmitted herewith is intended only for use by the
    > individual or
    >     > entity
    >     >     > to which it is addressed. If the reader of this message is not
    > the
    >     > intended
    >     >     > recipient, you are hereby notified that any review,
    > retransmission,
    >     >     > dissemination, distribution, copying or other use of, or
    > taking of
    >     > any
    >     >     > action in reliance upon this information is strictly
    > prohibited. If
    >     > you
    >     >     > have received this communication in error, please contact the
    > sender
    >     > and
    >     >     > delete the material from your computer.
    >     >     > > <log.txt>
    >     >     >
    >     >
    >     >
    >     > ________________________________________________________
    >     >
    >     > The information contained in this e-mail is confidential and/or
    >     > proprietary to Capital One and/or its affiliates and may only be used
    >     > solely in performance of work or services for Capital One. The
    > information
    >     > transmitted herewith is intended only for use by the individual or
    > entity
    >     > to which it is addressed. If the reader of this message is not the
    > intended
    >     > recipient, you are hereby notified that any review, retransmission,
    >     > dissemination, distribution, copying or other use of, or taking of
    > any
    >     > action in reliance upon this information is strictly prohibited. If
    > you
    >     > have received this communication in error, please contact the sender
    > and
    >     > delete the material from your computer.
    >     >
    >
    >
    > ________________________________________________________
    >
    > The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    >
    



________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
That's really strange.  Can you hit the maven central repo [1] from your
machine?

I guess delete the locationtech <repository> definition from your pom?


[1] http://repo1.maven.org/maven2/org/apache/apache/17/

On Wed, Mar 1, 2017 at 2:31 PM Liu, Eric <Er...@capitalone.com> wrote:

> Hmmm, deleting the files in .m2 doesn’t stop it from searching in
> locationtech, and using the other mvn command gives me no log output.
>
> On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
>
>     transversing: gotcha.  I completely understand now.  And now I
> understand
>     how the prospector table would help with sniping out those nodes.
>
>     maven: yep, that's the right git repo.  Locationtech is required when
> you
>     build with the 'geoindexing' profile.  Regardless, it's strange that
> maven
>     tried to get the apache pom from locationtech.  Deleting the
>     org/apache/apache directory should force maven to download the apache
> pom
>     from maven central.
>
>     --Aaron
>
>     On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <Er...@capitalone.com>
> wrote:
>
>     > Oh, that’s not an issue, that’s what we would like to do when
> traversing
>     > through the data. If a node has a high cardinality we don’t want to
> further
>     > traverse through its children.
>     >
>     > As for installation, did I clone the right repo for Rya? The one I’m
> using
>     > has locationtech repos for SNAPSHOT and RELEASE:
>     > https://github.com/apache/incubator-rya/blob/master/pom.xml
>     >
>     > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <aa...@gmail.com>
> wrote:
>     >
>     >     Repos: The locationtech repo is up [1].  The issue is that your
> local
>     > .m2
>     >     repo is in a bad state.  Maven is trying to get the apache pom
> from
>     >     locationtech.  Locationtech does not host that pom, instead it's
> on
>     > maven
>     >     central [2].
>     >
>     >     Two ways to fix this issue (you should do (1) and that'll fix
> it...
>     > (2) is
>     >     just another option for reference).
>     >
>     >     1. Delete your apache pom directory from your local maven repo
> (e.g.
>     > rm -rf
>     >     ~/.m2/repository/org/apache/apache/)
>     >
>     >     2. Tell maven to ignore remote repository metadata with the -llr
> flag
>     > (e.g.
>     >     mvn clean install -llr -Pgeoindexing)
>     >
>     >     Let me know if you have any other issues.
>     >
>     >     deep/wide: okay, I don't understand this statement: "if the
>     > cardinality of
>     >     a node is too high (for example, a user that owns a large number
> of
>     >     datasets), the neighbors of that node will not be found."  Is
> this a
>     >     property of your current datstore, or is this an issue with Rya?
>     >
>     >     --Aaron
>     >
>     >     [1]
>     >
>     >
> https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
>     >     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
>     >
>     >     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <pu...@gmail.com>
> wrote:
>     >
>     >     > Hey Eric,
>     >     > Regarding the repos-- sometimes the location tech repos go
> down,
>     > your best
>     >     > bet is to wait a little bit and try again.  You can also
> download the
>     >     > latest artifacts off of the apache build server.
>     >     > Since location tech is only used for the geo profile we may
> want to
>     > move
>     >     > where that repo is declared (or put it in the geo profile).
>     >     > For your use case, you could look to use the cardinality in the
>     > prospector
>     >     > services for individual nodes.  Though the prospector services
> could
>     > be run
>     >     > once and then used to be representative (that wouldn't work
> for your
>     > use
>     >     > case), you could run them regularly to keep track of counts
> for your
>     > use
>     >     > case.  Are you using the count keyword or just manually
> counting
>     > edges?
>     >     > The count keyword is pretty inefficient currently.  We could
> add
>     > that to
>     >     > our list of priorities maybe.
>     >     >
>     >     > Sent from my iPhone
>     >     >
>     >     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <
> Eric.Liu@capitalone.com>
>     > wrote:
>     >     > >
>     >     > > Hey Aaron,
>     >     > >
>     >     > > I’m currently setting up Rya to test these queries with some
> of our
>     >     > data. I run into an error when I run ‘mvn clean install’, I
> attached
>     > the
>     >     > logs but it seems like I can’t connect to the snapshots repo
> you’re
>     > using.
>     >     > >
>     >     > > As for “deep/wide”, it would be something like starting at a
>     > dataset,
>     >     > then fanning out looking for relations where it is either the
>     > subject or
>     >     > object, such as the user who created it, the job it came from,
> where
>     > it’s
>     >     > stored, etc. It would recurse on these neighboring nodes until
> a
>     > total
>     >     > number of results is reached. However, if the cardinality of a
> node
>     > is too
>     >     > high (for example, a user that owns a large number of
> datasets), the
>     >     > neighbors of that node will not be found. Really, the goal is
> to
>     > find the
>     >     > most distance relevant relationships possible, and this is our
>     > current
>     >     > naïve way of doing so.
>     >     > >
>     >     > > Do you want to have a short call about this? I think it’d be
>     > easier to
>     >     > explain/answer questions over the phone. I’m free pretty much
> any
>     > time
>     >     > 1pm-5pm PST tomorrow (3/1).
>     >     > >
>     >     > > Thanks,
>     >     > > Eric
>     >     > >
>     >     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <
> aaron.mihalik@gmail.com>
>     > wrote:
>     >     > >
>     >     > >    deep vs wide: I played around with the property paths
> sparql
>     > operator
>     >     > and
>     >     > >    put up an example here [1].  This is a slightly different
> query
>     > than
>     >     > the
>     >     > >    one I sent out before.  It would be worth it for us to
> look at
>     > how
>     >     > this is
>     >     > >    actually executed by OpenRDF.
>     >     > >
>     >     > >    Eric: Could you clarify by "deep vs wide"?  I think I
>     > understand your
>     >     > >    queries, but I don't have a good intuition about those
> terms
>     > and how
>     >     > >    cardinality might figure into a query.  It would probably
> be a
>     > bit
>     >     > more
>     >     > >    helpful if you provided a model or general description
> that is
>     >     > (somewhat)
>     >     > >    representative of your data.
>     >     > >
>     >     > >    --Aaron
>     >     > >
>     >     > >    [1]
>     >     > >
>     >     >
>     >
> https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
>     >     > >
>     >     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
>     > adina@usna.edu>
>     >     > wrote:
>     >     > >>
>     >     > >> Hi Eric,
>     >     > >>
>     >     > >> If you want to query by the Accumulo timestamp, something
> like
>     >     > >> timeRange(?ts, 13141201490, 13249201490) should work in
> Rya. I
>     > did not
>     >     > try
>     >     > >> it lately, but timeRange() was in Rya originally. Not sure
> if it
>     > was
>     >     > >> removed in later iterations or whether it would be useful
> for
>     > your use
>     >     > >> case. First Rya paper
>     >     > >>
> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
>     >     > discusses
>     >     > >> time ranges (Section 5.3 at the link above)
>     >     > >>
>     >     > >> Adina
>     >     > >>
>     >     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <
> pujav65@gmail.com
>     > >
>     >     > wrote:
>     >     > >>>
>     >     > >>> Hey John,
>     >     > >>> I'm pretty sure your pull request was merged-- it was
> pulled in
>     > through
>     >     > >>> another pull request.  If not, sorry-- I thought it had
> been
>     > merged and
>     >     > >>> then just not closed.  I was going to spend some time doing
>     > merges
>     >     > >> tomorrow
>     >     > >>> so I can get it tomorrow.
>     >     > >>>
>     >     > >>> Sent from my iPhone
>     >     > >>>
>     >     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <
> johns0806@gmail.com>
>     > wrote:
>     >     > >>>>
>     >     > >>>> I have a pull request that fixes that problem.. it has
> been
>     > stuck in
>     >     > >>> limbo
>     >     > >>>> for months..
>     > https://github.com/apache/incubator-rya-site/pull/1  Can
>     >     > >>>> someone merge it into master?
>     >     > >>>>
>     >     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
>     > Eric.Liu@capitalone.com>
>     >     > >>> wrote:
>     >     > >>>>>
>     >     > >>>>> Cool, thanks for the help.
>     >     > >>>>> By the way, the link to the Rya Manual is outdated on the
>     >     > >>> rya.apache.org
>     >     > >>>>> site. Should be pointing at https://github.com/apache/
>     >     > >>>>>
> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
>     >     > >> index.md
>     >     > >>>>>
>     >     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
>     > aaron.mihalik@gmail.com>
>     >     > >>> wrote:
>     >     > >>>>>
>     >     > >>>>>   deep vs wide:
>     >     > >>>>>
>     >     > >>>>>   A property path query is probably your best bet.
> Something
>     > like:
>     >     > >>>>>
>     >     > >>>>>   for the following data:
>     >     > >>>>>
>     >     > >>>>>   s:EventA p:causes s:EventB
>     >     > >>>>>   s:EventB p:causes s:EventC
>     >     > >>>>>   s:EventC p:causes s:EventD
>     >     > >>>>>
>     >     > >>>>>
>     >     > >>>>>   This query would start at EventB and work it's way up
> and
>     > down the
>     >     > >>>>> chain:
>     >     > >>>>>
>     >     > >>>>>   SELECT * WHERE {
>     >     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
>     >     > >>>>>   }
>     >     > >>>>>
>     >     > >>>>>
>     >     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
>     >     > >>> Caleb.Meier@parsons.com>
>     >     > >>>>>   wrote:
>     >     > >>>>>
>     >     > >>>>>> Yes, that's a good place to start.  If you have external
>     > timestamps
>     >     > >>>>> that
>     >     > >>>>>> are built into your graph using the time ontology in
> owl (e.g
>     > you
>     >     > >>>>> have
>     >     > >>>>>> triples of the form (event123, time:inDateTime,
>     > 2017-02-23T14:29)),
>     >     > >>>>> the
>     >     > >>>>>> temporal index is exactly what you want.  If you are
> hoping
>     > to query
>     >     > >>>>> based
>     >     > >>>>>> on the internal timestamps that Accumulo assigns to your
>     > triples,
>     >     > >>>>> then
>     >     > >>>>>> there are some slight tweaks that can be done to
> facilitate
>     > this,
>     >     > >>>>> but it
>     >     > >>>>>> won't be nearly as efficient (this will require some
> sort of
>     > client
>     >     > >>>>> side
>     >     > >>>>>> filtering).
>     >     > >>>>>>
>     >     > >>>>>> Caleb A. Meier, Ph.D.
>     >     > >>>>>> Software Engineer II ♦ Analyst
>     >     > >>>>>> Parsons Corporation
>     >     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>     >     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066>
> <(703)%20797-3066> <(703)%20797-3066>
>     > <(703)%20797-3066>
>     >     > <(703)%20797-3066>
>     >     > >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
>     >     > >>>>>>
>     >     > >>>>>> -----Original Message-----
>     >     > >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
>     >     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
>     >     > >>>>>> To: dev@rya.incubator.apache.org
>     >     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
>     >     > >>>>>>
>     >     > >>>>>> We’d like to be able to query by timestamp;
> specifically, we
>     > want to
>     >     > >>>>> be
>     >     > >>>>>> able to find all statements that were made within a
> given time
>     >     > >>>>> range. Is
>     >     > >>>>>> this what I should be looking at?
>     >     > >>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>     >     > >>>>> apache.org_confluence_download_attachments_63407907_
>     >     > >>>>>
>     > Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
>     >     > >>>>>
> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
>     >     > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
>     >     > >>> geo_4WXTD0qo8&m=
>     >     > >>>>>
> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
>     >     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <
> Caleb.Meier@parsons.com>
>     >     > wrote:
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Hey Eric,
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Currently timestamps can't be queried in Rya.  Do you
> need
>     > to be
>     >     > >>>>> able
>     >     > >>>>>> to query by timestamp, or simply discover the timestamp
> for a
>     > given
>     >     > >>>>> node?
>     >     > >>>>>> Rya does have a temporal index, but that requires you
> to use a
>     >     > >>>>> temporal
>     >     > >>>>>> ontology to model the temporal properties of your graph
> nodes.
>     >     > >>>>>>
>     >     > >>>>>>   ________________________________________
>     >     > >>>>>>
>     >     > >>>>>>   From: Liu, Eric <Er...@capitalone.com>
>     >     > >>>>>>
>     >     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
>     >     > >>>>>>
>     >     > >>>>>>   To: dev@rya.incubator.apache.org
>     >     > >>>>>>
>     >     > >>>>>>   Subject: Timestamps and Cardinality in Queries
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Hi,
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Continuing from our talk earlier today I was
> wondering if
>     > you
>     >     > >>>>> could
>     >     > >>>>>> provide more information about how timestamps could be
>     > queried in
>     >     > >>>>> Rya.
>     >     > >>>>>>
>     >     > >>>>>>   Also, we are trying to support a type of query that
> would
>     >     > >>>>> essentially
>     >     > >>>>>> be limiting on cardinality (different from the normal
> SPARQL
>     > limit
>     >     > >>>>> because
>     >     > >>>>>> it’s for node cardinality rather than total results). I
> saw
>     > in one
>     >     > of
>     >     > >>>>>> Caleb’s talks that Rya’s query optimization involves
> checking
>     >     > >>>>> cardinality
>     >     > >>>>>> first. I was wondering if there would be some way to
> tap into
>     > this
>     >     > >>>>> feature
>     >     > >>>>>> for usage in queries?
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Thanks,
>     >     > >>>>>>
>     >     > >>>>>>   Eric Liu
>     >     > >>>>>>
>     >     > >>>>>>
>  ________________________________________________________
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   The information contained in this e-mail is
> confidential
>     > and/or
>     >     > >>>>>> proprietary to Capital One and/or its affiliates and
> may only
>     > be
>     >     > used
>     >     > >>>>>> solely in performance of work or services for Capital
> One. The
>     >     > >>>>> information
>     >     > >>>>>> transmitted herewith is intended only for use by the
>     > individual or
>     >     > >>>>> entity
>     >     > >>>>>> to which it is addressed. If the reader of this message
> is
>     > not the
>     >     > >>>>> intended
>     >     > >>>>>> recipient, you are hereby notified that any review,
>     > retransmission,
>     >     > >>>>>> dissemination, distribution, copying or other use of, or
>     > taking of
>     >     > >>>>> any
>     >     > >>>>>> action in reliance upon this information is strictly
>     > prohibited. If
>     >     > >>>>> you
>     >     > >>>>>> have received this communication in error, please
> contact the
>     > sender
>     >     > >>>>> and
>     >     > >>>>>> delete the material from your computer.
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>> ________________________________________________________
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>> The information contained in this e-mail is confidential
>     > and/or
>     >     > >>>>>> proprietary to Capital One and/or its affiliates and
> may only
>     > be
>     >     > used
>     >     > >>>>>> solely in performance of work or services for Capital
> One. The
>     >     > >>>>> information
>     >     > >>>>>> transmitted herewith is intended only for use by the
>     > individual or
>     >     > >>>>> entity
>     >     > >>>>>> to which it is addressed. If the reader of this message
> is
>     > not the
>     >     > >>>>> intended
>     >     > >>>>>> recipient, you are hereby notified that any review,
>     > retransmission,
>     >     > >>>>>> dissemination, distribution, copying or other use of, or
>     > taking of
>     >     > >>>>> any
>     >     > >>>>>> action in reliance upon this information is strictly
>     > prohibited. If
>     >     > >>>>> you
>     >     > >>>>>> have received this communication in error, please
> contact the
>     > sender
>     >     > >>>>> and
>     >     > >>>>>> delete the material from your computer.
>     >     > >>>>>>
>     >     > >>>>>
>     >     > >>>>>
>     >     > >>>>> ________________________________________________________
>     >     > >>>>>
>     >     > >>>>> The information contained in this e-mail is confidential
> and/or
>     >     > >>>>> proprietary to Capital One and/or its affiliates and may
> only
>     > be used
>     >     > >>>>> solely in performance of work or services for Capital
> One. The
>     >     > >>> information
>     >     > >>>>> transmitted herewith is intended only for use by the
>     > individual or
>     >     > >>> entity
>     >     > >>>>> to which it is addressed. If the reader of this message
> is not
>     > the
>     >     > >>> intended
>     >     > >>>>> recipient, you are hereby notified that any review,
>     > retransmission,
>     >     > >>>>> dissemination, distribution, copying or other use of, or
>     > taking of
>     >     > any
>     >     > >>>>> action in reliance upon this information is strictly
>     > prohibited. If
>     >     > >> you
>     >     > >>>>> have received this communication in error, please
> contact the
>     > sender
>     >     > >> and
>     >     > >>>>> delete the material from your computer.
>     >     > >>>>>
>     >     > >>>
>     >     > >>
>     >     > >>
>     >     > >>
>     >     > >> --
>     >     > >> Dr. Adina Crainiceanu
>     >     > >> Associate Professor, Computer Science Department
>     >     > >> United States Naval Academy
>     >     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
> <(410)%20293-6822>
>     > <(410)%20293-6822>
>     >     > >> adina@usna.edu
>     >     > >> http://www.usna.edu/Users/cs/adina/
>     >     > >>
>     >     > >
>     >     > >
>     >     > > ________________________________________________________
>     >     > >
>     >     > > The information contained in this e-mail is confidential
> and/or
>     >     > proprietary to Capital One and/or its affiliates and may only
> be used
>     >     > solely in performance of work or services for Capital One. The
>     > information
>     >     > transmitted herewith is intended only for use by the
> individual or
>     > entity
>     >     > to which it is addressed. If the reader of this message is not
> the
>     > intended
>     >     > recipient, you are hereby notified that any review,
> retransmission,
>     >     > dissemination, distribution, copying or other use of, or
> taking of
>     > any
>     >     > action in reliance upon this information is strictly
> prohibited. If
>     > you
>     >     > have received this communication in error, please contact the
> sender
>     > and
>     >     > delete the material from your computer.
>     >     > > <log.txt>
>     >     >
>     >
>     >
>     > ________________________________________________________
>     >
>     > The information contained in this e-mail is confidential and/or
>     > proprietary to Capital One and/or its affiliates and may only be used
>     > solely in performance of work or services for Capital One. The
> information
>     > transmitted herewith is intended only for use by the individual or
> entity
>     > to which it is addressed. If the reader of this message is not the
> intended
>     > recipient, you are hereby notified that any review, retransmission,
>     > dissemination, distribution, copying or other use of, or taking of
> any
>     > action in reliance upon this information is strictly prohibited. If
> you
>     > have received this communication in error, please contact the sender
> and
>     > delete the material from your computer.
>     >
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
Hmmm, deleting the files in .m2 doesn’t stop it from searching in locationtech, and using the other mvn command gives me no log output.

On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:

    transversing: gotcha.  I completely understand now.  And now I understand
    how the prospector table would help with sniping out those nodes.
    
    maven: yep, that's the right git repo.  Locationtech is required when you
    build with the 'geoindexing' profile.  Regardless, it's strange that maven
    tried to get the apache pom from locationtech.  Deleting the
    org/apache/apache directory should force maven to download the apache pom
    from maven central.
    
    --Aaron
    
    On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <Er...@capitalone.com> wrote:
    
    > Oh, that’s not an issue, that’s what we would like to do when traversing
    > through the data. If a node has a high cardinality we don’t want to further
    > traverse through its children.
    >
    > As for installation, did I clone the right repo for Rya? The one I’m using
    > has locationtech repos for SNAPSHOT and RELEASE:
    > https://github.com/apache/incubator-rya/blob/master/pom.xml
    >
    > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
    >
    >     Repos: The locationtech repo is up [1].  The issue is that your local
    > .m2
    >     repo is in a bad state.  Maven is trying to get the apache pom from
    >     locationtech.  Locationtech does not host that pom, instead it's on
    > maven
    >     central [2].
    >
    >     Two ways to fix this issue (you should do (1) and that'll fix it...
    > (2) is
    >     just another option for reference).
    >
    >     1. Delete your apache pom directory from your local maven repo (e.g.
    > rm -rf
    >     ~/.m2/repository/org/apache/apache/)
    >
    >     2. Tell maven to ignore remote repository metadata with the -llr flag
    > (e.g.
    >     mvn clean install -llr -Pgeoindexing)
    >
    >     Let me know if you have any other issues.
    >
    >     deep/wide: okay, I don't understand this statement: "if the
    > cardinality of
    >     a node is too high (for example, a user that owns a large number of
    >     datasets), the neighbors of that node will not be found."  Is this a
    >     property of your current datstore, or is this an issue with Rya?
    >
    >     --Aaron
    >
    >     [1]
    >
    > https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
    >     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
    >
    >     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <pu...@gmail.com> wrote:
    >
    >     > Hey Eric,
    >     > Regarding the repos-- sometimes the location tech repos go down,
    > your best
    >     > bet is to wait a little bit and try again.  You can also download the
    >     > latest artifacts off of the apache build server.
    >     > Since location tech is only used for the geo profile we may want to
    > move
    >     > where that repo is declared (or put it in the geo profile).
    >     > For your use case, you could look to use the cardinality in the
    > prospector
    >     > services for individual nodes.  Though the prospector services could
    > be run
    >     > once and then used to be representative (that wouldn't work for your
    > use
    >     > case), you could run them regularly to keep track of counts for your
    > use
    >     > case.  Are you using the count keyword or just manually counting
    > edges?
    >     > The count keyword is pretty inefficient currently.  We could add
    > that to
    >     > our list of priorities maybe.
    >     >
    >     > Sent from my iPhone
    >     >
    >     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <Er...@capitalone.com>
    > wrote:
    >     > >
    >     > > Hey Aaron,
    >     > >
    >     > > I’m currently setting up Rya to test these queries with some of our
    >     > data. I run into an error when I run ‘mvn clean install’, I attached
    > the
    >     > logs but it seems like I can’t connect to the snapshots repo you’re
    > using.
    >     > >
    >     > > As for “deep/wide”, it would be something like starting at a
    > dataset,
    >     > then fanning out looking for relations where it is either the
    > subject or
    >     > object, such as the user who created it, the job it came from, where
    > it’s
    >     > stored, etc. It would recurse on these neighboring nodes until a
    > total
    >     > number of results is reached. However, if the cardinality of a node
    > is too
    >     > high (for example, a user that owns a large number of datasets), the
    >     > neighbors of that node will not be found. Really, the goal is to
    > find the
    >     > most distance relevant relationships possible, and this is our
    > current
    >     > naïve way of doing so.
    >     > >
    >     > > Do you want to have a short call about this? I think it’d be
    > easier to
    >     > explain/answer questions over the phone. I’m free pretty much any
    > time
    >     > 1pm-5pm PST tomorrow (3/1).
    >     > >
    >     > > Thanks,
    >     > > Eric
    >     > >
    >     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aa...@gmail.com>
    > wrote:
    >     > >
    >     > >    deep vs wide: I played around with the property paths sparql
    > operator
    >     > and
    >     > >    put up an example here [1].  This is a slightly different query
    > than
    >     > the
    >     > >    one I sent out before.  It would be worth it for us to look at
    > how
    >     > this is
    >     > >    actually executed by OpenRDF.
    >     > >
    >     > >    Eric: Could you clarify by "deep vs wide"?  I think I
    > understand your
    >     > >    queries, but I don't have a good intuition about those terms
    > and how
    >     > >    cardinality might figure into a query.  It would probably be a
    > bit
    >     > more
    >     > >    helpful if you provided a model or general description that is
    >     > (somewhat)
    >     > >    representative of your data.
    >     > >
    >     > >    --Aaron
    >     > >
    >     > >    [1]
    >     > >
    >     >
    > https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
    >     > >
    >     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
    > adina@usna.edu>
    >     > wrote:
    >     > >>
    >     > >> Hi Eric,
    >     > >>
    >     > >> If you want to query by the Accumulo timestamp, something like
    >     > >> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I
    > did not
    >     > try
    >     > >> it lately, but timeRange() was in Rya originally. Not sure if it
    > was
    >     > >> removed in later iterations or whether it would be useful for
    > your use
    >     > >> case. First Rya paper
    >     > >> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
    >     > discusses
    >     > >> time ranges (Section 5.3 at the link above)
    >     > >>
    >     > >> Adina
    >     > >>
    >     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pujav65@gmail.com
    > >
    >     > wrote:
    >     > >>>
    >     > >>> Hey John,
    >     > >>> I'm pretty sure your pull request was merged-- it was pulled in
    > through
    >     > >>> another pull request.  If not, sorry-- I thought it had been
    > merged and
    >     > >>> then just not closed.  I was going to spend some time doing
    > merges
    >     > >> tomorrow
    >     > >>> so I can get it tomorrow.
    >     > >>>
    >     > >>> Sent from my iPhone
    >     > >>>
    >     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com>
    > wrote:
    >     > >>>>
    >     > >>>> I have a pull request that fixes that problem.. it has been
    > stuck in
    >     > >>> limbo
    >     > >>>> for months..
    > https://github.com/apache/incubator-rya-site/pull/1  Can
    >     > >>>> someone merge it into master?
    >     > >>>>
    >     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
    > Eric.Liu@capitalone.com>
    >     > >>> wrote:
    >     > >>>>>
    >     > >>>>> Cool, thanks for the help.
    >     > >>>>> By the way, the link to the Rya Manual is outdated on the
    >     > >>> rya.apache.org
    >     > >>>>> site. Should be pointing at https://github.com/apache/
    >     > >>>>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
    >     > >> index.md
    >     > >>>>>
    >     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
    > aaron.mihalik@gmail.com>
    >     > >>> wrote:
    >     > >>>>>
    >     > >>>>>   deep vs wide:
    >     > >>>>>
    >     > >>>>>   A property path query is probably your best bet.  Something
    > like:
    >     > >>>>>
    >     > >>>>>   for the following data:
    >     > >>>>>
    >     > >>>>>   s:EventA p:causes s:EventB
    >     > >>>>>   s:EventB p:causes s:EventC
    >     > >>>>>   s:EventC p:causes s:EventD
    >     > >>>>>
    >     > >>>>>
    >     > >>>>>   This query would start at EventB and work it's way up and
    > down the
    >     > >>>>> chain:
    >     > >>>>>
    >     > >>>>>   SELECT * WHERE {
    >     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
    >     > >>>>>   }
    >     > >>>>>
    >     > >>>>>
    >     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
    >     > >>> Caleb.Meier@parsons.com>
    >     > >>>>>   wrote:
    >     > >>>>>
    >     > >>>>>> Yes, that's a good place to start.  If you have external
    > timestamps
    >     > >>>>> that
    >     > >>>>>> are built into your graph using the time ontology in owl (e.g
    > you
    >     > >>>>> have
    >     > >>>>>> triples of the form (event123, time:inDateTime,
    > 2017-02-23T14:29)),
    >     > >>>>> the
    >     > >>>>>> temporal index is exactly what you want.  If you are hoping
    > to query
    >     > >>>>> based
    >     > >>>>>> on the internal timestamps that Accumulo assigns to your
    > triples,
    >     > >>>>> then
    >     > >>>>>> there are some slight tweaks that can be done to facilitate
    > this,
    >     > >>>>> but it
    >     > >>>>>> won't be nearly as efficient (this will require some sort of
    > client
    >     > >>>>> side
    >     > >>>>>> filtering).
    >     > >>>>>>
    >     > >>>>>> Caleb A. Meier, Ph.D.
    >     > >>>>>> Software Engineer II ♦ Analyst
    >     > >>>>>> Parsons Corporation
    >     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
    >     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
    > <(703)%20797-3066>
    >     > <(703)%20797-3066>
    >     > >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
    >     > >>>>>>
    >     > >>>>>> -----Original Message-----
    >     > >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
    >     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
    >     > >>>>>> To: dev@rya.incubator.apache.org
    >     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
    >     > >>>>>>
    >     > >>>>>> We’d like to be able to query by timestamp; specifically, we
    > want to
    >     > >>>>> be
    >     > >>>>>> able to find all statements that were made within a given time
    >     > >>>>> range. Is
    >     > >>>>>> this what I should be looking at?
    >     > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
    >     > >>>>> apache.org_confluence_download_attachments_63407907_
    >     > >>>>>
    > Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
    >     > >>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
    >     > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
    >     > >>> geo_4WXTD0qo8&m=
    >     > >>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
    >     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com>
    >     > wrote:
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Hey Eric,
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Currently timestamps can't be queried in Rya.  Do you need
    > to be
    >     > >>>>> able
    >     > >>>>>> to query by timestamp, or simply discover the timestamp for a
    > given
    >     > >>>>> node?
    >     > >>>>>> Rya does have a temporal index, but that requires you to use a
    >     > >>>>> temporal
    >     > >>>>>> ontology to model the temporal properties of your graph nodes.
    >     > >>>>>>
    >     > >>>>>>   ________________________________________
    >     > >>>>>>
    >     > >>>>>>   From: Liu, Eric <Er...@capitalone.com>
    >     > >>>>>>
    >     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
    >     > >>>>>>
    >     > >>>>>>   To: dev@rya.incubator.apache.org
    >     > >>>>>>
    >     > >>>>>>   Subject: Timestamps and Cardinality in Queries
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Hi,
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Continuing from our talk earlier today I was wondering if
    > you
    >     > >>>>> could
    >     > >>>>>> provide more information about how timestamps could be
    > queried in
    >     > >>>>> Rya.
    >     > >>>>>>
    >     > >>>>>>   Also, we are trying to support a type of query that would
    >     > >>>>> essentially
    >     > >>>>>> be limiting on cardinality (different from the normal SPARQL
    > limit
    >     > >>>>> because
    >     > >>>>>> it’s for node cardinality rather than total results). I saw
    > in one
    >     > of
    >     > >>>>>> Caleb’s talks that Rya’s query optimization involves checking
    >     > >>>>> cardinality
    >     > >>>>>> first. I was wondering if there would be some way to tap into
    > this
    >     > >>>>> feature
    >     > >>>>>> for usage in queries?
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Thanks,
    >     > >>>>>>
    >     > >>>>>>   Eric Liu
    >     > >>>>>>
    >     > >>>>>>   ________________________________________________________
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   The information contained in this e-mail is confidential
    > and/or
    >     > >>>>>> proprietary to Capital One and/or its affiliates and may only
    > be
    >     > used
    >     > >>>>>> solely in performance of work or services for Capital One. The
    >     > >>>>> information
    >     > >>>>>> transmitted herewith is intended only for use by the
    > individual or
    >     > >>>>> entity
    >     > >>>>>> to which it is addressed. If the reader of this message is
    > not the
    >     > >>>>> intended
    >     > >>>>>> recipient, you are hereby notified that any review,
    > retransmission,
    >     > >>>>>> dissemination, distribution, copying or other use of, or
    > taking of
    >     > >>>>> any
    >     > >>>>>> action in reliance upon this information is strictly
    > prohibited. If
    >     > >>>>> you
    >     > >>>>>> have received this communication in error, please contact the
    > sender
    >     > >>>>> and
    >     > >>>>>> delete the material from your computer.
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>> ________________________________________________________
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>> The information contained in this e-mail is confidential
    > and/or
    >     > >>>>>> proprietary to Capital One and/or its affiliates and may only
    > be
    >     > used
    >     > >>>>>> solely in performance of work or services for Capital One. The
    >     > >>>>> information
    >     > >>>>>> transmitted herewith is intended only for use by the
    > individual or
    >     > >>>>> entity
    >     > >>>>>> to which it is addressed. If the reader of this message is
    > not the
    >     > >>>>> intended
    >     > >>>>>> recipient, you are hereby notified that any review,
    > retransmission,
    >     > >>>>>> dissemination, distribution, copying or other use of, or
    > taking of
    >     > >>>>> any
    >     > >>>>>> action in reliance upon this information is strictly
    > prohibited. If
    >     > >>>>> you
    >     > >>>>>> have received this communication in error, please contact the
    > sender
    >     > >>>>> and
    >     > >>>>>> delete the material from your computer.
    >     > >>>>>>
    >     > >>>>>
    >     > >>>>>
    >     > >>>>> ________________________________________________________
    >     > >>>>>
    >     > >>>>> The information contained in this e-mail is confidential and/or
    >     > >>>>> proprietary to Capital One and/or its affiliates and may only
    > be used
    >     > >>>>> solely in performance of work or services for Capital One. The
    >     > >>> information
    >     > >>>>> transmitted herewith is intended only for use by the
    > individual or
    >     > >>> entity
    >     > >>>>> to which it is addressed. If the reader of this message is not
    > the
    >     > >>> intended
    >     > >>>>> recipient, you are hereby notified that any review,
    > retransmission,
    >     > >>>>> dissemination, distribution, copying or other use of, or
    > taking of
    >     > any
    >     > >>>>> action in reliance upon this information is strictly
    > prohibited. If
    >     > >> you
    >     > >>>>> have received this communication in error, please contact the
    > sender
    >     > >> and
    >     > >>>>> delete the material from your computer.
    >     > >>>>>
    >     > >>>
    >     > >>
    >     > >>
    >     > >>
    >     > >> --
    >     > >> Dr. Adina Crainiceanu
    >     > >> Associate Professor, Computer Science Department
    >     > >> United States Naval Academy
    >     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
    > <(410)%20293-6822>
    >     > >> adina@usna.edu
    >     > >> http://www.usna.edu/Users/cs/adina/
    >     > >>
    >     > >
    >     > >
    >     > > ________________________________________________________
    >     > >
    >     > > The information contained in this e-mail is confidential and/or
    >     > proprietary to Capital One and/or its affiliates and may only be used
    >     > solely in performance of work or services for Capital One. The
    > information
    >     > transmitted herewith is intended only for use by the individual or
    > entity
    >     > to which it is addressed. If the reader of this message is not the
    > intended
    >     > recipient, you are hereby notified that any review, retransmission,
    >     > dissemination, distribution, copying or other use of, or taking of
    > any
    >     > action in reliance upon this information is strictly prohibited. If
    > you
    >     > have received this communication in error, please contact the sender
    > and
    >     > delete the material from your computer.
    >     > > <log.txt>
    >     >
    >
    >
    > ________________________________________________________
    >
    > The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
transversing: gotcha.  I completely understand now.  And now I understand
how the prospector table would help with sniping out those nodes.

maven: yep, that's the right git repo.  Locationtech is required when you
build with the 'geoindexing' profile.  Regardless, it's strange that maven
tried to get the apache pom from locationtech.  Deleting the
org/apache/apache directory should force maven to download the apache pom
from maven central.

--Aaron

On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <Er...@capitalone.com> wrote:

> Oh, that’s not an issue, that’s what we would like to do when traversing
> through the data. If a node has a high cardinality we don’t want to further
> traverse through its children.
>
> As for installation, did I clone the right repo for Rya? The one I’m using
> has locationtech repos for SNAPSHOT and RELEASE:
> https://github.com/apache/incubator-rya/blob/master/pom.xml
>
> On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
>
>     Repos: The locationtech repo is up [1].  The issue is that your local
> .m2
>     repo is in a bad state.  Maven is trying to get the apache pom from
>     locationtech.  Locationtech does not host that pom, instead it's on
> maven
>     central [2].
>
>     Two ways to fix this issue (you should do (1) and that'll fix it...
> (2) is
>     just another option for reference).
>
>     1. Delete your apache pom directory from your local maven repo (e.g.
> rm -rf
>     ~/.m2/repository/org/apache/apache/)
>
>     2. Tell maven to ignore remote repository metadata with the -llr flag
> (e.g.
>     mvn clean install -llr -Pgeoindexing)
>
>     Let me know if you have any other issues.
>
>     deep/wide: okay, I don't understand this statement: "if the
> cardinality of
>     a node is too high (for example, a user that owns a large number of
>     datasets), the neighbors of that node will not be found."  Is this a
>     property of your current datstore, or is this an issue with Rya?
>
>     --Aaron
>
>     [1]
>
> https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
>     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
>
>     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <pu...@gmail.com> wrote:
>
>     > Hey Eric,
>     > Regarding the repos-- sometimes the location tech repos go down,
> your best
>     > bet is to wait a little bit and try again.  You can also download the
>     > latest artifacts off of the apache build server.
>     > Since location tech is only used for the geo profile we may want to
> move
>     > where that repo is declared (or put it in the geo profile).
>     > For your use case, you could look to use the cardinality in the
> prospector
>     > services for individual nodes.  Though the prospector services could
> be run
>     > once and then used to be representative (that wouldn't work for your
> use
>     > case), you could run them regularly to keep track of counts for your
> use
>     > case.  Are you using the count keyword or just manually counting
> edges?
>     > The count keyword is pretty inefficient currently.  We could add
> that to
>     > our list of priorities maybe.
>     >
>     > Sent from my iPhone
>     >
>     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <Er...@capitalone.com>
> wrote:
>     > >
>     > > Hey Aaron,
>     > >
>     > > I’m currently setting up Rya to test these queries with some of our
>     > data. I run into an error when I run ‘mvn clean install’, I attached
> the
>     > logs but it seems like I can’t connect to the snapshots repo you’re
> using.
>     > >
>     > > As for “deep/wide”, it would be something like starting at a
> dataset,
>     > then fanning out looking for relations where it is either the
> subject or
>     > object, such as the user who created it, the job it came from, where
> it’s
>     > stored, etc. It would recurse on these neighboring nodes until a
> total
>     > number of results is reached. However, if the cardinality of a node
> is too
>     > high (for example, a user that owns a large number of datasets), the
>     > neighbors of that node will not be found. Really, the goal is to
> find the
>     > most distance relevant relationships possible, and this is our
> current
>     > naïve way of doing so.
>     > >
>     > > Do you want to have a short call about this? I think it’d be
> easier to
>     > explain/answer questions over the phone. I’m free pretty much any
> time
>     > 1pm-5pm PST tomorrow (3/1).
>     > >
>     > > Thanks,
>     > > Eric
>     > >
>     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aa...@gmail.com>
> wrote:
>     > >
>     > >    deep vs wide: I played around with the property paths sparql
> operator
>     > and
>     > >    put up an example here [1].  This is a slightly different query
> than
>     > the
>     > >    one I sent out before.  It would be worth it for us to look at
> how
>     > this is
>     > >    actually executed by OpenRDF.
>     > >
>     > >    Eric: Could you clarify by "deep vs wide"?  I think I
> understand your
>     > >    queries, but I don't have a good intuition about those terms
> and how
>     > >    cardinality might figure into a query.  It would probably be a
> bit
>     > more
>     > >    helpful if you provided a model or general description that is
>     > (somewhat)
>     > >    representative of your data.
>     > >
>     > >    --Aaron
>     > >
>     > >    [1]
>     > >
>     >
> https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
>     > >
>     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
> adina@usna.edu>
>     > wrote:
>     > >>
>     > >> Hi Eric,
>     > >>
>     > >> If you want to query by the Accumulo timestamp, something like
>     > >> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I
> did not
>     > try
>     > >> it lately, but timeRange() was in Rya originally. Not sure if it
> was
>     > >> removed in later iterations or whether it would be useful for
> your use
>     > >> case. First Rya paper
>     > >> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
>     > discusses
>     > >> time ranges (Section 5.3 at the link above)
>     > >>
>     > >> Adina
>     > >>
>     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pujav65@gmail.com
> >
>     > wrote:
>     > >>>
>     > >>> Hey John,
>     > >>> I'm pretty sure your pull request was merged-- it was pulled in
> through
>     > >>> another pull request.  If not, sorry-- I thought it had been
> merged and
>     > >>> then just not closed.  I was going to spend some time doing
> merges
>     > >> tomorrow
>     > >>> so I can get it tomorrow.
>     > >>>
>     > >>> Sent from my iPhone
>     > >>>
>     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com>
> wrote:
>     > >>>>
>     > >>>> I have a pull request that fixes that problem.. it has been
> stuck in
>     > >>> limbo
>     > >>>> for months..
> https://github.com/apache/incubator-rya-site/pull/1  Can
>     > >>>> someone merge it into master?
>     > >>>>
>     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
> Eric.Liu@capitalone.com>
>     > >>> wrote:
>     > >>>>>
>     > >>>>> Cool, thanks for the help.
>     > >>>>> By the way, the link to the Rya Manual is outdated on the
>     > >>> rya.apache.org
>     > >>>>> site. Should be pointing at https://github.com/apache/
>     > >>>>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
>     > >> index.md
>     > >>>>>
>     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
> aaron.mihalik@gmail.com>
>     > >>> wrote:
>     > >>>>>
>     > >>>>>   deep vs wide:
>     > >>>>>
>     > >>>>>   A property path query is probably your best bet.  Something
> like:
>     > >>>>>
>     > >>>>>   for the following data:
>     > >>>>>
>     > >>>>>   s:EventA p:causes s:EventB
>     > >>>>>   s:EventB p:causes s:EventC
>     > >>>>>   s:EventC p:causes s:EventD
>     > >>>>>
>     > >>>>>
>     > >>>>>   This query would start at EventB and work it's way up and
> down the
>     > >>>>> chain:
>     > >>>>>
>     > >>>>>   SELECT * WHERE {
>     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
>     > >>>>>   }
>     > >>>>>
>     > >>>>>
>     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
>     > >>> Caleb.Meier@parsons.com>
>     > >>>>>   wrote:
>     > >>>>>
>     > >>>>>> Yes, that's a good place to start.  If you have external
> timestamps
>     > >>>>> that
>     > >>>>>> are built into your graph using the time ontology in owl (e.g
> you
>     > >>>>> have
>     > >>>>>> triples of the form (event123, time:inDateTime,
> 2017-02-23T14:29)),
>     > >>>>> the
>     > >>>>>> temporal index is exactly what you want.  If you are hoping
> to query
>     > >>>>> based
>     > >>>>>> on the internal timestamps that Accumulo assigns to your
> triples,
>     > >>>>> then
>     > >>>>>> there are some slight tweaks that can be done to facilitate
> this,
>     > >>>>> but it
>     > >>>>>> won't be nearly as efficient (this will require some sort of
> client
>     > >>>>> side
>     > >>>>>> filtering).
>     > >>>>>>
>     > >>>>>> Caleb A. Meier, Ph.D.
>     > >>>>>> Software Engineer II ♦ Analyst
>     > >>>>>> Parsons Corporation
>     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
> <(703)%20797-3066>
>     > <(703)%20797-3066>
>     > >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
>     > >>>>>>
>     > >>>>>> -----Original Message-----
>     > >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
>     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
>     > >>>>>> To: dev@rya.incubator.apache.org
>     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
>     > >>>>>>
>     > >>>>>> We’d like to be able to query by timestamp; specifically, we
> want to
>     > >>>>> be
>     > >>>>>> able to find all statements that were made within a given time
>     > >>>>> range. Is
>     > >>>>>> this what I should be looking at?
>     > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>     > >>>>> apache.org_confluence_download_attachments_63407907_
>     > >>>>>
> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
>     > >>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
>     > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
>     > >>> geo_4WXTD0qo8&m=
>     > >>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
>     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com>
>     > wrote:
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Hey Eric,
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Currently timestamps can't be queried in Rya.  Do you need
> to be
>     > >>>>> able
>     > >>>>>> to query by timestamp, or simply discover the timestamp for a
> given
>     > >>>>> node?
>     > >>>>>> Rya does have a temporal index, but that requires you to use a
>     > >>>>> temporal
>     > >>>>>> ontology to model the temporal properties of your graph nodes.
>     > >>>>>>
>     > >>>>>>   ________________________________________
>     > >>>>>>
>     > >>>>>>   From: Liu, Eric <Er...@capitalone.com>
>     > >>>>>>
>     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
>     > >>>>>>
>     > >>>>>>   To: dev@rya.incubator.apache.org
>     > >>>>>>
>     > >>>>>>   Subject: Timestamps and Cardinality in Queries
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Hi,
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Continuing from our talk earlier today I was wondering if
> you
>     > >>>>> could
>     > >>>>>> provide more information about how timestamps could be
> queried in
>     > >>>>> Rya.
>     > >>>>>>
>     > >>>>>>   Also, we are trying to support a type of query that would
>     > >>>>> essentially
>     > >>>>>> be limiting on cardinality (different from the normal SPARQL
> limit
>     > >>>>> because
>     > >>>>>> it’s for node cardinality rather than total results). I saw
> in one
>     > of
>     > >>>>>> Caleb’s talks that Rya’s query optimization involves checking
>     > >>>>> cardinality
>     > >>>>>> first. I was wondering if there would be some way to tap into
> this
>     > >>>>> feature
>     > >>>>>> for usage in queries?
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   Thanks,
>     > >>>>>>
>     > >>>>>>   Eric Liu
>     > >>>>>>
>     > >>>>>>   ________________________________________________________
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>   The information contained in this e-mail is confidential
> and/or
>     > >>>>>> proprietary to Capital One and/or its affiliates and may only
> be
>     > used
>     > >>>>>> solely in performance of work or services for Capital One. The
>     > >>>>> information
>     > >>>>>> transmitted herewith is intended only for use by the
> individual or
>     > >>>>> entity
>     > >>>>>> to which it is addressed. If the reader of this message is
> not the
>     > >>>>> intended
>     > >>>>>> recipient, you are hereby notified that any review,
> retransmission,
>     > >>>>>> dissemination, distribution, copying or other use of, or
> taking of
>     > >>>>> any
>     > >>>>>> action in reliance upon this information is strictly
> prohibited. If
>     > >>>>> you
>     > >>>>>> have received this communication in error, please contact the
> sender
>     > >>>>> and
>     > >>>>>> delete the material from your computer.
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>> ________________________________________________________
>     > >>>>>>
>     > >>>>>>
>     > >>>>>>
>     > >>>>>> The information contained in this e-mail is confidential
> and/or
>     > >>>>>> proprietary to Capital One and/or its affiliates and may only
> be
>     > used
>     > >>>>>> solely in performance of work or services for Capital One. The
>     > >>>>> information
>     > >>>>>> transmitted herewith is intended only for use by the
> individual or
>     > >>>>> entity
>     > >>>>>> to which it is addressed. If the reader of this message is
> not the
>     > >>>>> intended
>     > >>>>>> recipient, you are hereby notified that any review,
> retransmission,
>     > >>>>>> dissemination, distribution, copying or other use of, or
> taking of
>     > >>>>> any
>     > >>>>>> action in reliance upon this information is strictly
> prohibited. If
>     > >>>>> you
>     > >>>>>> have received this communication in error, please contact the
> sender
>     > >>>>> and
>     > >>>>>> delete the material from your computer.
>     > >>>>>>
>     > >>>>>
>     > >>>>>
>     > >>>>> ________________________________________________________
>     > >>>>>
>     > >>>>> The information contained in this e-mail is confidential and/or
>     > >>>>> proprietary to Capital One and/or its affiliates and may only
> be used
>     > >>>>> solely in performance of work or services for Capital One. The
>     > >>> information
>     > >>>>> transmitted herewith is intended only for use by the
> individual or
>     > >>> entity
>     > >>>>> to which it is addressed. If the reader of this message is not
> the
>     > >>> intended
>     > >>>>> recipient, you are hereby notified that any review,
> retransmission,
>     > >>>>> dissemination, distribution, copying or other use of, or
> taking of
>     > any
>     > >>>>> action in reliance upon this information is strictly
> prohibited. If
>     > >> you
>     > >>>>> have received this communication in error, please contact the
> sender
>     > >> and
>     > >>>>> delete the material from your computer.
>     > >>>>>
>     > >>>
>     > >>
>     > >>
>     > >>
>     > >> --
>     > >> Dr. Adina Crainiceanu
>     > >> Associate Professor, Computer Science Department
>     > >> United States Naval Academy
>     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
> <(410)%20293-6822>
>     > >> adina@usna.edu
>     > >> http://www.usna.edu/Users/cs/adina/
>     > >>
>     > >
>     > >
>     > > ________________________________________________________
>     > >
>     > > The information contained in this e-mail is confidential and/or
>     > proprietary to Capital One and/or its affiliates and may only be used
>     > solely in performance of work or services for Capital One. The
> information
>     > transmitted herewith is intended only for use by the individual or
> entity
>     > to which it is addressed. If the reader of this message is not the
> intended
>     > recipient, you are hereby notified that any review, retransmission,
>     > dissemination, distribution, copying or other use of, or taking of
> any
>     > action in reliance upon this information is strictly prohibited. If
> you
>     > have received this communication in error, please contact the sender
> and
>     > delete the material from your computer.
>     > > <log.txt>
>     >
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
Oh, that’s not an issue, that’s what we would like to do when traversing through the data. If a node has a high cardinality we don’t want to further traverse through its children.

As for installation, did I clone the right repo for Rya? The one I’m using has locationtech repos for SNAPSHOT and RELEASE: https://github.com/apache/incubator-rya/blob/master/pom.xml

On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:

    Repos: The locationtech repo is up [1].  The issue is that your local .m2
    repo is in a bad state.  Maven is trying to get the apache pom from
    locationtech.  Locationtech does not host that pom, instead it's on maven
    central [2].
    
    Two ways to fix this issue (you should do (1) and that'll fix it... (2) is
    just another option for reference).
    
    1. Delete your apache pom directory from your local maven repo (e.g. rm -rf
    ~/.m2/repository/org/apache/apache/)
    
    2. Tell maven to ignore remote repository metadata with the -llr flag (e.g.
    mvn clean install -llr -Pgeoindexing)
    
    Let me know if you have any other issues.
    
    deep/wide: okay, I don't understand this statement: "if the cardinality of
    a node is too high (for example, a user that owns a large number of
    datasets), the neighbors of that node will not be found."  Is this a
    property of your current datstore, or is this an issue with Rya?
    
    --Aaron
    
    [1]
    https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
    [2] http://repo1.maven.org/maven2/org/apache/apache/17/
    
    On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <pu...@gmail.com> wrote:
    
    > Hey Eric,
    > Regarding the repos-- sometimes the location tech repos go down, your best
    > bet is to wait a little bit and try again.  You can also download the
    > latest artifacts off of the apache build server.
    > Since location tech is only used for the geo profile we may want to move
    > where that repo is declared (or put it in the geo profile).
    > For your use case, you could look to use the cardinality in the prospector
    > services for individual nodes.  Though the prospector services could be run
    > once and then used to be representative (that wouldn't work for your use
    > case), you could run them regularly to keep track of counts for your use
    > case.  Are you using the count keyword or just manually counting edges?
    > The count keyword is pretty inefficient currently.  We could add that to
    > our list of priorities maybe.
    >
    > Sent from my iPhone
    >
    > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <Er...@capitalone.com> wrote:
    > >
    > > Hey Aaron,
    > >
    > > I’m currently setting up Rya to test these queries with some of our
    > data. I run into an error when I run ‘mvn clean install’, I attached the
    > logs but it seems like I can’t connect to the snapshots repo you’re using.
    > >
    > > As for “deep/wide”, it would be something like starting at a dataset,
    > then fanning out looking for relations where it is either the subject or
    > object, such as the user who created it, the job it came from, where it’s
    > stored, etc. It would recurse on these neighboring nodes until a total
    > number of results is reached. However, if the cardinality of a node is too
    > high (for example, a user that owns a large number of datasets), the
    > neighbors of that node will not be found. Really, the goal is to find the
    > most distance relevant relationships possible, and this is our current
    > naïve way of doing so.
    > >
    > > Do you want to have a short call about this? I think it’d be easier to
    > explain/answer questions over the phone. I’m free pretty much any time
    > 1pm-5pm PST tomorrow (3/1).
    > >
    > > Thanks,
    > > Eric
    > >
    > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
    > >
    > >    deep vs wide: I played around with the property paths sparql operator
    > and
    > >    put up an example here [1].  This is a slightly different query than
    > the
    > >    one I sent out before.  It would be worth it for us to look at how
    > this is
    > >    actually executed by OpenRDF.
    > >
    > >    Eric: Could you clarify by "deep vs wide"?  I think I understand your
    > >    queries, but I don't have a good intuition about those terms and how
    > >    cardinality might figure into a query.  It would probably be a bit
    > more
    > >    helpful if you provided a model or general description that is
    > (somewhat)
    > >    representative of your data.
    > >
    > >    --Aaron
    > >
    > >    [1]
    > >
    > https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
    > >
    > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <ad...@usna.edu>
    > wrote:
    > >>
    > >> Hi Eric,
    > >>
    > >> If you want to query by the Accumulo timestamp, something like
    > >> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not
    > try
    > >> it lately, but timeRange() was in Rya originally. Not sure if it was
    > >> removed in later iterations or whether it would be useful for your use
    > >> case. First Rya paper
    > >> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
    > discusses
    > >> time ranges (Section 5.3 at the link above)
    > >>
    > >> Adina
    > >>
    > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pu...@gmail.com>
    > wrote:
    > >>>
    > >>> Hey John,
    > >>> I'm pretty sure your pull request was merged-- it was pulled in through
    > >>> another pull request.  If not, sorry-- I thought it had been merged and
    > >>> then just not closed.  I was going to spend some time doing merges
    > >> tomorrow
    > >>> so I can get it tomorrow.
    > >>>
    > >>> Sent from my iPhone
    > >>>
    > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com> wrote:
    > >>>>
    > >>>> I have a pull request that fixes that problem.. it has been stuck in
    > >>> limbo
    > >>>> for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
    > >>>> someone merge it into master?
    > >>>>
    > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com>
    > >>> wrote:
    > >>>>>
    > >>>>> Cool, thanks for the help.
    > >>>>> By the way, the link to the Rya Manual is outdated on the
    > >>> rya.apache.org
    > >>>>> site. Should be pointing at https://github.com/apache/
    > >>>>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
    > >> index.md
    > >>>>>
    > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com>
    > >>> wrote:
    > >>>>>
    > >>>>>   deep vs wide:
    > >>>>>
    > >>>>>   A property path query is probably your best bet.  Something like:
    > >>>>>
    > >>>>>   for the following data:
    > >>>>>
    > >>>>>   s:EventA p:causes s:EventB
    > >>>>>   s:EventB p:causes s:EventC
    > >>>>>   s:EventC p:causes s:EventD
    > >>>>>
    > >>>>>
    > >>>>>   This query would start at EventB and work it's way up and down the
    > >>>>> chain:
    > >>>>>
    > >>>>>   SELECT * WHERE {
    > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
    > >>>>>   }
    > >>>>>
    > >>>>>
    > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
    > >>> Caleb.Meier@parsons.com>
    > >>>>>   wrote:
    > >>>>>
    > >>>>>> Yes, that's a good place to start.  If you have external timestamps
    > >>>>> that
    > >>>>>> are built into your graph using the time ontology in owl (e.g you
    > >>>>> have
    > >>>>>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
    > >>>>> the
    > >>>>>> temporal index is exactly what you want.  If you are hoping to query
    > >>>>> based
    > >>>>>> on the internal timestamps that Accumulo assigns to your triples,
    > >>>>> then
    > >>>>>> there are some slight tweaks that can be done to facilitate this,
    > >>>>> but it
    > >>>>>> won't be nearly as efficient (this will require some sort of client
    > >>>>> side
    > >>>>>> filtering).
    > >>>>>>
    > >>>>>> Caleb A. Meier, Ph.D.
    > >>>>>> Software Engineer II ♦ Analyst
    > >>>>>> Parsons Corporation
    > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
    > >>>>>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
    > <(703)%20797-3066>
    > >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
    > >>>>>>
    > >>>>>> -----Original Message-----
    > >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
    > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
    > >>>>>> To: dev@rya.incubator.apache.org
    > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
    > >>>>>>
    > >>>>>> We’d like to be able to query by timestamp; specifically, we want to
    > >>>>> be
    > >>>>>> able to find all statements that were made within a given time
    > >>>>> range. Is
    > >>>>>> this what I should be looking at?
    > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
    > >>>>> apache.org_confluence_download_attachments_63407907_
    > >>>>> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
    > >>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
    > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
    > >>> geo_4WXTD0qo8&m=
    > >>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
    > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com>
    > wrote:
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>   Hey Eric,
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>   Currently timestamps can't be queried in Rya.  Do you need to be
    > >>>>> able
    > >>>>>> to query by timestamp, or simply discover the timestamp for a given
    > >>>>> node?
    > >>>>>> Rya does have a temporal index, but that requires you to use a
    > >>>>> temporal
    > >>>>>> ontology to model the temporal properties of your graph nodes.
    > >>>>>>
    > >>>>>>   ________________________________________
    > >>>>>>
    > >>>>>>   From: Liu, Eric <Er...@capitalone.com>
    > >>>>>>
    > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
    > >>>>>>
    > >>>>>>   To: dev@rya.incubator.apache.org
    > >>>>>>
    > >>>>>>   Subject: Timestamps and Cardinality in Queries
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>   Hi,
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>   Continuing from our talk earlier today I was wondering if you
    > >>>>> could
    > >>>>>> provide more information about how timestamps could be queried in
    > >>>>> Rya.
    > >>>>>>
    > >>>>>>   Also, we are trying to support a type of query that would
    > >>>>> essentially
    > >>>>>> be limiting on cardinality (different from the normal SPARQL limit
    > >>>>> because
    > >>>>>> it’s for node cardinality rather than total results). I saw in one
    > of
    > >>>>>> Caleb’s talks that Rya’s query optimization involves checking
    > >>>>> cardinality
    > >>>>>> first. I was wondering if there would be some way to tap into this
    > >>>>> feature
    > >>>>>> for usage in queries?
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>   Thanks,
    > >>>>>>
    > >>>>>>   Eric Liu
    > >>>>>>
    > >>>>>>   ________________________________________________________
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>   The information contained in this e-mail is confidential and/or
    > >>>>>> proprietary to Capital One and/or its affiliates and may only be
    > used
    > >>>>>> solely in performance of work or services for Capital One. The
    > >>>>> information
    > >>>>>> transmitted herewith is intended only for use by the individual or
    > >>>>> entity
    > >>>>>> to which it is addressed. If the reader of this message is not the
    > >>>>> intended
    > >>>>>> recipient, you are hereby notified that any review, retransmission,
    > >>>>>> dissemination, distribution, copying or other use of, or taking of
    > >>>>> any
    > >>>>>> action in reliance upon this information is strictly prohibited. If
    > >>>>> you
    > >>>>>> have received this communication in error, please contact the sender
    > >>>>> and
    > >>>>>> delete the material from your computer.
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>> ________________________________________________________
    > >>>>>>
    > >>>>>>
    > >>>>>>
    > >>>>>> The information contained in this e-mail is confidential and/or
    > >>>>>> proprietary to Capital One and/or its affiliates and may only be
    > used
    > >>>>>> solely in performance of work or services for Capital One. The
    > >>>>> information
    > >>>>>> transmitted herewith is intended only for use by the individual or
    > >>>>> entity
    > >>>>>> to which it is addressed. If the reader of this message is not the
    > >>>>> intended
    > >>>>>> recipient, you are hereby notified that any review, retransmission,
    > >>>>>> dissemination, distribution, copying or other use of, or taking of
    > >>>>> any
    > >>>>>> action in reliance upon this information is strictly prohibited. If
    > >>>>> you
    > >>>>>> have received this communication in error, please contact the sender
    > >>>>> and
    > >>>>>> delete the material from your computer.
    > >>>>>>
    > >>>>>
    > >>>>>
    > >>>>> ________________________________________________________
    > >>>>>
    > >>>>> The information contained in this e-mail is confidential and/or
    > >>>>> proprietary to Capital One and/or its affiliates and may only be used
    > >>>>> solely in performance of work or services for Capital One. The
    > >>> information
    > >>>>> transmitted herewith is intended only for use by the individual or
    > >>> entity
    > >>>>> to which it is addressed. If the reader of this message is not the
    > >>> intended
    > >>>>> recipient, you are hereby notified that any review, retransmission,
    > >>>>> dissemination, distribution, copying or other use of, or taking of
    > any
    > >>>>> action in reliance upon this information is strictly prohibited. If
    > >> you
    > >>>>> have received this communication in error, please contact the sender
    > >> and
    > >>>>> delete the material from your computer.
    > >>>>>
    > >>>
    > >>
    > >>
    > >>
    > >> --
    > >> Dr. Adina Crainiceanu
    > >> Associate Professor, Computer Science Department
    > >> United States Naval Academy
    > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
    > >> adina@usna.edu
    > >> http://www.usna.edu/Users/cs/adina/
    > >>
    > >
    > >
    > > ________________________________________________________
    > >
    > > The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    > > <log.txt>
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
Repos: The locationtech repo is up [1].  The issue is that your local .m2
repo is in a bad state.  Maven is trying to get the apache pom from
locationtech.  Locationtech does not host that pom, instead it's on maven
central [2].

Two ways to fix this issue (you should do (1) and that'll fix it... (2) is
just another option for reference).

1. Delete your apache pom directory from your local maven repo (e.g. rm -rf
~/.m2/repository/org/apache/apache/)

2. Tell maven to ignore remote repository metadata with the -llr flag (e.g.
mvn clean install -llr -Pgeoindexing)

Let me know if you have any other issues.

deep/wide: okay, I don't understand this statement: "if the cardinality of
a node is too high (for example, a user that owns a large number of
datasets), the neighbors of that node will not be found."  Is this a
property of your current datstore, or is this an issue with Rya?

--Aaron

[1]
https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
[2] http://repo1.maven.org/maven2/org/apache/apache/17/

On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <pu...@gmail.com> wrote:

> Hey Eric,
> Regarding the repos-- sometimes the location tech repos go down, your best
> bet is to wait a little bit and try again.  You can also download the
> latest artifacts off of the apache build server.
> Since location tech is only used for the geo profile we may want to move
> where that repo is declared (or put it in the geo profile).
> For your use case, you could look to use the cardinality in the prospector
> services for individual nodes.  Though the prospector services could be run
> once and then used to be representative (that wouldn't work for your use
> case), you could run them regularly to keep track of counts for your use
> case.  Are you using the count keyword or just manually counting edges?
> The count keyword is pretty inefficient currently.  We could add that to
> our list of priorities maybe.
>
> Sent from my iPhone
>
> > On Mar 1, 2017, at 3:00 AM, Liu, Eric <Er...@capitalone.com> wrote:
> >
> > Hey Aaron,
> >
> > I’m currently setting up Rya to test these queries with some of our
> data. I run into an error when I run ‘mvn clean install’, I attached the
> logs but it seems like I can’t connect to the snapshots repo you’re using.
> >
> > As for “deep/wide”, it would be something like starting at a dataset,
> then fanning out looking for relations where it is either the subject or
> object, such as the user who created it, the job it came from, where it’s
> stored, etc. It would recurse on these neighboring nodes until a total
> number of results is reached. However, if the cardinality of a node is too
> high (for example, a user that owns a large number of datasets), the
> neighbors of that node will not be found. Really, the goal is to find the
> most distance relevant relationships possible, and this is our current
> naïve way of doing so.
> >
> > Do you want to have a short call about this? I think it’d be easier to
> explain/answer questions over the phone. I’m free pretty much any time
> 1pm-5pm PST tomorrow (3/1).
> >
> > Thanks,
> > Eric
> >
> > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
> >
> >    deep vs wide: I played around with the property paths sparql operator
> and
> >    put up an example here [1].  This is a slightly different query than
> the
> >    one I sent out before.  It would be worth it for us to look at how
> this is
> >    actually executed by OpenRDF.
> >
> >    Eric: Could you clarify by "deep vs wide"?  I think I understand your
> >    queries, but I don't have a good intuition about those terms and how
> >    cardinality might figure into a query.  It would probably be a bit
> more
> >    helpful if you provided a model or general description that is
> (somewhat)
> >    representative of your data.
> >
> >    --Aaron
> >
> >    [1]
> >
> https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
> >
> >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <ad...@usna.edu>
> wrote:
> >>
> >> Hi Eric,
> >>
> >> If you want to query by the Accumulo timestamp, something like
> >> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not
> try
> >> it lately, but timeRange() was in Rya originally. Not sure if it was
> >> removed in later iterations or whether it would be useful for your use
> >> case. First Rya paper
> >> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
> discusses
> >> time ranges (Section 5.3 at the link above)
> >>
> >> Adina
> >>
> >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pu...@gmail.com>
> wrote:
> >>>
> >>> Hey John,
> >>> I'm pretty sure your pull request was merged-- it was pulled in through
> >>> another pull request.  If not, sorry-- I thought it had been merged and
> >>> then just not closed.  I was going to spend some time doing merges
> >> tomorrow
> >>> so I can get it tomorrow.
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com> wrote:
> >>>>
> >>>> I have a pull request that fixes that problem.. it has been stuck in
> >>> limbo
> >>>> for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
> >>>> someone merge it into master?
> >>>>
> >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com>
> >>> wrote:
> >>>>>
> >>>>> Cool, thanks for the help.
> >>>>> By the way, the link to the Rya Manual is outdated on the
> >>> rya.apache.org
> >>>>> site. Should be pointing at https://github.com/apache/
> >>>>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
> >> index.md
> >>>>>
> >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>   deep vs wide:
> >>>>>
> >>>>>   A property path query is probably your best bet.  Something like:
> >>>>>
> >>>>>   for the following data:
> >>>>>
> >>>>>   s:EventA p:causes s:EventB
> >>>>>   s:EventB p:causes s:EventC
> >>>>>   s:EventC p:causes s:EventD
> >>>>>
> >>>>>
> >>>>>   This query would start at EventB and work it's way up and down the
> >>>>> chain:
> >>>>>
> >>>>>   SELECT * WHERE {
> >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
> >>>>>   }
> >>>>>
> >>>>>
> >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
> >>> Caleb.Meier@parsons.com>
> >>>>>   wrote:
> >>>>>
> >>>>>> Yes, that's a good place to start.  If you have external timestamps
> >>>>> that
> >>>>>> are built into your graph using the time ontology in owl (e.g you
> >>>>> have
> >>>>>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
> >>>>> the
> >>>>>> temporal index is exactly what you want.  If you are hoping to query
> >>>>> based
> >>>>>> on the internal timestamps that Accumulo assigns to your triples,
> >>>>> then
> >>>>>> there are some slight tweaks that can be done to facilitate this,
> >>>>> but it
> >>>>>> won't be nearly as efficient (this will require some sort of client
> >>>>> side
> >>>>>> filtering).
> >>>>>>
> >>>>>> Caleb A. Meier, Ph.D.
> >>>>>> Software Engineer II ♦ Analyst
> >>>>>> Parsons Corporation
> >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> >>>>>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
> <(703)%20797-3066>
> >>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
> >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
> >>>>>> To: dev@rya.incubator.apache.org
> >>>>>> Subject: Re: Timestamps and Cardinality in Queries
> >>>>>>
> >>>>>> We’d like to be able to query by timestamp; specifically, we want to
> >>>>> be
> >>>>>> able to find all statements that were made within a given time
> >>>>> range. Is
> >>>>>> this what I should be looking at?
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> >>>>> apache.org_confluence_download_attachments_63407907_
> >>>>> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
> >>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
> >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
> >>> geo_4WXTD0qo8&m=
> >>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
> >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com>
> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   Hey Eric,
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   Currently timestamps can't be queried in Rya.  Do you need to be
> >>>>> able
> >>>>>> to query by timestamp, or simply discover the timestamp for a given
> >>>>> node?
> >>>>>> Rya does have a temporal index, but that requires you to use a
> >>>>> temporal
> >>>>>> ontology to model the temporal properties of your graph nodes.
> >>>>>>
> >>>>>>   ________________________________________
> >>>>>>
> >>>>>>   From: Liu, Eric <Er...@capitalone.com>
> >>>>>>
> >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
> >>>>>>
> >>>>>>   To: dev@rya.incubator.apache.org
> >>>>>>
> >>>>>>   Subject: Timestamps and Cardinality in Queries
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   Hi,
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   Continuing from our talk earlier today I was wondering if you
> >>>>> could
> >>>>>> provide more information about how timestamps could be queried in
> >>>>> Rya.
> >>>>>>
> >>>>>>   Also, we are trying to support a type of query that would
> >>>>> essentially
> >>>>>> be limiting on cardinality (different from the normal SPARQL limit
> >>>>> because
> >>>>>> it’s for node cardinality rather than total results). I saw in one
> of
> >>>>>> Caleb’s talks that Rya’s query optimization involves checking
> >>>>> cardinality
> >>>>>> first. I was wondering if there would be some way to tap into this
> >>>>> feature
> >>>>>> for usage in queries?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   Thanks,
> >>>>>>
> >>>>>>   Eric Liu
> >>>>>>
> >>>>>>   ________________________________________________________
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   The information contained in this e-mail is confidential and/or
> >>>>>> proprietary to Capital One and/or its affiliates and may only be
> used
> >>>>>> solely in performance of work or services for Capital One. The
> >>>>> information
> >>>>>> transmitted herewith is intended only for use by the individual or
> >>>>> entity
> >>>>>> to which it is addressed. If the reader of this message is not the
> >>>>> intended
> >>>>>> recipient, you are hereby notified that any review, retransmission,
> >>>>>> dissemination, distribution, copying or other use of, or taking of
> >>>>> any
> >>>>>> action in reliance upon this information is strictly prohibited. If
> >>>>> you
> >>>>>> have received this communication in error, please contact the sender
> >>>>> and
> >>>>>> delete the material from your computer.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ________________________________________________________
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> The information contained in this e-mail is confidential and/or
> >>>>>> proprietary to Capital One and/or its affiliates and may only be
> used
> >>>>>> solely in performance of work or services for Capital One. The
> >>>>> information
> >>>>>> transmitted herewith is intended only for use by the individual or
> >>>>> entity
> >>>>>> to which it is addressed. If the reader of this message is not the
> >>>>> intended
> >>>>>> recipient, you are hereby notified that any review, retransmission,
> >>>>>> dissemination, distribution, copying or other use of, or taking of
> >>>>> any
> >>>>>> action in reliance upon this information is strictly prohibited. If
> >>>>> you
> >>>>>> have received this communication in error, please contact the sender
> >>>>> and
> >>>>>> delete the material from your computer.
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ________________________________________________________
> >>>>>
> >>>>> The information contained in this e-mail is confidential and/or
> >>>>> proprietary to Capital One and/or its affiliates and may only be used
> >>>>> solely in performance of work or services for Capital One. The
> >>> information
> >>>>> transmitted herewith is intended only for use by the individual or
> >>> entity
> >>>>> to which it is addressed. If the reader of this message is not the
> >>> intended
> >>>>> recipient, you are hereby notified that any review, retransmission,
> >>>>> dissemination, distribution, copying or other use of, or taking of
> any
> >>>>> action in reliance upon this information is strictly prohibited. If
> >> you
> >>>>> have received this communication in error, please contact the sender
> >> and
> >>>>> delete the material from your computer.
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Dr. Adina Crainiceanu
> >> Associate Professor, Computer Science Department
> >> United States Naval Academy
> >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
> >> adina@usna.edu
> >> http://www.usna.edu/Users/cs/adina/
> >>
> >
> >
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
> > <log.txt>
>

Re: Timestamps and Cardinality in Queries

Posted by Puja Valiyil <pu...@gmail.com>.
Hey Eric,
Regarding the repos-- sometimes the location tech repos go down, your best bet is to wait a little bit and try again.  You can also download the latest artifacts off of the apache build server.
Since location tech is only used for the geo profile we may want to move where that repo is declared (or put it in the geo profile).
For your use case, you could look to use the cardinality in the prospector services for individual nodes.  Though the prospector services could be run once and then used to be representative (that wouldn't work for your use case), you could run them regularly to keep track of counts for your use case.  Are you using the count keyword or just manually counting edges?  The count keyword is pretty inefficient currently.  We could add that to our list of priorities maybe. 

Sent from my iPhone

> On Mar 1, 2017, at 3:00 AM, Liu, Eric <Er...@capitalone.com> wrote:
> 
> Hey Aaron,
> 
> I’m currently setting up Rya to test these queries with some of our data. I run into an error when I run ‘mvn clean install’, I attached the logs but it seems like I can’t connect to the snapshots repo you’re using.
> 
> As for “deep/wide”, it would be something like starting at a dataset, then fanning out looking for relations where it is either the subject or object, such as the user who created it, the job it came from, where it’s stored, etc. It would recurse on these neighboring nodes until a total number of results is reached. However, if the cardinality of a node is too high (for example, a user that owns a large number of datasets), the neighbors of that node will not be found. Really, the goal is to find the most distance relevant relationships possible, and this is our current naïve way of doing so.
> 
> Do you want to have a short call about this? I think it’d be easier to explain/answer questions over the phone. I’m free pretty much any time 1pm-5pm PST tomorrow (3/1).
> 
> Thanks,
> Eric
> 
> On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
> 
>    deep vs wide: I played around with the property paths sparql operator and
>    put up an example here [1].  This is a slightly different query than the
>    one I sent out before.  It would be worth it for us to look at how this is
>    actually executed by OpenRDF.
> 
>    Eric: Could you clarify by "deep vs wide"?  I think I understand your
>    queries, but I don't have a good intuition about those terms and how
>    cardinality might figure into a query.  It would probably be a bit more
>    helpful if you provided a model or general description that is (somewhat)
>    representative of your data.
> 
>    --Aaron
> 
>    [1]
>    https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
> 
>>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <ad...@usna.edu> wrote:
>> 
>> Hi Eric,
>> 
>> If you want to query by the Accumulo timestamp, something like
>> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not try
>> it lately, but timeRange() was in Rya originally. Not sure if it was
>> removed in later iterations or whether it would be useful for your use
>> case. First Rya paper
>> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf discusses
>> time ranges (Section 5.3 at the link above)
>> 
>> Adina
>> 
>>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pu...@gmail.com> wrote:
>>> 
>>> Hey John,
>>> I'm pretty sure your pull request was merged-- it was pulled in through
>>> another pull request.  If not, sorry-- I thought it had been merged and
>>> then just not closed.  I was going to spend some time doing merges
>> tomorrow
>>> so I can get it tomorrow.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com> wrote:
>>>> 
>>>> I have a pull request that fixes that problem.. it has been stuck in
>>> limbo
>>>> for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
>>>> someone merge it into master?
>>>> 
>>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com>
>>> wrote:
>>>>> 
>>>>> Cool, thanks for the help.
>>>>> By the way, the link to the Rya Manual is outdated on the
>>> rya.apache.org
>>>>> site. Should be pointing at https://github.com/apache/
>>>>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
>> index.md
>>>>> 
>>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com>
>>> wrote:
>>>>> 
>>>>>   deep vs wide:
>>>>> 
>>>>>   A property path query is probably your best bet.  Something like:
>>>>> 
>>>>>   for the following data:
>>>>> 
>>>>>   s:EventA p:causes s:EventB
>>>>>   s:EventB p:causes s:EventC
>>>>>   s:EventC p:causes s:EventD
>>>>> 
>>>>> 
>>>>>   This query would start at EventB and work it's way up and down the
>>>>> chain:
>>>>> 
>>>>>   SELECT * WHERE {
>>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
>>>>>   }
>>>>> 
>>>>> 
>>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
>>> Caleb.Meier@parsons.com>
>>>>>   wrote:
>>>>> 
>>>>>> Yes, that's a good place to start.  If you have external timestamps
>>>>> that
>>>>>> are built into your graph using the time ontology in owl (e.g you
>>>>> have
>>>>>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
>>>>> the
>>>>>> temporal index is exactly what you want.  If you are hoping to query
>>>>> based
>>>>>> on the internal timestamps that Accumulo assigns to your triples,
>>>>> then
>>>>>> there are some slight tweaks that can be done to facilitate this,
>>>>> but it
>>>>>> won't be nearly as efficient (this will require some sort of client
>>>>> side
>>>>>> filtering).
>>>>>> 
>>>>>> Caleb A. Meier, Ph.D.
>>>>>> Software Engineer II ♦ Analyst
>>>>>> Parsons Corporation
>>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>>>>>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
>>>>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
>>>>>> Sent: Thursday, February 23, 2017 2:27 PM
>>>>>> To: dev@rya.incubator.apache.org
>>>>>> Subject: Re: Timestamps and Cardinality in Queries
>>>>>> 
>>>>>> We’d like to be able to query by timestamp; specifically, we want to
>>>>> be
>>>>>> able to find all statements that were made within a given time
>>>>> range. Is
>>>>>> this what I should be looking at?
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>>>>> apache.org_confluence_download_attachments_63407907_
>>>>> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
>>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
>>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
>>> geo_4WXTD0qo8&m=
>>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
>>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   Hey Eric,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   Currently timestamps can't be queried in Rya.  Do you need to be
>>>>> able
>>>>>> to query by timestamp, or simply discover the timestamp for a given
>>>>> node?
>>>>>> Rya does have a temporal index, but that requires you to use a
>>>>> temporal
>>>>>> ontology to model the temporal properties of your graph nodes.
>>>>>> 
>>>>>>   ________________________________________
>>>>>> 
>>>>>>   From: Liu, Eric <Er...@capitalone.com>
>>>>>> 
>>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
>>>>>> 
>>>>>>   To: dev@rya.incubator.apache.org
>>>>>> 
>>>>>>   Subject: Timestamps and Cardinality in Queries
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   Hi,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   Continuing from our talk earlier today I was wondering if you
>>>>> could
>>>>>> provide more information about how timestamps could be queried in
>>>>> Rya.
>>>>>> 
>>>>>>   Also, we are trying to support a type of query that would
>>>>> essentially
>>>>>> be limiting on cardinality (different from the normal SPARQL limit
>>>>> because
>>>>>> it’s for node cardinality rather than total results). I saw in one of
>>>>>> Caleb’s talks that Rya’s query optimization involves checking
>>>>> cardinality
>>>>>> first. I was wondering if there would be some way to tap into this
>>>>> feature
>>>>>> for usage in queries?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   Thanks,
>>>>>> 
>>>>>>   Eric Liu
>>>>>> 
>>>>>>   ________________________________________________________
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>   The information contained in this e-mail is confidential and/or
>>>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>>>> solely in performance of work or services for Capital One. The
>>>>> information
>>>>>> transmitted herewith is intended only for use by the individual or
>>>>> entity
>>>>>> to which it is addressed. If the reader of this message is not the
>>>>> intended
>>>>>> recipient, you are hereby notified that any review, retransmission,
>>>>>> dissemination, distribution, copying or other use of, or taking of
>>>>> any
>>>>>> action in reliance upon this information is strictly prohibited. If
>>>>> you
>>>>>> have received this communication in error, please contact the sender
>>>>> and
>>>>>> delete the material from your computer.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________________________________
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> The information contained in this e-mail is confidential and/or
>>>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>>>> solely in performance of work or services for Capital One. The
>>>>> information
>>>>>> transmitted herewith is intended only for use by the individual or
>>>>> entity
>>>>>> to which it is addressed. If the reader of this message is not the
>>>>> intended
>>>>>> recipient, you are hereby notified that any review, retransmission,
>>>>>> dissemination, distribution, copying or other use of, or taking of
>>>>> any
>>>>>> action in reliance upon this information is strictly prohibited. If
>>>>> you
>>>>>> have received this communication in error, please contact the sender
>>>>> and
>>>>>> delete the material from your computer.
>>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________________________________
>>>>> 
>>>>> The information contained in this e-mail is confidential and/or
>>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>>> solely in performance of work or services for Capital One. The
>>> information
>>>>> transmitted herewith is intended only for use by the individual or
>>> entity
>>>>> to which it is addressed. If the reader of this message is not the
>>> intended
>>>>> recipient, you are hereby notified that any review, retransmission,
>>>>> dissemination, distribution, copying or other use of, or taking of any
>>>>> action in reliance upon this information is strictly prohibited. If
>> you
>>>>> have received this communication in error, please contact the sender
>> and
>>>>> delete the material from your computer.
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Dr. Adina Crainiceanu
>> Associate Professor, Computer Science Department
>> United States Naval Academy
>> 410-293-6822 <(410)%20293-6822>
>> adina@usna.edu
>> http://www.usna.edu/Users/cs/adina/
>> 
> 
> 
> ________________________________________________________
> 
> The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
> <log.txt>

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
Hey Aaron,

I’m currently setting up Rya to test these queries with some of our data. I run into an error when I run ‘mvn clean install’, I attached the logs but it seems like I can’t connect to the snapshots repo you’re using.

As for “deep/wide”, it would be something like starting at a dataset, then fanning out looking for relations where it is either the subject or object, such as the user who created it, the job it came from, where it’s stored, etc. It would recurse on these neighboring nodes until a total number of results is reached. However, if the cardinality of a node is too high (for example, a user that owns a large number of datasets), the neighbors of that node will not be found. Really, the goal is to find the most distance relevant relationships possible, and this is our current naïve way of doing so.

Do you want to have a short call about this? I think it’d be easier to explain/answer questions over the phone. I’m free pretty much any time 1pm-5pm PST tomorrow (3/1).

Thanks,
Eric

On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:

    deep vs wide: I played around with the property paths sparql operator and
    put up an example here [1].  This is a slightly different query than the
    one I sent out before.  It would be worth it for us to look at how this is
    actually executed by OpenRDF.
    
    Eric: Could you clarify by "deep vs wide"?  I think I understand your
    queries, but I don't have a good intuition about those terms and how
    cardinality might figure into a query.  It would probably be a bit more
    helpful if you provided a model or general description that is (somewhat)
    representative of your data.
    
    --Aaron
    
    [1]
    https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
    
    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <ad...@usna.edu> wrote:
    
    > Hi Eric,
    >
    > If you want to query by the Accumulo timestamp, something like
    > timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not try
    > it lately, but timeRange() was in Rya originally. Not sure if it was
    > removed in later iterations or whether it would be useful for your use
    > case. First Rya paper
    > https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf discusses
    > time ranges (Section 5.3 at the link above)
    >
    > Adina
    >
    > On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pu...@gmail.com> wrote:
    >
    > > Hey John,
    > > I'm pretty sure your pull request was merged-- it was pulled in through
    > > another pull request.  If not, sorry-- I thought it had been merged and
    > > then just not closed.  I was going to spend some time doing merges
    > tomorrow
    > > so I can get it tomorrow.
    > >
    > > Sent from my iPhone
    > >
    > > > On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com> wrote:
    > > >
    > > > I have a pull request that fixes that problem.. it has been stuck in
    > > limbo
    > > > for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
    > > > someone merge it into master?
    > > >
    > > >> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com>
    > > wrote:
    > > >>
    > > >> Cool, thanks for the help.
    > > >> By the way, the link to the Rya Manual is outdated on the
    > > rya.apache.org
    > > >> site. Should be pointing at https://github.com/apache/
    > > >> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
    > index.md
    > > >>
    > > >> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com>
    > > wrote:
    > > >>
    > > >>    deep vs wide:
    > > >>
    > > >>    A property path query is probably your best bet.  Something like:
    > > >>
    > > >>    for the following data:
    > > >>
    > > >>    s:EventA p:causes s:EventB
    > > >>    s:EventB p:causes s:EventC
    > > >>    s:EventC p:causes s:EventD
    > > >>
    > > >>
    > > >>    This query would start at EventB and work it's way up and down the
    > > >> chain:
    > > >>
    > > >>    SELECT * WHERE {
    > > >>       <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
    > > >>    }
    > > >>
    > > >>
    > > >>    On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
    > > Caleb.Meier@parsons.com>
    > > >>    wrote:
    > > >>
    > > >>> Yes, that's a good place to start.  If you have external timestamps
    > > >> that
    > > >>> are built into your graph using the time ontology in owl (e.g you
    > > >> have
    > > >>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
    > > >> the
    > > >>> temporal index is exactly what you want.  If you are hoping to query
    > > >> based
    > > >>> on the internal timestamps that Accumulo assigns to your triples,
    > > >> then
    > > >>> there are some slight tweaks that can be done to facilitate this,
    > > >> but it
    > > >>> won't be nearly as efficient (this will require some sort of client
    > > >> side
    > > >>> filtering).
    > > >>>
    > > >>> Caleb A. Meier, Ph.D.
    > > >>> Software Engineer II ♦ Analyst
    > > >>> Parsons Corporation
    > > >>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
    > > >>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
    > > >>> Caleb.Meier@Parsons.com ♦ www.parsons.com
    > > >>>
    > > >>> -----Original Message-----
    > > >>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
    > > >>> Sent: Thursday, February 23, 2017 2:27 PM
    > > >>> To: dev@rya.incubator.apache.org
    > > >>> Subject: Re: Timestamps and Cardinality in Queries
    > > >>>
    > > >>> We’d like to be able to query by timestamp; specifically, we want to
    > > >> be
    > > >>> able to find all statements that were made within a given time
    > > >> range. Is
    > > >>> this what I should be looking at?
    > > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
    > > >> apache.org_confluence_download_attachments_63407907_
    > > >> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
    > > >> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
    > > >> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
    > > geo_4WXTD0qo8&m=
    > > >> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
    > > >> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    > > >>>
    > > >>>
    > > >>>
    > > >>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Hey Eric,
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Currently timestamps can't be queried in Rya.  Do you need to be
    > > >> able
    > > >>> to query by timestamp, or simply discover the timestamp for a given
    > > >> node?
    > > >>> Rya does have a temporal index, but that requires you to use a
    > > >> temporal
    > > >>> ontology to model the temporal properties of your graph nodes.
    > > >>>
    > > >>>    ________________________________________
    > > >>>
    > > >>>    From: Liu, Eric <Er...@capitalone.com>
    > > >>>
    > > >>>    Sent: Wednesday, February 22, 2017 6:38 PM
    > > >>>
    > > >>>    To: dev@rya.incubator.apache.org
    > > >>>
    > > >>>    Subject: Timestamps and Cardinality in Queries
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Hi,
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Continuing from our talk earlier today I was wondering if you
    > > >> could
    > > >>> provide more information about how timestamps could be queried in
    > > >> Rya.
    > > >>>
    > > >>>    Also, we are trying to support a type of query that would
    > > >> essentially
    > > >>> be limiting on cardinality (different from the normal SPARQL limit
    > > >> because
    > > >>> it’s for node cardinality rather than total results). I saw in one of
    > > >>> Caleb’s talks that Rya’s query optimization involves checking
    > > >> cardinality
    > > >>> first. I was wondering if there would be some way to tap into this
    > > >> feature
    > > >>> for usage in queries?
    > > >>>
    > > >>>
    > > >>>
    > > >>>    Thanks,
    > > >>>
    > > >>>    Eric Liu
    > > >>>
    > > >>>    ________________________________________________________
    > > >>>
    > > >>>
    > > >>>
    > > >>>    The information contained in this e-mail is confidential and/or
    > > >>> proprietary to Capital One and/or its affiliates and may only be used
    > > >>> solely in performance of work or services for Capital One. The
    > > >> information
    > > >>> transmitted herewith is intended only for use by the individual or
    > > >> entity
    > > >>> to which it is addressed. If the reader of this message is not the
    > > >> intended
    > > >>> recipient, you are hereby notified that any review, retransmission,
    > > >>> dissemination, distribution, copying or other use of, or taking of
    > > >> any
    > > >>> action in reliance upon this information is strictly prohibited. If
    > > >> you
    > > >>> have received this communication in error, please contact the sender
    > > >> and
    > > >>> delete the material from your computer.
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>>
    > > >>> ________________________________________________________
    > > >>>
    > > >>>
    > > >>>
    > > >>> The information contained in this e-mail is confidential and/or
    > > >>> proprietary to Capital One and/or its affiliates and may only be used
    > > >>> solely in performance of work or services for Capital One. The
    > > >> information
    > > >>> transmitted herewith is intended only for use by the individual or
    > > >> entity
    > > >>> to which it is addressed. If the reader of this message is not the
    > > >> intended
    > > >>> recipient, you are hereby notified that any review, retransmission,
    > > >>> dissemination, distribution, copying or other use of, or taking of
    > > >> any
    > > >>> action in reliance upon this information is strictly prohibited. If
    > > >> you
    > > >>> have received this communication in error, please contact the sender
    > > >> and
    > > >>> delete the material from your computer.
    > > >>>
    > > >>
    > > >>
    > > >> ________________________________________________________
    > > >>
    > > >> The information contained in this e-mail is confidential and/or
    > > >> proprietary to Capital One and/or its affiliates and may only be used
    > > >> solely in performance of work or services for Capital One. The
    > > information
    > > >> transmitted herewith is intended only for use by the individual or
    > > entity
    > > >> to which it is addressed. If the reader of this message is not the
    > > intended
    > > >> recipient, you are hereby notified that any review, retransmission,
    > > >> dissemination, distribution, copying or other use of, or taking of any
    > > >> action in reliance upon this information is strictly prohibited. If
    > you
    > > >> have received this communication in error, please contact the sender
    > and
    > > >> delete the material from your computer.
    > > >>
    > >
    >
    >
    >
    > --
    > Dr. Adina Crainiceanu
    > Associate Professor, Computer Science Department
    > United States Naval Academy
    > 410-293-6822 <(410)%20293-6822>
    > adina@usna.edu
    > http://www.usna.edu/Users/cs/adina/
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
deep vs wide: I played around with the property paths sparql operator and
put up an example here [1].  This is a slightly different query than the
one I sent out before.  It would be worth it for us to look at how this is
actually executed by OpenRDF.

Eric: Could you clarify by "deep vs wide"?  I think I understand your
queries, but I don't have a good intuition about those terms and how
cardinality might figure into a query.  It would probably be a bit more
helpful if you provided a model or general description that is (somewhat)
representative of your data.

--Aaron

[1]
https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java

On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <ad...@usna.edu> wrote:

> Hi Eric,
>
> If you want to query by the Accumulo timestamp, something like
> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not try
> it lately, but timeRange() was in Rya originally. Not sure if it was
> removed in later iterations or whether it would be useful for your use
> case. First Rya paper
> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf discusses
> time ranges (Section 5.3 at the link above)
>
> Adina
>
> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pu...@gmail.com> wrote:
>
> > Hey John,
> > I'm pretty sure your pull request was merged-- it was pulled in through
> > another pull request.  If not, sorry-- I thought it had been merged and
> > then just not closed.  I was going to spend some time doing merges
> tomorrow
> > so I can get it tomorrow.
> >
> > Sent from my iPhone
> >
> > > On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com> wrote:
> > >
> > > I have a pull request that fixes that problem.. it has been stuck in
> > limbo
> > > for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
> > > someone merge it into master?
> > >
> > >> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com>
> > wrote:
> > >>
> > >> Cool, thanks for the help.
> > >> By the way, the link to the Rya Manual is outdated on the
> > rya.apache.org
> > >> site. Should be pointing at https://github.com/apache/
> > >> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
> index.md
> > >>
> > >> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com>
> > wrote:
> > >>
> > >>    deep vs wide:
> > >>
> > >>    A property path query is probably your best bet.  Something like:
> > >>
> > >>    for the following data:
> > >>
> > >>    s:EventA p:causes s:EventB
> > >>    s:EventB p:causes s:EventC
> > >>    s:EventC p:causes s:EventD
> > >>
> > >>
> > >>    This query would start at EventB and work it's way up and down the
> > >> chain:
> > >>
> > >>    SELECT * WHERE {
> > >>       <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
> > >>    }
> > >>
> > >>
> > >>    On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
> > Caleb.Meier@parsons.com>
> > >>    wrote:
> > >>
> > >>> Yes, that's a good place to start.  If you have external timestamps
> > >> that
> > >>> are built into your graph using the time ontology in owl (e.g you
> > >> have
> > >>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
> > >> the
> > >>> temporal index is exactly what you want.  If you are hoping to query
> > >> based
> > >>> on the internal timestamps that Accumulo assigns to your triples,
> > >> then
> > >>> there are some slight tweaks that can be done to facilitate this,
> > >> but it
> > >>> won't be nearly as efficient (this will require some sort of client
> > >> side
> > >>> filtering).
> > >>>
> > >>> Caleb A. Meier, Ph.D.
> > >>> Software Engineer II ♦ Analyst
> > >>> Parsons Corporation
> > >>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> > >>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
> > >>> Caleb.Meier@Parsons.com ♦ www.parsons.com
> > >>>
> > >>> -----Original Message-----
> > >>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
> > >>> Sent: Thursday, February 23, 2017 2:27 PM
> > >>> To: dev@rya.incubator.apache.org
> > >>> Subject: Re: Timestamps and Cardinality in Queries
> > >>>
> > >>> We’d like to be able to query by timestamp; specifically, we want to
> > >> be
> > >>> able to find all statements that were made within a given time
> > >> range. Is
> > >>> this what I should be looking at?
> > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> > >> apache.org_confluence_download_attachments_63407907_
> > >> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
> > >> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
> > >> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
> > geo_4WXTD0qo8&m=
> > >> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
> > >> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
> > >>>
> > >>>
> > >>>
> > >>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
> > >>>
> > >>>
> > >>>
> > >>>    Hey Eric,
> > >>>
> > >>>
> > >>>
> > >>>    Currently timestamps can't be queried in Rya.  Do you need to be
> > >> able
> > >>> to query by timestamp, or simply discover the timestamp for a given
> > >> node?
> > >>> Rya does have a temporal index, but that requires you to use a
> > >> temporal
> > >>> ontology to model the temporal properties of your graph nodes.
> > >>>
> > >>>    ________________________________________
> > >>>
> > >>>    From: Liu, Eric <Er...@capitalone.com>
> > >>>
> > >>>    Sent: Wednesday, February 22, 2017 6:38 PM
> > >>>
> > >>>    To: dev@rya.incubator.apache.org
> > >>>
> > >>>    Subject: Timestamps and Cardinality in Queries
> > >>>
> > >>>
> > >>>
> > >>>    Hi,
> > >>>
> > >>>
> > >>>
> > >>>    Continuing from our talk earlier today I was wondering if you
> > >> could
> > >>> provide more information about how timestamps could be queried in
> > >> Rya.
> > >>>
> > >>>    Also, we are trying to support a type of query that would
> > >> essentially
> > >>> be limiting on cardinality (different from the normal SPARQL limit
> > >> because
> > >>> it’s for node cardinality rather than total results). I saw in one of
> > >>> Caleb’s talks that Rya’s query optimization involves checking
> > >> cardinality
> > >>> first. I was wondering if there would be some way to tap into this
> > >> feature
> > >>> for usage in queries?
> > >>>
> > >>>
> > >>>
> > >>>    Thanks,
> > >>>
> > >>>    Eric Liu
> > >>>
> > >>>    ________________________________________________________
> > >>>
> > >>>
> > >>>
> > >>>    The information contained in this e-mail is confidential and/or
> > >>> proprietary to Capital One and/or its affiliates and may only be used
> > >>> solely in performance of work or services for Capital One. The
> > >> information
> > >>> transmitted herewith is intended only for use by the individual or
> > >> entity
> > >>> to which it is addressed. If the reader of this message is not the
> > >> intended
> > >>> recipient, you are hereby notified that any review, retransmission,
> > >>> dissemination, distribution, copying or other use of, or taking of
> > >> any
> > >>> action in reliance upon this information is strictly prohibited. If
> > >> you
> > >>> have received this communication in error, please contact the sender
> > >> and
> > >>> delete the material from your computer.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> ________________________________________________________
> > >>>
> > >>>
> > >>>
> > >>> The information contained in this e-mail is confidential and/or
> > >>> proprietary to Capital One and/or its affiliates and may only be used
> > >>> solely in performance of work or services for Capital One. The
> > >> information
> > >>> transmitted herewith is intended only for use by the individual or
> > >> entity
> > >>> to which it is addressed. If the reader of this message is not the
> > >> intended
> > >>> recipient, you are hereby notified that any review, retransmission,
> > >>> dissemination, distribution, copying or other use of, or taking of
> > >> any
> > >>> action in reliance upon this information is strictly prohibited. If
> > >> you
> > >>> have received this communication in error, please contact the sender
> > >> and
> > >>> delete the material from your computer.
> > >>>
> > >>
> > >>
> > >> ________________________________________________________
> > >>
> > >> The information contained in this e-mail is confidential and/or
> > >> proprietary to Capital One and/or its affiliates and may only be used
> > >> solely in performance of work or services for Capital One. The
> > information
> > >> transmitted herewith is intended only for use by the individual or
> > entity
> > >> to which it is addressed. If the reader of this message is not the
> > intended
> > >> recipient, you are hereby notified that any review, retransmission,
> > >> dissemination, distribution, copying or other use of, or taking of any
> > >> action in reliance upon this information is strictly prohibited. If
> you
> > >> have received this communication in error, please contact the sender
> and
> > >> delete the material from your computer.
> > >>
> >
>
>
>
> --
> Dr. Adina Crainiceanu
> Associate Professor, Computer Science Department
> United States Naval Academy
> 410-293-6822 <(410)%20293-6822>
> adina@usna.edu
> http://www.usna.edu/Users/cs/adina/
>

Re: Timestamps and Cardinality in Queries

Posted by Adina Crainiceanu <ad...@usna.edu>.
Hi Eric,

If you want to query by the Accumulo timestamp, something like
timeRange(?ts, 13141201490, 13249201490) should work in Rya. I did not try
it lately, but timeRange() was in Rya originally. Not sure if it was
removed in later iterations or whether it would be useful for your use
case. First Rya paper
https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf discusses
time ranges (Section 5.3 at the link above)

Adina

On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <pu...@gmail.com> wrote:

> Hey John,
> I'm pretty sure your pull request was merged-- it was pulled in through
> another pull request.  If not, sorry-- I thought it had been merged and
> then just not closed.  I was going to spend some time doing merges tomorrow
> so I can get it tomorrow.
>
> Sent from my iPhone
>
> > On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com> wrote:
> >
> > I have a pull request that fixes that problem.. it has been stuck in
> limbo
> > for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
> > someone merge it into master?
> >
> >> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com>
> wrote:
> >>
> >> Cool, thanks for the help.
> >> By the way, the link to the Rya Manual is outdated on the
> rya.apache.org
> >> site. Should be pointing at https://github.com/apache/
> >> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_index.md
> >>
> >> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com>
> wrote:
> >>
> >>    deep vs wide:
> >>
> >>    A property path query is probably your best bet.  Something like:
> >>
> >>    for the following data:
> >>
> >>    s:EventA p:causes s:EventB
> >>    s:EventB p:causes s:EventC
> >>    s:EventC p:causes s:EventD
> >>
> >>
> >>    This query would start at EventB and work it's way up and down the
> >> chain:
> >>
> >>    SELECT * WHERE {
> >>       <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
> >>    }
> >>
> >>
> >>    On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
> Caleb.Meier@parsons.com>
> >>    wrote:
> >>
> >>> Yes, that's a good place to start.  If you have external timestamps
> >> that
> >>> are built into your graph using the time ontology in owl (e.g you
> >> have
> >>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
> >> the
> >>> temporal index is exactly what you want.  If you are hoping to query
> >> based
> >>> on the internal timestamps that Accumulo assigns to your triples,
> >> then
> >>> there are some slight tweaks that can be done to facilitate this,
> >> but it
> >>> won't be nearly as efficient (this will require some sort of client
> >> side
> >>> filtering).
> >>>
> >>> Caleb A. Meier, Ph.D.
> >>> Software Engineer II ♦ Analyst
> >>> Parsons Corporation
> >>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> >>> Office:  (703)797-3066 <(703)%20797-3066>
> >>> Caleb.Meier@Parsons.com ♦ www.parsons.com
> >>>
> >>> -----Original Message-----
> >>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
> >>> Sent: Thursday, February 23, 2017 2:27 PM
> >>> To: dev@rya.incubator.apache.org
> >>> Subject: Re: Timestamps and Cardinality in Queries
> >>>
> >>> We’d like to be able to query by timestamp; specifically, we want to
> >> be
> >>> able to find all statements that were made within a given time
> >> range. Is
> >>> this what I should be looking at?
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> >> apache.org_confluence_download_attachments_63407907_
> >> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
> >> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
> >> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
> geo_4WXTD0qo8&m=
> >> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
> >> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
> >>>
> >>>
> >>>
> >>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
> >>>
> >>>
> >>>
> >>>    Hey Eric,
> >>>
> >>>
> >>>
> >>>    Currently timestamps can't be queried in Rya.  Do you need to be
> >> able
> >>> to query by timestamp, or simply discover the timestamp for a given
> >> node?
> >>> Rya does have a temporal index, but that requires you to use a
> >> temporal
> >>> ontology to model the temporal properties of your graph nodes.
> >>>
> >>>    ________________________________________
> >>>
> >>>    From: Liu, Eric <Er...@capitalone.com>
> >>>
> >>>    Sent: Wednesday, February 22, 2017 6:38 PM
> >>>
> >>>    To: dev@rya.incubator.apache.org
> >>>
> >>>    Subject: Timestamps and Cardinality in Queries
> >>>
> >>>
> >>>
> >>>    Hi,
> >>>
> >>>
> >>>
> >>>    Continuing from our talk earlier today I was wondering if you
> >> could
> >>> provide more information about how timestamps could be queried in
> >> Rya.
> >>>
> >>>    Also, we are trying to support a type of query that would
> >> essentially
> >>> be limiting on cardinality (different from the normal SPARQL limit
> >> because
> >>> it’s for node cardinality rather than total results). I saw in one of
> >>> Caleb’s talks that Rya’s query optimization involves checking
> >> cardinality
> >>> first. I was wondering if there would be some way to tap into this
> >> feature
> >>> for usage in queries?
> >>>
> >>>
> >>>
> >>>    Thanks,
> >>>
> >>>    Eric Liu
> >>>
> >>>    ________________________________________________________
> >>>
> >>>
> >>>
> >>>    The information contained in this e-mail is confidential and/or
> >>> proprietary to Capital One and/or its affiliates and may only be used
> >>> solely in performance of work or services for Capital One. The
> >> information
> >>> transmitted herewith is intended only for use by the individual or
> >> entity
> >>> to which it is addressed. If the reader of this message is not the
> >> intended
> >>> recipient, you are hereby notified that any review, retransmission,
> >>> dissemination, distribution, copying or other use of, or taking of
> >> any
> >>> action in reliance upon this information is strictly prohibited. If
> >> you
> >>> have received this communication in error, please contact the sender
> >> and
> >>> delete the material from your computer.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ________________________________________________________
> >>>
> >>>
> >>>
> >>> The information contained in this e-mail is confidential and/or
> >>> proprietary to Capital One and/or its affiliates and may only be used
> >>> solely in performance of work or services for Capital One. The
> >> information
> >>> transmitted herewith is intended only for use by the individual or
> >> entity
> >>> to which it is addressed. If the reader of this message is not the
> >> intended
> >>> recipient, you are hereby notified that any review, retransmission,
> >>> dissemination, distribution, copying or other use of, or taking of
> >> any
> >>> action in reliance upon this information is strictly prohibited. If
> >> you
> >>> have received this communication in error, please contact the sender
> >> and
> >>> delete the material from your computer.
> >>>
> >>
> >>
> >> ________________________________________________________
> >>
> >> The information contained in this e-mail is confidential and/or
> >> proprietary to Capital One and/or its affiliates and may only be used
> >> solely in performance of work or services for Capital One. The
> information
> >> transmitted herewith is intended only for use by the individual or
> entity
> >> to which it is addressed. If the reader of this message is not the
> intended
> >> recipient, you are hereby notified that any review, retransmission,
> >> dissemination, distribution, copying or other use of, or taking of any
> >> action in reliance upon this information is strictly prohibited. If you
> >> have received this communication in error, please contact the sender and
> >> delete the material from your computer.
> >>
>



-- 
Dr. Adina Crainiceanu
Associate Professor, Computer Science Department
United States Naval Academy
410-293-6822
adina@usna.edu
http://www.usna.edu/Users/cs/adina/

Re: Timestamps and Cardinality in Queries

Posted by Puja Valiyil <pu...@gmail.com>.
Hey John,
I'm pretty sure your pull request was merged-- it was pulled in through another pull request.  If not, sorry-- I thought it had been merged and then just not closed.  I was going to spend some time doing merges tomorrow so I can get it tomorrow.  

Sent from my iPhone

> On Feb 23, 2017, at 8:13 PM, John Smith <jo...@gmail.com> wrote:
> 
> I have a pull request that fixes that problem.. it has been stuck in limbo
> for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
> someone merge it into master?
> 
>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com> wrote:
>> 
>> Cool, thanks for the help.
>> By the way, the link to the Rya Manual is outdated on the rya.apache.org
>> site. Should be pointing at https://github.com/apache/
>> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_index.md
>> 
>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
>> 
>>    deep vs wide:
>> 
>>    A property path query is probably your best bet.  Something like:
>> 
>>    for the following data:
>> 
>>    s:EventA p:causes s:EventB
>>    s:EventB p:causes s:EventC
>>    s:EventC p:causes s:EventD
>> 
>> 
>>    This query would start at EventB and work it's way up and down the
>> chain:
>> 
>>    SELECT * WHERE {
>>       <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
>>    }
>> 
>> 
>>    On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <Ca...@parsons.com>
>>    wrote:
>> 
>>> Yes, that's a good place to start.  If you have external timestamps
>> that
>>> are built into your graph using the time ontology in owl (e.g you
>> have
>>> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
>> the
>>> temporal index is exactly what you want.  If you are hoping to query
>> based
>>> on the internal timestamps that Accumulo assigns to your triples,
>> then
>>> there are some slight tweaks that can be done to facilitate this,
>> but it
>>> won't be nearly as efficient (this will require some sort of client
>> side
>>> filtering).
>>> 
>>> Caleb A. Meier, Ph.D.
>>> Software Engineer II ♦ Analyst
>>> Parsons Corporation
>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>>> Office:  (703)797-3066 <(703)%20797-3066>
>>> Caleb.Meier@Parsons.com ♦ www.parsons.com
>>> 
>>> -----Original Message-----
>>> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
>>> Sent: Thursday, February 23, 2017 2:27 PM
>>> To: dev@rya.incubator.apache.org
>>> Subject: Re: Timestamps and Cardinality in Queries
>>> 
>>> We’d like to be able to query by timestamp; specifically, we want to
>> be
>>> able to find all statements that were made within a given time
>> range. Is
>>> this what I should be looking at?
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>> apache.org_confluence_download_attachments_63407907_
>> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=
>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>>> 
>>> 
>>> 
>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
>>> 
>>> 
>>> 
>>>    Hey Eric,
>>> 
>>> 
>>> 
>>>    Currently timestamps can't be queried in Rya.  Do you need to be
>> able
>>> to query by timestamp, or simply discover the timestamp for a given
>> node?
>>> Rya does have a temporal index, but that requires you to use a
>> temporal
>>> ontology to model the temporal properties of your graph nodes.
>>> 
>>>    ________________________________________
>>> 
>>>    From: Liu, Eric <Er...@capitalone.com>
>>> 
>>>    Sent: Wednesday, February 22, 2017 6:38 PM
>>> 
>>>    To: dev@rya.incubator.apache.org
>>> 
>>>    Subject: Timestamps and Cardinality in Queries
>>> 
>>> 
>>> 
>>>    Hi,
>>> 
>>> 
>>> 
>>>    Continuing from our talk earlier today I was wondering if you
>> could
>>> provide more information about how timestamps could be queried in
>> Rya.
>>> 
>>>    Also, we are trying to support a type of query that would
>> essentially
>>> be limiting on cardinality (different from the normal SPARQL limit
>> because
>>> it’s for node cardinality rather than total results). I saw in one of
>>> Caleb’s talks that Rya’s query optimization involves checking
>> cardinality
>>> first. I was wondering if there would be some way to tap into this
>> feature
>>> for usage in queries?
>>> 
>>> 
>>> 
>>>    Thanks,
>>> 
>>>    Eric Liu
>>> 
>>>    ________________________________________________________
>>> 
>>> 
>>> 
>>>    The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The
>> information
>>> transmitted herewith is intended only for use by the individual or
>> entity
>>> to which it is addressed. If the reader of this message is not the
>> intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of
>> any
>>> action in reliance upon this information is strictly prohibited. If
>> you
>>> have received this communication in error, please contact the sender
>> and
>>> delete the material from your computer.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ________________________________________________________
>>> 
>>> 
>>> 
>>> The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The
>> information
>>> transmitted herewith is intended only for use by the individual or
>> entity
>>> to which it is addressed. If the reader of this message is not the
>> intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of
>> any
>>> action in reliance upon this information is strictly prohibited. If
>> you
>>> have received this communication in error, please contact the sender
>> and
>>> delete the material from your computer.
>>> 
>> 
>> 
>> ________________________________________________________
>> 
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>> 

Re: Timestamps and Cardinality in Queries

Posted by John Smith <jo...@gmail.com>.
I have a pull request that fixes that problem.. it has been stuck in limbo
for months.. https://github.com/apache/incubator-rya-site/pull/1  Can
someone merge it into master?

On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <Er...@capitalone.com> wrote:

> Cool, thanks for the help.
> By the way, the link to the Rya Manual is outdated on the rya.apache.org
> site. Should be pointing at https://github.com/apache/
> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_index.md
>
> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:
>
>     deep vs wide:
>
>     A property path query is probably your best bet.  Something like:
>
>     for the following data:
>
>     s:EventA p:causes s:EventB
>     s:EventB p:causes s:EventC
>     s:EventC p:causes s:EventD
>
>
>     This query would start at EventB and work it's way up and down the
> chain:
>
>     SELECT * WHERE {
>        <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
>     }
>
>
>     On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <Ca...@parsons.com>
>     wrote:
>
>     > Yes, that's a good place to start.  If you have external timestamps
> that
>     > are built into your graph using the time ontology in owl (e.g you
> have
>     > triples of the form (event123, time:inDateTime, 2017-02-23T14:29)),
> the
>     > temporal index is exactly what you want.  If you are hoping to query
> based
>     > on the internal timestamps that Accumulo assigns to your triples,
> then
>     > there are some slight tweaks that can be done to facilitate this,
> but it
>     > won't be nearly as efficient (this will require some sort of client
> side
>     > filtering).
>     >
>     > Caleb A. Meier, Ph.D.
>     > Software Engineer II ♦ Analyst
>     > Parsons Corporation
>     > 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>     > Office:  (703)797-3066 <(703)%20797-3066>
>     > Caleb.Meier@Parsons.com ♦ www.parsons.com
>     >
>     > -----Original Message-----
>     > From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
>     > Sent: Thursday, February 23, 2017 2:27 PM
>     > To: dev@rya.incubator.apache.org
>     > Subject: Re: Timestamps and Cardinality in Queries
>     >
>     > We’d like to be able to query by timestamp; specifically, we want to
> be
>     > able to find all statements that were made within a given time
> range. Is
>     > this what I should be looking at?
>     > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
> apache.org_confluence_download_attachments_63407907_
> Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=
> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>     >
>     >
>     >
>     > On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
>     >
>     >
>     >
>     >     Hey Eric,
>     >
>     >
>     >
>     >     Currently timestamps can't be queried in Rya.  Do you need to be
> able
>     > to query by timestamp, or simply discover the timestamp for a given
> node?
>     > Rya does have a temporal index, but that requires you to use a
> temporal
>     > ontology to model the temporal properties of your graph nodes.
>     >
>     >     ________________________________________
>     >
>     >     From: Liu, Eric <Er...@capitalone.com>
>     >
>     >     Sent: Wednesday, February 22, 2017 6:38 PM
>     >
>     >     To: dev@rya.incubator.apache.org
>     >
>     >     Subject: Timestamps and Cardinality in Queries
>     >
>     >
>     >
>     >     Hi,
>     >
>     >
>     >
>     >     Continuing from our talk earlier today I was wondering if you
> could
>     > provide more information about how timestamps could be queried in
> Rya.
>     >
>     >     Also, we are trying to support a type of query that would
> essentially
>     > be limiting on cardinality (different from the normal SPARQL limit
> because
>     > it’s for node cardinality rather than total results). I saw in one of
>     > Caleb’s talks that Rya’s query optimization involves checking
> cardinality
>     > first. I was wondering if there would be some way to tap into this
> feature
>     > for usage in queries?
>     >
>     >
>     >
>     >     Thanks,
>     >
>     >     Eric Liu
>     >
>     >     ________________________________________________________
>     >
>     >
>     >
>     >     The information contained in this e-mail is confidential and/or
>     > proprietary to Capital One and/or its affiliates and may only be used
>     > solely in performance of work or services for Capital One. The
> information
>     > transmitted herewith is intended only for use by the individual or
> entity
>     > to which it is addressed. If the reader of this message is not the
> intended
>     > recipient, you are hereby notified that any review, retransmission,
>     > dissemination, distribution, copying or other use of, or taking of
> any
>     > action in reliance upon this information is strictly prohibited. If
> you
>     > have received this communication in error, please contact the sender
> and
>     > delete the material from your computer.
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     > ________________________________________________________
>     >
>     >
>     >
>     > The information contained in this e-mail is confidential and/or
>     > proprietary to Capital One and/or its affiliates and may only be used
>     > solely in performance of work or services for Capital One. The
> information
>     > transmitted herewith is intended only for use by the individual or
> entity
>     > to which it is addressed. If the reader of this message is not the
> intended
>     > recipient, you are hereby notified that any review, retransmission,
>     > dissemination, distribution, copying or other use of, or taking of
> any
>     > action in reliance upon this information is strictly prohibited. If
> you
>     > have received this communication in error, please contact the sender
> and
>     > delete the material from your computer.
>     >
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
Cool, thanks for the help.
By the way, the link to the Rya Manual is outdated on the rya.apache.org site. Should be pointing at https://github.com/apache/incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_index.md

On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <aa...@gmail.com> wrote:

    deep vs wide:
    
    A property path query is probably your best bet.  Something like:
    
    for the following data:
    
    s:EventA p:causes s:EventB
    s:EventB p:causes s:EventC
    s:EventC p:causes s:EventD
    
    
    This query would start at EventB and work it's way up and down the chain:
    
    SELECT * WHERE {
       <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
    }
    
    
    On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <Ca...@parsons.com>
    wrote:
    
    > Yes, that's a good place to start.  If you have external timestamps that
    > are built into your graph using the time ontology in owl (e.g you have
    > triples of the form (event123, time:inDateTime, 2017-02-23T14:29)), the
    > temporal index is exactly what you want.  If you are hoping to query based
    > on the internal timestamps that Accumulo assigns to your triples, then
    > there are some slight tweaks that can be done to facilitate this, but it
    > won't be nearly as efficient (this will require some sort of client side
    > filtering).
    >
    > Caleb A. Meier, Ph.D.
    > Software Engineer II ♦ Analyst
    > Parsons Corporation
    > 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
    > Office:  (703)797-3066 <(703)%20797-3066>
    > Caleb.Meier@Parsons.com ♦ www.parsons.com
    >
    > -----Original Message-----
    > From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
    > Sent: Thursday, February 23, 2017 2:27 PM
    > To: dev@rya.incubator.apache.org
    > Subject: Re: Timestamps and Cardinality in Queries
    >
    > We’d like to be able to query by timestamp; specifically, we want to be
    > able to find all statements that were made within a given time range. Is
    > this what I should be looking at?
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_63407907_Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    >
    >
    >
    > On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
    >
    >
    >
    >     Hey Eric,
    >
    >
    >
    >     Currently timestamps can't be queried in Rya.  Do you need to be able
    > to query by timestamp, or simply discover the timestamp for a given node?
    > Rya does have a temporal index, but that requires you to use a temporal
    > ontology to model the temporal properties of your graph nodes.
    >
    >     ________________________________________
    >
    >     From: Liu, Eric <Er...@capitalone.com>
    >
    >     Sent: Wednesday, February 22, 2017 6:38 PM
    >
    >     To: dev@rya.incubator.apache.org
    >
    >     Subject: Timestamps and Cardinality in Queries
    >
    >
    >
    >     Hi,
    >
    >
    >
    >     Continuing from our talk earlier today I was wondering if you could
    > provide more information about how timestamps could be queried in Rya.
    >
    >     Also, we are trying to support a type of query that would essentially
    > be limiting on cardinality (different from the normal SPARQL limit because
    > it’s for node cardinality rather than total results). I saw in one of
    > Caleb’s talks that Rya’s query optimization involves checking cardinality
    > first. I was wondering if there would be some way to tap into this feature
    > for usage in queries?
    >
    >
    >
    >     Thanks,
    >
    >     Eric Liu
    >
    >     ________________________________________________________
    >
    >
    >
    >     The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    >
    >
    >
    >
    >
    >
    >
    > ________________________________________________________
    >
    >
    >
    > The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
deep vs wide:

A property path query is probably your best bet.  Something like:

for the following data:

s:EventA p:causes s:EventB
s:EventB p:causes s:EventC
s:EventC p:causes s:EventD


This query would start at EventB and work it's way up and down the chain:

SELECT * WHERE {
   <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
}


On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <Ca...@parsons.com>
wrote:

> Yes, that's a good place to start.  If you have external timestamps that
> are built into your graph using the time ontology in owl (e.g you have
> triples of the form (event123, time:inDateTime, 2017-02-23T14:29)), the
> temporal index is exactly what you want.  If you are hoping to query based
> on the internal timestamps that Accumulo assigns to your triples, then
> there are some slight tweaks that can be done to facilitate this, but it
> won't be nearly as efficient (this will require some sort of client side
> filtering).
>
> Caleb A. Meier, Ph.D.
> Software Engineer II ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066 <(703)%20797-3066>
> Caleb.Meier@Parsons.com ♦ www.parsons.com
>
> -----Original Message-----
> From: Liu, Eric [mailto:Eric.Liu@capitalone.com]
> Sent: Thursday, February 23, 2017 2:27 PM
> To: dev@rya.incubator.apache.org
> Subject: Re: Timestamps and Cardinality in Queries
>
> We’d like to be able to query by timestamp; specifically, we want to be
> able to find all statements that were made within a given time range. Is
> this what I should be looking at?
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_63407907_Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>
>
>
> On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:
>
>
>
>     Hey Eric,
>
>
>
>     Currently timestamps can't be queried in Rya.  Do you need to be able
> to query by timestamp, or simply discover the timestamp for a given node?
> Rya does have a temporal index, but that requires you to use a temporal
> ontology to model the temporal properties of your graph nodes.
>
>     ________________________________________
>
>     From: Liu, Eric <Er...@capitalone.com>
>
>     Sent: Wednesday, February 22, 2017 6:38 PM
>
>     To: dev@rya.incubator.apache.org
>
>     Subject: Timestamps and Cardinality in Queries
>
>
>
>     Hi,
>
>
>
>     Continuing from our talk earlier today I was wondering if you could
> provide more information about how timestamps could be queried in Rya.
>
>     Also, we are trying to support a type of query that would essentially
> be limiting on cardinality (different from the normal SPARQL limit because
> it’s for node cardinality rather than total results). I saw in one of
> Caleb’s talks that Rya’s query optimization involves checking cardinality
> first. I was wondering if there would be some way to tap into this feature
> for usage in queries?
>
>
>
>     Thanks,
>
>     Eric Liu
>
>     ________________________________________________________
>
>
>
>     The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>
>
>
> ________________________________________________________
>
>
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

RE: Timestamps and Cardinality in Queries

Posted by "Meier, Caleb" <Ca...@parsons.com>.
Yes, that's a good place to start.  If you have external timestamps that are built into your graph using the time ontology in owl (e.g you have triples of the form (event123, time:inDateTime, 2017-02-23T14:29)), the temporal index is exactly what you want.  If you are hoping to query based on the internal timestamps that Accumulo assigns to your triples, then there are some slight tweaks that can be done to facilitate this, but it won't be nearly as efficient (this will require some sort of client side filtering).  

Caleb A. Meier, Ph.D.
Software Engineer II ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
Caleb.Meier@Parsons.com ♦ www.parsons.com

-----Original Message-----
From: Liu, Eric [mailto:Eric.Liu@capitalone.com] 
Sent: Thursday, February 23, 2017 2:27 PM
To: dev@rya.incubator.apache.org
Subject: Re: Timestamps and Cardinality in Queries

We’d like to be able to query by timestamp; specifically, we want to be able to find all statements that were made within a given time range. Is this what I should be looking at? https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_63407907_Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e= 



On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:



    Hey Eric,

    

    Currently timestamps can't be queried in Rya.  Do you need to be able to query by timestamp, or simply discover the timestamp for a given node?  Rya does have a temporal index, but that requires you to use a temporal ontology to model the temporal properties of your graph nodes.

    ________________________________________

    From: Liu, Eric <Er...@capitalone.com>

    Sent: Wednesday, February 22, 2017 6:38 PM

    To: dev@rya.incubator.apache.org

    Subject: Timestamps and Cardinality in Queries

    

    Hi,

    

    Continuing from our talk earlier today I was wondering if you could provide more information about how timestamps could be queried in Rya.

    Also, we are trying to support a type of query that would essentially be limiting on cardinality (different from the normal SPARQL limit because it’s for node cardinality rather than total results). I saw in one of Caleb’s talks that Rya’s query optimization involves checking cardinality first. I was wondering if there would be some way to tap into this feature for usage in queries?

    

    Thanks,

    Eric Liu

    ________________________________________________________

    

    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

    

    



________________________________________________________



The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Liu, Eric" <Er...@capitalone.com>.
We’d like to be able to query by timestamp; specifically, we want to be able to find all statements that were made within a given time range. Is this what I should be looking at? https://cwiki.apache.org/confluence/download/attachments/63407907/Rya%20Temporal%20Indexing.pdf?version=1&modificationDate=1464789502000&api=v2

On 2/22/17, 6:21 PM, "Meier, Caleb" <Ca...@parsons.com> wrote:

    Hey Eric,
    
    Currently timestamps can't be queried in Rya.  Do you need to be able to query by timestamp, or simply discover the timestamp for a given node?  Rya does have a temporal index, but that requires you to use a temporal ontology to model the temporal properties of your graph nodes.
    ________________________________________
    From: Liu, Eric <Er...@capitalone.com>
    Sent: Wednesday, February 22, 2017 6:38 PM
    To: dev@rya.incubator.apache.org
    Subject: Timestamps and Cardinality in Queries
    
    Hi,
    
    Continuing from our talk earlier today I was wondering if you could provide more information about how timestamps could be queried in Rya.
    Also, we are trying to support a type of query that would essentially be limiting on cardinality (different from the normal SPARQL limit because it’s for node cardinality rather than total results). I saw in one of Caleb’s talks that Rya’s query optimization involves checking cardinality first. I was wondering if there would be some way to tap into this feature for usage in queries?
    
    Thanks,
    Eric Liu
    ________________________________________________________
    
    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
    
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Posted by "Meier, Caleb" <Ca...@parsons.com>.
Hey Eric,

Currently timestamps can't be queried in Rya.  Do you need to be able to query by timestamp, or simply discover the timestamp for a given node?  Rya does have a temporal index, but that requires you to use a temporal ontology to model the temporal properties of your graph nodes.
________________________________________
From: Liu, Eric <Er...@capitalone.com>
Sent: Wednesday, February 22, 2017 6:38 PM
To: dev@rya.incubator.apache.org
Subject: Timestamps and Cardinality in Queries

Hi,

Continuing from our talk earlier today I was wondering if you could provide more information about how timestamps could be queried in Rya.
Also, we are trying to support a type of query that would essentially be limiting on cardinality (different from the normal SPARQL limit because it’s for node cardinality rather than total results). I saw in one of Caleb’s talks that Rya’s query optimization involves checking cardinality first. I was wondering if there would be some way to tap into this feature for usage in queries?

Thanks,
Eric Liu
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.