You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by Eldon Carman <ec...@ucr.edu> on 2013/11/28 05:16:25 UTC

Benchmark Query Update

It appears that our query process is taking longer than expected. I have
created a small set of sensors to test our benchmark queries. The data set
is about 250 MB and the queries execute in 10 to 20 seconds with the SAXON
XSLT processor. When I tried a few of the queries on VXQuery, the process
ran for one hour and still did not complete. I am now looking into where
the time is being spent for our query and see why its taking so long.

Re: Benchmark Query Update

Posted by Vinayak Borkar <vi...@gmail.com>.
Excellent!


On 12/13/13, 1:12 PM, Eldon Carman wrote:
> I added the rule to take the previously mentioned subplan and make it
> into a single assign for child. The change dropped 4 minutes off each
> child path step that was found in the pattern mentioned. I have attached
> the new query plan and the results of several modified queries to show
> the change in times based on new additions to the query.
>
> Saxon Execution time: 0m36.009s
> VXQuery Execution time: 1m33.632s
>
>
> On Thu, Dec 12, 2013 at 11:51 AM, Eldon Carman <ecarm002@ucr.edu
> <ma...@ucr.edu>> wrote:
>
>     After finishing the rewrite rule to merge the child path steps, I
>     ran a few tests. The results of the query's and plans are attached.
>
>     First I noted when the following group of operators were added to
>     the plan, the time changed by 4 minutes (from 35s to 4m27s).
>
>                subplan {
>                          aggregate [$$19] <- [function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}sequence,
>     Args:[function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}child,
>     Args:[function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}treat, Args:[%0->$$17,
>     {http://www.w3.org/2001/XMLSchema}int
>     <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>     [1d000000ee])], {http://www.w3.org/2001/XMLSchema}int
>     <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>     [1d0000010b])]]]
>                          -- AGGREGATE  |LOCAL|
>                            unnest $$17 <- function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15]
>                            -- UNNEST  |LOCAL|
>                              nested tuple source
>                              -- NESTED_TUPLE_SOURCE  |LOCAL|
>                       }
>                -- SUBPLAN  |PARTITIONED|
>
>     The above query plan section appears twice in the original query. If
>     each takes 4 minutes that would account for most of the time. My
>     test with the original query has a time of 9m16.336s.
>
>     I suggest a rewrite rule that could change this plan section to a
>     single assign.
>
>     Does anything in this plan section stand out as being slow? Is it
>     just the number of operators? The child path step function is fairly
>     fast.
>
>
>     On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <ecarm002@ucr.edu
>     <ma...@ucr.edu>> wrote:
>
>         The first query (q00.xq) was executed 10 times on the 10
>         stations of data. The data contains 6,827 files
>         (/dataCollection) with 206,686 sensor readings
>         (/dataCollection/data) amounting to ~55 MB. The query was
>         executed 10 times to remove the overhead of starting and stoping
>         the cluster and node controllers in VXQuery.
>
>         (: XQuery Filter Query :)
>         (: See historical data for Riverside, CA (ASN00008113) station
>         by selecting   :)
>         (: the weather readings for December 25 over the last 10 years.
>                        :)
>         let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
>         for $r in collection($collection)/dataCollection/data
>         let $date := xs:date(fn:substring(xs:string(fn:data($r/date)),
>         0, 11))
>         where $r/station eq "GHCND:ASN00008113"
>              and fn:year-from-date($date) >= (2003)
>              and fn:month-from-date($date) eq 12
>              and fn:day-from-date($date) eq 25
>         return $r
>
>         Saxon processed this query 10 times in 35.936s with an average
>         of 3.5936s per query.
>         VXQuery processed this query 10 times in 504.715s with an
>         average of 50.4715s per query.
>
>         I ran the query again with out the date filter options. The
>         query returns all data from station GHCND:ASN00008113.
>         Saxon processed this query 10 times in 35.953s with an average
>         of 3.5953s per query.
>         VXQuery processed this query 10 times in 376.325s with an
>         average of 37.6325s per query.
>
>         The below modified query takes an average of 4.0028s. The query
>         basically touches each sensor reading but does nothing. The
>         select is much simpler and the plan does not have two subplans
>         for paths steps used in the select.
>
>         let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
>         for $r in collection($collection)/dataCollection/data
>         where empty($r)
>         return $r
>
>         The process seems to take a lot of time to prepare data and then
>         execute the select for the where clause.
>
>         Notes on VXQuery performance:
>         ========================
>         The frame size was set to 1 MB.
>         The cpu is at 100% to 260% on a 8 core machine. (100% is one
>         core is being fully used)
>         The disk has sporadic activity.
>         The system has one cluster controller and one node controller
>         set up from inside the CLI script.
>
>         Suggested Options:
>         1. Remove the subplans for path steps going into the select.
>              * The subplan iterates over a field created by an unnest
>         operator. The unnest operator is guaranteed to produce single
>         value items. The subplan is not required when the input is a
>         single item that gets iterated over then result aggregated back
>         together. The process could be a simple assign for the value
>         inside the aggregate (including the rest of the nested plan
>         operators minus the unnest).
>         2. Project unused variables out of the tuple during local
>         execution.
>              * Depends on how the tuples are being passes between
>         operators. Right now a lot of information is stored in the tuple
>         (XML file, all path steps, etc.). Reducing the size could help
>         with coping less information during each new path step.
>
>         Questions?
>         * Can you track to see which operators are taking the longest?
>         * Can you explain the tuple stream and how it interacts with
>         each operator? Is there one stream? Does it only grow or change
>         size at each operator?
>
>
>         On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar
>         <vinayakb@gmail.com <ma...@gmail.com>> wrote:
>
>             Preston,
>
>             Let me suggest a way to track down our performance issues in
>             VXQuery. Let's approach our queries one at a time. First,
>             let's start with the single collection, scan-based queries
>             and reason about their performance in comparison to Saxon.
>             As an even smaller goal, can you take your first query and
>             report running times on the 250MB of data alongwith Saxon's
>             running times?
>
>             Thanks,
>             Vinayak
>
>
>
>
>             On 11/29/13, 12:48 PM, Eldon Carman wrote:
>
>                 The query plans are so big, I attached a document with
>                 the queries and
>                 plans.
>
>
>                 On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar
>                 <vinayakb@gmail.com <ma...@gmail.com>
>                 <mailto:vinayakb@gmail.com <ma...@gmail.com>>>
>                 wrote:
>
>                      Preston,
>
>                      For each query, please send the following:
>
>                      1. The query
>                      2. The translated logical plan
>                      3. The optimized physical plan
>
>                      Thanks,
>                      Vinayak
>
>
>
>                      On 11/27/13, 8:16 PM, Eldon Carman wrote:
>
>                          It appears that our query process is taking
>                 longer than
>                          expected. I have
>                          created a small set of sensors to test our
>                 benchmark queries.
>                          The data set
>                          is about 250 MB and the queries execute in 10
>                 to 20 seconds with
>                          the SAXON
>                          XSLT processor. When I tried a few of the
>                 queries on VXQuery,
>                          the process
>                          ran for one hour and still did not complete. I
>                 am now looking
>                          into where
>                          the time is being spent for our query and see
>                 why its taking so
>                          long.
>
>
>
>
>
>
>
>


Re: Benchmark Query Update

Posted by Vinayak Borkar <vi...@gmail.com>.
Yes.


On 12/16/13, 11:04 PM, Eldon Carman wrote:
> Just to confirm. Your talking about the xqts tests (~19000 queries), correct?
>
> Sent from my iPhone
>
>> On Dec 16, 2013, at 9:34 PM, Vinayak Borkar <vi...@gmail.com> wrote:
>>
>> Preston,
>>
>>
>> Can you try running the W3C XQuery tests against your current codebase with all the rules and optimizations and compare the outcome with running the tests on our last release.
>>
>> Please report the outcome on this list. Let's ensure that we are not regressing while adding these optimizations.
>>
>> Thanks,
>> Vinayak
>>
>>
>>> On 12/13/13, 1:12 PM, Eldon Carman wrote:
>>> I added the rule to take the previously mentioned subplan and make it
>>> into a single assign for child. The change dropped 4 minutes off each
>>> child path step that was found in the pattern mentioned. I have attached
>>> the new query plan and the results of several modified queries to show
>>> the change in times based on new additions to the query.
>>>
>>> Saxon Execution time: 0m36.009s
>>> VXQuery Execution time: 1m33.632s
>>>
>>>
>>> On Thu, Dec 12, 2013 at 11:51 AM, Eldon Carman <ecarm002@ucr.edu
>>> <ma...@ucr.edu>> wrote:
>>>
>>>     After finishing the rewrite rule to merge the child path steps, I
>>>     ran a few tests. The results of the query's and plans are attached.
>>>
>>>     First I noted when the following group of operators were added to
>>>     the plan, the time changed by 4 minutes (from 35s to 4m27s).
>>>
>>>                subplan {
>>>                          aggregate [$$19] <- [function-call:
>>>     vxquery:{urn:org.apache.vxquery.operators-ext}sequence,
>>>     Args:[function-call:
>>>     vxquery:{urn:org.apache.vxquery.operators-ext}child,
>>>     Args:[function-call:
>>>     vxquery:{urn:org.apache.vxquery.operators-ext}treat, Args:[%0->$$17,
>>>     {http://www.w3.org/2001/XMLSchema}int
>>>     <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>>>     [1d000000ee])], {http://www.w3.org/2001/XMLSchema}int
>>>     <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>>>     [1d0000010b])]]]
>>>                          -- AGGREGATE  |LOCAL|
>>>                            unnest $$17 <- function-call:
>>>     vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15]
>>>                            -- UNNEST  |LOCAL|
>>>                              nested tuple source
>>>                              -- NESTED_TUPLE_SOURCE  |LOCAL|
>>>                       }
>>>                -- SUBPLAN  |PARTITIONED|
>>>
>>>     The above query plan section appears twice in the original query. If
>>>     each takes 4 minutes that would account for most of the time. My
>>>     test with the original query has a time of 9m16.336s.
>>>
>>>     I suggest a rewrite rule that could change this plan section to a
>>>     single assign.
>>>
>>>     Does anything in this plan section stand out as being slow? Is it
>>>     just the number of operators? The child path step function is fairly
>>>     fast.
>>>
>>>
>>>     On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <ecarm002@ucr.edu
>>>     <ma...@ucr.edu>> wrote:
>>>
>>>         The first query (q00.xq) was executed 10 times on the 10
>>>         stations of data. The data contains 6,827 files
>>>         (/dataCollection) with 206,686 sensor readings
>>>         (/dataCollection/data) amounting to ~55 MB. The query was
>>>         executed 10 times to remove the overhead of starting and stoping
>>>         the cluster and node controllers in VXQuery.
>>>
>>>         (: XQuery Filter Query :)
>>>         (: See historical data for Riverside, CA (ASN00008113) station
>>>         by selecting   :)
>>>         (: the weather readings for December 25 over the last 10 years.
>>>                        :)
>>>         let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
>>>         for $r in collection($collection)/dataCollection/data
>>>         let $date := xs:date(fn:substring(xs:string(fn:data($r/date)),
>>>         0, 11))
>>>         where $r/station eq "GHCND:ASN00008113"
>>>              and fn:year-from-date($date) >= (2003)
>>>              and fn:month-from-date($date) eq 12
>>>              and fn:day-from-date($date) eq 25
>>>         return $r
>>>
>>>         Saxon processed this query 10 times in 35.936s with an average
>>>         of 3.5936s per query.
>>>         VXQuery processed this query 10 times in 504.715s with an
>>>         average of 50.4715s per query.
>>>
>>>         I ran the query again with out the date filter options. The
>>>         query returns all data from station GHCND:ASN00008113.
>>>         Saxon processed this query 10 times in 35.953s with an average
>>>         of 3.5953s per query.
>>>         VXQuery processed this query 10 times in 376.325s with an
>>>         average of 37.6325s per query.
>>>
>>>         The below modified query takes an average of 4.0028s. The query
>>>         basically touches each sensor reading but does nothing. The
>>>         select is much simpler and the plan does not have two subplans
>>>         for paths steps used in the select.
>>>
>>>         let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
>>>         for $r in collection($collection)/dataCollection/data
>>>         where empty($r)
>>>         return $r
>>>
>>>         The process seems to take a lot of time to prepare data and then
>>>         execute the select for the where clause.
>>>
>>>         Notes on VXQuery performance:
>>>         ========================
>>>         The frame size was set to 1 MB.
>>>         The cpu is at 100% to 260% on a 8 core machine. (100% is one
>>>         core is being fully used)
>>>         The disk has sporadic activity.
>>>         The system has one cluster controller and one node controller
>>>         set up from inside the CLI script.
>>>
>>>         Suggested Options:
>>>         1. Remove the subplans for path steps going into the select.
>>>              * The subplan iterates over a field created by an unnest
>>>         operator. The unnest operator is guaranteed to produce single
>>>         value items. The subplan is not required when the input is a
>>>         single item that gets iterated over then result aggregated back
>>>         together. The process could be a simple assign for the value
>>>         inside the aggregate (including the rest of the nested plan
>>>         operators minus the unnest).
>>>         2. Project unused variables out of the tuple during local
>>>         execution.
>>>              * Depends on how the tuples are being passes between
>>>         operators. Right now a lot of information is stored in the tuple
>>>         (XML file, all path steps, etc.). Reducing the size could help
>>>         with coping less information during each new path step.
>>>
>>>         Questions?
>>>         * Can you track to see which operators are taking the longest?
>>>         * Can you explain the tuple stream and how it interacts with
>>>         each operator? Is there one stream? Does it only grow or change
>>>         size at each operator?
>>>
>>>
>>>         On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar
>>>         <vinayakb@gmail.com <ma...@gmail.com>> wrote:
>>>
>>>             Preston,
>>>
>>>             Let me suggest a way to track down our performance issues in
>>>             VXQuery. Let's approach our queries one at a time. First,
>>>             let's start with the single collection, scan-based queries
>>>             and reason about their performance in comparison to Saxon.
>>>             As an even smaller goal, can you take your first query and
>>>             report running times on the 250MB of data alongwith Saxon's
>>>             running times?
>>>
>>>             Thanks,
>>>             Vinayak
>>>
>>>
>>>
>>>
>>>             On 11/29/13, 12:48 PM, Eldon Carman wrote:
>>>
>>>                 The query plans are so big, I attached a document with
>>>                 the queries and
>>>                 plans.
>>>
>>>
>>>                 On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar
>>>                 <vinayakb@gmail.com <ma...@gmail.com>
>>>                 <mailto:vinayakb@gmail.com <ma...@gmail.com>>>
>>>                 wrote:
>>>
>>>                      Preston,
>>>
>>>                      For each query, please send the following:
>>>
>>>                      1. The query
>>>                      2. The translated logical plan
>>>                      3. The optimized physical plan
>>>
>>>                      Thanks,
>>>                      Vinayak
>>>
>>>
>>>
>>>                      On 11/27/13, 8:16 PM, Eldon Carman wrote:
>>>
>>>                          It appears that our query process is taking
>>>                 longer than
>>>                          expected. I have
>>>                          created a small set of sensors to test our
>>>                 benchmark queries.
>>>                          The data set
>>>                          is about 250 MB and the queries execute in 10
>>>                 to 20 seconds with
>>>                          the SAXON
>>>                          XSLT processor. When I tried a few of the
>>>                 queries on VXQuery,
>>>                          the process
>>>                          ran for one hour and still did not complete. I
>>>                 am now looking
>>>                          into where
>>>                          the time is being spent for our query and see
>>>                 why its taking so
>>>                          long.
>>
>


Re: Benchmark Query Update

Posted by Eldon Carman <ec...@ucr.edu>.
Just to confirm. Your talking about the xqts tests (~19000 queries), correct?

Sent from my iPhone

> On Dec 16, 2013, at 9:34 PM, Vinayak Borkar <vi...@gmail.com> wrote:
> 
> Preston,
> 
> 
> Can you try running the W3C XQuery tests against your current codebase with all the rules and optimizations and compare the outcome with running the tests on our last release.
> 
> Please report the outcome on this list. Let's ensure that we are not regressing while adding these optimizations.
> 
> Thanks,
> Vinayak
> 
> 
>> On 12/13/13, 1:12 PM, Eldon Carman wrote:
>> I added the rule to take the previously mentioned subplan and make it
>> into a single assign for child. The change dropped 4 minutes off each
>> child path step that was found in the pattern mentioned. I have attached
>> the new query plan and the results of several modified queries to show
>> the change in times based on new additions to the query.
>> 
>> Saxon Execution time: 0m36.009s
>> VXQuery Execution time: 1m33.632s
>> 
>> 
>> On Thu, Dec 12, 2013 at 11:51 AM, Eldon Carman <ecarm002@ucr.edu
>> <ma...@ucr.edu>> wrote:
>> 
>>    After finishing the rewrite rule to merge the child path steps, I
>>    ran a few tests. The results of the query's and plans are attached.
>> 
>>    First I noted when the following group of operators were added to
>>    the plan, the time changed by 4 minutes (from 35s to 4m27s).
>> 
>>               subplan {
>>                         aggregate [$$19] <- [function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}sequence,
>>    Args:[function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}child,
>>    Args:[function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}treat, Args:[%0->$$17,
>>    {http://www.w3.org/2001/XMLSchema}int
>>    <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>>    [1d000000ee])], {http://www.w3.org/2001/XMLSchema}int
>>    <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>>    [1d0000010b])]]]
>>                         -- AGGREGATE  |LOCAL|
>>                           unnest $$17 <- function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15]
>>                           -- UNNEST  |LOCAL|
>>                             nested tuple source
>>                             -- NESTED_TUPLE_SOURCE  |LOCAL|
>>                      }
>>               -- SUBPLAN  |PARTITIONED|
>> 
>>    The above query plan section appears twice in the original query. If
>>    each takes 4 minutes that would account for most of the time. My
>>    test with the original query has a time of 9m16.336s.
>> 
>>    I suggest a rewrite rule that could change this plan section to a
>>    single assign.
>> 
>>    Does anything in this plan section stand out as being slow? Is it
>>    just the number of operators? The child path step function is fairly
>>    fast.
>> 
>> 
>>    On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <ecarm002@ucr.edu
>>    <ma...@ucr.edu>> wrote:
>> 
>>        The first query (q00.xq) was executed 10 times on the 10
>>        stations of data. The data contains 6,827 files
>>        (/dataCollection) with 206,686 sensor readings
>>        (/dataCollection/data) amounting to ~55 MB. The query was
>>        executed 10 times to remove the overhead of starting and stoping
>>        the cluster and node controllers in VXQuery.
>> 
>>        (: XQuery Filter Query :)
>>        (: See historical data for Riverside, CA (ASN00008113) station
>>        by selecting   :)
>>        (: the weather readings for December 25 over the last 10 years.
>>                       :)
>>        let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
>>        for $r in collection($collection)/dataCollection/data
>>        let $date := xs:date(fn:substring(xs:string(fn:data($r/date)),
>>        0, 11))
>>        where $r/station eq "GHCND:ASN00008113"
>>             and fn:year-from-date($date) >= (2003)
>>             and fn:month-from-date($date) eq 12
>>             and fn:day-from-date($date) eq 25
>>        return $r
>> 
>>        Saxon processed this query 10 times in 35.936s with an average
>>        of 3.5936s per query.
>>        VXQuery processed this query 10 times in 504.715s with an
>>        average of 50.4715s per query.
>> 
>>        I ran the query again with out the date filter options. The
>>        query returns all data from station GHCND:ASN00008113.
>>        Saxon processed this query 10 times in 35.953s with an average
>>        of 3.5953s per query.
>>        VXQuery processed this query 10 times in 376.325s with an
>>        average of 37.6325s per query.
>> 
>>        The below modified query takes an average of 4.0028s. The query
>>        basically touches each sensor reading but does nothing. The
>>        select is much simpler and the plan does not have two subplans
>>        for paths steps used in the select.
>> 
>>        let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
>>        for $r in collection($collection)/dataCollection/data
>>        where empty($r)
>>        return $r
>> 
>>        The process seems to take a lot of time to prepare data and then
>>        execute the select for the where clause.
>> 
>>        Notes on VXQuery performance:
>>        ========================
>>        The frame size was set to 1 MB.
>>        The cpu is at 100% to 260% on a 8 core machine. (100% is one
>>        core is being fully used)
>>        The disk has sporadic activity.
>>        The system has one cluster controller and one node controller
>>        set up from inside the CLI script.
>> 
>>        Suggested Options:
>>        1. Remove the subplans for path steps going into the select.
>>             * The subplan iterates over a field created by an unnest
>>        operator. The unnest operator is guaranteed to produce single
>>        value items. The subplan is not required when the input is a
>>        single item that gets iterated over then result aggregated back
>>        together. The process could be a simple assign for the value
>>        inside the aggregate (including the rest of the nested plan
>>        operators minus the unnest).
>>        2. Project unused variables out of the tuple during local
>>        execution.
>>             * Depends on how the tuples are being passes between
>>        operators. Right now a lot of information is stored in the tuple
>>        (XML file, all path steps, etc.). Reducing the size could help
>>        with coping less information during each new path step.
>> 
>>        Questions?
>>        * Can you track to see which operators are taking the longest?
>>        * Can you explain the tuple stream and how it interacts with
>>        each operator? Is there one stream? Does it only grow or change
>>        size at each operator?
>> 
>> 
>>        On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar
>>        <vinayakb@gmail.com <ma...@gmail.com>> wrote:
>> 
>>            Preston,
>> 
>>            Let me suggest a way to track down our performance issues in
>>            VXQuery. Let's approach our queries one at a time. First,
>>            let's start with the single collection, scan-based queries
>>            and reason about their performance in comparison to Saxon.
>>            As an even smaller goal, can you take your first query and
>>            report running times on the 250MB of data alongwith Saxon's
>>            running times?
>> 
>>            Thanks,
>>            Vinayak
>> 
>> 
>> 
>> 
>>            On 11/29/13, 12:48 PM, Eldon Carman wrote:
>> 
>>                The query plans are so big, I attached a document with
>>                the queries and
>>                plans.
>> 
>> 
>>                On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar
>>                <vinayakb@gmail.com <ma...@gmail.com>
>>                <mailto:vinayakb@gmail.com <ma...@gmail.com>>>
>>                wrote:
>> 
>>                     Preston,
>> 
>>                     For each query, please send the following:
>> 
>>                     1. The query
>>                     2. The translated logical plan
>>                     3. The optimized physical plan
>> 
>>                     Thanks,
>>                     Vinayak
>> 
>> 
>> 
>>                     On 11/27/13, 8:16 PM, Eldon Carman wrote:
>> 
>>                         It appears that our query process is taking
>>                longer than
>>                         expected. I have
>>                         created a small set of sensors to test our
>>                benchmark queries.
>>                         The data set
>>                         is about 250 MB and the queries execute in 10
>>                to 20 seconds with
>>                         the SAXON
>>                         XSLT processor. When I tried a few of the
>>                queries on VXQuery,
>>                         the process
>>                         ran for one hour and still did not complete. I
>>                am now looking
>>                         into where
>>                         the time is being spent for our query and see
>>                why its taking so
>>                         long.
> 

Re: Benchmark Query Update

Posted by Vinayak Borkar <vi...@gmail.com>.
Preston,


Can you try running the W3C XQuery tests against your current codebase 
with all the rules and optimizations and compare the outcome with 
running the tests on our last release.

Please report the outcome on this list. Let's ensure that we are not 
regressing while adding these optimizations.

Thanks,
Vinayak


On 12/13/13, 1:12 PM, Eldon Carman wrote:
> I added the rule to take the previously mentioned subplan and make it
> into a single assign for child. The change dropped 4 minutes off each
> child path step that was found in the pattern mentioned. I have attached
> the new query plan and the results of several modified queries to show
> the change in times based on new additions to the query.
>
> Saxon Execution time: 0m36.009s
> VXQuery Execution time: 1m33.632s
>
>
> On Thu, Dec 12, 2013 at 11:51 AM, Eldon Carman <ecarm002@ucr.edu
> <ma...@ucr.edu>> wrote:
>
>     After finishing the rewrite rule to merge the child path steps, I
>     ran a few tests. The results of the query's and plans are attached.
>
>     First I noted when the following group of operators were added to
>     the plan, the time changed by 4 minutes (from 35s to 4m27s).
>
>                subplan {
>                          aggregate [$$19] <- [function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}sequence,
>     Args:[function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}child,
>     Args:[function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}treat, Args:[%0->$$17,
>     {http://www.w3.org/2001/XMLSchema}int
>     <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>     [1d000000ee])], {http://www.w3.org/2001/XMLSchema}int
>     <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>     [1d0000010b])]]]
>                          -- AGGREGATE  |LOCAL|
>                            unnest $$17 <- function-call:
>     vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15]
>                            -- UNNEST  |LOCAL|
>                              nested tuple source
>                              -- NESTED_TUPLE_SOURCE  |LOCAL|
>                       }
>                -- SUBPLAN  |PARTITIONED|
>
>     The above query plan section appears twice in the original query. If
>     each takes 4 minutes that would account for most of the time. My
>     test with the original query has a time of 9m16.336s.
>
>     I suggest a rewrite rule that could change this plan section to a
>     single assign.
>
>     Does anything in this plan section stand out as being slow? Is it
>     just the number of operators? The child path step function is fairly
>     fast.
>
>
>     On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <ecarm002@ucr.edu
>     <ma...@ucr.edu>> wrote:
>
>         The first query (q00.xq) was executed 10 times on the 10
>         stations of data. The data contains 6,827 files
>         (/dataCollection) with 206,686 sensor readings
>         (/dataCollection/data) amounting to ~55 MB. The query was
>         executed 10 times to remove the overhead of starting and stoping
>         the cluster and node controllers in VXQuery.
>
>         (: XQuery Filter Query :)
>         (: See historical data for Riverside, CA (ASN00008113) station
>         by selecting   :)
>         (: the weather readings for December 25 over the last 10 years.
>                        :)
>         let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
>         for $r in collection($collection)/dataCollection/data
>         let $date := xs:date(fn:substring(xs:string(fn:data($r/date)),
>         0, 11))
>         where $r/station eq "GHCND:ASN00008113"
>              and fn:year-from-date($date) >= (2003)
>              and fn:month-from-date($date) eq 12
>              and fn:day-from-date($date) eq 25
>         return $r
>
>         Saxon processed this query 10 times in 35.936s with an average
>         of 3.5936s per query.
>         VXQuery processed this query 10 times in 504.715s with an
>         average of 50.4715s per query.
>
>         I ran the query again with out the date filter options. The
>         query returns all data from station GHCND:ASN00008113.
>         Saxon processed this query 10 times in 35.953s with an average
>         of 3.5953s per query.
>         VXQuery processed this query 10 times in 376.325s with an
>         average of 37.6325s per query.
>
>         The below modified query takes an average of 4.0028s. The query
>         basically touches each sensor reading but does nothing. The
>         select is much simpler and the plan does not have two subplans
>         for paths steps used in the select.
>
>         let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
>         for $r in collection($collection)/dataCollection/data
>         where empty($r)
>         return $r
>
>         The process seems to take a lot of time to prepare data and then
>         execute the select for the where clause.
>
>         Notes on VXQuery performance:
>         ========================
>         The frame size was set to 1 MB.
>         The cpu is at 100% to 260% on a 8 core machine. (100% is one
>         core is being fully used)
>         The disk has sporadic activity.
>         The system has one cluster controller and one node controller
>         set up from inside the CLI script.
>
>         Suggested Options:
>         1. Remove the subplans for path steps going into the select.
>              * The subplan iterates over a field created by an unnest
>         operator. The unnest operator is guaranteed to produce single
>         value items. The subplan is not required when the input is a
>         single item that gets iterated over then result aggregated back
>         together. The process could be a simple assign for the value
>         inside the aggregate (including the rest of the nested plan
>         operators minus the unnest).
>         2. Project unused variables out of the tuple during local
>         execution.
>              * Depends on how the tuples are being passes between
>         operators. Right now a lot of information is stored in the tuple
>         (XML file, all path steps, etc.). Reducing the size could help
>         with coping less information during each new path step.
>
>         Questions?
>         * Can you track to see which operators are taking the longest?
>         * Can you explain the tuple stream and how it interacts with
>         each operator? Is there one stream? Does it only grow or change
>         size at each operator?
>
>
>         On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar
>         <vinayakb@gmail.com <ma...@gmail.com>> wrote:
>
>             Preston,
>
>             Let me suggest a way to track down our performance issues in
>             VXQuery. Let's approach our queries one at a time. First,
>             let's start with the single collection, scan-based queries
>             and reason about their performance in comparison to Saxon.
>             As an even smaller goal, can you take your first query and
>             report running times on the 250MB of data alongwith Saxon's
>             running times?
>
>             Thanks,
>             Vinayak
>
>
>
>
>             On 11/29/13, 12:48 PM, Eldon Carman wrote:
>
>                 The query plans are so big, I attached a document with
>                 the queries and
>                 plans.
>
>
>                 On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar
>                 <vinayakb@gmail.com <ma...@gmail.com>
>                 <mailto:vinayakb@gmail.com <ma...@gmail.com>>>
>                 wrote:
>
>                      Preston,
>
>                      For each query, please send the following:
>
>                      1. The query
>                      2. The translated logical plan
>                      3. The optimized physical plan
>
>                      Thanks,
>                      Vinayak
>
>
>
>                      On 11/27/13, 8:16 PM, Eldon Carman wrote:
>
>                          It appears that our query process is taking
>                 longer than
>                          expected. I have
>                          created a small set of sensors to test our
>                 benchmark queries.
>                          The data set
>                          is about 250 MB and the queries execute in 10
>                 to 20 seconds with
>                          the SAXON
>                          XSLT processor. When I tried a few of the
>                 queries on VXQuery,
>                          the process
>                          ran for one hour and still did not complete. I
>                 am now looking
>                          into where
>                          the time is being spent for our query and see
>                 why its taking so
>                          long.
>
>
>
>
>
>
>
>


Re: Benchmark Query Update

Posted by Eldon Carman <ec...@ucr.edu>.
I added the rule to take the previously mentioned subplan and make it into
a single assign for child. The change dropped 4 minutes off each child path
step that was found in the pattern mentioned. I have attached the new query
plan and the results of several modified queries to show the change in
times based on new additions to the query.

Saxon Execution time: 0m36.009s
VXQuery Execution time: 1m33.632s


On Thu, Dec 12, 2013 at 11:51 AM, Eldon Carman <ec...@ucr.edu> wrote:

> After finishing the rewrite rule to merge the child path steps, I ran a
> few tests. The results of the query's and plans are attached.
>
> First I noted when the following group of operators were added to the
> plan, the time changed by 4 minutes (from 35s to 4m27s).
>
>           subplan {
>                     aggregate [$$19] <- [function-call:
> vxquery:{urn:org.apache.vxquery.operators-ext}sequence,
> Args:[function-call: vxquery:{urn:org.apache.vxquery.operators-ext}child,
> Args:[function-call: vxquery:{urn:org.apache.vxquery.operators-ext}treat,
> Args:[%0->$$17, {http://www.w3.org/2001/XMLSchema}int QUANT_ONE(bytes[5]
> = [1d000000ee])], {http://www.w3.org/2001/XMLSchema}intQUANT_ONE(bytes[5] = [1d0000010b])]]]
>                     -- AGGREGATE  |LOCAL|
>                       unnest $$17 <- function-call:
> vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15]
>                       -- UNNEST  |LOCAL|
>                         nested tuple source
>                         -- NESTED_TUPLE_SOURCE  |LOCAL|
>                  }
>           -- SUBPLAN  |PARTITIONED|
>
> The above query plan section appears twice in the original query. If each
> takes 4 minutes that would account for most of the time. My test with the
> original query has a time of 9m16.336s.
>
> I suggest a rewrite rule that could change this plan section to a single
> assign.
>
> Does anything in this plan section stand out as being slow? Is it just the
> number of operators? The child path step function is fairly fast.
>
>
> On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <ec...@ucr.edu> wrote:
>
>> The first query (q00.xq) was executed 10 times on the 10 stations of
>> data. The data contains 6,827 files (/dataCollection) with 206,686 sensor
>> readings (/dataCollection/data) amounting to ~55 MB. The query was executed
>> 10 times to remove the overhead of starting and stoping the cluster and
>> node controllers in VXQuery.
>>
>> (: XQuery Filter Query :)
>> (: See historical data for Riverside, CA (ASN00008113) station by
>> selecting   :)
>> (: the weather readings for December 25 over the last 10 years.
>>     :)
>> let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
>> for $r in collection($collection)/dataCollection/data
>> let $date := xs:date(fn:substring(xs:string(fn:data($r/date)), 0, 11))
>> where $r/station eq "GHCND:ASN00008113"
>>     and fn:year-from-date($date) >= (2003)
>>     and fn:month-from-date($date) eq 12
>>     and fn:day-from-date($date) eq 25
>> return $r
>>
>> Saxon processed this query 10 times in 35.936s with an average of 3.5936s
>> per query.
>> VXQuery processed this query 10 times in 504.715s with an average
>> of 50.4715s per query.
>>
>> I ran the query again with out the date filter options. The query returns
>> all data from station GHCND:ASN00008113.
>> Saxon processed this query 10 times in 35.953s with an average of 3.5953s
>> per query.
>> VXQuery processed this query 10 times in 376.325s with an average
>> of 37.6325s per query.
>>
>> The below modified query takes an average of 4.0028s. The query basically
>> touches each sensor reading but does nothing. The select is much simpler
>> and the plan does not have two subplans for paths steps used in the select.
>>
>> let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
>> for $r in collection($collection)/dataCollection/data
>> where empty($r)
>> return $r
>>
>> The process seems to take a lot of time to prepare data and then execute
>> the select for the where clause.
>>
>> Notes on VXQuery performance:
>> ========================
>> The frame size was set to 1 MB.
>> The cpu is at 100% to 260% on a 8 core machine. (100% is one core is
>> being fully used)
>> The disk has sporadic activity.
>> The system has one cluster controller and one node controller set up from
>> inside the CLI script.
>>
>> Suggested Options:
>> 1. Remove the subplans for path steps going into the select.
>>     * The subplan iterates over a field created by an unnest operator.
>> The unnest operator is guaranteed to produce single value items. The
>> subplan is not required when the input is a single item that gets iterated
>> over then result aggregated back together. The process could be a simple
>> assign for the value inside the aggregate (including the rest of the nested
>> plan operators minus the unnest).
>> 2. Project unused variables out of the tuple during local execution.
>>     * Depends on how the tuples are being passes between operators. Right
>> now a lot of information is stored in the tuple (XML file, all path steps,
>> etc.). Reducing the size could help with coping less information during
>> each new path step.
>>
>> Questions?
>> * Can you track to see which operators are taking the longest?
>> * Can you explain the tuple stream and how it interacts with each
>> operator? Is there one stream? Does it only grow or change size at each
>> operator?
>>
>>
>> On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar <vi...@gmail.com>wrote:
>>
>>> Preston,
>>>
>>> Let me suggest a way to track down our performance issues in VXQuery.
>>> Let's approach our queries one at a time. First, let's start with the
>>> single collection, scan-based queries and reason about their performance in
>>> comparison to Saxon. As an even smaller goal, can you take your first query
>>> and report running times on the 250MB of data alongwith Saxon's running
>>> times?
>>>
>>> Thanks,
>>> Vinayak
>>>
>>>
>>>
>>>
>>> On 11/29/13, 12:48 PM, Eldon Carman wrote:
>>>
>>>> The query plans are so big, I attached a document with the queries and
>>>> plans.
>>>>
>>>>
>>>> On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar <vinayakb@gmail.com
>>>> <ma...@gmail.com>> wrote:
>>>>
>>>>     Preston,
>>>>
>>>>     For each query, please send the following:
>>>>
>>>>     1. The query
>>>>     2. The translated logical plan
>>>>     3. The optimized physical plan
>>>>
>>>>     Thanks,
>>>>     Vinayak
>>>>
>>>>
>>>>
>>>>     On 11/27/13, 8:16 PM, Eldon Carman wrote:
>>>>
>>>>         It appears that our query process is taking longer than
>>>>         expected. I have
>>>>         created a small set of sensors to test our benchmark queries.
>>>>         The data set
>>>>         is about 250 MB and the queries execute in 10 to 20 seconds with
>>>>         the SAXON
>>>>         XSLT processor. When I tried a few of the queries on VXQuery,
>>>>         the process
>>>>         ran for one hour and still did not complete. I am now looking
>>>>         into where
>>>>         the time is being spent for our query and see why its taking so
>>>>         long.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Benchmark Query Update

Posted by Eldon Carman <ec...@ucr.edu>.
After finishing the rewrite rule to merge the child path steps, I ran a few
tests. The results of the query's and plans are attached.

First I noted when the following group of operators were added to the plan,
the time changed by 4 minutes (from 35s to 4m27s).

          subplan {
                    aggregate [$$19] <- [function-call:
vxquery:{urn:org.apache.vxquery.operators-ext}sequence,
Args:[function-call: vxquery:{urn:org.apache.vxquery.operators-ext}child,
Args:[function-call: vxquery:{urn:org.apache.vxquery.operators-ext}treat,
Args:[%0->$$17, {http://www.w3.org/2001/XMLSchema}int QUANT_ONE(bytes[5] =
[1d000000ee])], {http://www.w3.org/2001/XMLSchema}int QUANT_ONE(bytes[5] =
[1d0000010b])]]]
                    -- AGGREGATE  |LOCAL|
                      unnest $$17 <- function-call:
vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15]
                      -- UNNEST  |LOCAL|
                        nested tuple source
                        -- NESTED_TUPLE_SOURCE  |LOCAL|
                 }
          -- SUBPLAN  |PARTITIONED|

The above query plan section appears twice in the original query. If each
takes 4 minutes that would account for most of the time. My test with the
original query has a time of 9m16.336s.

I suggest a rewrite rule that could change this plan section to a single
assign.

Does anything in this plan section stand out as being slow? Is it just the
number of operators? The child path step function is fairly fast.


On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <ec...@ucr.edu> wrote:

> The first query (q00.xq) was executed 10 times on the 10 stations of data.
> The data contains 6,827 files (/dataCollection) with 206,686 sensor
> readings (/dataCollection/data) amounting to ~55 MB. The query was executed
> 10 times to remove the overhead of starting and stoping the cluster and
> node controllers in VXQuery.
>
> (: XQuery Filter Query :)
> (: See historical data for Riverside, CA (ASN00008113) station by
> selecting   :)
> (: the weather readings for December 25 over the last 10 years.
>     :)
> let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
> for $r in collection($collection)/dataCollection/data
> let $date := xs:date(fn:substring(xs:string(fn:data($r/date)), 0, 11))
> where $r/station eq "GHCND:ASN00008113"
>     and fn:year-from-date($date) >= (2003)
>     and fn:month-from-date($date) eq 12
>     and fn:day-from-date($date) eq 25
> return $r
>
> Saxon processed this query 10 times in 35.936s with an average of 3.5936s
> per query.
> VXQuery processed this query 10 times in 504.715s with an average
> of 50.4715s per query.
>
> I ran the query again with out the date filter options. The query returns
> all data from station GHCND:ASN00008113.
> Saxon processed this query 10 times in 35.953s with an average of 3.5953s
> per query.
> VXQuery processed this query 10 times in 376.325s with an average
> of 37.6325s per query.
>
> The below modified query takes an average of 4.0028s. The query basically
> touches each sensor reading but does nothing. The select is much simpler
> and the plan does not have two subplans for paths steps used in the select.
>
> let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
> for $r in collection($collection)/dataCollection/data
> where empty($r)
> return $r
>
> The process seems to take a lot of time to prepare data and then execute
> the select for the where clause.
>
> Notes on VXQuery performance:
> ========================
> The frame size was set to 1 MB.
> The cpu is at 100% to 260% on a 8 core machine. (100% is one core is being
> fully used)
> The disk has sporadic activity.
> The system has one cluster controller and one node controller set up from
> inside the CLI script.
>
> Suggested Options:
> 1. Remove the subplans for path steps going into the select.
>     * The subplan iterates over a field created by an unnest operator. The
> unnest operator is guaranteed to produce single value items. The subplan is
> not required when the input is a single item that gets iterated over then
> result aggregated back together. The process could be a simple assign for
> the value inside the aggregate (including the rest of the nested plan
> operators minus the unnest).
> 2. Project unused variables out of the tuple during local execution.
>     * Depends on how the tuples are being passes between operators. Right
> now a lot of information is stored in the tuple (XML file, all path steps,
> etc.). Reducing the size could help with coping less information during
> each new path step.
>
> Questions?
> * Can you track to see which operators are taking the longest?
> * Can you explain the tuple stream and how it interacts with each
> operator? Is there one stream? Does it only grow or change size at each
> operator?
>
>
> On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar <vi...@gmail.com> wrote:
>
>> Preston,
>>
>> Let me suggest a way to track down our performance issues in VXQuery.
>> Let's approach our queries one at a time. First, let's start with the
>> single collection, scan-based queries and reason about their performance in
>> comparison to Saxon. As an even smaller goal, can you take your first query
>> and report running times on the 250MB of data alongwith Saxon's running
>> times?
>>
>> Thanks,
>> Vinayak
>>
>>
>>
>>
>> On 11/29/13, 12:48 PM, Eldon Carman wrote:
>>
>>> The query plans are so big, I attached a document with the queries and
>>> plans.
>>>
>>>
>>> On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar <vinayakb@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>     Preston,
>>>
>>>     For each query, please send the following:
>>>
>>>     1. The query
>>>     2. The translated logical plan
>>>     3. The optimized physical plan
>>>
>>>     Thanks,
>>>     Vinayak
>>>
>>>
>>>
>>>     On 11/27/13, 8:16 PM, Eldon Carman wrote:
>>>
>>>         It appears that our query process is taking longer than
>>>         expected. I have
>>>         created a small set of sensors to test our benchmark queries.
>>>         The data set
>>>         is about 250 MB and the queries execute in 10 to 20 seconds with
>>>         the SAXON
>>>         XSLT processor. When I tried a few of the queries on VXQuery,
>>>         the process
>>>         ran for one hour and still did not complete. I am now looking
>>>         into where
>>>         the time is being spent for our query and see why its taking so
>>>         long.
>>>
>>>
>>>
>>>
>>
>
>

Re: Benchmark Query Update

Posted by Eldon Carman <ec...@ucr.edu>.
The first query (q00.xq) was executed 10 times on the 10 stations of data.
The data contains 6,827 files (/dataCollection) with 206,686 sensor
readings (/dataCollection/data) amounting to ~55 MB. The query was executed
10 times to remove the overhead of starting and stoping the cluster and
node controllers in VXQuery.

(: XQuery Filter Query :)
(: See historical data for Riverside, CA (ASN00008113) station by selecting
  :)
(: the weather readings for December 25 over the last 10 years.
  :)
let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
for $r in collection($collection)/dataCollection/data
let $date := xs:date(fn:substring(xs:string(fn:data($r/date)), 0, 11))
where $r/station eq "GHCND:ASN00008113"
    and fn:year-from-date($date) >= (2003)
    and fn:month-from-date($date) eq 12
    and fn:day-from-date($date) eq 25
return $r

Saxon processed this query 10 times in 35.936s with an average of 3.5936s
per query.
VXQuery processed this query 10 times in 504.715s with an average
of 50.4715s per query.

I ran the query again with out the date filter options. The query returns
all data from station GHCND:ASN00008113.
Saxon processed this query 10 times in 35.953s with an average of 3.5953s
per query.
VXQuery processed this query 10 times in 376.325s with an average
of 37.6325s per query.

The below modified query takes an average of 4.0028s. The query basically
touches each sensor reading but does nothing. The select is much simpler
and the plan does not have two subplans for paths steps used in the select.

let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
for $r in collection($collection)/dataCollection/data
where empty($r)
return $r

The process seems to take a lot of time to prepare data and then execute
the select for the where clause.

Notes on VXQuery performance:
========================
The frame size was set to 1 MB.
The cpu is at 100% to 260% on a 8 core machine. (100% is one core is being
fully used)
The disk has sporadic activity.
The system has one cluster controller and one node controller set up from
inside the CLI script.

Suggested Options:
1. Remove the subplans for path steps going into the select.
    * The subplan iterates over a field created by an unnest operator. The
unnest operator is guaranteed to produce single value items. The subplan is
not required when the input is a single item that gets iterated over then
result aggregated back together. The process could be a simple assign for
the value inside the aggregate (including the rest of the nested plan
operators minus the unnest).
2. Project unused variables out of the tuple during local execution.
    * Depends on how the tuples are being passes between operators. Right
now a lot of information is stored in the tuple (XML file, all path steps,
etc.). Reducing the size could help with coping less information during
each new path step.

Questions?
* Can you track to see which operators are taking the longest?
* Can you explain the tuple stream and how it interacts with each operator?
Is there one stream? Does it only grow or change size at each operator?


On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar <vi...@gmail.com> wrote:

> Preston,
>
> Let me suggest a way to track down our performance issues in VXQuery.
> Let's approach our queries one at a time. First, let's start with the
> single collection, scan-based queries and reason about their performance in
> comparison to Saxon. As an even smaller goal, can you take your first query
> and report running times on the 250MB of data alongwith Saxon's running
> times?
>
> Thanks,
> Vinayak
>
>
>
>
> On 11/29/13, 12:48 PM, Eldon Carman wrote:
>
>> The query plans are so big, I attached a document with the queries and
>> plans.
>>
>>
>> On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar <vinayakb@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Preston,
>>
>>     For each query, please send the following:
>>
>>     1. The query
>>     2. The translated logical plan
>>     3. The optimized physical plan
>>
>>     Thanks,
>>     Vinayak
>>
>>
>>
>>     On 11/27/13, 8:16 PM, Eldon Carman wrote:
>>
>>         It appears that our query process is taking longer than
>>         expected. I have
>>         created a small set of sensors to test our benchmark queries.
>>         The data set
>>         is about 250 MB and the queries execute in 10 to 20 seconds with
>>         the SAXON
>>         XSLT processor. When I tried a few of the queries on VXQuery,
>>         the process
>>         ran for one hour and still did not complete. I am now looking
>>         into where
>>         the time is being spent for our query and see why its taking so
>>         long.
>>
>>
>>
>>
>

Re: Benchmark Query Update

Posted by Vinayak Borkar <vi...@gmail.com>.
Preston,

Let me suggest a way to track down our performance issues in VXQuery. 
Let's approach our queries one at a time. First, let's start with the 
single collection, scan-based queries and reason about their performance 
in comparison to Saxon. As an even smaller goal, can you take your first 
query and report running times on the 250MB of data alongwith Saxon's 
running times?

Thanks,
Vinayak



On 11/29/13, 12:48 PM, Eldon Carman wrote:
> The query plans are so big, I attached a document with the queries and
> plans.
>
>
> On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar <vinayakb@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Preston,
>
>     For each query, please send the following:
>
>     1. The query
>     2. The translated logical plan
>     3. The optimized physical plan
>
>     Thanks,
>     Vinayak
>
>
>
>     On 11/27/13, 8:16 PM, Eldon Carman wrote:
>
>         It appears that our query process is taking longer than
>         expected. I have
>         created a small set of sensors to test our benchmark queries.
>         The data set
>         is about 250 MB and the queries execute in 10 to 20 seconds with
>         the SAXON
>         XSLT processor. When I tried a few of the queries on VXQuery,
>         the process
>         ran for one hour and still did not complete. I am now looking
>         into where
>         the time is being spent for our query and see why its taking so
>         long.
>
>
>


Re: Benchmark Query Update

Posted by Eldon Carman <ec...@ucr.edu>.
The query plans are so big, I attached a document with the queries and
plans.


On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar <vi...@gmail.com> wrote:

> Preston,
>
> For each query, please send the following:
>
> 1. The query
> 2. The translated logical plan
> 3. The optimized physical plan
>
> Thanks,
> Vinayak
>
>
>
> On 11/27/13, 8:16 PM, Eldon Carman wrote:
>
>> It appears that our query process is taking longer than expected. I have
>> created a small set of sensors to test our benchmark queries. The data set
>> is about 250 MB and the queries execute in 10 to 20 seconds with the SAXON
>> XSLT processor. When I tried a few of the queries on VXQuery, the process
>> ran for one hour and still did not complete. I am now looking into where
>> the time is being spent for our query and see why its taking so long.
>>
>>
>

Re: Benchmark Query Update

Posted by Vinayak Borkar <vi...@gmail.com>.
Preston,

For each query, please send the following:

1. The query
2. The translated logical plan
3. The optimized physical plan

Thanks,
Vinayak


On 11/27/13, 8:16 PM, Eldon Carman wrote:
> It appears that our query process is taking longer than expected. I have
> created a small set of sensors to test our benchmark queries. The data set
> is about 250 MB and the queries execute in 10 to 20 seconds with the SAXON
> XSLT processor. When I tried a few of the queries on VXQuery, the process
> ran for one hour and still did not complete. I am now looking into where
> the time is being spent for our query and see why its taking so long.
>