You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@bahir.apache.org by yanglei99 <gi...@git.apache.org> on 2017/04/07 23:27:16 UTC

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

GitHub user yanglei99 opened a pull request:

    https://github.com/apache/bahir/pull/41

    [BAHIR-102] Initial support of Cloudant Query with python examples

    All the existing examples (5 in scala, 3 in python) are tested successfully with and without Query enabled. 2 python examples are added with can drive into Query, and tested successfully with and without Query enabled. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanglei99/bahir sql-cloudant-query-new

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/bahir/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #41
    
----
commit 8ddbad23fef26bbb4c5e2f175bf345f2b4c126c3
Author: Yang Lei <ge...@gmail.com>
Date:   2017-04-07T23:23:43Z

    [BAHIR-102] Initial support of Cloudant Query with python examples

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by yanglei99 <gi...@git.apache.org>.

Github user yanglei99 commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110701584
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    --- End diff --
    
    how about this: 
    cloudant.useQuery|false|By default _all_docs endpoint is used if configuration 'view' and 'index' (see below) are not set. When useQuery is enabled, _find endpoint will be used in place of _all_docs, some of the query predicates will be driven into datastore. As RDD partition is ONE during _find, parallel loading is not achieved


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir issue #41: [BAHIR-102] Initial support of Cloudant Query with python e...

Posted by lresende <gi...@git.apache.org>.

Github user lresende commented on the issue:

    https://github.com/apache/bahir/pull/41
  
    LGTM, merging if there are no more comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir issue #41: [BAHIR-102] Initial support of Cloudant Query with python e...

Posted by lresende <gi...@git.apache.org>.

Github user lresende commented on the issue:

    https://github.com/apache/bahir/pull/41
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by yanglei99 <gi...@git.apache.org>.

Github user yanglei99 commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110697071
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    --- End diff --
    
    Setting the flag to true does not guarantee using _find. As find is only used to replace the _all_docs query, which will be done under condition that it is not a _view nor _index query. That is the reason the description is stated the way it is



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir issue #41: [BAHIR-102] Initial support of Cloudant Query with python e...

Posted by ApacheBahir <gi...@git.apache.org>.

Github user ApacheBahir commented on the issue:

    https://github.com/apache/bahir/pull/41
  
    
    Refer to this link for build results (access rights to CI server needed): 
    http://169.45.79.58:8080/job/bahir_spark_pr_builder/54/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by emlaver <gi...@git.apache.org>.

Github user emlaver commented on a diff in the pull request:

https://github.com/apache/bahir/pull/41#discussion_r110706772

--- Diff: sql-cloudant/README.md ---
@@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
cloudant.host||cloudant host url
cloudant.username||cloudant userid
cloudant.password||cloudant password
+cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
--- End diff --

Ok - I'm still having a hard time understanding the statement `some of the query predicates will be driven into datastore`. What do you mean by driven? Does it have to do with `partitions used to drive JsonStoreRDD loading query result in parallel` ?
Also, `As RDD partition is ONE during _find` - one what? set to one partition for the `jsonstore.rdd.partitions` configuration?
And for `parallel loading is not achieved` - This sounds awkward to me. Is it not achieved because the `_find` endpoint only supports one partition?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by emlaver <gi...@git.apache.org>.

Github user emlaver commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110696516
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    --- End diff --
    
    This description is a bit unclear.  How about: `When set to true, the _find endpoint will be queried.  The _all_docs endpoint will be used when set to false (default setting).  Note: When querying against the _find endpoint, RDD partition is limited to one so parallel loading is not an available option.`  Also, I'm having a hard time understanding this: `some query predicates will be driven into datastore`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by emlaver <gi...@git.apache.org>.

Github user emlaver commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110706837
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    +cloudant.queryLimit|25|the limit set onto _find query
    --- End diff --
    
    Ok great, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by yanglei99 <gi...@git.apache.org>.

Github user yanglei99 commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110726127
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    --- End diff --
    
    I chose the word useQuery, as Query is Cloudant's special word.  Also when useQuery is not used, pushing down can also happens for primary key (_id) query. So the above will not be accurate. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/bahir/pull/41


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir issue #41: [BAHIR-102] Initial support of Cloudant Query with python e...

Posted by ApacheBahir <gi...@git.apache.org>.

Github user ApacheBahir commented on the issue:

    https://github.com/apache/bahir/pull/41
  
    Build successful
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by yanglei99 <gi...@git.apache.org>.

Github user yanglei99 commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110700478
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    +cloudant.queryLimit|25|the limit set onto _find query
    --- End diff --
    
    Will change the cloudant.queryLimit|25 comment as you suggested



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by emlaver <gi...@git.apache.org>.

Github user emlaver commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110697369
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    +cloudant.queryLimit|25|the limit set onto _find query
    --- End diff --
    
    `The maximum number of results returned when querying the _find endpoint`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by lresende <gi...@git.apache.org>.

Github user lresende commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110724252
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    --- End diff --
    
    my $ 0.02c, how about something like this... :
    
    Clouding.optimizeQuery | false | When the optimizeQuery is enabled, the data source will look for ways to optimize the query execution by rewriting it or pushing down the execution to the data store. One example is the utilization of explicit _find with query predicates to be executed in the datastore instead of _all_docs. However, as RDD partition is ONE during _find, parallel loading is not achieved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by yanglei99 <gi...@git.apache.org>.

Github user yanglei99 commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110726384
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    --- End diff --
    
    "some of the query predicates will be driven into datastore", as we do not do a full SQL translation for all the query predicates. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir pull request #41: [BAHIR-102] Initial support of Cloudant Query with p...

Posted by yanglei99 <gi...@git.apache.org>.

Github user yanglei99 commented on a diff in the pull request:

    https://github.com/apache/bahir/pull/41#discussion_r110727111
  
    --- Diff: sql-cloudant/README.md ---
    @@ -62,6 +62,8 @@ cloudant.protocol|https|protocol to use to transfer data: http or https
     cloudant.host||cloudant host url
     cloudant.username||cloudant userid
     cloudant.password||cloudant password
    +cloudant.useQuery|false|When enabled, for query not using index or view, _find will be used instead of _all_docs, some query predicates will be driven into datastore. However, RDD partition is ONE during _find, so parallel loading is not achieved
    --- End diff --
    
    I can use something @lresende suggested : the data source will look for ways to optimize the query execution by pushing down some of the query predicates into the data store.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir issue #41: [BAHIR-102] Initial support of Cloudant Query with python e...

Posted by ApacheBahir <gi...@git.apache.org>.

Github user ApacheBahir commented on the issue:

    https://github.com/apache/bahir/pull/41
  
    
    Refer to this link for build results (access rights to CI server needed): 
    http://169.45.79.58:8080/job/bahir_spark_pr_builder/55/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir issue #41: [BAHIR-102] Initial support of Cloudant Query with python e...

Posted by ApacheBahir <gi...@git.apache.org>.

Github user ApacheBahir commented on the issue:

    https://github.com/apache/bahir/pull/41
  
    Build successful
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] bahir issue #41: [BAHIR-102] Initial support of Cloudant Query with python e...

Posted by ApacheBahir <gi...@git.apache.org>.

Github user ApacheBahir commented on the issue:

    https://github.com/apache/bahir/pull/41
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---