You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/08 08:33:44 UTC
[GitHub] spark pull request #22973: [SPARK-25972][SQL] Missed JSON options in streami...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22973
[SPARK-25972][SQL] Missed JSON options in streaming.py
## What changes were proposed in this pull request?
Added JSON options for `json()` in streaming.py that are presented in the similar method in readwriter.py. In particular, missed options are `dropFieldIfAllNull` and `encoding`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 streaming-missed-options
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22973.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22973
----
commit da697acc60326017ce0b9a999df124ffb1ca9276
Author: Maxim Gekk <ma...@...>
Date: 2018-11-08T08:28:35Z
Adding missed options
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22973: [SPARK-25972][PYTHON] Missed JSON options in stre...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22973#discussion_r232484720
--- Diff: python/pyspark/sql/streaming.py ---
@@ -467,11 +468,18 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
:param allowUnquotedControlChars: allows JSON Strings to contain unquoted control
characters (ASCII characters with value less than 32,
including tab and line feed characters) or not.
+ :param encoding: allows to forcibly set one of standard basic or extended encoding for
+ the JSON files. For example UTF-16BE, UTF-32LE. If None is set,
+ the encoding of input JSON will be detected automatically
+ when the multiLine option is set to ``true``.
:param lineSep: defines the line separator that should be used for parsing. If None is
set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
:param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
it uses the default value, ``en-US``. For instance, ``locale`` is used while
parsing dates and timestamps.
+ :param dropFieldIfAllNull: whether to ignore column of all null values or empty
+ array/struct during schema inference. If None is set, it
+ uses the default value, ``false``.
--- End diff --
@MaxGekk, let's match its order (the doc and parameters).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22973: [SPARK-25972][PYTHON] Missed JSON options in stre...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22973
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98589/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22973
**[Test build #98695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98695/testReport)** for PR 22973 at commit [`5b132ce`](https://github.com/apache/spark/commit/5b132ce59c5bdfab72bebe69123d84a52d261e1d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98695/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22973
**[Test build #98695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98695/testReport)** for PR 22973 at commit [`5b132ce`](https://github.com/apache/spark/commit/5b132ce59c5bdfab72bebe69123d84a52d261e1d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22973
> Oh, let's name it [PYTHON] BTW.
I see some of on-going PR have tag `[PYSPARK]`. Is this somewhere documented?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98639/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22973
Oh, let's name it `[PYTHON]` BTW.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22973
**[Test build #98639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98639/testReport)** for PR 22973 at commit [`4ca71fc`](https://github.com/apache/spark/commit/4ca71fc75d0a25ced9803372b0594ae8342b5eb9).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22973
**[Test build #98639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98639/testReport)** for PR 22973 at commit [`4ca71fc`](https://github.com/apache/spark/commit/4ca71fc75d0a25ced9803372b0594ae8342b5eb9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22973
**[Test build #98589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98589/testReport)** for PR 22973 at commit [`da697ac`](https://github.com/apache/spark/commit/da697acc60326017ce0b9a999df124ffb1ca9276).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22973
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22973: [SPARK-25972][PYTHON] Missed JSON options in stre...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22973#discussion_r232485771
--- Diff: python/pyspark/sql/streaming.py ---
@@ -467,11 +468,18 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
:param allowUnquotedControlChars: allows JSON Strings to contain unquoted control
characters (ASCII characters with value less than 32,
including tab and line feed characters) or not.
+ :param encoding: allows to forcibly set one of standard basic or extended encoding for
+ the JSON files. For example UTF-16BE, UTF-32LE. If None is set,
+ the encoding of input JSON will be detected automatically
+ when the multiLine option is set to ``true``.
:param lineSep: defines the line separator that should be used for parsing. If None is
set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
:param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
it uses the default value, ``en-US``. For instance, ``locale`` is used while
parsing dates and timestamps.
+ :param dropFieldIfAllNull: whether to ignore column of all null values or empty
+ array/struct during schema inference. If None is set, it
+ uses the default value, ``false``.
--- End diff --
re-ordered
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22973
@HyukjinKwon Thank you for pointing out the missed options. Please, have a look at the PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22973
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22973
Yea .. actually that's documented in https://spark.apache.org/contributing.html . Strictly it should be` PYTHON`
> The PR title should be of the form [SPARK-xxxx][COMPONENT] Title, where SPARK-xxxx is the relevant JIRA number, COMPONENT is one of the PR categories shown at spark-prs.appspot.com and Title may be the JIRA’s title or a more specific title describing the PR itself.
![screen shot 2018-11-08 at 7 19 07 pm](https://user-images.githubusercontent.com/6477701/48195647-2e353b80-e38b-11e8-8ac0-73a458ee00c0.png)
I tried to be clear for myself so I took a look before but .. not a big deal at all :D.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22973
**[Test build #98589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98589/testReport)** for PR 22973 at commit [`da697ac`](https://github.com/apache/spark/commit/da697acc60326017ce0b9a999df124ffb1ca9276).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org