You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/10/03 11:21:07 UTC

[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/22620

    [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement

    ## What changes were proposed in this pull request?
    
    This PR proposes to register Grouped aggregate UDF Vectorized UDFs for SQL Statement, for instance:
    
    ```python
    from pyspark.sql.functions import pandas_udf, PandasUDFType
    
    @pandas_udf("integer", PandasUDFType.GROUPED_AGG)  # doctest: +SKIP
    def sum_udf(v):
        return v.sum()
    
    spark.udf.register("sum_udf", sum_udf)  # doctest: +SKIP
    q = "SELECT sum_udf(v1) FROM VALUES (3, 0), (2, 0), (1, 1) tbl(v1, v2) GROUP BY v2"
    spark.sql(q).show()
    
    +-----------+
    |sum_udf(v1)|
    +-----------+
    |          1|
    |          5|
    +-----------+
    ```
    
    ## How was this patch tested?
    
    Manual test and unit test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-25601

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22620
    
----
commit 06a7bd0c7daed0f3af5b42c1ea8a9b4b5e2e6216
Author: hyukjinkwon <gu...@...>
Date:   2018-10-03T11:13:32Z

    Register Grouped aggregate UDF Vectorized UDFs for SQL Statement

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22620


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22620#discussion_r222456993
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,9 +319,11 @@ def register(self, name, f, returnType=None):
                         "Invalid returnType: data type can not be specified when f is"
                         "a user-defined function, but got %s." % returnType)
                 if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF]:
    +                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF,
    +                                  PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF]:
    --- End diff --
    
    We don't need it here:
    
    Users specify GROUPED_AGG only. GROUPED_AGG is turned to WINDOW_AGG eval type in WindowInPandasExec.
    
    Admittedly, there is a bit confusion here we can improve. We just haven't got a user specified udf type that maps to multiple evalType before WINDOW_AGG.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    cc @BryanCutler, @gatorsmile and @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22620#discussion_r222411585
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -298,6 +298,15 @@ def register(self, name, f, returnType=None):
                 >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
                 [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
     
    +            >>> @pandas_udf("integer", PandasUDFType.GROUPED_AGG)  # doctest: +SKIP
    +            ... def sum_udf(v):
    +            ...     return v.sum()
    +            ...
    +            >>> _ = spark.udf.register("sum_udf", sum_udf)  # doctest: +SKIP
    --- End diff --
    
    what is the "_ =" thing here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    **[Test build #96917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96917/testReport)** for PR 22620 at commit [`f36bc03`](https://github.com/apache/spark/commit/f36bc03caebedc6f1901f6fb29b824a36e754541).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3658/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22620#discussion_r222487117
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,9 +319,11 @@ def register(self, name, f, returnType=None):
                         "Invalid returnType: data type can not be specified when f is"
                         "a user-defined function, but got %s." % returnType)
                 if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF]:
    +                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF,
    +                                  PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF]:
    --- End diff --
    
    These need to be clearly defined in Apache Spark 3.0 release; otherwise, it might be confusing to both developers and end users. :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22620#discussion_r222417513
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -298,6 +298,15 @@ def register(self, name, f, returnType=None):
                 >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
                 [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
     
    +            >>> @pandas_udf("integer", PandasUDFType.GROUPED_AGG)  # doctest: +SKIP
    +            ... def sum_udf(v):
    +            ...     return v.sum()
    +            ...
    +            >>> _ = spark.udf.register("sum_udf", sum_udf)  # doctest: +SKIP
    --- End diff --
    
    Hide the output like ...
    
    ```
    >>> spark.udf.register("sum_udf", sum_udf)
    <function sum_udf at 0x103ff18c0>
    ```
    
    in the doctest.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3666/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    **[Test build #96916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96916/testReport)** for PR 22620 at commit [`97d0377`](https://github.com/apache/spark/commit/97d037733943e6e143088edc68ca958d31a81f33).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96917/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Merged to master and branch-2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    **[Test build #96896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96896/testReport)** for PR 22620 at commit [`06a7bd0`](https://github.com/apache/spark/commit/06a7bd0c7daed0f3af5b42c1ea8a9b4b5e2e6216).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22620#discussion_r222452544
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,9 +319,11 @@ def register(self, name, f, returnType=None):
                         "Invalid returnType: data type can not be specified when f is"
                         "a user-defined function, but got %s." % returnType)
                 if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF]:
    +                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF,
    +                                  PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF]:
    --- End diff --
    
    how about SQL_WINDOW_AGG_PANDAS_UDF?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96896/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Thank you @icexelloss, @gatorsmile, @BryanCutler and @viirya.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    **[Test build #96896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96896/testReport)** for PR 22620 at commit [`06a7bd0`](https://github.com/apache/spark/commit/06a7bd0c7daed0f3af5b42c1ea8a9b4b5e2e6216).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22620#discussion_r222421940
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -298,6 +298,15 @@ def register(self, name, f, returnType=None):
                 >>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # doctest: +SKIP
                 [Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
     
    +            >>> @pandas_udf("integer", PandasUDFType.GROUPED_AGG)  # doctest: +SKIP
    +            ... def sum_udf(v):
    +            ...     return v.sum()
    +            ...
    +            >>> _ = spark.udf.register("sum_udf", sum_udf)  # doctest: +SKIP
    --- End diff --
    
    Ha. I see..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96916/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22620#discussion_r222698014
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -310,9 +319,11 @@ def register(self, name, f, returnType=None):
                         "Invalid returnType: data type can not be specified when f is"
                         "a user-defined function, but got %s." % returnType)
                 if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
    -                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF]:
    +                                  PythonEvalType.SQL_SCALAR_PANDAS_UDF,
    +                                  PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF]:
    --- End diff --
    
    I opened https://issues.apache.org/jira/browse/SPARK-25640 to track this.
    
    To be clear, this is transparent to end users, but I agree it can be confusing to developers.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    **[Test build #96917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96917/testReport)** for PR 22620 at commit [`f36bc03`](https://github.com/apache/spark/commit/f36bc03caebedc6f1901f6fb29b824a36e754541).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22620
  
    **[Test build #96916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96916/testReport)** for PR 22620 at commit [`97d0377`](https://github.com/apache/spark/commit/97d037733943e6e143088edc68ca958d31a81f33).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org