You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by di...@apache.org on 2020/08/04 07:33:49 UTC

[flink-web] 01/02: Update the blog date of 'Pandas support in PyFlink'

This is an automated email from the ASF dual-hosted git repository.

dianfu pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/flink-web.git

commit 16a18c7349d05cf59bbabd6bb89c709bf46482fa
Author: Dian Fu <di...@apache.org>
AuthorDate: Tue Aug 4 15:22:20 2020 +0800

    Update the blog date of 'Pandas support in PyFlink'
---
 ...nk.md => 2020-08-04-pyflink-pandas-udf-support-flink.md} |   8 ++++----
 content/2020/07/28/pyflink-pandas-udf-support-flink.html    |   6 +++---
 .../mission-of-pyFlink.gif                                  | Bin
 .../python-scientific-stack.png                             | Bin
 .../vm-communication.png                                    | Bin
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/_posts/2020-07-28-pyflink-pandas-udf-support-flink.md b/_posts/2020-08-04-pyflink-pandas-udf-support-flink.md
similarity index 98%
rename from _posts/2020-07-28-pyflink-pandas-udf-support-flink.md
rename to _posts/2020-08-04-pyflink-pandas-udf-support-flink.md
index f9871c5..ae0ee73 100644
--- a/_posts/2020-07-28-pyflink-pandas-udf-support-flink.md
+++ b/_posts/2020-08-04-pyflink-pandas-udf-support-flink.md
@@ -1,7 +1,7 @@
 ---
 layout: post
 title: "PyFlink: The integration of Pandas into PyFlink"
-date: 2020-07-28T12:00:00.000Z
+date: 2020-08-04T00:00:00.000Z
 authors:
 - Jincheng:
   name: "Jincheng Sun"
@@ -15,7 +15,7 @@ excerpt: The Apache Flink community put some great effort into integrating Panda
 Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities. 
 
 <center>
-<img src="{{ site.baseurl }}/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png" width="450px" alt="Python Scientific Stack"/>
+<img src="{{ site.baseurl }}/img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png" width="450px" alt="Python Scientific Stack"/>
 </center>
 <center>
   <a href="https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science?slide=52">Pic source: VanderPlas 2017, slide 52.</a>
@@ -50,7 +50,7 @@ While providing support for Python UDFs in PyFlink greatly improved the user exp
 The introduction of Pandas UDF is used to address these drawbacks. For Pandas UDF, a batch of rows is transferred between the JVM and PVM in a columnar format ([Arrow memory format](https://arrow.apache.org/docs/format/Columnar.html)). The batch of rows will be converted into a collection of Pandas Series and will be transferred to the Pandas UDF to then leverage popular Python libraries (such as Pandas, or NumPy) for the Python UDF implementation.
 
 <center>
-<img src="{{ site.baseurl }}/img/blog/2020-07-28-pyflink-pandas/vm-communication.png" width="550px" alt="VM Communication"/>
+<img src="{{ site.baseurl }}/img/blog/2020-08-04-pyflink-pandas/vm-communication.png" width="550px" alt="VM Communication"/>
 </center>
 
 
@@ -234,5 +234,5 @@ In this article, we introduce the integration of Pandas in Flink 1.11, including
 Future work by the community will focus on adding more features and bringing additional optimizations with follow up releases.  Such optimizations and additions include a Python DataStream API and more integration with the Python ecosystem, such as support for distributed Pandas in Flink. Stay tuned for more information and updates with the upcoming releases!
 
 <center>
-<img src="{{ site.baseurl }}/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif" width="600px" alt="Mission of PyFlink"/>
+<img src="{{ site.baseurl }}/img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif" width="600px" alt="Mission of PyFlink"/>
 </center>
diff --git a/content/2020/07/28/pyflink-pandas-udf-support-flink.html b/content/2020/07/28/pyflink-pandas-udf-support-flink.html
index 05b2e6e..99a170a 100644
--- a/content/2020/07/28/pyflink-pandas-udf-support-flink.html
+++ b/content/2020/07/28/pyflink-pandas-udf-support-flink.html
@@ -211,7 +211,7 @@
 <p>Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities.</p>
 
 <center>
-<img src="/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png" width="450px" alt="Python Scientific Stack" />
+<img src="/img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png" width="450px" alt="Python Scientific Stack" />
 </center>
 <center>
   <a href="https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science?slide=52">Pic source: VanderPlas 2017, slide 52.</a>
@@ -255,7 +255,7 @@ Currently, only Scalar Pandas UDFs are supported in PyFlink.</p>
 <p>The introduction of Pandas UDF is used to address these drawbacks. For Pandas UDF, a batch of rows is transferred between the JVM and PVM in a columnar format (<a href="https://arrow.apache.org/docs/format/Columnar.html">Arrow memory format</a>). The batch of rows will be converted into a collection of Pandas Series and will be transferred to the Pandas UDF to then leverage popular Python libraries (such as Pandas, or NumPy) for the Python UDF implementation.</p>
 
 <center>
-<img src="/img/blog/2020-07-28-pyflink-pandas/vm-communication.png" width="550px" alt="VM Communication" />
+<img src="/img/blog/2020-08-04-pyflink-pandas/vm-communication.png" width="550px" alt="VM Communication" />
 </center>
 
 <p>The performance of vectorized UDFs is usually much higher when compared to the normal Python UDF, as the serialization/deserialization overhead is minimized by falling back to <a href="https://arrow.apache.org/">Apache Arrow</a>, while handling <code>pandas.Series</code> as input/output allows us to take full advantage of the Pandas and NumPy libraries, making it a popular solution to parallelize Machine Learning and other large-scale, distributed data science workloads (e.g. feature  [...]
@@ -407,7 +407,7 @@ With the function, you can register and use it in the same way as the <a href="h
 <p>Future work by the community will focus on adding more features and bringing additional optimizations with follow up releases.  Such optimizations and additions include a Python DataStream API and more integration with the Python ecosystem, such as support for distributed Pandas in Flink. Stay tuned for more information and updates with the upcoming releases!</p>
 
 <center>
-<img src="/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif" width="600px" alt="Mission of PyFlink" />
+<img src="/img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif" width="600px" alt="Mission of PyFlink" />
 </center>
 
       </article>
diff --git a/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif b/img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif
similarity index 100%
rename from img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif
rename to img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif
diff --git a/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png b/img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png
similarity index 100%
rename from img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png
rename to img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png
diff --git a/img/blog/2020-07-28-pyflink-pandas/vm-communication.png b/img/blog/2020-08-04-pyflink-pandas/vm-communication.png
similarity index 100%
rename from img/blog/2020-07-28-pyflink-pandas/vm-communication.png
rename to img/blog/2020-08-04-pyflink-pandas/vm-communication.png