You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by gi...@apache.org on 2021/10/16 00:02:21 UTC

[beam] branch asf-site updated: Publishing website 2021/10/16 00:01:48 at commit 024d96c

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 6c94813  Publishing website 2021/10/16 00:01:48 at commit 024d96c
6c94813 is described below

commit 6c948132248d45ca8f7aff559c0b0eb82fd2cb43
Author: jenkins <bu...@apache.org>
AuthorDate: Sat Oct 16 00:01:49 2021 +0000

    Publishing website 2021/10/16 00:01:48 at commit 024d96c
---
 .../documentation/dsls/dataframes/overview/index.html   | 17 +++++++++--------
 .../documentation/dsls/sql/walkthrough/index.html       |  5 +++--
 website/generated-content/sitemap.xml                   |  2 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/website/generated-content/documentation/dsls/dataframes/overview/index.html b/website/generated-content/documentation/dsls/dataframes/overview/index.html
index a67b527..a1398bc 100644
--- a/website/generated-content/documentation/dsls/dataframes/overview/index.html
+++ b/website/generated-content/documentation/dsls/dataframes/overview/index.html
@@ -22,12 +22,16 @@ function openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfi
 Run in Colab</a></td></table><p><br><br><br><br></p><p>The Apache Beam Python SDK provides a DataFrame API for working with pandas-like <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>DataFrame</a> objects. The feature lets you convert a PCollection to a DataFrame and then interact with the DataFrame using the standard methods available on the pandas DataFrame API. The DataFrame API is built on top of the pandas implementation, and pandas DataFram [...]
 </code></pre><p>Note that the <em>same</em> <code>pandas</code> version should be installed on workers when executing DataFrame API pipelines on distributed runners. Reference <a href=https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt><code>base_image_requirements.txt</code></a> for the Beam release you are using to see what version of <code>pandas</code> will be used by default on workers.</p><h2 id=using-dataframes>Using DataFrames</h2><p>You c [...]
 
-<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span>
-  <span class=n>df</span> <span class=o>=</span> <span class=n>p</span> <span class=o>|</span> <span class=n>read_csv</span><span class=p>(</span><span class=s2>&#34;gs://apache-beam-samples/nyc_taxi/misc/sample.csv&#34;</span><span class=p>)</span>
-  <span class=n>agg</span> <span class=o>=</span> <span class=n>df</span><span class=p>[[</span><span class=s1>&#39;passenger_count&#39;</span><span class=p>,</span> <span class=s1>&#39;DOLocationID&#39;</span><span class=p>]]</span><span class=o>.</span><span class=n>groupby</span><span class=p>(</span><span class=s1>&#39;DOLocationID&#39;</span><span class=p>)</span><span class=o>.</span><span class=n>sum</span><span class=p>()</span>
-  <span class=n>agg</span><span class=o>.</span><span class=n>to_csv</span><span class=p>(</span><span class=s1>&#39;output&#39;</span><span class=p>)</span></code></pre></div></div></div><p>pandas is able to infer column names from the first row of the CSV data, which is where <code>passenger_count</code> and <code>DOLocationID</code> come from.</p><p>In this example, the only traditional Beam type is the <code>Pipeline</code> instance. Otherwise the example is written completely with t [...]
+<span class=k>with</span> <span class=n>pipeline</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span>
+  <span class=n>rides</span> <span class=o>=</span> <span class=n>p</span> <span class=o>|</span> <span class=n>read_csv</span><span class=p>(</span><span class=n>input_path</span><span class=p>)</span>
+
+  <span class=c1># Count the number of passengers dropped off per LocationID</span>
+  <span class=n>agg</span> <span class=o>=</span> <span class=n>rides</span><span class=o>.</span><span class=n>groupby</span><span class=p>(</span><span class=s1>&#39;DOLocationID&#39;</span><span class=p>)</span><span class=o>.</span><span class=n>passenger_count</span><span class=o>.</span><span class=n>sum</span><span class=p>()</span>
+  <span class=n>agg</span><span class=o>.</span><span class=n>to_csv</span><span class=p>(</span><span class=n>output_path</span><span class=p>)</span></code></pre></div></div></div><p>pandas is able to infer column names from the first row of the CSV data, which is where <code>passenger_count</code> and <code>DOLocationID</code> come from.</p><p>In this example, the only traditional Beam type is the <code>Pipeline</code> instance. Otherwise the example is written completely with the Dat [...]
 <span class=kn>from</span> <span class=nn>apache_beam.dataframe.convert</span> <span class=kn>import</span> <span class=n>to_pcollection</span>
 <span class=o>...</span>
+
+
     <span class=c1># Read the text file[pattern] into a PCollection.</span>
     <span class=n>lines</span> <span class=o>=</span> <span class=n>p</span> <span class=o>|</span> <span class=s1>&#39;Read&#39;</span> <span class=o>&gt;&gt;</span> <span class=n>ReadFromText</span><span class=p>(</span><span class=n>known_args</span><span class=o>.</span><span class=n>input</span><span class=p>)</span>
 
@@ -45,10 +49,7 @@ Run in Colab</a></td></table><p><br><br><br><br></p><p>The Apache Beam Python SD
     <span class=n>counted</span><span class=o>.</span><span class=n>to_csv</span><span class=p>(</span><span class=n>known_args</span><span class=o>.</span><span class=n>output</span><span class=p>)</span>
 
     <span class=c1># Deferred DataFrames can also be converted back to schema&#39;d PCollections</span>
-    <span class=n>counted_pc</span> <span class=o>=</span> <span class=n>to_pcollection</span><span class=p>(</span><span class=n>counted</span><span class=p>,</span> <span class=n>include_indexes</span><span class=o>=</span><span class=bp>True</span><span class=p>)</span>
-
-    <span class=c1># Do something with counted_pc</span>
-    <span class=o>...</span></code></pre></div></div></div><p>You can find the full wordcount example on
+    <span class=n>counted_pc</span> <span class=o>=</span> <span class=n>to_pcollection</span><span class=p>(</span><span class=n>counted</span><span class=p>,</span> <span class=n>include_indexes</span><span class=o>=</span><span class=bp>True</span><span class=p>)</span></code></pre></div></div></div><p>You can find the full wordcount example on
 <a href=https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/wordcount.py>GitHub</a>,
 along with other <a href=https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/>example DataFrame pipelines</a>.</p><p>It’s also possible to use the DataFrame API by passing a function to <a href=https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform><code>DataframeTransform</code></a>:</p><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy  [...]
 
diff --git a/website/generated-content/documentation/dsls/sql/walkthrough/index.html b/website/generated-content/documentation/dsls/sql/walkthrough/index.html
index 4287556..a0e6f89 100644
--- a/website/generated-content/documentation/dsls/sql/walkthrough/index.html
+++ b/website/generated-content/documentation/dsls/sql/walkthrough/index.html
@@ -113,8 +113,9 @@ to either a single <code>PCollection</code> or a <code>PCollectionTuple</code> w
 </span><span class=c1></span>    <span class=c1>// by joining two PCollections
 </span><span class=c1></span>    <span class=n>PCollection</span><span class=o>&lt;</span><span class=n>Row</span><span class=o>&gt;</span> <span class=n>output</span> <span class=o>=</span> <span class=n>namesAndFoods</span><span class=o>.</span><span class=na>apply</span><span class=o>(</span>
         <span class=n>SqlTransform</span><span class=o>.</span><span class=na>query</span><span class=o>(</span>
-            <span class=s>&#34;SELECT Names.appId, COUNT(Reviews.rating), AVG(Reviews.rating)&#34;</span>
-                <span class=o>+</span> <span class=s>&#34;FROM Apps INNER JOIN Reviews ON Apps.appId == Reviews.appId&#34;</span><span class=o>));</span>
+            <span class=s>&#34;SELECT Apps.appId, COUNT(Reviews.rating), AVG(Reviews.rating) &#34;</span>
+                <span class=o>+</span> <span class=s>&#34;FROM Apps INNER JOIN Reviews ON Apps.appId = Reviews.appId &#34;</span>
+                <span class=o>+</span> <span class=s>&#34;GROUP BY Apps.appId&#34;</span><span class=o>));</span>
     </code></pre></div></div></div></p></li></ul><p><a href=https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java>BeamSqlExample</a>
 in the code repository shows basic usage of both APIs.</p></div></div><footer class=footer><div class=footer__contained><div class=footer__cols><div class="footer__cols__col footer__cols__col__logos"><div class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg class=footer__logo alt="Apache logo"></div></div><div class=footer-wrapper><div class=wrapper-grid><div [...]
 <a href=http://www.apache.org>The Apache Software Foundation</a>
diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml
index 0bfeeb6..a962144 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b [...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b [...]
\ No newline at end of file