You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by uw...@apache.org on 2017/11/19 14:56:17 UTC

[04/19] arrow-site git commit: API doc update

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/ipc.html
----------------------------------------------------------------------
diff --git a/docs/python/ipc.html b/docs/python/ipc.html
index 3a49d76..60b107b 100644
--- a/docs/python/ipc.html
+++ b/docs/python/ipc.html
@@ -169,19 +169,6 @@ IO</span></a>.</p>
 <h3>Using streams<a class="headerlink" href="#using-streams" title="Permalink to this headline">¶</a></h3>
 <p>First, let’s create a small record batch:</p>
 <div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span>
-<span class="gt">---------------------------------------------------------------------------</span>
-<span class="ne">ImportError</span><span class="g g-Whitespace">                               </span>Traceback (most recent call last)
-<span class="nn">&lt;ipython-input-1-852643f3aad4&gt;</span> in <span class="ni">&lt;module&gt;</span><span class="nt">()</span>
-<span class="ne">----&gt; </span><span class="mi">1</span> <span class="kn">import</span> <span class="nn">pyarrow</span> <span class="kn">as</span> <span class="nn">pa</span>
-
-<span class="nn">~apache-arrow/arrow/python/pyarrow/__init__.py</span> in <span class="ni">&lt;module&gt;</span><span class="nt">()</span>
-<span class="g g-Whitespace">     </span><span class="mi">30</span> 
-<span class="g g-Whitespace">     </span><span class="mi">31</span> 
-<span class="ne">---&gt; </span><span class="mi">32</span> <span class="kn">from</span> <span class="nn">pyarrow.lib</span> <span class="kn">import</span> <span class="n">cpu_count</span><span class="p">,</span> <span class="n">set_cpu_count</span>
-<span class="g g-Whitespace">     </span><span class="mi">33</span> <span class="kn">from</span> <span class="nn">pyarrow.lib</span> <span class="kn">import</span> <span class="p">(</span><span class="n">null</span><span class="p">,</span> <span class="n">bool_</span><span class="p">,</span>
-<span class="g g-Whitespace">     </span><span class="mi">34</span>                          <span class="n">int8</span><span class="p">,</span> <span class="n">int16</span><span class="p">,</span> <span class="n">int32</span><span class="p">,</span> <span class="n">int64</span><span class="p">,</span>
-
-<span class="ne">ImportError</span>: libarrow.so.0: cannot open shared object file: No such file or directory
 
 <span class="gp">In [2]: </span><span class="n">data</span> <span class="o">=</span> <span class="p">[</span>
 <span class="gp">   ...: </span>    <span class="n">pa</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]),</span>
@@ -189,60 +176,22 @@ IO</span></a>.</p>
 <span class="gp">   ...: </span>    <span class="n">pa</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="bp">True</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">])</span>
 <span class="gp">   ...: </span><span class="p">]</span>
 <span class="gp">   ...: </span>
-<span class="go">
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-2-be49d87abbaf&gt; in &lt;module&gt;()</span>
-<span class="go">      1 data = [</span>
-<span class="go">----&gt; 2     pa.array([1, 2, 3, 4]),</span>
-<span class="go">      3     pa.array([&#39;foo&#39;, &#39;bar&#39;, &#39;baz&#39;, None]),</span>
-<span class="go">      4     pa.array([True, None, False, True])</span>
-<span class="go">      5 ]</span>
-
-<span class="go">NameError: name &#39;pa&#39; is not defined</span>
 
 <span class="gp">In [3]: </span><span class="n">batch</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">RecordBatch</span><span class="o">.</span><span class="n">from_arrays</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="p">[</span><span class="s1">&#39;f0&#39;</span><span class="p">,</span> <span class="s1">&#39;f1&#39;</span><span class="p">,</span> <span class="s1">&#39;f2&#39;</span><span class="p">])</span>
-<span class="go">
 
 
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-3-d86ae0d33275&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 batch = pa.RecordBatch.from_arrays(data, [&#39;f0&#39;, &#39;f1&#39;, &#39;f2&#39;])</span>
-
-<span class="go">NameError: name &#39;pa&#39; is not defined</span>
 
 <span class="gp">In [4]: </span><span class="n">batch</span><span class="o">.</span><span class="n">num_rows</span>
-<span class="go">
 
 
 
 
 
 
 
 ---------------------------------------------------------
 ------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-4-1c9023f6baf6&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 batch.num_rows</span>
-
-<span class="go">NameError: name &#39;batch&#39; is not defined</span>
+<span class="gh">Out[4]: </span><span class="go">4</span>
 
 <span class="gp">In [5]: </span><span class="n">batch</span><span class="o">.</span><span class="n">num_columns</span>
-<span class="go">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-5-8a6dbe0adf3d&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 batch.num_columns</span>
-
-<span class="go">NameError: name &#39;batch&#39; is not defined</span>
+<span class="go">Out[5]: 3</span>
 </pre></div>
 </div>
 <p>Now, we can begin writing a stream containing some number of these batches. For
-this we use <code class="xref py py-class docutils literal"><span class="pre">RecordBatchStreamWriter</span></code>, which can write to a writeable
+this we use <a class="reference internal" href="generated/pyarrow.RecordBatchStreamWriter.html#pyarrow.RecordBatchStreamWriter" title="pyarrow.RecordBatchStreamWriter"><code class="xref py py-class docutils literal"><span class="pre">RecordBatchStreamWriter</span></code></a>, which can write to a writeable
 <code class="docutils literal"><span class="pre">NativeFile</span></code> object or a writeable Python object:</p>
 <div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [6]: </span><span class="n">sink</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">BufferOutputStream</span><span class="p">()</span>
-<span class="gt">---------------------------------------------------------------------------</span>
-<span class="ne">NameError</span><span class="g g-Whitespace">                                 </span>Traceback (most recent call last)
-<span class="nn">&lt;ipython-input-6-2d52de97a6bb&gt;</span> in <span class="ni">&lt;module&gt;</span><span class="nt">()</span>
-<span class="ne">----&gt; </span><span class="mi">1</span> <span class="n">sink</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">BufferOutputStream</span><span class="p">()</span>
-
-<span class="ne">NameError</span>: name &#39;pa&#39; is not defined
 
 <span class="gp">In [7]: </span><span class="n">writer</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">RecordBatchStreamWriter</span><span class="p">(</span><span class="n">sink</span><span class="p">,</span> <span class="n">batch</span><span class="o">.</span><span class="n">schema</span><span class="p">)</span>
-<span class="go">---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-7-c4194fd1d755&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 writer = pa.RecordBatchStreamWriter(sink, batch.schema)</span>
-
-<span class="go">NameError: name &#39;pa&#39; is not defined</span>
 </pre></div>
 </div>
 <p>Here we used an in-memory Arrow buffer stream, but this could have been a
@@ -253,84 +202,38 @@ particular stream. Now we can do:</p>
 <div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [8]: </span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
 <span class="gp">   ...: </span>   <span class="n">writer</span><span class="o">.</span><span class="n">write_batch</span><span class="p">(</span><span class="n">batch</span><span class="p">)</span>
 <span class="gp">   ...: </span>
-<span class="gt">---------------------------------------------------------------------------</span>
-<span class="ne">NameError</span><span class="g g-Whitespace">                                 </span>Traceback (most recent call last)
-<span class="nn">&lt;ipython-input-8-bffd3bbe4251&gt;</span> in <span class="ni">&lt;module&gt;</span><span class="nt">()</span>
-<span class="g g-Whitespace">      </span><span class="mi">1</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
-<span class="ne">----&gt; </span><span class="mi">2</span>    <span class="n">writer</span><span class="o">.</span><span class="n">write_batch</span><span class="p">(</span><span class="n">batch</span><span class="p">)</span>
-<span class="g g-Whitespace">      </span><span class="mi">3</span> 
-
-<span class="ne">NameError</span>: name &#39;writer&#39; is not defined
 
 <span class="gp">In [9]: </span><span class="n">writer</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
-<span class="go">-----------------------------------
 ----------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-9-5f6d4868f1d2&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 writer.close()</span>
-
-<span class="go">NameError: name &#39;writer&#39; is not defined</span>
 
 <span class="gp">In [10]: </span><span class="n">buf</span> <span class="o">=</span> <span class="n">sink</span><span class="o">.</span><span class="n">get_result</span><span class="p">()</span>
-<span class="go">
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-10-a94efaead943&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 buf = sink.get_result()</span>
-
-<span class="go">NameError: name &#39;sink&#39; is not defined</span>
 
 <span class="gp">In [11]: </span><span class="n">buf</span><span class="o">.</span><span class="n">size</span>
-<span class="go">
 
 
 
 
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-11-118f90487584&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 buf.size</span>
-
-<span class="go">NameError: name &#39;buf&#39; is not defined</span>
+<span class="gh">Out[11]: </span><span class="go">2108</span>
 </pre></div>
 </div>
 <p>Now <code class="docutils literal"><span class="pre">buf</span></code> contains the complete stream as an in-memory byte buffer. We can
-read such a stream with <code class="xref py py-class docutils literal"><span class="pre">RecordBatchStreamReader</span></code> or the
+read such a stream with <a class="reference internal" href="generated/pyarrow.RecordBatchStreamReader.html#pyarrow.RecordBatchStreamReader" title="pyarrow.RecordBatchStreamReader"><code class="xref py py-class docutils literal"><span class="pre">RecordBatchStreamReader</span></code></a> or the
 convenience function <code class="docutils literal"><span class="pre">pyarrow.open_stream</span></code>:</p>
 <div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [12]: </span><span class="n">reader</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">open_stream</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>
-<span class="gt">---------------------------------------------------------------------------</span>
-<span class="ne">NameError</span><span class="g g-Whitespace">                                 </span>Traceback (most recent call last)
-<span class="nn">&lt;ipython-input-12-818efbf851e7&gt;</span> in <span class="ni">&lt;module&gt;</span><span class="nt">()</span>
-<span class="ne">----&gt; </span><span class="mi">1</span> <span class="n">reader</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">open_stream</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>
-
-<span class="ne">NameError</span>: name &#39;pa&#39; is not defined
 
 <span class="gp">In [13]: </span><span class="n">reader</span><span class="o">.</span><span class="n">schema</span>
-<span class="go">---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-13-701a37ca4899&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 reader.schema</span>
-
-<span class="go">NameError: name &#39;reader&#39; is not defined</span>
+<span class="gh">Out[13]: </span><span class="go"></span>
+<span class="go">f0: int64</span>
+<span class="go">f1: string</span>
+<span class="go">f2: bool</span>
+<span class="go">metadata</span>
+<span class="gt">--------</span>
+<span class="p">{}</span>
 
 <span class="gp">In [14]: </span><span class="n">batches</span> <span class="o">=</span> <span class="p">[</span><span class="n">b</span> <span class="k">for</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">]</span>
-<span class="go">
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-14-677b5cf09c5c&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 batches = [b for b in reader]</span>
-
-<span class="go">NameError: name &#39;reader&#39; is not defined</span>
 
 <span class="gp">In [15]: </span><span class="nb">len</span><span class="p">(</span><span class="n">batches</span><span class="p">)</span>
-<span class="go">
 
 
 
 
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-15-4ac424215b05&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 len(batches)</span>
-
-<span class="go">NameError: name &#39;batches&#39; is not defined</span>
+<span class="gh">Out[15]: </span><span class="go">5</span>
 </pre></div>
 </div>
 <p>We can check the returned batches are the same as the original input:</p>
 <div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [16]: </span><span class="n">batches</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">equals</span><span class="p">(</span><span class="n">batch</span><span class="p">)</span>
-<span class="gt">---------------------------------------------------------------------------</span>
-<span class="ne">NameError</span><span class="g g-Whitespace">                                 </span>Traceback (most recent call last)
-<span class="nn">&lt;ipython-input-16-acb722b16630&gt;</span> in <span class="ni">&lt;module&gt;</span><span class="nt">()</span>
-<span class="ne">----&gt; </span><span class="mi">1</span> <span class="n">batches</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">equals</span><span class="p">(</span><span class="n">batch</span><span class="p">)</span>
-
-<span class="ne">NameError</span>: name &#39;batches&#39; is not defined
+<span class="gh">Out[16]: </span><span class="go">True</span>
 </pre></div>
 </div>
 <p>An important point is that if the input source supports zero-copy reads
@@ -339,99 +242,40 @@ batches are also zero-copy and do not allocate any new memory on read.</p>
 </div>
 <div class="section" id="writing-and-reading-random-access-files">
 <h3>Writing and Reading Random Access Files<a class="headerlink" href="#writing-and-reading-random-access-files" title="Permalink to this headline">¶</a></h3>
-<p>The <code class="xref py py-class docutils literal"><span class="pre">RecordBatchFileWriter</span></code> has the same API as
-<code class="xref py py-class docutils literal"><span class="pre">RecordBatchStreamWriter</span></code>:</p>
+<p>The <a class="reference internal" href="generated/pyarrow.RecordBatchFileWriter.html#pyarrow.RecordBatchFileWriter" title="pyarrow.RecordBatchFileWriter"><code class="xref py py-class docutils literal"><span class="pre">RecordBatchFileWriter</span></code></a> has the same API as
+<a class="reference internal" href="generated/pyarrow.RecordBatchStreamWriter.html#pyarrow.RecordBatchStreamWriter" title="pyarrow.RecordBatchStreamWriter"><code class="xref py py-class docutils literal"><span class="pre">RecordBatchStreamWriter</span></code></a>:</p>
 <div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [17]: </span><span class="n">sink</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">BufferOutputStream</span><span class="p">()</span>
-<span class="gt">---------------------------------------------------------------------------</span>
-<span class="ne">NameError</span><span class="g g-Whitespace">                                 </span>Traceback (most recent call last)
-<span class="nn">&lt;ipython-input-17-2d52de97a6bb&gt;</span> in <span class="ni">&lt;module&gt;</span><span class="nt">()</span>
-<span class="ne">----&gt; </span><span class="mi">1</span> <span class="n">sink</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">BufferOutputStream</span><span class="p">()</span>
-
-<span class="ne">NameError</span>: name &#39;pa&#39; is not defined
 
 <span class="gp">In [18]: </span><span class="n">writer</span> <span class="o">=</span> <span class="n">pa</span><span class="o">.</span><span class="n">RecordBatchFileWriter</span><span class="p">(</span><span class="n">sink</span><span class="p">,</span> <span class="n">batch</span><span class="o">.</span><span class="n">schema</span><span class="p">)</span>
-<span class="go">---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-18-94b770282636&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 writer = pa.RecordBatchFileWriter(sink, batch.schema)</span>
-
-<span class="go">NameError: name &#39;pa&#39; is not defined</span>
 
 <span class="gp">In [19]: </span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
 <span class="gp">   ....: </span>   <span class="n">writer</span><span class="o">.</span><span class="n">write_batch</span><span class="p">(</span><span class="n">batch</span><span class="p">)</span>
 <span class="gp">   ....: </span>
-<span class="go">
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-19-2c620e9e000a&gt; in &lt;module&gt;()</span>
-<span class="go">      1 for i in range(10):</span>
-<span class="go">----&gt; 2    writer.write_batch(batch)</span>
-<span class="go">      3 </span>
-
-<span class="go">NameError: name &#39;writer&#39; is not defined</span>
 
 <span class="gp">In [20]: </span><span class="n">writer</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
-<span class="go">
 
 
 
 
 
 ---------------------------------------------------------------------------</span>
-<span class="go">NameError                                 Traceback (most recent call last)</span>
-<span class="go">&lt;ipython-input-20-5f6d4868f1d2&gt; in &lt;module&gt;()</span>
-<span class="go">----&gt; 1 writer.close()</span>
-
-<span class="go">NameError: name &#39;writer&#39; is not defined</span>
 
 <span class="gp">In [21]: </span><span class="n">buf</span> <span class="o">=</span> <span class="n">sink</span><span class="o">.</span><span class="n">get_result</span><span class="p">()</span>
-<span class="go">
 
 
 
 

<TRUNCATED>