You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by uw...@apache.org on 2018/12/23 16:31:24 UTC

[17/51] [partial] arrow-site git commit: Upload nightly docs

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/62ef7145/docs/latest/python/benchmarks.html
----------------------------------------------------------------------
diff --git a/docs/latest/python/benchmarks.html b/docs/latest/python/benchmarks.html
new file mode 100644
index 0000000..b64d97c
--- /dev/null
+++ b/docs/latest/python/benchmarks.html
@@ -0,0 +1,247 @@
+
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Benchmarks &mdash; Apache Arrow v0.11.1.dev413+g23dfc1c5</title>
+  
+
+  
+  
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" /> 
+
+  
+  <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav">
+
+   
+  <div class="wy-grid-for-nav">
+
+    
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search">
+          
+
+          
+            <a href="../index.html" class="icon icon-home"> Apache Arrow
+          
+
+          
+          </a>
+
+          
+            
+            
+              <div class="version">
+                0.11.1.dev413+g23dfc1c5
+              </div>
+            
+          
+
+          
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+
+          
+        </div>
+
+        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+          
+            
+            
+              
+            
+            
+              <p class="caption"><span class="caption-text">Memory Format</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../format/README.html">Arrow specification documents</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Guidelines.html">Implementation guidelines</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Layout.html">Physical memory layout</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Metadata.html">Metadata: Logical types, schemas, data headers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/IPC.html">Interprocess messaging / communication (IPC)</a></li>
+</ul>
+<p class="caption"><span class="caption-text">Languages</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../cpp/index.html">C++ Implementation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="index.html">Python bindings</a></li>
+</ul>
+
+            
+          
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+      
+      <nav class="wy-nav-top" aria-label="top navigation">
+        
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">Apache Arrow</a>
+        
+      </nav>
+
+
+      <div class="wy-nav-content">
+        
+        <div class="rst-content">
+        
+          
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+
+  <ul class="wy-breadcrumbs">
+    
+      <li><a href="../index.html">Docs</a> &raquo;</li>
+        
+      <li>Benchmarks</li>
+    
+    
+      <li class="wy-breadcrumbs-aside">
+        
+            
+            <a href="../_sources/python/benchmarks.rst.txt" rel="nofollow"> View page source</a>
+          
+        
+      </li>
+    
+  </ul>
+
+  
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <div class="section" id="benchmarks">
+<h1>Benchmarks<a class="headerlink" href="#benchmarks" title="Permalink to this headline">¶</a></h1>
+<p>The <code class="docutils literal notranslate"><span class="pre">pyarrow</span></code> package comes with a suite of benchmarks meant to
+run with <a href="#id1"><span class="problematic" id="id2">`asv`_</span></a>.  You’ll need to install the <code class="docutils literal notranslate"><span class="pre">asv</span></code> package first
+(<code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">asv</span></code> or <code class="docutils literal notranslate"><span class="pre">conda</span> <span class="pre">install</span> <span class="pre">-c</span> <span class="pre">conda-forge</span> <span class="pre">asv</span></code>).</p>
+<p>The benchmarks are run using <a href="#id3"><span class="problematic" id="id4">`asv`_</span></a> which is also their only requirement.</p>
+<div class="section" id="running-the-benchmarks">
+<h2>Running the benchmarks<a class="headerlink" href="#running-the-benchmarks" title="Permalink to this headline">¶</a></h2>
+<p>To run the benchmarks, call <code class="docutils literal notranslate"><span class="pre">asv</span> <span class="pre">run</span> <span class="pre">--python=same</span></code>. You cannot use the
+plain <code class="docutils literal notranslate"><span class="pre">asv</span> <span class="pre">run</span></code> command at the moment as asv cannot handle python packages
+in subdirectories of a repository.</p>
+</div>
+<div class="section" id="running-with-arbitrary-revisions">
+<h2>Running with arbitrary revisions<a class="headerlink" href="#running-with-arbitrary-revisions" title="Permalink to this headline">¶</a></h2>
+<p>ASV allows to store results and generate graphs of the benchmarks over
+the project’s evolution.  For this you have the latest development version of ASV:</p>
+<div class="code highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">git</span><span class="o">+</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">airspeed</span><span class="o">-</span><span class="n">velocity</span><span class="o">/</span><span class="n">asv</span>
+</pre></div>
+</div>
+<p>Now you should be ready to run <code class="docutils literal notranslate"><span class="pre">asv</span> <span class="pre">run</span></code> or whatever other command
+suits your needs.</p>
+</div>
+<div class="section" id="compatibility">
+<h2>Compatibility<a class="headerlink" href="#compatibility" title="Permalink to this headline">¶</a></h2>
+<p>We only expect the benchmarking setup to work with Python 3.6 or later,
+on a Unix-like system.</p>
+</div>
+</div>
+
+
+           </div>
+           
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016-2018 Apache Software Foundation
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    
+    
+      <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
+        <script type="text/javascript" src="../_static/jquery.js"></script>
+        <script type="text/javascript" src="../_static/underscore.js"></script>
+        <script type="text/javascript" src="../_static/doctools.js"></script>
+    
+
+  
+
+  <script type="text/javascript" src="../_static/js/theme.js"></script>
+
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script>
+<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments);}
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+
+</body>
+</html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/62ef7145/docs/latest/python/csv.html
----------------------------------------------------------------------
diff --git a/docs/latest/python/csv.html b/docs/latest/python/csv.html
new file mode 100644
index 0000000..38a84c5
--- /dev/null
+++ b/docs/latest/python/csv.html
@@ -0,0 +1,324 @@
+
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Reading CSV files &mdash; Apache Arrow v0.11.1.dev473+g6ed02454</title>
+  
+
+  
+  
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="Reading and Writing the Apache Parquet Format" href="parquet.html" />
+    <link rel="prev" title="Pandas Integration" href="pandas.html" /> 
+
+  
+  <script src="../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav">
+
+   
+  <div class="wy-grid-for-nav">
+
+    
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search">
+          
+
+          
+            <a href="../index.html" class="icon icon-home"> Apache Arrow
+          
+
+          
+          </a>
+
+          
+            
+            
+              <div class="version">
+                0.11.1.dev473+g6ed02454
+              </div>
+            
+          
+
+          
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+
+          
+        </div>
+
+        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+          
+            
+            
+              
+            
+            
+              <p class="caption"><span class="caption-text">Memory Format</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../format/README.html">Arrow specification documents</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Guidelines.html">Implementation guidelines</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Layout.html">Physical memory layout</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/Metadata.html">Metadata: Logical types, schemas, data headers</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../format/IPC.html">Interprocess messaging / communication (IPC)</a></li>
+</ul>
+<p class="caption"><span class="caption-text">Languages</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal" href="../cpp/index.html">C++ Implementation</a></li>
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Python bindings</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="install.html">Installing PyArrow</a></li>
+<li class="toctree-l2"><a class="reference internal" href="memory.html">Memory and IO Interfaces</a></li>
+<li class="toctree-l2"><a class="reference internal" href="data.html">Data Types and In-Memory Data Model</a></li>
+<li class="toctree-l2"><a class="reference internal" href="ipc.html">Streaming, Serialization, and IPC</a></li>
+<li class="toctree-l2"><a class="reference internal" href="filesystems.html">File System Interfaces</a></li>
+<li class="toctree-l2"><a class="reference internal" href="plasma.html">The Plasma In-Memory Object Store</a></li>
+<li class="toctree-l2"><a class="reference internal" href="numpy.html">NumPy Integration</a></li>
+<li class="toctree-l2"><a class="reference internal" href="pandas.html">Pandas Integration</a></li>
+<li class="toctree-l2 current"><a class="current reference internal" href="#">Reading CSV files</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="#usage">Usage</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#customized-parsing">Customized parsing</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#customized-conversion">Customized conversion</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#performance">Performance</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="parquet.html">Reading and Writing the Apache Parquet Format</a></li>
+<li class="toctree-l2"><a class="reference internal" href="extending.html">Using pyarrow from C++ and Cython Code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="api.html">API Reference</a></li>
+<li class="toctree-l2"><a class="reference internal" href="development.html">Development</a></li>
+<li class="toctree-l2"><a class="reference internal" href="getting_involved.html">Getting Involved</a></li>
+</ul>
+</li>
+</ul>
+
+            
+          
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+      
+      <nav class="wy-nav-top" aria-label="top navigation">
+        
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">Apache Arrow</a>
+        
+      </nav>
+
+
+      <div class="wy-nav-content">
+        
+        <div class="rst-content">
+        
+          
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+
+  <ul class="wy-breadcrumbs">
+    
+      <li><a href="../index.html">Docs</a> &raquo;</li>
+        
+          <li><a href="index.html">Python bindings</a> &raquo;</li>
+        
+      <li>Reading CSV files</li>
+    
+    
+      <li class="wy-breadcrumbs-aside">
+        
+            
+            <a href="../_sources/python/csv.rst.txt" rel="nofollow"> View page source</a>
+          
+        
+      </li>
+    
+  </ul>
+
+  
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <div class="section" id="reading-csv-files">
+<span id="csv"></span><h1>Reading CSV files<a class="headerlink" href="#reading-csv-files" title="Permalink to this headline">¶</a></h1>
+<p>Arrow provides preliminary support for reading data from CSV files.
+The features currently offered are the following:</p>
+<ul class="simple">
+<li>multi-threaded or single-threaded reading</li>
+<li>automatic decompression of input files (based on the filename extension,
+such as <code class="docutils literal notranslate"><span class="pre">my_data.csv.gz</span></code>)</li>
+<li>fetching column names from the first row in the CSV file</li>
+<li>column-wise type inference and conversion to one of <code class="docutils literal notranslate"><span class="pre">null</span></code>, <code class="docutils literal notranslate"><span class="pre">int64</span></code>,
+<code class="docutils literal notranslate"><span class="pre">float64</span></code>, <code class="docutils literal notranslate"><span class="pre">timestamp[s]</span></code>, <code class="docutils literal notranslate"><span class="pre">string</span></code> or <code class="docutils literal notranslate"><span class="pre">binary</span></code> data</li>
+<li>detecting various spellings of null values such as <code class="docutils literal notranslate"><span class="pre">NaN</span></code> or <code class="docutils literal notranslate"><span class="pre">#N/A</span></code></li>
+</ul>
+<div class="section" id="usage">
+<h2>Usage<a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h2>
+<p>CSV reading functionality is available through the <code class="xref py py-mod docutils literal notranslate"><span class="pre">pyarrow.csv</span></code> module.
+In many cases, you will simply call the <a class="reference internal" href="generated/pyarrow.csv.read_csv.html#pyarrow.csv.read_csv" title="pyarrow.csv.read_csv"><code class="xref py py-func docutils literal notranslate"><span class="pre">read_csv()</span></code></a> function
+with the file path you want to read from:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pyarrow</span> <span class="k">import</span> <span class="n">csv</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">fn</span> <span class="o">=</span> <span class="s1">&#39;tips.csv.gz&#39;</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">table</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">fn</span><span class="p">)</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">table</span>
+<span class="go">pyarrow.Table</span>
+<span class="go">total_bill: double</span>
+<span class="go">tip: double</span>
+<span class="go">sex: string</span>
+<span class="go">smoker: string</span>
+<span class="go">day: string</span>
+<span class="go">time: string</span>
+<span class="go">size: int64</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="nb">len</span><span class="p">(</span><span class="n">table</span><span class="p">)</span>
+<span class="go">244</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">to_pandas</span><span class="p">()</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
+<span class="go">   total_bill   tip     sex smoker  day    time  size</span>
+<span class="go">0       16.99  1.01  Female     No  Sun  Dinner     2</span>
+<span class="go">1       10.34  1.66    Male     No  Sun  Dinner     3</span>
+<span class="go">2       21.01  3.50    Male     No  Sun  Dinner     3</span>
+<span class="go">3       23.68  3.31    Male     No  Sun  Dinner     2</span>
+<span class="go">4       24.59  3.61  Female     No  Sun  Dinner     4</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="customized-parsing">
+<h2>Customized parsing<a class="headerlink" href="#customized-parsing" title="Permalink to this headline">¶</a></h2>
+<p>To alter the default parsing settings in case of reading CSV files with an
+unusual structure, you should create a <a class="reference internal" href="generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions" title="pyarrow.csv.ParseOptions"><code class="xref py py-class docutils literal notranslate"><span class="pre">ParseOptions</span></code></a> instance
+and pass it to <a class="reference internal" href="generated/pyarrow.csv.read_csv.html#pyarrow.csv.read_csv" title="pyarrow.csv.read_csv"><code class="xref py py-func docutils literal notranslate"><span class="pre">read_csv()</span></code></a>.</p>
+</div>
+<div class="section" id="customized-conversion">
+<h2>Customized conversion<a class="headerlink" href="#customized-conversion" title="Permalink to this headline">¶</a></h2>
+<p>To alter how CSV data is converted to Arrow types and data, you should create
+a <a class="reference internal" href="generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions" title="pyarrow.csv.ConvertOptions"><code class="xref py py-class docutils literal notranslate"><span class="pre">ConvertOptions</span></code></a> instance and pass it to <a class="reference internal" href="generated/pyarrow.csv.read_csv.html#pyarrow.csv.read_csv" title="pyarrow.csv.read_csv"><code class="xref py py-func docutils literal notranslate"><span class="pre">read_csv()</span></code></a>.</p>
+</div>
+<div class="section" id="performance">
+<h2>Performance<a class="headerlink" href="#performance" title="Permalink to this headline">¶</a></h2>
+<p>Due to the structure of CSV files, one cannot expect the same levels of
+performance as when reading dedicated binary formats like
+<a class="reference internal" href="parquet.html#parquet"><span class="std std-ref">Parquet</span></a>.  Nevertheless, Arrow strives to reduce the
+overhead of reading CSV files.</p>
+<p>Performance options can be controlled through the <a class="reference internal" href="generated/pyarrow.csv.ReadOptions.html#pyarrow.csv.ReadOptions" title="pyarrow.csv.ReadOptions"><code class="xref py py-class docutils literal notranslate"><span class="pre">ReadOptions</span></code></a> class.
+Multi-threaded reading is the default for highest performance, distributing
+the workload efficiently over all available cores.</p>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">The number of threads to use concurrently is automatically inferred by Arrow
+and can be inspected using the <a class="reference internal" href="generated/pyarrow.cpu_count.html#pyarrow.cpu_count" title="pyarrow.cpu_count"><code class="xref py py-func docutils literal notranslate"><span class="pre">cpu_count()</span></code></a> function.</p>
+</div>
+</div>
+</div>
+
+
+           </div>
+           
+          </div>
+          <footer>
+  
+    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
+      
+        <a href="parquet.html" class="btn btn-neutral float-right" title="Reading and Writing the Apache Parquet Format" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
+      
+      
+        <a href="pandas.html" class="btn btn-neutral" title="Pandas Integration" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
+      
+    </div>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016-2018 Apache Software Foundation
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    
+    
+      <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
+        <script type="text/javascript" src="../_static/jquery.js"></script>
+        <script type="text/javascript" src="../_static/underscore.js"></script>
+        <script type="text/javascript" src="../_static/doctools.js"></script>
+    
+
+  
+
+  <script type="text/javascript" src="../_static/js/theme.js"></script>
+
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script>
+<script async src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1"></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments);}
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+
+</body>
+</html>
\ No newline at end of file