You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Dan Bikle <bi...@gmail.com> on 2016/09/25 08:27:25 UTC
How to use Spark-Scala to download a CSV file from the web?
hello spark-world,
How to use Spark-Scala to download a CSV file from the web and load the
file into a spark-csv DataFrame?
Currently I depend on curl in a shell command to get my CSV file.
Here is the syntax I want to enhance:
*/* fb_csv.scalaThis script should load FB prices from
Yahoo.Demo:spark-shell -i fb_csv.scala*/// I should get prices:import
sys.process._"/usr/bin/curl -o /tmp/fb.csv
http://ichart.finance.yahoo.com/table.csv?s=FB
<http://ichart.finance.yahoo.com/table.csv?s=FB>"!import
org.apache.spark.sql.SQLContextval sqlContext = new SQLContext(sc)val fb_df
=
sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("/tmp/fb.csv")fb_df.head(9)*
I want to enhance the above script so it is pure Scala with no shell syntax
inside.
Re: How to use Spark-Scala to download a CSV file from the web?
Posted by Marco Mistroni <mm...@gmail.com>.
Hi
not sure if spark-csv supports the http:// format you use to load data
from the WEB. I just tried this and got exception
scala> val df = sqlContext.read.
| format("com.databricks.spark.csv").
| option("inferSchema", "true").
| load("http://ichart.finance.yahoo.com/table.csv?s=FB")
16/09/25 10:08:09 WARN : Your hostname, MarcoLaptop resolves to a
loopback/non-reachable address: fe80:0:0:0:3c1f:e7b4:c7cc:d2bd%wlan3, but
we couldn't find any external IP address!
java.io.IOException: No FileSystem for scheme: http
But, it supports reading from a csv file, so you could write a spark
program that
1. download your FB data from yahoo (i have code which is doiing exactly
what you are doing and i am using com.github.tototoshi.csv package for
downloading csv data from web)
2 . create an RDD out of that (or a DataFrame)
3. do whatever processing you need
hth
Re: How to use Spark-Scala to download a CSV file from the web?
Posted by Jörn Franke <jo...@gmail.com>.
Use a tool like flume and/or oozie to reliable download files from http and do error handling (e.g. Requests time out). Afterwards process the data with spark.
> On 25 Sep 2016, at 10:27, Dan Bikle <bi...@gmail.com> wrote:
>
> hello spark-world,
>
> How to use Spark-Scala to download a CSV file from the web and load the file into a spark-csv DataFrame?
>
> Currently I depend on curl in a shell command to get my CSV file.
>
> Here is the syntax I want to enhance:
>
> /* fb_csv.scala
> This script should load FB prices from Yahoo.
>
> Demo:
> spark-shell -i fb_csv.scala
> */
>
> // I should get prices:
> import sys.process._
> "/usr/bin/curl -o /tmp/fb.csv http://ichart.finance.yahoo.com/table.csv?s=FB"!
>
> import org.apache.spark.sql.SQLContext
>
> val sqlContext = new SQLContext(sc)
>
> val fb_df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("/tmp/fb.csv")
>
> fb_df.head(9)
>
> I want to enhance the above script so it is pure Scala with no shell syntax inside.
>