You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/24 16:21:05 UTC
[GitHub] [arrow-datafusion] DataPsycho opened a new pull request, #4360: Adding more dataframe example to read csv files
DataPsycho opened a new pull request, #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360
# Which issue does this PR close?
Not related to pr but to improve document
Closes #.
# Rationale for this change
Did not proposed any change just added example
# What changes are included in this PR?
change in the dataframe example file
# Are these changes tested?
Current implementation does not have any test to run
# Are there any user-facing changes?
No
No Breaking change in public API.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#issuecomment-1330944183
Thanks again @DataPsycho !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb merged pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
alamb merged PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032785785
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,47 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = r#"id,time,vote,unixtime,rating
+ a1,\"10 6, 2013\",3,1381017600,5.0
+ a2,\"08 9, 2013\",2,1376006400,4.5"#;
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
+ // Create a session context
+ let ctx = SessionContext::new();
+ // Register a lazy DataFrame using the context
+ let df = ctx.read_csv(path, CsvReadOptions::default()).await.unwrap();
+ df
+}
+
+// Example to read csv file with a given csv file
+async fn example_read_csv_file_with_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = r#"id,time,vote,unixtime,rating
+ a1,\"10 6, 2013\",3,1381017600,5.0
+ a2,\"08 9, 2013\",2,1376006400,4.5"#;
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
+ // Create a session context
+ let ctx = SessionContext::new();
+ // Define the schema
+ let schema = Schema::new(vec![
+ Field::new("id", DataType::Utf8, false),
+ Field::new("time", DataType::Utf8, false),
+ Field::new("vote", DataType::Int32, true),
+ Field::new("unixtime", DataType::Int64, false),
+ Field::new("rating", DataType::Float32, true),
+ ]);
+ // Create a csv option provider
+ let mut csv_read_option = CsvReadOptions::default();
+ // Update the option provider with the defined schema
+ csv_read_option.schema = Some(&schema);
Review Comment:
Committed, suggested Changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1033684577
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,66 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = r#"id,time,vote,unixtime,rating
+ a1,\"10 6, 2013\",3,1381017600,5.0
+ a2,\"08 9, 2013\",2,1376006400,4.5"#;
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
Review Comment:
Sure thing. As my first commit, I just want to add some stuff. I have a lot more examples to add then I will update it and create a separate function for each type of data files like excel, json etc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032785312
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,47 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = r#"id,time,vote,unixtime,rating
+ a1,\"10 6, 2013\",3,1381017600,5.0
+ a2,\"08 9, 2013\",2,1376006400,4.5"#;
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
+ // Create a session context
+ let ctx = SessionContext::new();
+ // Register a lazy DataFrame using the context
+ let df = ctx.read_csv(path, CsvReadOptions::default()).await.unwrap();
+ df
+}
+
+// Example to read csv file with a given csv file
+async fn example_read_csv_file_with_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = r#"id,time,vote,unixtime,rating
+ a1,\"10 6, 2013\",3,1381017600,5.0
+ a2,\"08 9, 2013\",2,1376006400,4.5"#;
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
+ // Create a session context
+ let ctx = SessionContext::new();
+ // Define the schema
+ let schema = Schema::new(vec![
+ Field::new("id", DataType::Utf8, false),
+ Field::new("time", DataType::Utf8, false),
+ Field::new("vote", DataType::Int32, true),
+ Field::new("unixtime", DataType::Int64, false),
+ Field::new("rating", DataType::Float32, true),
+ ]);
+ // Create a csv option provider
+ let mut csv_read_option = CsvReadOptions::default();
+ // Update the option provider with the defined schema
+ csv_read_option.schema = Some(&schema);
Review Comment:
If you wanted to use a slightly more idiomatic rust syntax (and avoid `mut`), you could do:
```suggestion
// Create a csv option provider with the desired schema
let csv_read_option = CsvReadOptions {
// Update the option provider with the defined schema
schema: Some(&schema),
..default::Default()
};
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032640050
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
+ // Create a session context and create a lazy
Review Comment:
Updated as suggested. and Completed the incomplete instruction
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] ursabot commented on pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#issuecomment-1330956784
Benchmark runs are scheduled for baseline = e4d790d495d65a23e0a7dc2994786c59bfe5d66c and contender = fa4bea871086db70a8d19820a2f266de826836e1. fa4bea871086db70a8d19820a2f266de826836e1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d3e9629c9fa44ba69f60a8fdd6f85b35...7298ca8b049542f5b78d4b293cdc350f/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/dfe16e9a1ae54580ac5c202ca5cf2883...d4815997bf3542af8785fe0e57335647/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c6750a3ae80f4a4faa178aecfec06fe3...f49c2361daed4f918f99f2061b20e298/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f9bb16508ad24dc9be63098cee15cf50...bfdd8e16a101423b8632c90b9cdce3af/)
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1033722661
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,66 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = r#"id,time,vote,unixtime,rating
+ a1,\"10 6, 2013\",3,1381017600,5.0
+ a2,\"08 9, 2013\",2,1376006400,4.5"#;
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
Review Comment:
I have refactored it according to your suggestion. Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1033661515
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,66 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = r#"id,time,vote,unixtime,rating
+ a1,\"10 6, 2013\",3,1381017600,5.0
+ a2,\"08 9, 2013\",2,1376006400,4.5"#;
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
Review Comment:
minor nit: this code is repeated a few times and could potentially be moved to its own function
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] martin-g commented on a diff in pull request #4360: Adding more dataframe example to read csv files
Posted by GitBox <gi...@apache.org>.
martin-g commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032439566
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
+ // Create a session context and create a lazy
Review Comment:
`... create a lazy ???`
the sentence seems unfinished
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
Review Comment:
```suggestion
let content = r#"id,time,vote,unixtime,rating
a1,\"10 6, 2013\",3,1381017600,5.0
a2,\"08 9, 2013\",2,1376006400,4.5"#;
```
##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
Ok(())
}
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
+ // write the data
+ fs::write(path, content).expect("Problem with writing file!");
+ // Create a session context and create a lazy
+ let ctx = SessionContext::new();
+ let df = ctx.read_csv(path, CsvReadOptions::default()).await.unwrap();
+ df
+}
+
+// Example to read csv file with a given csv file
+async fn example_read_csv_file_with_schema() -> Arc<DataFrame> {
+ let path = "example.csv";
+ // Create the data to put into the csv file with headers
+ let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
Review Comment:
```suggestion
let content = r#"id,time,vote,unixtime,rating
a1,\"10 6, 2013\",3,1381017600,5.0
a2,\"08 9, 2013\",2,1376006400,4.5"#;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org