You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/24 16:21:05 UTC

[GitHub] [arrow-datafusion] DataPsycho opened a new pull request, #4360: Adding more dataframe example to read csv files

DataPsycho opened a new pull request, #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360

   # Which issue does this PR close?
   
   Not related to pr but to improve document
   
   Closes #.
   
   # Rationale for this change
   
   Did not proposed any change just added example
   
   # What changes are included in this PR?
   
   change in the dataframe example file
   
   # Are these changes tested?
   
   Current implementation does not have any test to run
   
   # Are there any user-facing changes?
   
   No
   
   No Breaking change in public API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

alamb commented on PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#issuecomment-1330944183

   Thanks again @DataPsycho !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

alamb merged PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032785785


##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,47 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = r#"id,time,vote,unixtime,rating
+    a1,\"10 6, 2013\",3,1381017600,5.0
+    a2,\"08 9, 2013\",2,1376006400,4.5"#;
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");
+    // Create a session context
+    let ctx = SessionContext::new();
+    // Register a lazy DataFrame using the context
+    let df = ctx.read_csv(path, CsvReadOptions::default()).await.unwrap();
+    df
+}
+
+// Example to read csv file with a given csv file
+async fn example_read_csv_file_with_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = r#"id,time,vote,unixtime,rating
+    a1,\"10 6, 2013\",3,1381017600,5.0
+    a2,\"08 9, 2013\",2,1376006400,4.5"#;
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");
+    // Create a session context
+    let ctx = SessionContext::new();
+    // Define the schema
+    let schema = Schema::new(vec![
+        Field::new("id", DataType::Utf8, false),
+        Field::new("time", DataType::Utf8, false),
+        Field::new("vote", DataType::Int32, true),
+        Field::new("unixtime", DataType::Int64, false),
+        Field::new("rating", DataType::Float32, true),
+    ]);
+    // Create a csv option provider
+    let mut csv_read_option = CsvReadOptions::default();
+    // Update the option provider with the defined schema
+    csv_read_option.schema = Some(&schema);

Review Comment:
   Committed, suggested Changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1033684577


##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,66 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = r#"id,time,vote,unixtime,rating
+    a1,\"10 6, 2013\",3,1381017600,5.0
+    a2,\"08 9, 2013\",2,1376006400,4.5"#;
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");

Review Comment:
   Sure thing. As my first commit, I just want to add some stuff. I have a lot more examples to add then I will update it and create a separate function for each type of data files like excel, json etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

alamb commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032785312


##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,47 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = r#"id,time,vote,unixtime,rating
+    a1,\"10 6, 2013\",3,1381017600,5.0
+    a2,\"08 9, 2013\",2,1376006400,4.5"#;
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");
+    // Create a session context
+    let ctx = SessionContext::new();
+    // Register a lazy DataFrame using the context
+    let df = ctx.read_csv(path, CsvReadOptions::default()).await.unwrap();
+    df
+}
+
+// Example to read csv file with a given csv file
+async fn example_read_csv_file_with_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = r#"id,time,vote,unixtime,rating
+    a1,\"10 6, 2013\",3,1381017600,5.0
+    a2,\"08 9, 2013\",2,1376006400,4.5"#;
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");
+    // Create a session context
+    let ctx = SessionContext::new();
+    // Define the schema
+    let schema = Schema::new(vec![
+        Field::new("id", DataType::Utf8, false),
+        Field::new("time", DataType::Utf8, false),
+        Field::new("vote", DataType::Int32, true),
+        Field::new("unixtime", DataType::Int64, false),
+        Field::new("rating", DataType::Float32, true),
+    ]);
+    // Create a csv option provider
+    let mut csv_read_option = CsvReadOptions::default();
+    // Update the option provider with the defined schema
+    csv_read_option.schema = Some(&schema);

Review Comment:
   If you wanted to use a slightly more idiomatic rust syntax (and avoid `mut`), you could do:
   
   ```suggestion
       // Create a csv option provider with the desired schema 
       let csv_read_option = CsvReadOptions {
         // Update the option provider with the defined schema
         schema: Some(&schema),
         ..default::Default()
       };
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032640050


##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");
+    // Create a session context and create a lazy

Review Comment:
   Updated as suggested. and Completed the incomplete instruction



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

ursabot commented on PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#issuecomment-1330956784

   Benchmark runs are scheduled for baseline = e4d790d495d65a23e0a7dc2994786c59bfe5d66c and contender = fa4bea871086db70a8d19820a2f266de826836e1. fa4bea871086db70a8d19820a2f266de826836e1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d3e9629c9fa44ba69f60a8fdd6f85b35...7298ca8b049542f5b78d4b293cdc350f/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/dfe16e9a1ae54580ac5c202ca5cf2883...d4815997bf3542af8785fe0e57335647/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c6750a3ae80f4a4faa178aecfec06fe3...f49c2361daed4f918f99f2061b20e298/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f9bb16508ad24dc9be63098cee15cf50...bfdd8e16a101423b8632c90b9cdce3af/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] DataPsycho commented on a diff in pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

DataPsycho commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1033722661


##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,66 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = r#"id,time,vote,unixtime,rating
+    a1,\"10 6, 2013\",3,1381017600,5.0
+    a2,\"08 9, 2013\",2,1376006400,4.5"#;
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");

Review Comment:
   I have refactored it according to your suggestion. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

andygrove commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1033661515


##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,66 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = r#"id,time,vote,unixtime,rating
+    a1,\"10 6, 2013\",3,1381017600,5.0
+    a2,\"08 9, 2013\",2,1376006400,4.5"#;
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");

Review Comment:
   minor nit: this code is repeated a few times and could potentially be moved to its own function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] martin-g commented on a diff in pull request #4360: Adding more dataframe example to read csv files

Posted by GitBox <gi...@apache.org>.

martin-g commented on code in PR #4360:
URL: https://github.com/apache/arrow-datafusion/pull/4360#discussion_r1032439566


##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");
+    // Create a session context and create a lazy

Review Comment:
   `... create a lazy ???`
   the sentence seems unfinished



##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";

Review Comment:
   ```suggestion
       let content = r#"id,time,vote,unixtime,rating
       a1,\"10 6, 2013\",3,1381017600,5.0
       a2,\"08 9, 2013\",2,1376006400,4.5"#;
   ```



##########
datafusion-examples/examples/dataframe.rs:
##########
@@ -41,3 +44,37 @@ async fn main() -> Result<()> {
 
     Ok(())
 }
+
+// Example to read data from a csv file with inferred schema
+async fn example_read_csv_file_with_inferred_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";
+    // write the data
+    fs::write(path, content).expect("Problem with writing file!");
+    // Create a session context and create a lazy
+    let ctx = SessionContext::new();
+    let df = ctx.read_csv(path, CsvReadOptions::default()).await.unwrap();
+    df
+}
+
+// Example to read csv file with a given csv file
+async fn example_read_csv_file_with_schema() -> Arc<DataFrame> {
+    let path = "example.csv";
+    // Create the data to put into the csv file with headers
+    let content = "id,time,vote,unixtime,rating\na1,\"10 6, 2013\",3,1381017600,5.0\na2,\"08 9, 2013\",2,1376006400,4.5";

Review Comment:
   ```suggestion
       let content = r#"id,time,vote,unixtime,rating
       a1,\"10 6, 2013\",3,1381017600,5.0
       a2,\"08 9, 2013\",2,1376006400,4.5"#;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org