You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "dadepo (via GitHub)" <gi...@apache.org> on 2023/05/30 12:36:37 UTC

[GitHub] [arrow-datafusion] dadepo opened a new issue, #6485: Create a scalar from array of type Map

dadepo opened a new issue, #6485:
URL: https://github.com/apache/arrow-datafusion/issues/6485

   ### Is your feature request related to a problem or challenge?
   
   I am not sure if this is a bug, feature request or things are working as it should.
   
   How can I fix an error of this kind
   
   Background, I have a UDF that mimics json_build_object from postgres. This UDF returns a MapArray as the returning value.
   
   With a table like
   
   ```
   let df = ctx.sql(r#"
   select * from test"#).await?;
   df.show().await?;
   
   +-----------+-----------+-----------+
   | clientid  | name      | parentid  |
   +-----------+-----------+-----------+
   | c-string1 | n-string1 | p-string1 |
   | c-string2 | n-string2 | p-string2 |
   +-----------+-----------+-----------+
   ```
   
   I can build a json like this fine
   
   ```
           let df = ctx.sql(r#"
           select json_build_object('name', name, 'type', 'test')
           from test"#).await?;
           df.show().await?;
   
   +---------------------------------------------------------------------+
   | json_build_object(Utf8("name"),test.name,Utf8("type"),Utf8("test")) |
   +---------------------------------------------------------------------+
   | {name: n-string1, type: test}                                       |
   | {name: n-string2, type: test}                                       |
   +---------------------------------------------------------------------+
   ```
   
   But if I remove name or any other column that exist in the table, I get the error
   
   ```
           let df = ctx.sql(r#"
           select json_build_object('type', 'test')
           from test"#).await?;
           df.show().await?;
   
   Error: This feature is not implemented: Can't create a scalar from array of type "Map(Field { name: "entries", data_type: Struct([Field { name: "keys", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "values", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)"
   ```
   
   A simple work around will be to always include a column from the table in the query, but unfortunately i cannot guarantee this. Especially when I have use case where json_build_object works on computed values or values created via other udfs.
   
   How can I make my UDF not dependent on the present of a column? For example I can use the make_array function in this way. That is, I can use it without including a column from the table
   
   ```
           let df = ctx.sql(r#"
           select make_array('type', 'test')
           from test"#).await?;
           df.show().await?;
   
   +--------------------------------------+
   | makearray(Utf8("type"),Utf8("test")) |
   +--------------------------------------+
   | [type, test]                         |
   | [type, test]                         |
   +--------------------------------------+
   ```
   
   ### Describe the solution you'd like
   
   Ability to have a UDF not dependent on the present of a column, generate as much results in a column as needed in the resulting record batch 
   
   ### Describe alternatives you've considered
   
   if I change the return type to UTF8 instead of Map, that is
      
   ```
   let json_build_object_return_type: ReturnTypeFunction = Arc::new(move |_| Ok(Arc::new(DataType::Utf8)));
   ```
   
   And i return `StringArray` from the UDF, I get the behaviour I am looking for. But this means the json structure returned from the UDF get formatted into strings, which makes processing more difficult down the line as I end up with values like this
   
   ```
   let value = r#"\"{\\\"key_one\\\":\\\"val_one\\\"}\""#;
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org