You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "sgilmore10 (via GitHub)" <gi...@apache.org> on 2023/05/30 14:46:08 UTC

[GitHub] [arrow] sgilmore10 opened a new pull request, #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

sgilmore10 opened a new pull request, #35827:
URL: https://github.com/apache/arrow/pull/35827

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   This change lets users control toggle the automatic null-value detection behavior. By default, values MATLAB considers to be missing  (e.g. `NaN` for `double`, `<missing>` for `string`, and `NaT` for `datetime`) will be treated as `null` values. Users can toggle this behavior on and off using the `InferNulls` name-value pair. 
   
   **Example**
   ```matlab
   >> matlabArray = [1 NaN 3]'
   
   matlabArray =
   
        1
        NaN
        3
         
   % Treat NaN as a null value
    >> arrowArray1 = arrow.array.Float64Array(maltabArray, InferNulls=true)
   
   arrowArray1 = 
   
   [
     1,
     null,
     3
   ]
   
   % Don't treat NaN as a null value 
    >> arrowArray2 = arrow.array.Float64Array(maltabArray, InferNulls=false)
      
   arrowArray2 = 
   
   [
     1,
     nan,
     3
   ]
   
   ```
   We've only added this nv-pair to `arrow.array.Float64Array` for now. We'll add this nv-pair to the other types in a followup changelist.
   
   ### What changes are included in this PR?
   
   1. Added `InferNulls` name-value pair to `arrow.array.Float64Array`.
   2. Added common validation function `arrow.args.validateTypeAndShape` to remove duplicate validation code among the numeric classes.
   3. Added a functioned called `arrow.args.parseValidElements` that the `arrow.array.<Type>Array` classes will be able to share for generating the logical mask of valid elements.
   
   ### Are these changes tested?
   
   Yes, we added a test pointed called `InferNulls` to  the test class`tFloat64Array.m`.
   
   ### Are there any user-facing changes?
   
   Yes, users can know control how `NaN` values are treated when creating an `arrow.array.Float64Array`.
   
   ### Future Directions
   
   1. Add a name-value pair called `ValidIndices` or `NullIndices` to allow users to supply the list of valid elements themselves.
   2. Extend null support to other numeric types.
   3. We've been working on adding error-handling support to `mathworks/libmexclass`. We have a prototype to do this using status-like and result-like objects already pushed to a [branch](https://github.com/mathworks/libmexclass/tree/33). Once this branch is merged with the `main` branch of `mathworks/libmexclass`, we'll port it over.
   
   
   ### Notes
   
   Thank you @kevingurney for all the help with this PR! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sgilmore10 commented on a diff in pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "sgilmore10 (via GitHub)" <gi...@apache.org>.
sgilmore10 commented on code in PR #35827:
URL: https://github.com/apache/arrow/pull/35827#discussion_r1210934445


##########
matlab/src/matlab/+arrow/+args/parseValidElements.m:
##########
@@ -0,0 +1,28 @@
+function validElements = parseValidElements(data, detectNulls, nullDetectionFcn)
+    % Returns a logical vector of the validElements in data. If inferNulls

Review Comment:
   Good catch. I thought I renamed the parameter, but I guess I did not. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on a diff in pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on code in PR #35827:
URL: https://github.com/apache/arrow/pull/35827#discussion_r1210796082


##########
matlab/src/matlab/+arrow/+args/parseValidElements.m:
##########
@@ -0,0 +1,28 @@
+function validElements = parseValidElements(data, detectNulls, nullDetectionFcn)
+    % Returns a logical vector of the validElements in data. If inferNulls

Review Comment:
   ```suggestion
       % Returns a logical vector of the validElements in data. If detectNulls
   ```
   
   A caller uses `inferNulls`. So we may want to rename `detectNulls` in this function to `inferNulls`...



##########
matlab/src/matlab/+arrow/+args/parseValidElements.m:
##########
@@ -0,0 +1,28 @@
+function validElements = parseValidElements(data, detectNulls, nullDetectionFcn)
+    % Returns a logical vector of the validElements in data. If inferNulls
+    % is true, calls ismissing on data to determine which elements are 
+    % null.
+
+    % Licensed to the Apache Software Foundation (ASF) under one or more
+    % contributor license agreements.  See the NOTICE file distributed with
+    % this work for additional information regarding copyright ownership.
+    % The ASF licenses this file to you under the Apache License, Version
+    % 2.0 (the "License"); you may not use this file except in compliance
+    % with the License.  You may obtain a copy of the License at
+    %
+    %   http://www.apache.org/licenses/LICENSE-2.0
+    %
+    % Unless required by applicable law or agreed to in writing, software
+    % distributed under the License is distributed on an "AS IS" BASIS,
+    % WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+    % implied.  See the License for the specific language governing
+    % permissions and limitations under the License.

Review Comment:
   Can we move this license header to the top level?
   
   ```matlab
   % Licensed to the Apache ...
   % ...
   
   function validElements = ...
   ```



##########
matlab/src/matlab/+arrow/+args/parseValidElements.m:
##########
@@ -0,0 +1,28 @@
+function validElements = parseValidElements(data, detectNulls, nullDetectionFcn)

Review Comment:
   It seems that `nullDetectionFcn` isn't used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou merged pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou merged PR #35827:
URL: https://github.com/apache/arrow/pull/35827


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sgilmore10 commented on a diff in pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "sgilmore10 (via GitHub)" <gi...@apache.org>.
sgilmore10 commented on code in PR #35827:
URL: https://github.com/apache/arrow/pull/35827#discussion_r1210934130


##########
matlab/src/matlab/+arrow/+args/parseValidElements.m:
##########
@@ -0,0 +1,28 @@
+function validElements = parseValidElements(data, detectNulls, nullDetectionFcn)

Review Comment:
   That's a mistake on my part. I meant to delete it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35827:
URL: https://github.com/apache/arrow/pull/35827#issuecomment-1568568988

   :warning: GitHub issue #35676 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sgilmore10 commented on a diff in pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "sgilmore10 (via GitHub)" <gi...@apache.org>.
sgilmore10 commented on code in PR #35827:
URL: https://github.com/apache/arrow/pull/35827#discussion_r1210939664


##########
matlab/src/matlab/+arrow/+args/parseValidElements.m:
##########
@@ -0,0 +1,28 @@
+function validElements = parseValidElements(data, detectNulls, nullDetectionFcn)
+    % Returns a logical vector of the validElements in data. If inferNulls
+    % is true, calls ismissing on data to determine which elements are 
+    % null.
+
+    % Licensed to the Apache Software Foundation (ASF) under one or more
+    % contributor license agreements.  See the NOTICE file distributed with
+    % this work for additional information regarding copyright ownership.
+    % The ASF licenses this file to you under the Apache License, Version
+    % 2.0 (the "License"); you may not use this file except in compliance
+    % with the License.  You may obtain a copy of the License at
+    %
+    %   http://www.apache.org/licenses/LICENSE-2.0
+    %
+    % Unless required by applicable law or agreed to in writing, software
+    % distributed under the License is distributed on an "AS IS" BASIS,
+    % WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+    % implied.  See the License for the specific language governing
+    % permissions and limitations under the License.

Review Comment:
   Will do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sgilmore10 commented on pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "sgilmore10 (via GitHub)" <gi...@apache.org>.
sgilmore10 commented on PR #35827:
URL: https://github.com/apache/arrow/pull/35827#issuecomment-1573941957

   I think I've addressed all the current feedback. Let me know if we feel anything else needs to be changed.
   
   Thanks!
   Sarah


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35827:
URL: https://github.com/apache/arrow/pull/35827#issuecomment-1575466429

   Benchmark runs are scheduled for baseline = 8878fc8161db50eb245d14d96d3d057981173d8a and contender = f44f7685638e4afb97f8d6d536d8538b5292b47d. f44f7685638e4afb97f8d6d536d8538b5292b47d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/125af350767d4742a69483a0e30bb749...21f7d06d83284217bb382dc3ff5df7b9/)
   [Finished :arrow_down:1.8% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/37b261ca16e647aa96e776df3ea4ab7e...492672e829644d50b05170d26553069c/)
   [Finished :arrow_down:0.98% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/ce3daf8904e64b56af46d150f2acf9a6...6d82a4f3a13240529b294c9602a80554/)
   [Finished :arrow_down:1.6% :arrow_up:0.48%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/34c5b58ab31b45e193a88b48e63815af...b7d14591347a4b67b73a94811e6df198/)
   Buildkite builds:
   [Finished] [`f44f7685` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2978)
   [Finished] [`f44f7685` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/3014)
   [Finished] [`f44f7685` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2979)
   [Finished] [`f44f7685` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/3004)
   [Finished] [`8878fc81` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2977)
   [Finished] [`8878fc81` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/3013)
   [Finished] [`8878fc81` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2978)
   [Finished] [`8878fc81` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/3003)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sgilmore10 commented on a diff in pull request #35827: GH-35676: [MATLAB] Add an `InferNulls` name-value pair for controlling null value inference during construction of `arrow.array.Array`

Posted by "sgilmore10 (via GitHub)" <gi...@apache.org>.
sgilmore10 commented on code in PR #35827:
URL: https://github.com/apache/arrow/pull/35827#discussion_r1210936274


##########
matlab/src/matlab/+arrow/+args/parseValidElements.m:
##########
@@ -0,0 +1,28 @@
+function validElements = parseValidElements(data, detectNulls, nullDetectionFcn)
+    % Returns a logical vector of the validElements in data. If inferNulls

Review Comment:
   I renamed the parameter to `inferNulls`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org