You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2017/12/15 19:05:01 UTC

[jira] [Comment Edited] (JENA-1449) tdbloader2 handling multi-argument --sort-args

    [ https://issues.apache.org/jira/browse/JENA-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292617#comment-16292617 ] 

Andy Seaborne edited comment on JENA-1449 at 12/15/17 7:04 PM:
---------------------------------------------------------------

Yes, to making the SORT_ARGS the primary way it is handled end-to-end. We can leave {{--sort-args}} have it set {{SORT_ARGS}} as currently, just exported

{noformat}
 -s|--sort-args)
      # Sort arguments
      shift
      SORT_ARGS=$1
      shift
{noformat}

and then not use the argument when the main script calls tdbloader2index (calls to the individual scripts can be useful even if {{--phase}} is preferred).

tdbloader2index already does 
{noformat}SORT_ARGS="${SORT_ARGS:-}"{noformat}
so I wonder if this was intended all along 

That's the easy part :-). The scripts are not just passing "SORT_ARGS" on to the call of sort(1).  The scripts interpret SORT_ARGS to check e.g. there is enough space in the temp directory.  This is presumably based on [~rvesse] experiences' of running the script and/or users running the script. It has been useful to me as well.

{noformat}
# Check where we are storing temporary sort files
debug "Sort Arguments: $SORT_ARGS"
SORT_TEMP_DIR=
if [[ "$SORT_ARGS" == *"-T "* ]]; then
  # Specified via -T argument
  SORT_TEMP_DIR=(${SORT_ARGS/-T /})
  SORT_TEMP_DIR=${SORT_TEMP_DIR[0]}
elif [[ "$SORT_ARGS" == *"--temporary-directory="* ]]; then
  # Specified via --temporary-directory argument
  SORT_TEMP_DIR=(${SORT_ARGS/--temporary-directory=/})
  SORT_TEMP_DIR=${SORT_TEMP_DIR[0]}
else
  # Using the system temp directory
  SORT_TEMP_DIR="$TMPDIR"
fi
{noformat}

which parses SORT_ARGS assuming only one argument in it (and does not protect against metacharacters and spaces in the value {noformat}${SORT_TEMP_DIR[0]}{noformat}

Shell parsing is hard work. One way is to create a subshell, and process argument with {{set -- $SORT_ARGS}} but getting the quoting right is still tricky.




was (Author: andy.seaborne):
Yes, to making the SORT_ARGS the primary way it is handled end-to-end. We can leave {{--sort-args}} have it set {{SORT_ARGS}} as currently, just exported

{noformat}
 -s|--sort-args)
      # Sort arguments
      shift
      SORT_ARGS=$1
      shift
{noformat}

and then not use the argument when the main script calls tdbloader2index (calls to the individual scripts can be useful even if {{--phase}} is preferred).

tdbloader2index already does SORT_ARGS="${SORT_ARGS:-}" so I wonder if this was intended all along 

That's the easy part :-). The scripts are not just passing "SORT_ARGS" on to the call of sort(1).  The scripts interpret SORT_ARGS to check e.g. there is enough space in the temp directory.  This is presumably based on [~rvesse] experiences' of running the script and/or users running the script. It has been useful to me as well.

{noformat}
# Check where we are storing temporary sort files
debug "Sort Arguments: $SORT_ARGS"
SORT_TEMP_DIR=
if [[ "$SORT_ARGS" == *"-T "* ]]; then
  # Specified via -T argument
  SORT_TEMP_DIR=(${SORT_ARGS/-T /})
  SORT_TEMP_DIR=${SORT_TEMP_DIR[0]}
elif [[ "$SORT_ARGS" == *"--temporary-directory="* ]]; then
  # Specified via --temporary-directory argument
  SORT_TEMP_DIR=(${SORT_ARGS/--temporary-directory=/})
  SORT_TEMP_DIR=${SORT_TEMP_DIR[0]}
else
  # Using the system temp directory
  SORT_TEMP_DIR="$TMPDIR"
fi
{noformat}

which parses SORT_ARGS assuming only one argument in it (and does not protect against metacharacters and spaces in the value ${SORT_TEMP_DIR[0]}.

Shell parsing is hard work!



> tdbloader2 handling multi-argument --sort-args
> ----------------------------------------------
>
>                 Key: JENA-1449
>                 URL: https://issues.apache.org/jira/browse/JENA-1449
>             Project: Apache Jena
>          Issue Type: Bug
>            Reporter: Andy Seaborne
>            Priority: Minor
>         Attachments: index.diff, main.diff
>
>
> tdbloader2 does not handle {{--sort-args}} if the argument string for sort(1) has multiple arguments.
> e.g.
> {{--parallel=3 --temporary-directory=mytmp --buffer-size=80%  --compress-program=gzip}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)