You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by rvesse <gi...@git.apache.org> on 2015/06/30 17:39:46 UTC

[GitHub] jena pull request: [JENA-977] tdbloader2 script rewrite

GitHub user rvesse opened a pull request:

    https://github.com/apache/jena/pull/84

    [JENA-977] tdbloader2 script rewrite

    This pull request contains a substantial rewrite of the `tdbloader2` scripts to make them more user friendly, flexible and robust.
    
    ## Dev Environment Notes
    
    Previously it was a pain to run the scripts in a dev environment because they assume a class path of `$JENA_HOME/lib/*` which does not exist in a dev environment.  Therefore the POM for the distribution module was updated to use the maven dependencies plugin to generate the `lib/` directory during a `package` phase and clean it up during the `clean` phase which makes it much easier to set `JENA_HOME` to your working copy distribution module directory and enhance the scripts.
    
    ## Script Changes
    
    The script changes are fairly extensive covering a number of areas.  The existing two scripts were split into four:
    
    - `tdbloader2` - Main entry point which coordinates running the other scripts
    - `tdbloader2data` - Script which runs the data phase of the build
    - `tdbloader2index` - Script which runs the index phase of the build
    - `tdbloader2common` - Script which provides functions common to all scripts
    
    The now defunct `tdbloader2worker` script was removed, there was also outdated and broken scripts in `jena-tdb/bin/` which were also removed
    
    ### Symbolic Link and relative path handling
    
    In rewriting the scripts some bugs with current treatment of `JENA_HOME` were addressed:
    
    - If `JENA_HOME` is not set it tries to locate it from the scripts path but if the script is symbolic linked then it uses `readlink -f` however the `-f` option has completely different meaning on BSD/OS X so could fail in some cases.  The scripts now all contain a `resolveLink` function which handles the OS specific behaviour appropriately.
    - If `JENA_HOME` is itself set to a symbolic link then the scripts could fail to invoke the other scripts, if `JENA_HOME` is a symbolic link it is now resolved appropriately
    
    There were also similar bugs that could occur if the database location given or data file paths were themselves relative and/or symbolic links.  At various points the scripts will now resolve symbolic links and make paths absolute which makes the scripts less error prone.
    
    ### Option Handling
    
    The scripts now all support a variety of user friendly options and has built-in help for those.  The main script `tdbloader2` accepts all the options and handles passing relevant options through to the appropriate child scripts as necessary.
    
    All options that previously were only exposed via environment variables are now exposed as command line options.  For some the existing environment variables (`JVM_ARGS` and `SORT_ARGS`) are still honoured if these options are not otherwise specified.
    
    Each of the tdbloader2 scripts now provides a `printUsage` function which contains a detailed and user-friendly help summary.  A user can view this by running with the `-h` or `--help` option on each script.
    
    ### Incremental Builds
    
    A `--phase` option is now supported on `tdbloader2` which takes a value of `all`, `data` or `index`.  `all` does a full build and is the default behaviour if phase is omitted.
    
    The other two perform the appropriately named phase of the build.  This allows a build to be done in smaller incremental steps and also allows for the index phase of the build to be restarted which is useful because in my experience if you get past the data phase then the index phase has far more scope for error.
    
    ### Indexing Improvements
    
    There have been a lot of improvements made to the indexing scripts:
    
    - Warns if it looks like the disk where sort is storing temporary files may be too full
    - Aborts if there is insufficient free disk space to sort an input file
    - Warns if a given sort is likely to be external, adds additional warnings if the same sort may be short of disk space on the disk where sort is storing temporary files
    - Provides progress reporting for sort when running in the foreground provided that the `pv` ([PipeViewer](http://www.ivarch.com/programs/pv.shtml)) tool is available
    
    ### Debugging
    
    All scripts now support `--debug` and `--trace` options which add extra output
    
    - `--debug` will add various additional debugging output during a build 
    - `--trace` will set `set -x` on the scripts
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/jena JENA-977

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/84.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #84
    
----
commit d92e336263da3f0f2a58dfc24cb9b5f23449cc5c
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-25T15:56:29Z

    Initial work on refactoring tdbloader2 scripts (JENA-977)
    
    - Better option processing
    - Split tdbloader2worker into a data and index phase script
    - Support only running a specific phase

commit 7b61a144854d81acbd180b5debfd5c8638d2af57
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-25T16:04:36Z

    Further tweak new tdbloader2 scripts (JENA-977)
    
    - Add proper usage to tdbloader2
    - Check for temporary data files needed for index phase in
      tdbloader2index

commit a96b0164c43142791ac030e5332b3f54df6fb4ba
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-26T11:25:57Z

    Further refactoring of tdbloader2 scripts (JENA-977)
    
    - Proper usage summaries in all scripts
    - -k/--keep-work option instead of hidden environment variable
      for keeping work
    - Short forms for all options

commit 7770596bc94613409fe2753240b603ae22a38b57
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-26T15:15:18Z

    Various further improvements to the scripts (JENA-977)
    
    - Validate sort temporary directory when indexing and WARN if the disk
      it is on is low on space (10% or less free)
    - Support --debug and --trace flags in all scripts, add various debug
      output throughout scripts
    - Fix a bug with not detecting sort failure when pv is used to monitor
      progress
    - Fix a bug in size calculations used for progress monitoring and sort
      failure detection
    
    This commit includes some temporary DEV changes that will be reverted
    later

commit 3c59213e273711836628d9d030df23dac142ee1b
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-29T12:12:03Z

    Fix script usage in dev environment (JENA-977)
    
    This commit enhances the distribution module to make it much easier to
    use in dev environments.  The dependency plugin is used with the
    copy-dependencies goal to produce the lib/ directory during a package
    phase and then clean plugin is configured to clean the lib/ directory
    during a clean.  This means that developers can now set JENA_HOME to the
    distribution module directory in their working copy and provided they
    have done a mvn package all the scripts should work.
    
    This also allows the temporary hacks in the new tdbloader2 scripts to be
    removed so these scripts now run against Jena 3 libraries and don't need
    the path to the new scripts to be hacked.

commit c55c1f74b4571eee2c9e333967b5671e862adff7
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-29T16:21:18Z

    Further refactoring of tdbloader2 scripts (JENA-977)
    
    - Move common functions into tdbloader2common script
    - Remove duplicated definitions from other scripts and source in the new
      common script
    - Add helper function for getting drive information
    - Add check in tdbloader2index script which will abort the build if
      there is insufficient free space to sort the data file since the
      sorted output will be same size in the input so if there are fewer
      bytes free than the size of the input we can abort early

commit a7ac2797856bf60476204b8997b5a5bf4cfa15c5
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-30T12:44:29Z

    Further improvements to tdbloader2 scripts (JENA-977)
    
    - Auto-detection of JENA_HOME now exports it so it is visible to the
      child scripts
    - Force making database directory path absolute and resolving any
      symbolic links in the path
    - Additional checks in tdbloader2index to warn if sort is going to be
      external and it may run out of temporary disk space for the sort

commit cc4a80ac3c44d738a8904ac91b1ece71b446d74a
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-30T13:25:46Z

    Check for return codes from children in tdbloader2 (JENA-977)
    
    Ensures that the main script checks for the return code of the child
    scripts and aborts if they fail

commit d4a0bc50a6d82ab5bbb43ab90e65216e5b165621
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-30T14:04:50Z

    Finish up first pass of work on tdbloader2 script refactoring (JENA-977)
    
    - Add options for setting the JVM and sort arguments that do not rely on
      environment variables.  NB - For backwards compatibility the existing
      environment variables are still honoured if the new command line
      options are not used
    - Improve some error messages
    - Explicitly support -- for separating data files from options for cases
      where file names may be confused

commit f64dbdcb6ac77cfb6654916e43797fdca3d4fb5c
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-30T14:33:09Z

    Ensure data file paths are absolute (JENA-977)
    
    This commit improves the tdbloader2 script to ensure that data file
    paths are made absolute and any symbolic links are resolved.

commit d9ff26ec96b6cbf15d6649704dbcfe7f1d8d09eb
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-30T14:59:33Z

    Fix bug where JENA_HOME is a symbolic link (JENA-977)
    
    This commit fixes a bug that can occur when JENA_HOME is a symbolic
    link, the scripts need to resolve the link as otherwise they cannot
    source the common function scripts successfully.
    
    Scripts now also bail out if they can't find the common functions script
    to source.

commit c25ad5d800779ca829a7bde581f98d62c417719b
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-30T15:04:42Z

    Minor clean up of OS type testing (JENA-977)

commit 12dc2cc66640e432a4e2f5b45ebf2fb16c995440
Author: Rob Vesse <rv...@apache.org>
Date:   2015-06-30T15:08:52Z

    Final pieces of tdbloader2 script clean up (JENA-977)
    
    - Fix white space inconsistencies in tdbloader2 scripts
    - Removed defunct tdbloader2worker script
    - Removed defunct and broken scripts from jena-tdb/bin/

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: [JENA-977] tdbloader2 script rewrite

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on the pull request:

    https://github.com/apache/jena/pull/84#issuecomment-117563015
  
    Good point, I have now applied this fix to `template.bin` and regenerated all the scripts.  I also updated `cmd-maker` slightly since `tdbloader2` is no longer based on the template at all


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: [JENA-977] tdbloader2 script rewrite

Posted by afs <gi...@git.apache.org>.
Github user afs commented on the pull request:

    https://github.com/apache/jena/pull/84#issuecomment-117264364
  
    The find `JENA_HOME` fragment was from `apache-jena/template.bin` -- should that be updated as well? (and the scripts regenerated).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: [JENA-977] tdbloader2 script rewrite

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/84


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] jena pull request: [JENA-977] tdbloader2 script rewrite

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on the pull request:

    https://github.com/apache/jena/pull/84#issuecomment-118903151
  
    @afs Any further comments or should I go ahead and merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---