You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by afs <gi...@git.apache.org> on 2018/05/20 18:29:38 UTC

[GitHub] jena pull request #424: JENA-1550: TDB2 loader

GitHub user afs opened a pull request:

    https://github.com/apache/jena/pull/424

    JENA-1550: TDB2 loader

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/afs/jena tdb2-loader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #424
    
----
commit 2b87a3da593411190bce8793cf520dd52901d15f
Author: Andy Seaborne <an...@...>
Date:   2018-05-20T12:16:13Z

    Rename to avoid overload with TDB "Loader".

commit f5d9e6fa1a21f69b5c8cca6282614402c6d20b4b
Author: Andy Seaborne <an...@...>
Date:   2018-05-20T16:43:17Z

    JENA-1550: TDB2 Bulk Loaders

----


---

[GitHub] jena pull request #424: JENA-1550: TDB2 loader

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on a diff in the pull request:

    https://github.com/apache/jena/pull/424#discussion_r189540060
  
    --- Diff: jena-base/src/main/java/org/apache/jena/atlas/lib/Timer.java ---
    @@ -69,4 +69,13 @@ static public String timeStr(long timeInterval) {
         protected String timeStr(long timePoint, long startTimePoint) {
             return timeStr(timePoint - startTimePoint) ;
         }
    +
    +    /** Time an operation. Return the elapsed time in milliseconds. */
    +    public static long time(Runnable action) {
    --- End diff --
    
    There's an implied assumption here that any exceptions thrown by the runnable are handled by the caller


---

[GitHub] jena pull request #424: JENA-1550: TDB2 loader

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on a diff in the pull request:

    https://github.com/apache/jena/pull/424#discussion_r189542491
  
    --- Diff: jena-db/jena-tdb2/src/main/java/org/apache/jena/tdb2/loader/parallel/LoaderParallel.java ---
    @@ -0,0 +1,155 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.tdb2.loader.parallel;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.function.Consumer;
    +
    +import org.apache.jena.atlas.lib.tuple.Tuple;
    +import org.apache.jena.graph.Node;
    +import org.apache.jena.riot.system.StreamRDF;
    +import org.apache.jena.sparql.core.DatasetGraph;
    +import org.apache.jena.tdb2.loader.DataLoader;
    +import org.apache.jena.tdb2.loader.base.LoaderBase;
    +import org.apache.jena.tdb2.loader.base.LoaderOps;
    +import org.apache.jena.tdb2.loader.base.MonitorOutput;
    +import org.apache.jena.tdb2.store.DatasetGraphTDB;
    +import org.apache.jena.tdb2.store.DatasetPrefixesTDB;
    +import org.apache.jena.tdb2.store.NodeId;
    +import org.apache.jena.tdb2.sys.TDBInternal;
    +
    +/**
    + * The parallel Loader.
    + * <p>
    + * The process is:
    + * <blockquote>
    + * {@code DataBatcher -> DataToTuples -> Indexer}
    + * <blockquote>
    + * {@link DataBatcher} produces {@link DataBlock}s - grouping of triples and quads. It uses 
    + * <br/>
    + * {@link DataToTuples} processes {@link DataBlock} to create 2 outputs blocks of {@code Tuple<NodeId>}, one output for triples, oue for quads.
    --- End diff --
    
    Typo `oue` -> `one`


---

[GitHub] jena pull request #424: JENA-1550: TDB2 loader

Posted by afs <gi...@git.apache.org>.
Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/424#discussion_r189671869
  
    --- Diff: jena-db/jena-tdb2/src/main/java/org/apache/jena/tdb2/loader/parallel/LoaderParallel.java ---
    @@ -0,0 +1,155 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.tdb2.loader.parallel;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.function.Consumer;
    +
    +import org.apache.jena.atlas.lib.tuple.Tuple;
    +import org.apache.jena.graph.Node;
    +import org.apache.jena.riot.system.StreamRDF;
    +import org.apache.jena.sparql.core.DatasetGraph;
    +import org.apache.jena.tdb2.loader.DataLoader;
    +import org.apache.jena.tdb2.loader.base.LoaderBase;
    +import org.apache.jena.tdb2.loader.base.LoaderOps;
    +import org.apache.jena.tdb2.loader.base.MonitorOutput;
    +import org.apache.jena.tdb2.store.DatasetGraphTDB;
    +import org.apache.jena.tdb2.store.DatasetPrefixesTDB;
    +import org.apache.jena.tdb2.store.NodeId;
    +import org.apache.jena.tdb2.sys.TDBInternal;
    +
    +/**
    + * The parallel Loader.
    + * <p>
    + * The process is:
    + * <blockquote>
    + * {@code DataBatcher -> DataToTuples -> Indexer}
    + * <blockquote>
    + * {@link DataBatcher} produces {@link DataBlock}s - grouping of triples and quads. It uses 
    --- End diff --
    
    Done.
    
    Also some adding of "public" to what are internal classes to allow further experimentation (as time permits). Loading at larger scale, and to disk, benefits from phasing the index builds (c.f. tdbloader1) but it needs experimentation.


---

[GitHub] jena pull request #424: JENA-1550: TDB2 loader

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on a diff in the pull request:

    https://github.com/apache/jena/pull/424#discussion_r189542444
  
    --- Diff: jena-db/jena-tdb2/src/main/java/org/apache/jena/tdb2/loader/parallel/LoaderParallel.java ---
    @@ -0,0 +1,155 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.tdb2.loader.parallel;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.function.Consumer;
    +
    +import org.apache.jena.atlas.lib.tuple.Tuple;
    +import org.apache.jena.graph.Node;
    +import org.apache.jena.riot.system.StreamRDF;
    +import org.apache.jena.sparql.core.DatasetGraph;
    +import org.apache.jena.tdb2.loader.DataLoader;
    +import org.apache.jena.tdb2.loader.base.LoaderBase;
    +import org.apache.jena.tdb2.loader.base.LoaderOps;
    +import org.apache.jena.tdb2.loader.base.MonitorOutput;
    +import org.apache.jena.tdb2.store.DatasetGraphTDB;
    +import org.apache.jena.tdb2.store.DatasetPrefixesTDB;
    +import org.apache.jena.tdb2.store.NodeId;
    +import org.apache.jena.tdb2.sys.TDBInternal;
    +
    +/**
    + * The parallel Loader.
    + * <p>
    + * The process is:
    + * <blockquote>
    + * {@code DataBatcher -> DataToTuples -> Indexer}
    + * <blockquote>
    + * {@link DataBatcher} produces {@link DataBlock}s - grouping of triples and quads. It uses 
    --- End diff --
    
    Unterminated sentence here?


---

[GitHub] jena pull request #424: JENA-1550: TDB2 loader

Posted by rvesse <gi...@git.apache.org>.
Github user rvesse commented on a diff in the pull request:

    https://github.com/apache/jena/pull/424#discussion_r189542354
  
    --- Diff: jena-db/jena-tdb2/src/main/java/org/apache/jena/tdb2/loader/parallel/LoaderConst.java ---
    @@ -0,0 +1,41 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.tdb2.loader.parallel;
    +
    +import java.util.Collections;
    +import java.util.List;
    +
    +import org.apache.jena.atlas.lib.tuple.Tuple;
    +import org.apache.jena.tdb2.store.NodeId;
    +
    +public class LoaderConst {
    +
    +    /** Chunk size for the triple->tuples output pipe */  
    +    public final static int ChunkSize = 100_000 ;
    --- End diff --
    
    Didn't even know this little syntax trick for making numeric constants more readable existed in Java!


---

[GitHub] jena pull request #424: JENA-1550: TDB2 loader

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/jena/pull/424


---