You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Amindri Udugala <am...@gmail.com> on 2015/02/09 07:33:07 UTC

Null pointer exception while building the index for freebase data dump

Hi All,

I need to create an index from a Freebase data dump. So I followed the
instructions in the README file in entityhub\indexing\freebase.

First I executed java -jar
org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init, to
generate the folder structure. The folder structure was successfully
generated except for the following warnings

16:16:20,530 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'nsogi' valid , namespace 'http://prefix.cc/nsogi:'
invalid -> mapping ignored!
16:16:21,279 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'category' valid , namespace '
http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'chebi' valid , namespace '
http://bio2rdf.org/chebi:' invalid -> mapping ignored!
16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'hgnc' valid , namespace 'http://bio2rdf.org/hgnc:'
invalid -> mapping ignored!
16:16:21,450 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'dbptmpl' valid , namespace '
http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'pubmed' valid , namespace '
http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'dbc' valid , namespace '
http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'dbt' valid , namespace '
http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'dbrc' valid , namespace '
http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'call' valid , namespace '
http://webofcode.org/wfn/call:' invalid -> mapping ignored!
16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
Namespace Mapping: prefix 'affymetrix' valid , namespace '
http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!

Then I copied the Freebase dump (freebase-rdf-latest.gz) to the
indexing/resources/rdfdata folder
and the incoming_links.txt file, generated by fbrankings-uri.sh  to
indexing/resources folder and executed the indexing process. (I used all
the default config files)

While executing the index process I noticed the following log.


16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataIterable: null
16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityIterator:
org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c
16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataProvider:
org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55
16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityScoreProvider: null

Finally it threw a null pointer exception as follows

16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 1 files imported
in 0 seconds
16:38:40,837 [Thread-3] INFO  source.ResourceLoader - Loding 0 File ...
16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 0 files imported
in 0 seconds
16:38:42,912 [Thread-0] INFO  solryard.SolrYardIndexingDestination -    ...
create SolrYard
16:38:42,959 [main] INFO  impl.IndexerImpl -  ... delete existing
IndexedEntityId file
C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip
16:38:42,974 [main] INFO  impl.IndexerImpl - Initialisation completed
16:38:42,974 [main] INFO  impl.IndexerImpl -   ... initialisation completed
16:38:42,974 [main] INFO  impl.IndexerImpl - start indexing ...
16:38:42,974 [main] INFO  impl.IndexerImpl - Indexing started ...
Exception in thread "Indexing: Entity Source Reader Deamon"
java.lang.NullPointerException
        at java.lang.StringBuilder.<init>(Unknown Source)
        at
org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435)
        at
org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379)
        at
org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356)
        at
org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55)
        at java.lang.Thread.run(Unknown Source)

I'm not sure if this happens because I haven't configured an important
property in a configuration file. I'm pretty new to Stanbol and any help
would be much appreciated.

Thanks in advance.
-- 
Regards
Amindri Udugala

Re: Null pointer exception while building the index for freebase data dump

Posted by Amindri Udugala <am...@gmail.com>.
Hi Rupert,

Thank you very much for the very informative reply. Seems like the indexing
tool is running as expected.
Will keep update about how it goes..
Many thanks again..

Regards
Amindri



On 23 February 2015 at 18:50, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Amindri,
>
> You can ignore those WARNINGS. They simple tell you that a literal
> value failed to validate the stated data type. I am not completely
> sure what jena does with such triples. But I think that it does store
> them anyway in the triplestore. When I imported freebase I piped the
> lodgings to a grep that removed all lines containing "WARN
> jena.riot".
>
> You will also see some similar warnings during indexing (e.g. dates
> like the 31th February ...). During indexing those data are stored as
> string values.
>  Yes I saw these as well...
>


> On Mon, Feb 23, 2015 at 3:06 AM, Amindri Udugala
> <am...@gmail.com> wrote:
> > However I noticed that the indexing process uses up to 14 GB of ram and
> > very little cpu (0% - 1%. Mostly it  is 0%). Also does not seem to use
> any
> > disk space at all. Is this something to be worried about?
>
> Jena TDB uses memory mapped files. AFAIK it will use all memory it can
> get for those. CPU is expected to be minimal. Most of the time is
> spent in index lookups. For every triple Jena needs to lookup the
> subject, predicate and object in the nodes table. After that it needs
> to lookup the triple in the triple table.
> In case any node or the triple does not exist it needs to update the
> tables.
>
> So most of the time is spent in lookups and write operations. As soon
> the the table get to big to be mapped in memory things start to get
> slow. Depending on the hardware even very slow ....
>
> The WARN messages state the line number. When you do a line count on
> the source file you can easily determine how much of the dump you have
> already imported. You should also see loggings about the current
> import speed. Combining this you can estimate the remaining time.
>
> best
> Rupert
>
> >
> > Thanks
> > Amindri
> >
> >
> >
> > On 13 February 2015 at 17:16, Amindri Udugala <am...@gmail.com>
> > wrote:
> >
> >> Hi Rupert,
> >>
> >> The fix is in the indexing tool.
> >> (entityhub/indexing/core/source/LineBasedEntityIterator.java). I created
> >> the issue and submitted the patch.
> >>
> >> Yes Rupert, the problem was jena TDB is not importing the, Freebase
> dump.
> >> The reason behind this was file name of my freebase data dump. It was
> named
> >> as freebase_latest.gz, and JenaTDB was trying to map the extension of
> the
> >> file with a map of Lang objects. (Check line no 61 in
> RdfResourceImporter).
> >> Once I renamed my Freebase dump as freebase.rdf.gz, Jena TDB started to
> >> import the data.
> >>
> >> Then again it threw a riot exception and now I'm running the fixit.pl
> >> tool on the dump. Will keep you updated on how the indexing process will
> >> turn out.
> >>
> >> Thanks for the valuable tips on indexing.
> >>
> >> Thanks
> >> Amindri
> >>
> >>
> >
> >
> > --
> > Regards
> > Amindri Udugala
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>



-- 
Regards
Amindri Udugala

Re: Null pointer exception while building the index for freebase data dump

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Amindri,

You can ignore those WARNINGS. They simple tell you that a literal
value failed to validate the stated data type. I am not completely
sure what jena does with such triples. But I think that it does store
them anyway in the triplestore. When I imported freebase I piped the
lodgings to a grep that removed all lines containing "WARN
jena.riot".

You will also see some similar warnings during indexing (e.g. dates
like the 31th February ...). During indexing those data are stored as
string values.

On Mon, Feb 23, 2015 at 3:06 AM, Amindri Udugala
<am...@gmail.com> wrote:
> However I noticed that the indexing process uses up to 14 GB of ram and
> very little cpu (0% - 1%. Mostly it  is 0%). Also does not seem to use any
> disk space at all. Is this something to be worried about?

Jena TDB uses memory mapped files. AFAIK it will use all memory it can
get for those. CPU is expected to be minimal. Most of the time is
spent in index lookups. For every triple Jena needs to lookup the
subject, predicate and object in the nodes table. After that it needs
to lookup the triple in the triple table.
In case any node or the triple does not exist it needs to update the tables.

So most of the time is spent in lookups and write operations. As soon
the the table get to big to be mapped in memory things start to get
slow. Depending on the hardware even very slow ....

The WARN messages state the line number. When you do a line count on
the source file you can easily determine how much of the dump you have
already imported. You should also see loggings about the current
import speed. Combining this you can estimate the remaining time.

best
Rupert

>
> Thanks
> Amindri
>
>
>
> On 13 February 2015 at 17:16, Amindri Udugala <am...@gmail.com>
> wrote:
>
>> Hi Rupert,
>>
>> The fix is in the indexing tool.
>> (entityhub/indexing/core/source/LineBasedEntityIterator.java). I created
>> the issue and submitted the patch.
>>
>> Yes Rupert, the problem was jena TDB is not importing the, Freebase dump.
>> The reason behind this was file name of my freebase data dump. It was named
>> as freebase_latest.gz, and JenaTDB was trying to map the extension of the
>> file with a map of Lang objects. (Check line no 61 in RdfResourceImporter).
>> Once I renamed my Freebase dump as freebase.rdf.gz, Jena TDB started to
>> import the data.
>>
>> Then again it threw a riot exception and now I'm running the fixit.pl
>> tool on the dump. Will keep you updated on how the indexing process will
>> turn out.
>>
>> Thanks for the valuable tips on indexing.
>>
>> Thanks
>> Amindri
>>
>>
>
>
> --
> Regards
> Amindri Udugala



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Null pointer exception while building the index for freebase data dump

Posted by Amindri Udugala <am...@gmail.com>.
Hi Rupert,

I started to index the freebase dump 5 days ago and everything thing seems
to be good when i checked the logs. Following lines are some of the logs
which I got...


11:32:58,216 [Thread-3] INFO  jenatdb.RdfResourceImporter - Filtered:
1423200000 triples (80.25462627530949%)
11:33:27,960 [Thread-3] WARN  jena.riot - [line: 1773380405, col: 129]
Lexical form 'T17:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:33:27,960 [Thread-3] WARN  jena.riot - [line: 1773380406, col: 126]
Lexical form 'T10:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:33:27,961 [Thread-3] WARN  jena.riot - [line: 1773380407, col: 126]
Lexical form 'T17:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:33:37,043 [Thread-3] WARN  jena.riot - [line: 1773388185, col: 123]
Lexical form 'T00:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:33:37,043 [Thread-3] WARN  jena.riot - [line: 1773388186, col: 126]
Lexical form 'T00:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:35:08,676 [Thread-3] WARN  jena.riot - [line: 1773466822, col: 125]
Lexical form 'T08:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:35:08,676 [Thread-3] WARN  jena.riot - [line: 1773466823, col: 127]
Lexical form 'T16:30' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:35:08,676 [Thread-3] WARN  jena.riot - [line: 1773466825, col: 128]
Lexical form 'T13:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:35:12,191 [Thread-3] INFO  jenatdb.RdfResourceImporter - Add:
350,200,000 triples (Batch: 333 / Avg: 702)
11:36:16,585 [Thread-3] WARN  jena.riot - [line: 1773516793, col: 125]
Lexical form 'T08:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:16,585 [Thread-3] WARN  jena.riot - [line: 1773516797, col: 127]
Lexical form 'T19:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:16,585 [Thread-3] WARN  jena.riot - [line: 1773516801, col: 126]
Lexical form 'T08:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:16,585 [Thread-3] WARN  jena.riot - [line: 1773516803, col: 128]
Lexical form 'T18:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:16,586 [Thread-3] WARN  jena.riot - [line: 1773516806, col: 129]
Lexical form 'T19:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:16,586 [Thread-3] WARN  jena.riot - [line: 1773516807, col: 123]
Lexical form 'T08:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:16,586 [Thread-3] WARN  jena.riot - [line: 1773516808, col: 125]
Lexical form 'T09:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:16,586 [Thread-3] WARN  jena.riot - [line: 1773516809, col: 123]
Lexical form 'T08:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:36:22,545 [Thread-3] INFO  jenatdb.RdfResourceImporter - Filtered:
1423300000 triples (80.25274111404639%)
11:37:41,460 [Thread-3] INFO  jenatdb.RdfResourceImporter - Add:
350,250,000 triples (Batch: 334 / Avg: 702)
11:39:29,824 [Thread-3] DEBUG file.BlockAccessMapped - Segment: 706
11:39:38,981 [Thread-3] INFO  jenatdb.RdfResourceImporter - Filtered:
1423400000 triples (80.25097751824737%)
11:40:14,344 [Thread-3] INFO  jenatdb.RdfResourceImporter - Add:
350,300,000 triples (Batch: 327 / Avg: 702)
11:40:23,581 [Thread-3] DEBUG file.BlockAccessMapped - Segment: 1930
11:41:55,656 [Thread-3] INFO  jenatdb.RdfResourceImporter - Add:
350,350,000 triples (Batch: 493 / Avg: 702)
11:41:58,858 [Thread-3] INFO  jenatdb.RdfResourceImporter - Filtered:
1423500000 triples (80.2491098335781%)
11:42:04,305 [Thread-3] WARN  jena.riot - [line: 1773858075, col: 98]
Lexical form 'T15:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:42:04,325 [Thread-3] WARN  jena.riot - [line: 1773858119, col: 100]
Lexical form 'T18:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:42:04,364 [Thread-3] WARN  jena.riot - [line: 1773858168, col: 98]
Lexical form 'T13:30' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:42:04,364 [Thread-3] WARN  jena.riot - [line: 1773858169, col: 100]
Lexical form 'T00:00' not valid for datatype
http://www.w3.org/2001/XMLSchema#dateTime
11:43:46,111 [Thread-3] WARN  jena.riot - [line: 1773975044, col: 88] Bad
IRI: <http://www.amazon.de:80/exec/obidos/ASIN/B00005V6S1> Code:
13/DEFAULT_PORT_SHOULD_BE_OMITTED in PORT: If the port is the default one
for the scheme it should be omitted.
11:43:46,111 [Thread-3] WARN  jena.riot - [line: 1773975044, col: 88] Bad
IRI: <http://www.amazon.de:80/exec/obidos/ASIN/B00005V6S1> Code:
14/PORT_SHOULD_NOT_BE_WELL_KNOWN in PORT: Ports under 1024 should be
accessed using the appropriate scheme name.


However I noticed that the indexing process uses up to 14 GB of ram and
very little cpu (0% - 1%. Mostly it  is 0%). Also does not seem to use any
disk space at all. Is this something to be worried about?

Thanks
Amindri



On 13 February 2015 at 17:16, Amindri Udugala <am...@gmail.com>
wrote:

> Hi Rupert,
>
> The fix is in the indexing tool.
> (entityhub/indexing/core/source/LineBasedEntityIterator.java). I created
> the issue and submitted the patch.
>
> Yes Rupert, the problem was jena TDB is not importing the, Freebase dump.
> The reason behind this was file name of my freebase data dump. It was named
> as freebase_latest.gz, and JenaTDB was trying to map the extension of the
> file with a map of Lang objects. (Check line no 61 in RdfResourceImporter).
> Once I renamed my Freebase dump as freebase.rdf.gz, Jena TDB started to
> import the data.
>
> Then again it threw a riot exception and now I'm running the fixit.pl
> tool on the dump. Will keep you updated on how the indexing process will
> turn out.
>
> Thanks for the valuable tips on indexing.
>
> Thanks
> Amindri
>
>


-- 
Regards
Amindri Udugala

Re: Null pointer exception while building the index for freebase data dump

Posted by Amindri Udugala <am...@gmail.com>.
Hi Rupert,

The fix is in the indexing tool.
(entityhub/indexing/core/source/LineBasedEntityIterator.java). I created
the issue and submitted the patch.

Yes Rupert, the problem was jena TDB is not importing the, Freebase dump.
The reason behind this was file name of my freebase data dump. It was named
as freebase_latest.gz, and JenaTDB was trying to map the extension of the
file with a map of Lang objects. (Check line no 61 in RdfResourceImporter).
Once I renamed my Freebase dump as freebase.rdf.gz, Jena TDB started to
import the data.

Then again it threw a riot exception and now I'm running the fixit.pl tool
on the dump. Will keep you updated on how the indexing process will turn
out.

Thanks for the valuable tips on indexing.

Thanks
Amindri

Re: Null pointer exception while building the index for freebase data dump

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Amindri

> http://rdf.freebase.com/ns/.0432b)!
>
>
> It seems like the 'm' in the entity id is being dropped. I created a patch
> for that, so that preceding elements of the entity id are not dropped when
> the default name space is used.
>

This is definitely a bug. Where have you fixed it? in the
NamespacePrefixService or in the indexing tool? Can you please create
an issue and provide the patch?

This this fixed the loggings of the 2nd try do look fine. However it
looks as as if no data are in the Jena TDB store.

> When I checked the code, this happens because
> indexingDataset.getDefaultGraph()
> (RdfIndexingSource.getEntityData(String id) - 406) returns en empty graph so
> it cannot find the parsed entity in it.. The indexing/resources/tdb folder,
> which is used to create the  indexingDataset exists with 26 data files.
>

The files are creates as soon as you start Jena TDB. Depending on the
OS the initial size of the files differ. On Mac they are several GByte
in Size (as the OS allocates the full size of the memory mapped files)
on Linux and Windows they are just some kBytes.

With the Freebase Data imported the size of all files in this
directory should be much higher. I still have a directory with a dump
I imported in April 2013. This one has about 70GByte. Back than I was
using a machine with a SSD to import the RDF data. The process needed
about a week to complete.

To import the data you need to copy the file compressed file with the
corrected RDF data (output of step (4) in the README) to the
"indexing/resources/rdfdata" folder. The indexing tool will import all
RDF files in this folder to the Jena TDB store. Imported files will be
moved over to "indexing/resources/imported" (to avoid importing them
again on follow up executions).

In addition I recommend to cancel the Indexing tool after it has
finished importing all data. Experience showed that restarting the
indexing tool after importing nearly 2 billion triples to Jena TDB
increased the indexing time quite a but.

In my setup importing the RDF data to Jena TDB needed about  a week.
Indexing the imported data needed about 10 hours.

best
Rupert

On Thu, Feb 12, 2015 at 6:35 AM, Amindri Udugala
<am...@gmail.com> wrote:
> Hi Rupert,
>
> I was following the readme file, but the problem still exists.
>
> I enabled debug and saw the following lines
>
> 15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator - > line =     141 m.0432b
> 15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator -  - id = m.0432b
> 15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator -  - entity =
> http://rdf.freebase.com/ns/.0432b
> 15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator -  - score =
> 15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
> jenatdb.RdfIndexingSource - No Statements found for id
> http://rdf.freebase.com/key/.0432b (Node:
> http://rdf.freebase.com/ns/.0432b)!
>
>
> It seems like the 'm' in the entity id is being dropped. I created a patch
> for that, so that preceding elements of the entity id are not dropped when
> the default name space is used.
>
>
> However even after fixing this problem, I still get the same Debug log with
> the correct entity URL
>
> 15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
> impl.EntityIdBasedIndexingDaemon - unable to get Data for Entity
> http://rdf.freebase.com/key/m.041yjm (score=norm:0.314499|orig:141.0)
> 15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator - > line =     141 m.041s8n
> 15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator -  - id = m.041s8n
> 15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator -  - entity =
> http://rdf.freebase.com/ns/m.041s8n
> 15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
> source.LineBasedEntityIterator -  - score =
> 15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
> jenatdb.RdfIndexingSource - No Statements found for id
> http://rdf.freebase.com/ns/m.041s8n (Node:
> http://rdf.freebase.com/key/m.041s8n)!
> 15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
> impl.EntityIdBasedIndexingDaemon - unable to get Data for Entity
> http://rdf.freebase.com/ns/m.041s8n (score=norm:0.314499|orig:141.0)
>
> When I checked the code, this happens because
> indexingDataset.getDefaultGraph()
> (RdfIndexingSource.getEntityData(String id) - 406) returns en empty graph so
> it cannot find the parsed entity in it.. The indexing/resources/tdb folder,
> which is used to create the  indexingDataset exists with 26 data files.
>
>
> Do you have any idea why this happens?
>
> Thanks
> Amindri
>
>
>
> On 11 February 2015 at 23:03, Rupert Westenthaler
> <ru...@gmail.com> wrote:
>>
>> Hi Amindri,
>>
>> The file to look is the README.md file of the freebase indexer [1]. If
>> something is missing in this file please create an issue [2] and if
>> possible provide a patch.
>>
>> thx
>> Rupert
>>
>>
>> [1]
>> http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/freebase/README.md
>> [2] https://issues.apache.org/jira/browse/STANBOL
>>
>>
>> On Wed, Feb 11, 2015 at 1:00 AM, Amindri Udugala
>> <am...@gmail.com> wrote:
>> > Hi Rupert,
>> >
>> > Thanks for the informative reply.
>> > I was able to specify an empty String as the namespace prefix
>> > namespaceprefix.mapping
>> > file. Exactly as you mentioned, indexing started with no loggings for
>> > quite
>> > some time. Then the process finish without indexing a single entity.
>> >
>> > I used all the default configuration files created by the init process.
>> > I'm
>> > trying to build a freebase index for multilingual FST linking. I would
>> > much
>> > appreciate if you can point me to resource where I can get the
>> > information
>> > to correctly configure the properties files.
>> >
>> > Thanks,
>> > Amindri
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                              ++43-699-11108907
>> | A-5500 Bischofshofen
>> | REDLINK.CO
>> ..........................................................................
>> | http://redlink.co/
>
>
>
>
> --
> Regards
> Amindri Udugala
>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Null pointer exception while building the index for freebase data dump

Posted by Amindri Udugala <am...@gmail.com>.
Hi Rupert,

I was following the readme file, but the problem still exists.

I enabled debug and saw the following lines

15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator - > line =     141 m.0432b
15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator -  - id = m.0432b
15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator -  - entity =
http://rdf.freebase.com/ns/.0432b
15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator -  - score =
15:48:43,318 [Indexing: Entity Source Reader Deamon] DEBUG
jenatdb.RdfIndexingSource - No Statements found for id
http://rdf.freebase.com/key/.0432b (Node: http://rdf.freebase.com/ns/.0432b
)!


It seems like the 'm' in the entity id is being dropped. I created a patch
for that, so that preceding elements of the entity id are not dropped when
the default name space is used.


However even after fixing this problem, I still get the same Debug log with
the correct entity URL

15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
impl.EntityIdBasedIndexingDaemon - unable to get Data for Entity
http://rdf.freebase.com/key/m.041yjm (score=norm:0.314499|orig:141.0)
15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator - > line =     141 m.041s8n
15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator -  - id = m.041s8n
15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator -  - entity =
http://rdf.freebase.com/ns/m.041s8n
15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
source.LineBasedEntityIterator -  - score =
15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
jenatdb.RdfIndexingSource - No Statements found for id
http://rdf.freebase.com/ns/m.041s8n (Node:
http://rdf.freebase.com/key/m.041s8n)!
15:48:43,319 [Indexing: Entity Source Reader Deamon] DEBUG
impl.EntityIdBasedIndexingDaemon - unable to get Data for Entity
http://rdf.freebase.com/ns/m.041s8n (score=norm:0.314499|orig:141.0)

When I checked the code, this happens because indexingDataset
.getDefaultGraph()
(RdfIndexingSource.getEntityData(String id) - 406) returns en empty graph
so it cannot find the parsed entity in it.. The indexing/resources/tdb
folder, which is used to create the  indexingDataset exists with 26 data
files.


Do you have any idea why this happens?

Thanks
Amindri



On 11 February 2015 at 23:03, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Amindri,
>
> The file to look is the README.md file of the freebase indexer [1]. If
> something is missing in this file please create an issue [2] and if
> possible provide a patch.
>
> thx
> Rupert
>
>
> [1]
> http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/freebase/README.md
> [2] https://issues.apache.org/jira/browse/STANBOL
>
>
> On Wed, Feb 11, 2015 at 1:00 AM, Amindri Udugala
> <am...@gmail.com> wrote:
> > Hi Rupert,
> >
> > Thanks for the informative reply.
> > I was able to specify an empty String as the namespace prefix
> > namespaceprefix.mapping
> > file. Exactly as you mentioned, indexing started with no loggings for
> quite
> > some time. Then the process finish without indexing a single entity.
> >
> > I used all the default configuration files created by the init process.
> I'm
> > trying to build a freebase index for multilingual FST linking. I would
> much
> > appreciate if you can point me to resource where I can get the
> information
> > to correctly configure the properties files.
> >
> > Thanks,
> > Amindri
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>



-- 
Regards
Amindri Udugala

Re: Null pointer exception while building the index for freebase data dump

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Amindri,

The file to look is the README.md file of the freebase indexer [1]. If
something is missing in this file please create an issue [2] and if
possible provide a patch.

thx
Rupert


[1] http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/freebase/README.md
[2] https://issues.apache.org/jira/browse/STANBOL


On Wed, Feb 11, 2015 at 1:00 AM, Amindri Udugala
<am...@gmail.com> wrote:
> Hi Rupert,
>
> Thanks for the informative reply.
> I was able to specify an empty String as the namespace prefix
> namespaceprefix.mapping
> file. Exactly as you mentioned, indexing started with no loggings for quite
> some time. Then the process finish without indexing a single entity.
>
> I used all the default configuration files created by the init process. I'm
> trying to build a freebase index for multilingual FST linking. I would much
> appreciate if you can point me to resource where I can get the information
> to correctly configure the properties files.
>
> Thanks,
> Amindri



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Null pointer exception while building the index for freebase data dump

Posted by Amindri Udugala <am...@gmail.com>.
Hi Rupert,

Thanks for the informative reply.
I was able to specify an empty String as the namespace prefix
namespaceprefix.mapping
file. Exactly as you mentioned, indexing started with no loggings for quite
some time. Then the process finish without indexing a single entity.

I used all the default configuration files created by the init process. I'm
trying to build a freebase index for multilingual FST linking. I would much
appreciate if you can point me to resource where I can get the information
to correctly configure the properties files.

Thanks,
Amindri

Re: Null pointer exception while building the index for freebase data dump

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Amindri

This are valuable information.

The important thing is that

>>       1 m.0___xpc
>>       1 m.0___xk_

"m.0___xpc" is processed as "http://rdf.freebase.com/ns/m.0___xpc".
Make sure that your "indexing/config/iditerator.properties" is
configured accordingly.
If not you will see the log noting that the indexing has started. Than
you will have no loggings for quite some time. After that it will
finish indexing without a single entity to be indexed. The reason is
that the URIs for Entities are wrongly build and therefore not found
in the source triple store.

If the "iditerator.properties" is correctly configured you will see
logs every few thousand indexed entities.

>> So the entities are not preceded with a name space. Therefore when calling
>> String prefix = NamespaceMappingUtils.getPrefix(entity);
>> (LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is
>> assigned with a empty String.
>> Is it correct to defined an empty name space mapping as follows in the
>> namespaceprefix.mapping?

An empty String represents the default namespace. You can provide a
mapping for an empty String in the "namespaceprefix.mapping" file. If
not this would be a Bug.

best
Rupert


On Tue, Feb 10, 2015 at 3:50 AM, Amindri Udugala
<am...@gmail.com> wrote:
> Hi Rupert,
>
> Sorry about the previous mail. I configured the ns-prefix-state property in
> iditerator.properties file to false he the indexing process finished
> without any error. However I'm not sure if what I did was correct.
> If it is correct, it will be quite helpful, to throw an exception if the
> prefix is empty and prefix state is set to true.
> I'm sorry again if any of the things I mentioned doesn't make any sense :)
> Thanks
>
> On 10 February 2015 at 11:53, Amindri Udugala <am...@gmail.com>
> wrote:
>
>> Hi Rupert,
>>
>> Thank for the prompt reply.
>>
>> When I checked the incoming_links.txt  the final lines were as follows
>>       1 m.0___xpc
>>       1 m.0___xk_
>>       1 m.0___ttg
>>       1 m.0___t6s
>>       1 m.0___t6h
>>       1 m.0___t5v
>>       1 m.0___t5c
>>       1 m.0___rw7
>>       1 m.0___qhn
>>       1 m.0___p3v
>>       1 m.0___nm5
>>       1 m.0___n4s
>>       1 m.0___n
>>       1 m.0___jk_
>>       1 m.0___hv4
>>       1 m.0___c6k
>>       1 m.0___b4g
>>       1 m.0___8
>>       1 m.0___7yv
>>       1 m.0___2fw
>>       1 m.0____
>>
>> So the entities are not preceded with a name space. Therefore when calling
>> String prefix = NamespaceMappingUtils.getPrefix(entity);
>> (LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is
>> assigned with a empty String.
>> Is it correct to defined an empty name space mapping as follows in the
>> namespaceprefix.mapping?
>>
>> fb    http://rdf.freebase.com/ns/
>> ns    http://rdf.freebase.com/ns/
>> key    http://rdf.freebase.com/key/
>>     http://rdf.freebase.com/ns/
>>
>> Thanks
>>
>>
>> Regards
>> Amindri
>>
>> On 9 February 2015 at 17:52, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>>> Hi Amindri
>>>
>>> Based on the code the NPE could originate from a namespace prefix
>>> unknown to the namespace prefix service.
>>>
>>> Can you please check the data of the "incoming_links.txt" file against
>>> mappings define in the "indexing/config/namespaceprefix.mappings"
>>> file. My guess is that the "incoming_links.txt" uses a prefix that is
>>> not define in the mappings file.
>>>
>>> It is recommended to explicitly define namespace prefix mappings for
>>> all namespaces used by the indexing process (config data and rdf
>>> data). For missing mappings http://prefix.cc/ is used as a fallback.
>>>
>>> best
>>> Rupert
>>>
>>>
>>> On Mon, Feb 9, 2015 at 7:33 AM, Amindri Udugala
>>> <am...@gmail.com> wrote:
>>> > Hi All,
>>> >
>>> > I need to create an index from a Freebase data dump. So I followed the
>>> > instructions in the README file in entityhub\indexing\freebase.
>>> >
>>> > First I executed java -jar
>>> > org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init,
>>> to
>>> > generate the folder structure. The folder structure was successfully
>>> > generated except for the following warnings
>>> >
>>> > 16:16:20,530 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'nsogi' valid , namespace '
>>> http://prefix.cc/nsogi:'
>>> > invalid -> mapping ignored!
>>> > 16:16:21,279 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'category' valid , namespace '
>>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'chebi' valid , namespace '
>>> > http://bio2rdf.org/chebi:' invalid -> mapping ignored!
>>> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'hgnc' valid , namespace '
>>> http://bio2rdf.org/hgnc:'
>>> > invalid -> mapping ignored!
>>> > 16:16:21,450 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbptmpl' valid , namespace '
>>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'pubmed' valid , namespace '
>>> > http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbc' valid , namespace '
>>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbt' valid , namespace '
>>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbrc' valid , namespace '
>>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'call' valid , namespace '
>>> > http://webofcode.org/wfn/call:' invalid -> mapping ignored!
>>> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'affymetrix' valid , namespace '
>>> > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
>>> >
>>> > Then I copied the Freebase dump (freebase-rdf-latest.gz) to the
>>> > indexing/resources/rdfdata folder
>>> > and the incoming_links.txt file, generated by fbrankings-uri.sh  to
>>> > indexing/resources folder and executed the indexing process. (I used all
>>> > the default config files)
>>> >
>>> > While executing the index process I noticed the following log.
>>> >
>>> >
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataIterable:
>>> null
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityIterator:
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataProvider:
>>> >
>>> org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityScoreProvider:
>>> null
>>> >
>>> > Finally it threw a null pointer exception as follows
>>> >
>>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 1 files
>>> imported
>>> > in 0 seconds
>>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader - Loding 0 File ...
>>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 0 files
>>> imported
>>> > in 0 seconds
>>> > 16:38:42,912 [Thread-0] INFO  solryard.SolrYardIndexingDestination -
>>> ...
>>> > create SolrYard
>>> > 16:38:42,959 [main] INFO  impl.IndexerImpl -  ... delete existing
>>> > IndexedEntityId file
>>> >
>>> C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Initialisation completed
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl -   ... initialisation
>>> completed
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - start indexing ...
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Indexing started ...
>>> > Exception in thread "Indexing: Entity Source Reader Deamon"
>>> > java.lang.NullPointerException
>>> >         at java.lang.StringBuilder.<init>(Unknown Source)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55)
>>> >         at java.lang.Thread.run(Unknown Source)
>>> >
>>> > I'm not sure if this happens because I haven't configured an important
>>> > property in a configuration file. I'm pretty new to Stanbol and any help
>>> > would be much appreciated.
>>> >
>>> > Thanks in advance.
>>> > --
>>> > Regards
>>> > Amindri Udugala
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                              ++43-699-11108907
>>> | A-5500 Bischofshofen
>>> | REDLINK.CO
>>> ..........................................................................
>>> | http://redlink.co/
>>>
>>
>>
>>
>> --
>> Regards
>> Amindri Udugala
>>
>>
>>
>
>
> --
> Regards
> Amindri Udugala



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Null pointer exception while building the index for freebase data dump

Posted by Amindri Udugala <am...@gmail.com>.
Hi Rupert,

Sorry about the previous mail. I configured the ns-prefix-state property in
iditerator.properties file to false he the indexing process finished
without any error. However I'm not sure if what I did was correct.
If it is correct, it will be quite helpful, to throw an exception if the
prefix is empty and prefix state is set to true.
I'm sorry again if any of the things I mentioned doesn't make any sense :)
Thanks

On 10 February 2015 at 11:53, Amindri Udugala <am...@gmail.com>
wrote:

> Hi Rupert,
>
> Thank for the prompt reply.
>
> When I checked the incoming_links.txt  the final lines were as follows
>       1 m.0___xpc
>       1 m.0___xk_
>       1 m.0___ttg
>       1 m.0___t6s
>       1 m.0___t6h
>       1 m.0___t5v
>       1 m.0___t5c
>       1 m.0___rw7
>       1 m.0___qhn
>       1 m.0___p3v
>       1 m.0___nm5
>       1 m.0___n4s
>       1 m.0___n
>       1 m.0___jk_
>       1 m.0___hv4
>       1 m.0___c6k
>       1 m.0___b4g
>       1 m.0___8
>       1 m.0___7yv
>       1 m.0___2fw
>       1 m.0____
>
> So the entities are not preceded with a name space. Therefore when calling
> String prefix = NamespaceMappingUtils.getPrefix(entity);
> (LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is
> assigned with a empty String.
> Is it correct to defined an empty name space mapping as follows in the
> namespaceprefix.mapping?
>
> fb    http://rdf.freebase.com/ns/
> ns    http://rdf.freebase.com/ns/
> key    http://rdf.freebase.com/key/
>     http://rdf.freebase.com/ns/
>
> Thanks
>
>
> Regards
> Amindri
>
> On 9 February 2015 at 17:52, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Amindri
>>
>> Based on the code the NPE could originate from a namespace prefix
>> unknown to the namespace prefix service.
>>
>> Can you please check the data of the "incoming_links.txt" file against
>> mappings define in the "indexing/config/namespaceprefix.mappings"
>> file. My guess is that the "incoming_links.txt" uses a prefix that is
>> not define in the mappings file.
>>
>> It is recommended to explicitly define namespace prefix mappings for
>> all namespaces used by the indexing process (config data and rdf
>> data). For missing mappings http://prefix.cc/ is used as a fallback.
>>
>> best
>> Rupert
>>
>>
>> On Mon, Feb 9, 2015 at 7:33 AM, Amindri Udugala
>> <am...@gmail.com> wrote:
>> > Hi All,
>> >
>> > I need to create an index from a Freebase data dump. So I followed the
>> > instructions in the README file in entityhub\indexing\freebase.
>> >
>> > First I executed java -jar
>> > org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init,
>> to
>> > generate the folder structure. The folder structure was successfully
>> > generated except for the following warnings
>> >
>> > 16:16:20,530 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'nsogi' valid , namespace '
>> http://prefix.cc/nsogi:'
>> > invalid -> mapping ignored!
>> > 16:16:21,279 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'category' valid , namespace '
>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'chebi' valid , namespace '
>> > http://bio2rdf.org/chebi:' invalid -> mapping ignored!
>> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'hgnc' valid , namespace '
>> http://bio2rdf.org/hgnc:'
>> > invalid -> mapping ignored!
>> > 16:16:21,450 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'dbptmpl' valid , namespace '
>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'pubmed' valid , namespace '
>> > http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'dbc' valid , namespace '
>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'dbt' valid , namespace '
>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'dbrc' valid , namespace '
>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'call' valid , namespace '
>> > http://webofcode.org/wfn/call:' invalid -> mapping ignored!
>> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>> > Namespace Mapping: prefix 'affymetrix' valid , namespace '
>> > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
>> >
>> > Then I copied the Freebase dump (freebase-rdf-latest.gz) to the
>> > indexing/resources/rdfdata folder
>> > and the incoming_links.txt file, generated by fbrankings-uri.sh  to
>> > indexing/resources folder and executed the indexing process. (I used all
>> > the default config files)
>> >
>> > While executing the index process I noticed the following log.
>> >
>> >
>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataIterable:
>> null
>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityIterator:
>> >
>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c
>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataProvider:
>> >
>> org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55
>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityScoreProvider:
>> null
>> >
>> > Finally it threw a null pointer exception as follows
>> >
>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 1 files
>> imported
>> > in 0 seconds
>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader - Loding 0 File ...
>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 0 files
>> imported
>> > in 0 seconds
>> > 16:38:42,912 [Thread-0] INFO  solryard.SolrYardIndexingDestination -
>> ...
>> > create SolrYard
>> > 16:38:42,959 [main] INFO  impl.IndexerImpl -  ... delete existing
>> > IndexedEntityId file
>> >
>> C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip
>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Initialisation completed
>> > 16:38:42,974 [main] INFO  impl.IndexerImpl -   ... initialisation
>> completed
>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - start indexing ...
>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Indexing started ...
>> > Exception in thread "Indexing: Entity Source Reader Deamon"
>> > java.lang.NullPointerException
>> >         at java.lang.StringBuilder.<init>(Unknown Source)
>> >         at
>> >
>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435)
>> >         at
>> >
>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379)
>> >         at
>> >
>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356)
>> >         at
>> >
>> org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55)
>> >         at java.lang.Thread.run(Unknown Source)
>> >
>> > I'm not sure if this happens because I haven't configured an important
>> > property in a configuration file. I'm pretty new to Stanbol and any help
>> > would be much appreciated.
>> >
>> > Thanks in advance.
>> > --
>> > Regards
>> > Amindri Udugala
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                              ++43-699-11108907
>> | A-5500 Bischofshofen
>> | REDLINK.CO
>> ..........................................................................
>> | http://redlink.co/
>>
>
>
>
> --
> Regards
> Amindri Udugala
>
>
>


-- 
Regards
Amindri Udugala

Re: Null pointer exception while building the index for freebase data dump

Posted by Amindri Udugala <am...@gmail.com>.
Hi Rupert,

Thank for the prompt reply.

When I checked the incoming_links.txt  the final lines were as follows
      1 m.0___xpc
      1 m.0___xk_
      1 m.0___ttg
      1 m.0___t6s
      1 m.0___t6h
      1 m.0___t5v
      1 m.0___t5c
      1 m.0___rw7
      1 m.0___qhn
      1 m.0___p3v
      1 m.0___nm5
      1 m.0___n4s
      1 m.0___n
      1 m.0___jk_
      1 m.0___hv4
      1 m.0___c6k
      1 m.0___b4g
      1 m.0___8
      1 m.0___7yv
      1 m.0___2fw
      1 m.0____

So the entities are not preceded with a name space. Therefore when calling
String prefix = NamespaceMappingUtils.getPrefix(entity);
(LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is
assigned with a empty String.
Is it correct to defined an empty name space mapping as follows in the
namespaceprefix.mapping?

fb    http://rdf.freebase.com/ns/
ns    http://rdf.freebase.com/ns/
key    http://rdf.freebase.com/key/
    http://rdf.freebase.com/ns/

Thanks


Regards
Amindri

On 9 February 2015 at 17:52, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Amindri
>
> Based on the code the NPE could originate from a namespace prefix
> unknown to the namespace prefix service.
>
> Can you please check the data of the "incoming_links.txt" file against
> mappings define in the "indexing/config/namespaceprefix.mappings"
> file. My guess is that the "incoming_links.txt" uses a prefix that is
> not define in the mappings file.
>
> It is recommended to explicitly define namespace prefix mappings for
> all namespaces used by the indexing process (config data and rdf
> data). For missing mappings http://prefix.cc/ is used as a fallback.
>
> best
> Rupert
>
>
> On Mon, Feb 9, 2015 at 7:33 AM, Amindri Udugala
> <am...@gmail.com> wrote:
> > Hi All,
> >
> > I need to create an index from a Freebase data dump. So I followed the
> > instructions in the README file in entityhub\indexing\freebase.
> >
> > First I executed java -jar
> > org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init,
> to
> > generate the folder structure. The folder structure was successfully
> > generated except for the following warnings
> >
> > 16:16:20,530 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'nsogi' valid , namespace '
> http://prefix.cc/nsogi:'
> > invalid -> mapping ignored!
> > 16:16:21,279 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'category' valid , namespace '
> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'chebi' valid , namespace '
> > http://bio2rdf.org/chebi:' invalid -> mapping ignored!
> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'hgnc' valid , namespace '
> http://bio2rdf.org/hgnc:'
> > invalid -> mapping ignored!
> > 16:16:21,450 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'dbptmpl' valid , namespace '
> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'pubmed' valid , namespace '
> > http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'dbc' valid , namespace '
> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'dbt' valid , namespace '
> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'dbrc' valid , namespace '
> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'call' valid , namespace '
> > http://webofcode.org/wfn/call:' invalid -> mapping ignored!
> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> > Namespace Mapping: prefix 'affymetrix' valid , namespace '
> > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
> >
> > Then I copied the Freebase dump (freebase-rdf-latest.gz) to the
> > indexing/resources/rdfdata folder
> > and the incoming_links.txt file, generated by fbrankings-uri.sh  to
> > indexing/resources folder and executed the indexing process. (I used all
> > the default config files)
> >
> > While executing the index process I noticed the following log.
> >
> >
> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataIterable:
> null
> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityIterator:
> >
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c
> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataProvider:
> >
> org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55
> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityScoreProvider:
> null
> >
> > Finally it threw a null pointer exception as follows
> >
> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 1 files
> imported
> > in 0 seconds
> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader - Loding 0 File ...
> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 0 files
> imported
> > in 0 seconds
> > 16:38:42,912 [Thread-0] INFO  solryard.SolrYardIndexingDestination -
> ...
> > create SolrYard
> > 16:38:42,959 [main] INFO  impl.IndexerImpl -  ... delete existing
> > IndexedEntityId file
> >
> C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip
> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Initialisation completed
> > 16:38:42,974 [main] INFO  impl.IndexerImpl -   ... initialisation
> completed
> > 16:38:42,974 [main] INFO  impl.IndexerImpl - start indexing ...
> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Indexing started ...
> > Exception in thread "Indexing: Entity Source Reader Deamon"
> > java.lang.NullPointerException
> >         at java.lang.StringBuilder.<init>(Unknown Source)
> >         at
> >
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435)
> >         at
> >
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379)
> >         at
> >
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356)
> >         at
> >
> org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55)
> >         at java.lang.Thread.run(Unknown Source)
> >
> > I'm not sure if this happens because I haven't configured an important
> > property in a configuration file. I'm pretty new to Stanbol and any help
> > would be much appreciated.
> >
> > Thanks in advance.
> > --
> > Regards
> > Amindri Udugala
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>



-- 
Regards
Amindri Udugala

Re: Null pointer exception while building the index for freebase data dump

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Amindri

Based on the code the NPE could originate from a namespace prefix
unknown to the namespace prefix service.

Can you please check the data of the "incoming_links.txt" file against
mappings define in the "indexing/config/namespaceprefix.mappings"
file. My guess is that the "incoming_links.txt" uses a prefix that is
not define in the mappings file.

It is recommended to explicitly define namespace prefix mappings for
all namespaces used by the indexing process (config data and rdf
data). For missing mappings http://prefix.cc/ is used as a fallback.

best
Rupert


On Mon, Feb 9, 2015 at 7:33 AM, Amindri Udugala
<am...@gmail.com> wrote:
> Hi All,
>
> I need to create an index from a Freebase data dump. So I followed the
> instructions in the README file in entityhub\indexing\freebase.
>
> First I executed java -jar
> org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init, to
> generate the folder structure. The folder structure was successfully
> generated except for the following warnings
>
> 16:16:20,530 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'nsogi' valid , namespace 'http://prefix.cc/nsogi:'
> invalid -> mapping ignored!
> 16:16:21,279 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'category' valid , namespace '
> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'chebi' valid , namespace '
> http://bio2rdf.org/chebi:' invalid -> mapping ignored!
> 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'hgnc' valid , namespace 'http://bio2rdf.org/hgnc:'
> invalid -> mapping ignored!
> 16:16:21,450 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'dbptmpl' valid , namespace '
> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'pubmed' valid , namespace '
> http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
> 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'dbc' valid , namespace '
> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'dbt' valid , namespace '
> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'dbrc' valid , namespace '
> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'call' valid , namespace '
> http://webofcode.org/wfn/call:' invalid -> mapping ignored!
> 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
> Namespace Mapping: prefix 'affymetrix' valid , namespace '
> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
>
> Then I copied the Freebase dump (freebase-rdf-latest.gz) to the
> indexing/resources/rdfdata folder
> and the incoming_links.txt file, generated by fbrankings-uri.sh  to
> indexing/resources folder and executed the indexing process. (I used all
> the default config files)
>
> While executing the index process I noticed the following log.
>
>
> 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataIterable: null
> 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityIterator:
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c
> 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataProvider:
> org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55
> 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityScoreProvider: null
>
> Finally it threw a null pointer exception as follows
>
> 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 1 files imported
> in 0 seconds
> 16:38:40,837 [Thread-3] INFO  source.ResourceLoader - Loding 0 File ...
> 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 0 files imported
> in 0 seconds
> 16:38:42,912 [Thread-0] INFO  solryard.SolrYardIndexingDestination -    ...
> create SolrYard
> 16:38:42,959 [main] INFO  impl.IndexerImpl -  ... delete existing
> IndexedEntityId file
> C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip
> 16:38:42,974 [main] INFO  impl.IndexerImpl - Initialisation completed
> 16:38:42,974 [main] INFO  impl.IndexerImpl -   ... initialisation completed
> 16:38:42,974 [main] INFO  impl.IndexerImpl - start indexing ...
> 16:38:42,974 [main] INFO  impl.IndexerImpl - Indexing started ...
> Exception in thread "Indexing: Entity Source Reader Deamon"
> java.lang.NullPointerException
>         at java.lang.StringBuilder.<init>(Unknown Source)
>         at
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435)
>         at
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379)
>         at
> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356)
>         at
> org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55)
>         at java.lang.Thread.run(Unknown Source)
>
> I'm not sure if this happens because I haven't configured an important
> property in a configuration file. I'm pretty new to Stanbol and any help
> would be much appreciated.
>
> Thanks in advance.
> --
> Regards
> Amindri Udugala



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/