You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2012/04/27 12:52:05 UTC

CrunchBase data in RDF?

CrunchBase [1] (I am sure you all know) is an interest website (with APIs). It
is a great source of information about companies, people, financial
organizations, service providers, funding rounds and acquisitions. The website
is very useful already as it is if you want to search info about a company or
browse around. However, it is not possible, for example: to search for trends in
funding, movements of people between companies, etc.

They have an API and all data is available in JSON format. It's quite easy to
crawl and extract what you want. A conversion of this data in RDF would be quite
useful to people wanting to do some CrunchBase data mining/analysis.

I started writing a crunchbase2rdf crawler/conversion tool using Apache Jena (of
course!) and JSoup. The main code for crawling and converting the data is there,
however it is incomplete and just an initial hack.

Help on data modeling, suggestions on RDF vocabularies to use (other than FOAF,
DC, ...) and writing more RDFExtractors is welcome. And this is the reason why I
am posting this message on jena-users ml.

An RDFExtract is very easy to write, here is one:

public class TwitterRdfExtractor extends AbstractRdfExtractor {
	public TwitterRdfExtractor() { super("twitter_username"); }
	@Override
	public Model extract ( Resource subject, JSON json ) {
		Model model = ModelFactory.createDefaultModel();
		Object object = json.object().get(name());
		if ( object != null ) {
			String username = object.toString().trim();
			if ( username.length() > 0 ) {
				model.add(subject, ResourceFactory.createProperty(Run.CRUNCHBASE_NS,
name()), username);
			}			
		}
		return model;
	}
}

The crawler will automatically trigger the execution of this if the JSON
document has a field named "twitter_username". Maybe this is overcomplicated
and something easier/simpler is better.

Do you have a generic JSON to RDF conversion code in Java?

Of course, in an ideal world CrunchBase would publish a data dump or a public
SQL/SPARQL (or any other query language they chose) endpoint. So that people
interested can explore their data as they wish.

Last but not least, see also:

 - http://bnode.org/blog/2008/07/29/semantic-web-by-example-semantic-crunchbase
 - http://cb.semsol.org/ (yep... not there, unfortunately)

Paolo

PS:
Benji, you should really resurrect Semantic CrunchBase and have time to work on
it. ;-)

 [1] http://www.crunchbase.com/

Re: CrunchBase data in RDF?

Posted by Paolo Castagna <ca...@googlemail.com>.

Andy Seaborne wrote:
> 
>> Do you have a generic JSON to RDF conversion code in Java?
> 
> Have your tried using JSON-LD to "upgrade" to RDF?
> 
> https://github.com/tristan/jsonld-java

Hi Andy,
interesting, I did not know about this project.

I gave it a try:

  Model model = ...
  InputStream inputStream = ...
  Object jsonObject = JSONUtils.fromInputStream(inputStream);
  JSONLDProcessor processor = new JSONLDProcessor();
  JenaTripleCallback callback = new JenaTripleCallback();
  callback.setJenaModel(model);
  processor.triples(jsonObject, callback);

Thanks for the link.

Paolo

> 
>     Andy
> 
>

Re: CrunchBase data in RDF?

Posted by Andy Seaborne <an...@apache.org>.

> Do you have a generic JSON to RDF conversion code in Java?

Have your tried using JSON-LD to "upgrade" to RDF?

https://github.com/tristan/jsonld-java

	Andy

Re: CrunchBase data in RDF?

Posted by Paolo Castagna <ca...@googlemail.com>.

Paolo Castagna wrote:
> I started writing a crunchbase2rdf [2] crawler/conversion tool using Apache Jena (of
> course!) and JSoup. The main code for crawling and converting the data is there,
> however it is incomplete and just an initial hack.

Ops, sorry... missing link! :-)

Paolo

 [2] https://github.com/castagna/crunchbase2rdf