You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2013/11/22 12:18:35 UTC

[jira] [Resolved] (STANBOL-1214) Fix for fbranking.sh script

     [ https://issues.apache.org/jira/browse/STANBOL-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-1214.
------------------------------------------

    Resolution: Fixed

tested the new  script with "freebase-rdf-2013-11-17-00-00.gz". Works perfectly.

here the first to lines of the prodices incoming_links.txt file

8207306 m.0kpv11
3465907 m.019h
2667804 m.0775xvm
1889043 m.01xryvm
1606734 m.09jd9nh
1522976 m.05zppz
1340895 m.0jst35z
1334458 m.0g4g
1191781 m.04l
1077343 m.09gn



> Fix for fbranking.sh script
> ---------------------------
>
>                 Key: STANBOL-1214
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1214
>             Project: Stanbol
>          Issue Type: Bug
>            Reporter: Viktor Gal
>            Assignee: Rupert Westenthaler
>
> the format of freebase dump has been changed. now they contain full URIs hence the fbranking.sh for counting incoming links is obsolete. Here's a quick fix for the new dump format:
> gunzip -c db/freebase-rdf-2013-11-17.gz \ 
> | grep "^<http://rdf.freebase.com/ns/m\..*<.*>.*<http://rdf.freebase.com/ns/m\." \
> | cut -f 3 \
> | sed 's/.*\/ns\/\(.*\)>/\1/g \
> | sort -S $MAX_SORT_MEM \
> | uniq -c  \
> | sort -nr -S $MAX_SORT_MEM > $INCOMING_FILE



--
This message was sent by Atlassian JIRA
(v6.1#6144)