[okfn-discuss] ANN: DBpedia - New version of the DBpedia dataset released.

Mon Sep 10 17:51:24 UTC 2007

Sören,

Thanks for posting this.

I noticed it on the DBpedia/LOD lists a few days ago and it looks brilliant!

I've just blogged it:

  http://blog.okfn.org/2007/09/10/dbpedia-20/

All the best,

Jonathan

Sören Auer wrote:
> Hi all,
>
> we released a new version of the DBpedia datasets some days ago (cf. 
> Chris announcement attached). DBpedia might be one of the largest 
> pieces of Open Knowledge - it comprises more than 100M facts extracted 
> from Wikipedia and represented in RDF.
>
> --Sören
>
> ------------------------------------------------------------------------
>
> Subject:
> ANN: DBpedia - New version of the DBpedia dataset released.
> From:
> "Chris Bizer" <chris at bizer.de>
> Date:
> Wed, 5 Sep 2007 17:48:29 +0200
> To:
> <semantic-web at w3.org>, <dbpedia-discussion at lists.sourceforge.net>, 
> "Linking Open Data" <linking-open-data at simile.mit.edu>
>
> To:
> <semantic-web at w3.org>, <dbpedia-discussion at lists.sourceforge.net>, 
> "Linking Open Data" <linking-open-data at simile.mit.edu>
>
>
>
> Hi all,
>
> after quite some work into improving the DBpedia information 
> extraction framework, we have released a new version of the DBpedia 
> dataset today.
>
> DBpedia is a community effort to extract structured information from 
> Wikipedia and to make this information available on the Web. DBpedia 
> allows you to ask sophisticated queries against Wikipedia and to link 
> other datasets on the Web to Wikipedia data.
>
> The DBpedia dataset describes 1,950,000 "things", including at least 
> 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It 
> contains 657,000 links to images, 1,600,000 links to relevant external 
> web pages and 440,000 external links into other RDF datasets. 
> Altogether, the DBpedia dataset consists of around 103 million RDF 
> triples.
>
> The Dataset has been extracted from the July 2007 Wikipedia dumps of 
> English, German, French, Spanish, Italian, Portuguese, Polish, 
> Swedish, Dutch, Japanese, Chinese, Russian, Finnish and Norwegian 
> versions of Wikipedia. It contains descriptions in all these languages.
>
> Compared to the last version, we did the following:
>
> 1. Improved the Data Quality
>
> We increased the quality of the data, be improving the DBpedia 
> information extraction algorithms. So if you have decided that the old 
> version of the dataset was too dirty for your application, please look 
> again, you will be  surprised  :-)
>
> 2. Third Classification Schema Added
>
> We have added a third classification schema to the dataset. Beside of 
> the Wikipedia categorization and the YAGO classification, concepts are 
> now also classified by associating them to WordNet synsets.
>
> 3. Geo-Coordinates
>
> The dataset contains geo-coordinates for  for geographic locations. 
> Geo-coordinates are expressed using the W3C Basic Geo Vocabulary. This 
> enables location-based SPARQL queries.
>
> 4. RDF Links to other Open Datasets
>
> We interlinked DBpedia with further open datasets and ontologies. The 
> dataset now contains 440 000 external RDF links into the Geonames, 
> Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP 
> Bibliography and Project Gutenberg datasets. Altogether, the network 
> of interlinked datasources around DBpedia currently amounts to around 
> 2 billion RDF triples which are accessible as Linked Data on the Web.
>
> The DBpedia dataset is licensed under the terms GNU Free Documentation 
> License. The dataset can be accessed online via a SPARQL endpoint and 
> as Linked Data. It can also be downloaded in the form of RDF dumps.
>
> Please refer to the DBpedia webpage for more information about the 
> dataset and its use cases:
>
> http://dbpedia.org/
>
> Many thanks for their excellent work to:
>
> 1. Georgi Kobilarov (Freie Universität Berlin) who redesigned and 
> improved the extraction framework and implemented many of the 
> interlinking algorithms.
> 2. Piet Hensel (Freie Universität Berlin) who improved the infobox 
> extraction code, wrote the unit test suite.
> 3. Richard Cyganiak (Freie Universität Berlin) for his advice on 
> redesigning the architecture of the extraction framework and for 
> helping to solve many annoying Unicode and URI problems.
> 4. Zdravko Tashev (OpenLink Software) for his patience to try several 
> times to import buggy versions of the dataset into Virtuoso.
> 5. OpenLink Software altogether for providing the server that hosts 
> the DBpedia SPARQL endpoint.
> 6. Sören Auer, Jens Lehmann and Jörg Schüppel (Universität Leipzig) 
> for the original version of the infobox extraction code.
> 7. Tom Heath and Peter Coetzee (Open University) for the RDFS version 
> of the YAGO class hirarchy.
> 8. Fabian M. Suchanek, Gjergji Kasneci (Max-Plank-Institut 
> Saarbrücken) for allowing us to integrate the YAGO classification.
> 9. Christian Becker (Freie Universität Berlin) for writing the 
> geo-coordinates and the homepage extractor.
> 10. Ivan Herman, Tim Berners-Lee, Rich Knopman and many others for 
> their bug reports.
>
> Have fun exploring the new dataset :-)
>
> Cheers
>
> Chris
>
> -- 
> Chris Bizer
> Freie Universität Berlin
> Phone: +49 30 838 54057
> Mail: chris at bizer.de
> Web: www.bizer.de
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss
>