[okfn-discuss] ANN: DBpedia - New version of the DBpedia dataset released.
Jonathan
jonathan.gray at okfn.org
Mon Sep 10 17:51:24 UTC 2007
Sören,
Thanks for posting this.
I noticed it on the DBpedia/LOD lists a few days ago and it looks brilliant!
I've just blogged it:
http://blog.okfn.org/2007/09/10/dbpedia-20/
All the best,
Jonathan
Sören Auer wrote:
> Hi all,
>
> we released a new version of the DBpedia datasets some days ago (cf.
> Chris announcement attached). DBpedia might be one of the largest
> pieces of Open Knowledge - it comprises more than 100M facts extracted
> from Wikipedia and represented in RDF.
>
> --Sören
>
> ------------------------------------------------------------------------
>
> Subject:
> ANN: DBpedia - New version of the DBpedia dataset released.
> From:
> "Chris Bizer" <chris at bizer.de>
> Date:
> Wed, 5 Sep 2007 17:48:29 +0200
> To:
> <semantic-web at w3.org>, <dbpedia-discussion at lists.sourceforge.net>,
> "Linking Open Data" <linking-open-data at simile.mit.edu>
>
> To:
> <semantic-web at w3.org>, <dbpedia-discussion at lists.sourceforge.net>,
> "Linking Open Data" <linking-open-data at simile.mit.edu>
>
>
>
> Hi all,
>
> after quite some work into improving the DBpedia information
> extraction framework, we have released a new version of the DBpedia
> dataset today.
>
> DBpedia is a community effort to extract structured information from
> Wikipedia and to make this information available on the Web. DBpedia
> allows you to ask sophisticated queries against Wikipedia and to link
> other datasets on the Web to Wikipedia data.
>
> The DBpedia dataset describes 1,950,000 "things", including at least
> 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It
> contains 657,000 links to images, 1,600,000 links to relevant external
> web pages and 440,000 external links into other RDF datasets.
> Altogether, the DBpedia dataset consists of around 103 million RDF
> triples.
>
> The Dataset has been extracted from the July 2007 Wikipedia dumps of
> English, German, French, Spanish, Italian, Portuguese, Polish,
> Swedish, Dutch, Japanese, Chinese, Russian, Finnish and Norwegian
> versions of Wikipedia. It contains descriptions in all these languages.
>
> Compared to the last version, we did the following:
>
> 1. Improved the Data Quality
>
> We increased the quality of the data, be improving the DBpedia
> information extraction algorithms. So if you have decided that the old
> version of the dataset was too dirty for your application, please look
> again, you will be surprised :-)
>
> 2. Third Classification Schema Added
>
> We have added a third classification schema to the dataset. Beside of
> the Wikipedia categorization and the YAGO classification, concepts are
> now also classified by associating them to WordNet synsets.
>
> 3. Geo-Coordinates
>
> The dataset contains geo-coordinates for for geographic locations.
> Geo-coordinates are expressed using the W3C Basic Geo Vocabulary. This
> enables location-based SPARQL queries.
>
> 4. RDF Links to other Open Datasets
>
> We interlinked DBpedia with further open datasets and ontologies. The
> dataset now contains 440 000 external RDF links into the Geonames,
> Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP
> Bibliography and Project Gutenberg datasets. Altogether, the network
> of interlinked datasources around DBpedia currently amounts to around
> 2 billion RDF triples which are accessible as Linked Data on the Web.
>
> The DBpedia dataset is licensed under the terms GNU Free Documentation
> License. The dataset can be accessed online via a SPARQL endpoint and
> as Linked Data. It can also be downloaded in the form of RDF dumps.
>
> Please refer to the DBpedia webpage for more information about the
> dataset and its use cases:
>
> http://dbpedia.org/
>
> Many thanks for their excellent work to:
>
> 1. Georgi Kobilarov (Freie Universität Berlin) who redesigned and
> improved the extraction framework and implemented many of the
> interlinking algorithms.
> 2. Piet Hensel (Freie Universität Berlin) who improved the infobox
> extraction code, wrote the unit test suite.
> 3. Richard Cyganiak (Freie Universität Berlin) for his advice on
> redesigning the architecture of the extraction framework and for
> helping to solve many annoying Unicode and URI problems.
> 4. Zdravko Tashev (OpenLink Software) for his patience to try several
> times to import buggy versions of the dataset into Virtuoso.
> 5. OpenLink Software altogether for providing the server that hosts
> the DBpedia SPARQL endpoint.
> 6. Sören Auer, Jens Lehmann and Jörg Schüppel (Universität Leipzig)
> for the original version of the infobox extraction code.
> 7. Tom Heath and Peter Coetzee (Open University) for the RDFS version
> of the YAGO class hirarchy.
> 8. Fabian M. Suchanek, Gjergji Kasneci (Max-Plank-Institut
> Saarbrücken) for allowing us to integrate the YAGO classification.
> 9. Christian Becker (Freie Universität Berlin) for writing the
> geo-coordinates and the homepage extractor.
> 10. Ivan Herman, Tim Berners-Lee, Rich Knopman and many others for
> their bug reports.
>
> Have fun exploring the new dataset :-)
>
> Cheers
>
> Chris
>
> --
> Chris Bizer
> Freie Universität Berlin
> Phone: +49 30 838 54057
> Mail: chris at bizer.de
> Web: www.bizer.de
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss
>
More information about the okfn-discuss
mailing list