DBpedia Spotlight
1. Shedding Light on the Web of Documents
DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. DBpedia Spotlight performs named entity extraction, including
entity detection and
Name Resolution (a.k.a. disambiguation). It can also be used for building your solution for
Named Entity Recognition, amongst other information extraction tasks.
Text annotation has the potential of enhancing a wide range of applications, including search, faceted browsing and navigation. By connecting text documents with DBpedia, our system enables a range of interesting use cases. For instance, the ontology can be used as background knowledge to display complementary information on web pages or to enhance information retrieval tasks. Moreover, faceted browsing over documents and customization of web feeds based on semantics become feasible. Finally, by following links from DBpedia into other data sources, the Linked Open Data cloud is pulled closer to the Web of Documents.
Take a look at our Known Uses page for other examples of how DBpedia Spotlight can be used. If you use DBpedia Spotlight in your project, please add a link to
here. If you use it in a paper, please use the citation available in the end of this page.
1. Online Access
You can try out DBpedia Spotlight through our Web Application or Web Service endpoints. The Web Application is a user interface that allows you to enter text in a form and generates an HTML annotated version of the text with links to DBpedia. The Web Service endpoints provide programmatic access to the demo, allowing you to retrieve data also in XML or JSON. Example calls are displayed below.
1.1. Web Application
1.2. Web Service
The available service endpoints are listed below and described in more details in the User's Manual,.
-
http://spotlight.dbpedia.org/rest/annotate
-
http://spotlight.dbpedia.org/rest/disambiguate
-
http://spotlight.dbpedia.org/rest/candidates
The WADL (Web Application Description Language) file describing our endpoint is available at:
http://spotlight.dbpedia.org/rest/application.wadl
Example 1: Simple request
- text= President Obama called Wednesday on Congress to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance.
- confidence = 0.2; support=20
- whitelist all types.
Example 2: Using SPARQL for filtering
This example demonstrates how to keep the annotations constrained to only politicians related to Chicago.
- text= President Obama called Wednesday on Congress to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance.
- confidence = 0.2; support=20
- whitelist sparql = SELECT DISTINCT ?politician WHERE { ?politician a <
http://dbpedia.org/ontology/OfficeHolder> . ?politician ?related <
http://dbpedia.org/resource/Chicago> }
Notice: Due to system resources restrictions, for this demo we only use the first 2000 results returned for each query (default for the public DBpedia SPARQL endpoint). However you are welcome to download the software+data and install in your server for real world use cases.
Attention: Make sure to encode your SPARQL query before adding it as the value of the &sparql; parameter – see
java.net.URLEncoder.encode().
2. Documentation
We split the documentation according to the depth at which we give explanations. Please feel free to take a look at our:
- User's Manual, if you are not interested in details of how things happen, but you would like to use the system in your website or software project.
- Technical Documentation, if you want to have an overview of technical details before you go into the source code.
-
Source code, if you really want to know every detail, our source code is open, free and loves to meet new people. ;)
3. Downloads
If you are interested in running DBpedia Spotlight in your own server, or join our development effort, we please check our download and installation instructions. DBpedia Spotlight is downloadable from its
project page on Sourceforge. The latest Java / Scala source code is available from the project's
Subversion repository and can be
browsed online. The latest stable build is
0.5, but if you feel adventurous, you feel free to try
trunk. Since DBpedia Spotlight uses the entire Wikipedia in order to learn how to annotate DBpedia Resources, the entire dataset cannot be distributed alongside the code, and can be downloaded in varied sizes from the download page. A tiny dataset is included in the distribution for demonstration purposes only.
3.1. Quickstart
Download DBpedia Spotlight jar:
Download a default configuration file:
Download necessary data files:
wget
# Download tiny idnex
wget
tar zxvf index.tgz
# Download pos tagger model
wget
# Run the Server class in the jar
java -cp dbpedia-spotlight-0.5.jar org.dbpedia.spotlight.web.rest.Server server.properties
The files you've downloaded above contain only a very small subset of the DBpedia resources. They are used to demonstrate DBpedia Spotlight in a lightweight environment. Please see our downloads page for more information on other alternatives that are more useful in real world scenarios.
4. Licenses
The program can be used under the terms of the
Apache License, 2.0.
Part of the code uses
LingPipe under the
Royalty Free License. Therefore, this license also applies to the output of the currently deployed web service.
The documentation on this website is shared as
Creative Commons Attribution-ShareAlike 3.0 Unported License
5. Citation
If you use this work on your research, please cite:
Pablo N. Mendes, Max Jakob, Andr�s Garc�a-Silva and Christian Bizer.
DBpedia Spotlight: Shedding Light on the Web of Documents. In the Proceedings of the 7th International Conference on Semantic Systems (I-Semantics). Graz, Austria, 7�9 September 2011.
title = {DBpedia Spotlight: Shedding Light on the Web of Documents},
author = {Pablo N. Mendes and Max Jakob and Andr\'{e}s Garc\'{i}a-Silva and Christian Bizer},
year = {2011},
booktitle = {Proceedings of the 7th International Conference on Semantic Systems (I-Semantics)},
abstract = {Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.}
}
The corpus used to evaluate DBpedia Spotlight in this work is described here.
6. Support and Feedback
The best way to get help with DBpedia Spotlight is to send a message to our
mailing list at dbp-spotlight-users@lists.sourceforge.net.
You can also join the #dbpedia-spotlight IRC channel on Freenode.
We'd love if you gave us some feedback.
pablo reviewed dbp-spotlight: Thumbs up
Thumbs up:
Well, I'm co-founder and active developer in this project. :)
7. Team
The DBpedia Spotlight team includes the names cited below. Individual contributions are acknowledged in the source code and publications.
7.1.1. Maintainers
Pablo Mendes (Freie Universit�t Berlin), Jun 2010-present.
Max Jakob (Freie Universit�t Berlin), Jun 2010-Sep 2011.
Jo Daiber (Charles University in Prague), Mar 2011-present.
Prof. Dr.
Chris Bizer (Freie Universit�t Berlin), supervisor, Jun 2010-present.
7.1.2. Collaborators
Andr�s Garc�a-Silva (Universidad Polit�cnica de Madrid), Jul-Dec 2010.
Rohana Rajapakse (Goss Interactive Ltd.), Oct-2011.
8. Acknowledgements
This work has been funded by:
- Neofonie GmbH, a Berlin-based company offering leading technologies in the area of Web search, social media and mobile applications (
http://www.neofonie.de/). (Jun / 2010 to Jun / 2011)
- The European Commission through the project LOD2 – Creating Knowledge out of Linked Data (
http://lod2.eu/). (Jun / 2010 to present)
Information
Last Modification:
2012-04-25 15:30:03 by Pablo Mendes