Welcome to semanticizest’s documentation!

semanticizest (Semanticizer, STandalone) is a library for entity linking (also known as wikification.)

Quick usage

First we need to create a model for the semanticizer. The following command will download and read a wikipedia dump (in this case the Limburgish wiki) and subsequently create and store the corresponding model.

python -m semanticizest.parse_wikidump --download liwiki liwiki.model

Import the required modules:

>>> import re
>>> from semanticizest import Semanticizer

Load the model from disk:

>>> sem = Semanticizer('liwiki.model')

Set up a piece of sample text and tokenize it:

>>> text = """'ne Donjon is 'ne middeleëuwse, zjwaore, verdedigingstaore,
... meistal geboewd óp 'ne hoage heuvel, de opperhaof, dae deil oetmaak van 'n
... motte."""
>>> tokens = re.findall('\w+', text)

Feed the tokens to the semanticizer to get the entity link candidates:

>>> for cand in sem.all_candidates(tokens):
...     print cand
(7, 8, u'Taore (boewwerk)', 1.0)
(13, 14, u'Heuvel', 1.0)
(15, 16, u'Opperhaof', 1.0)
(21, 22, u'Motte', 1.0)

As we can see, It finds four entity candidates in this short text. The first entity found is ‘Taore (boewwerk)’, corresponding to the seventh token: ‘verdedigingstaore’.

Indices and tables

Table Of Contents

Next topic

API Reference

This Page