In this data release, we provide the first annotations of referentially grounded texts, so-called reference text, with referential relations to structured data and FrameNet style semantic roles. The annotations target specific types of events and documents that make reference to the same event instances of the same type, e.g. “mass shooting”, “disease outbreak”, “auto races”. The data consists of:

  • structured data derived from Wikidata in JSON following SEM format
  • annotated reference texts in English and Dutch in XML following the NAF format
  • Dutch FrameNet lexicon in JSON format

The annotated output consists of 214 annotated reference texts, 172 Dutch and 42 English. 18960 mentions were annotated with 6066 tokens of 1973 different lexical units, covering 486 different frames (avg. 12.5 annotations per frame). In order to enable correct frame annotation, 1205 (19.8%) mentions received markable correction (avg. 5.5 per text): 393 multi-words and 812 compounds. Also, 5068 (26,7%) of these mentions were annotated with instance- links.

The data can be used freely under the Creative Commons CC BY 4.0 license. 

Have you ever wondered how the same event is described in different languages? Then this dataset might be useful to you.
From Wikidata, we’ve selected 25 event types, e.g., military operation (see the paper Section 6 for more information).
In total, we collected 19,979 Wikidata items that belong to these 25 event types.
For each Wikidata item, we attempted to retrieve the first paragraph of the Wikipedia page describing the Wikidata item.
We included English, Italian, and Dutch texts, which we processed using various NLP systems.
Also, we represent structured data about each Wikidata item, which facilitates research into the framing of events.

