Investment component: data-to-text

The investment component subproject applies the data-to-text method to obtain texts in different genres and from different sources related to similar situations. This method was developed in the NWO-Spinoza-project Understanding-Language-By-Machines and was used to organize the SemEval-2018 Task-5 on Counting events and participants. The method starts from registries of real-world events with structured data and looks for so-called reference texts that report on those specific events. There are many such databases, manually created by people, that contain thousands of incidents of different types such as calamities, crime, sports, economic transactions, and labor conflicts. In some cases, these databases also have collections of news articles and blogs that specifically report on these events. This allows us to create formal representations of the events as “microworlds” and pair these with reference texts. Microworlds represent the minimal referential data that capture the implications of the event, e.g. what type of event happened to whom, when and where. The reference texts (news, blogs) at least mention aspects of these microworlds and will exhibit the typical ways of framing these events. For the SemEval-2018 task, thousands of incidents on gun-violence, fire incidents (burning) and people being fired were collected with their reference texts on each specific incident.