Our Software
Improving worker safety with an oil & gas company
Data analysis
Data classification
Data matching



Our client was an oil & gas company with an annual revenue of EUR 36 billion and more than 25,000 employees. They prioritized safety by implementing an incident management tool that would allow them to monitor hazards and incidents.

A key goal of this initiative was reducing lost-time injuries.

While the client made initial progress, with a 56% reduction in lost-time injuries over the course of six years, they weren't confident that their approach was allowing them to correctly identify the root causes of the incidents.

If the current incident classifications were inaccurate, their safety efforts and KPIs could be focused on the wrong things, hindering their continued improvement.


Incident reports included a free-text description of the incident, as well as a list of options from which the reporter was supposed to select the appropriate classification for the incident.

This data was not normalized. And, because the client is an international business, incident data was reported in several different languages. The client did not have the capabilities to classify incident reports based on the text description and compare these to the reporter-selected classifications, looking for discrepancies.


We leveraged our expertise in free-text classification to quickly develop a high-accuracy solution.

Automated language recognition identified each text fragment's primary language. Data normalization then prepared the data for automated classification, including detection of spelling errors and removal of word endings for root matching.

An expert from our team and domain experts from the client performed an iterative exploration of the data, using techniques like tokenizing, statistical word counts, word combination/pattern detection, fuzzy search, and contextual data analysis.

After the automated classification was complete, machine learning algorithms and cluster rules identified groups of records that described similar scenarios.

Our data-first semantic analysis allowed classification of an incident from its unstructured text alone. We identified types of incidents that were routinely classified correctly, and types that were not consistently classified.

Using a token-based classification module with a backtracking algorithm, we identified a new set of incident categories that more accurately reflected the actual incidents that occurred.


We found that the free-text portion of an incident report was usually descriptive and effective, but the selected classification was often wrong, to varying degrees.

Some types of incidents were simply classified differently by different reporters. In many cases, however, the reporter selected the first classification option available, apparently without regard to its accuracy.

With increased insight into the types of safety incidents that occurred, and the ability to investigate their root causes, the client could set new KPIs that accurately reflected their safety improvement needs and begin working toward raising their safety standards even higher.

Are you facing any data challenges? Meet with one of our experts and find out how we can help.