Automatic text analysis in pharmaceutical industry

German Pharmaceutical industry

Effective work in the pharmaceutical industry often relies on bringing new technologies to analyze huge amounts of data available to find the best ways of creating pharmaceuticals and analyzing the feedback from the patients and medical professionals to answer their needs in a shorter time. This information often contains sensitive health information, which requires the effective de-identification of medical texts before applying further pharmacy AI scenarios.


Our NLP experts have been supporting on the daily basis many R&D and production scenarios such as:

• deidentification of medical notes allowing for further data analysis without additional permissions,
• finding new adverse events of drugs in huge amounts of social media and internet posts,
• tracking and analyzing the sentiment of public posts during the entrance of a new drug to the market, as well as detecting the context in which particular groups are discussing it
• automatic detection of medical entities in texts such as drug names or doses
• management and optimization of processing pipelines in a production big data environment


During the projects, we have been working with a multitude of languages such as German, English, French, Italian and Spanish, which caused the need of building advanced preprocessing and NLP algorithms able to work with all of them. Social media and online communication often contain many abbreviations or colloquialisms, which connected with the specific medical language posed another challenge that we needed to face during the projects.