Automating GDPR-compliant data anonymization for market research

Case Studies / InSaaS.ai

PROFESSIONAL SERVICES · DATA PRIVACY & GDPR

Automating GDPR-compliant data anonymization for market research at scale

InSaaS.ai processes massive volumes of textual data from social media, online forums, and customer sources for market research and product planning. Every dataset potentially contains personally identifiable information that must be removed before analysis – manually reviewing it all was impossible at their scale. We integrated ShareMedix as an anonymization API that automatically detects and masks PII across their entire data pipeline.

Client: InSaaS.ai – a data analytics company specializing in marketing, market research, and product development insights.

KEY RESULTS

GDPR

Full compliance with European data privacy regulations

API

Integrated directly into existing data pipelines

Parallel

Handles multiple concurrent anonymization requests at scale

Auto

Names, addresses, phone numbers, IBANs detected and masked

INDUSTRY

Professional Services / Market Research

USE CASE

PII anonymization in text data

AI APPROACH

NLP-based PII detection

DATA SOURCES

Social media, forums, customer data

INTEGRATION

API (parallel processing)

PRODUCT

Automating GDPR-compliant data anonymization for market research at scale - theblueai - Professional Services · Data Privacy & GDPR

The challenge

InSaaS.ai builds data analytics solutions for marketing, market research, and product development. Their work depends on analyzing large volumes of textual data – social media posts, online forum discussions, customer feedback, and internal datasets. The insights hidden in this data are valuable, but the data itself is full of personally identifiable information: names, addresses, phone numbers, IBAN numbers, and other sensitive details.

Under GDPR, this data cannot be processed or analyzed in its raw form. Every piece of PII must be detected and removed before the data enters InSaaS.ai’s analytics pipeline. Doing this manually was not an option – the volume of data was far too large, the variety of PII types too broad, and the regulatory stakes too high for a manual process.

The core problem: InSaaS.ai needed to make vast amounts of textual data GDPR-compliant before it could be analyzed – but the volume and variety of personally identifiable information made manual anonymization impossible at their scale.

What we built

We integrated ShareMedix – our NLP-powered data anonymization engine – directly into InSaaS.ai’s data processing pipeline via API. The system automatically detects and masks PII in textual data before it reaches the analytics layer.

Automatic PII detection across data types. ShareMedix identifies names, addresses, phone numbers, IBAN numbers, email addresses, and other personally identifiable information using advanced NLP techniques. The system handles data from diverse sources – social media posts written in informal language, structured customer records, and everything in between.

API-first integration. Rather than a standalone tool requiring manual operation, ShareMedix was deployed as an API that InSaaS.ai calls programmatically within their existing data pipelines. Data flows in, anonymized data flows out – no manual steps, no separate interface.

Parallel processing at scale. Market research datasets can be massive. The anonymization API handles multiple concurrent requests, processing large volumes of text without becoming a bottleneck in InSaaS.ai’s data preparation workflow.

Customizable anonymization rules. Different data sources and use cases require different handling. ShareMedix supports white lists, black lists, and configurable anonymization rules — allowing InSaaS.ai to fine-tune what gets masked and what gets preserved, adapting as regulations and data sources evolve.

The results

BEFORE

PII scattered throughout massive text datasets. Manual review impossible at scale. GDPR compliance difficult to guarantee. Data utilization limited by privacy risk.

AFTER

Automated PII detection and masking via API. GDPR compliance built into the data pipeline. Full datasets available for analysis without privacy risk. No manual intervention required.

InSaaS.ai can now process and analyze their entire data volume with confidence that GDPR compliance is maintained automatically. The anonymization step, previously a manual blocker that limited what data could be used, became a seamless, invisible part of the pipeline.

This also unlocked data that was previously too risky to use. Customer data and other sensitive sources that would have required extensive manual review can now be anonymized automatically and fed into analytics workflows, expanding the scope of insights InSaaS.ai can deliver to their clients.

Technology used

Natural Language Processing Named Entity Recognition PII Detection API Integration
ShareMedix Platform GDPR Compliance

More Case Studies

See how we’ve helped other companies

Case stzdy photo of luxury-automotive-manufacturer example car

AUTOMOTIVE · LEADING LUXURY MANUFACTURER

Intelligent virtual assistant replacing manual planning queries across SAP and cloud systems

Product planners spent hours manually querying SAP BW and multiple data warehouses for every decision. We built a bilingual voice-and-text assistant that retrieves planning data on demand – no system expertise needed.

NLP Azure SAP BW Hybrid Cloud

Hrs → Sec

Data retrieval

DE + EN

Voice & text

SAP BW

Integrated

MANUFACTURING · RADAWAY

Making email-based order processing reliable with LLMs

Staff were manually reading customer emails, identifying products, and entering orders by hand. We turned a promising AI prototype into a production system that handles it end to end, across languages, formats, and attachments.

LLM Semantic Matching Prompt Engineering

-90%

Manual intervention

95%+

Match accuracy

case study - Cutting security questionnaire completion from one month to one week with GenAI - medtech

LOGISTICS · FR. MEYER’S SOHN

Eliminating manual data extraction from thousands of daily shipping emails

Operations staff were manually reading German and English logistics emails to pull out routing and scheduling data, every single day. We built an AI pipeline that extracts, structures, and delivers the data automatically.

GPT-4 Email Processing On-premise

–80%

Manual effort

2 langs

DE & EN

On-prem

Deployed

Tell us which process is costing you the most

We start with a focused process analysis – you see exactly what’s possible before committing to project implementation.






    Data Controller Information: The controller of your personal data is theBlue.ai GmbH, headquartered in Hamburg, Germany. By submitting this form, you consent to the processing of your personal data for the purpose of responding to your inquiry. You may withdraw your consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal. Based on our legitimate interest, we may also send you information about our services and solutions, but only if it relates to the topic of your message. If you prefer not to receive such communications, you have the right to object at any time. For more details on how we handle your personal data and your rights, please refer to our Information Clause and Privacy Policy.

    * Required fields.