Mask Framework: De-identifying unstructured free-text medical data

Privacy protection through data protection and screening is very important in today’s information dominant world. For structured data involving a formatted repository in a database, it a simple matter. However, unstructured quantitative data, requires more sophisticated methods of processing and analysis.

For many organizations, dealing with unstructured data is a prominent part of their operation. Therefore finding a means of analysing and de-identifying confidential information from their data-sets is a crucial undertaking. One such organization, facing the complex challenge of organizing and analyzing their unstructured data turned to Evenset’s machine learning specialities to find a solution. 

Initially, a SAS based macro system was developed to de-identify (encode) text-based results. The system was a rule-based application aimed at identifying and masking individual name, post box number, phone number, email address, any 10-digit number (OHIP or SIN), unit/apartment number, postal code, and street number. The system was an answer to the immediate needs, however, it lacked complex analysis techniques using applied data science for more efficient and accurate use. 

Our Solution

Evenset came on-board to enhance the processes and build an open-source platform that could be used to de-identify private information for flexible masking of personal information, ensuring that de-identified medical text still contained enough information to facilitate research. Evenset software specialists began a large scale cataloguing of information and identifiers to cover all fields of data comprehension and scenario analysis.

The next step was devising a de-identification system that would assess the different types of personal identifiers in free text data, the ideal approach to masking and disclosure risk analysis on the data.  Evenset takes great pride in building solutions that upgrade and update the methods of others, bringing forth a new age of ability, efficiency and security.


