Application of Natural Language Processing in Healthcare and Medical


At Evenset, we build complex custom software solutions for companies in healthcare system to analyses the vast amount of unstructured free text data. The scope of these projects varies from designing Natural Langugae Processing (NLP) pipeline to extract meaning from free texts by leveraging machine learning algorithms. In the next section, we discuss recent application of NLP in healthcare, medical and pharmaceutical industries.

Applications and usage of NLP in healthcare

With the recent emergence of Electronic Medical Record (EMR) platforms, the whole healthcare system in most of the countries have started switching from pen and paper copies of patients and hospital operational information to more digitized information storage systems. There have been some initiatives by organizations like HL7 to standardize the data capture and transfer protocols, however, each organization or health entity has adopted a custom version of those protocols to meet their operational requirements. We can categorize patients’ health data into two types:

  • Structured data
  • Unstructured data

Structured data refers to those types of data which has predefined structure, type and purpose in the database, such age, sex, name. Unstructured data, on the other hand, are those fields that although have a purpose in the system but the operator or the healthcare provider can essentially enter anything they want without any restriction or limitation, such as physicians notes or lab results for a patient.

De-identification of unstructured text data

Since unstructured data has no restrictions and limitations, they can contain protected health information (PHI), or personal identifiable data (PII), such as patient name, which consequently can jeopardize patients’ privacy during data transfer or release between different health entities. The first solution to tackle this problem is using predefined dictionaries, such as dictionaries of names, last names, zip codes and occupations to track and mask patient sensitive information. However, this solution has low accuracy and the number of false positives are often high.

We can benefit from Natural language processing (NLP) in conjunction with machine learning techniques to automatically parse the unstructured free text data and detect PII or PHI. Named Entity Recognition (NER) and text classifications are two types of algorithms in this sector. Dealing with healthcare data is challenging especially because of technical terms which are not present in the common NLP and ML models and also there is not enough publicly available data to train our models.

Question answering systems

Family doctors check their medical references 10-12 times a day, for questions such as: “What is the recommended first-line agent for sepsis?”

They currently use information summary products such as UpToDate that hire physicians and specialists to shorten the most recent updates in the field for providers at the point of care, but it still takes around 5 minutes for the physicians to find their answers every time they go to a solution such as UpToDate. This translates to around 10 hours a month, spent on reference checks.

Tali, subscribes to medical journals, references and government guidelines; and uses Natural Language Processing techniques to reduce the time spent finding the answer to a matter of seconds for the family doctors.

Tali saves time for the providers to have more face time with their patients, it is a lot easier to use compared to their current solutions and reduces the number of medical errors.

The same engine can be used on top of contents of famous providers such as Mayo Clinic or webMD to help patients to find the answers to their questions faster.


Chatbot is as an interface on which the users interact with the computer system by typing and chatting. Chatbots can either be powered by NLP which are often called conversational chatbots which can hold a conversation or to a system that simply facilitates the user interaction by asking questions and gathering information.

The global chatbot market was around 1.17B USD in 2018 and is will reach 10.09 Billion by year 2026, at a CAGR of 30.09%. The global natural language processing market can potentially grow at a CAGR of 20% during the forecast period with its market size predicted to reach 23B USD by 2024.

In terms of applications, chatbots have been very successful in the customer support role in other industries. In healthcare, this translates into tools to capture pre-visit or post visit information, follow up with patients and monitor and evaluate patients adherence to the certain treatment. This will significantly reduce the time and costs of the health provider and puts the patient at the center of the care.

Patient Identification for Improved Care

EMR systems often buries patients’ social and economical status, psychological conditions, behaviors and habits are often buried inside the unstructured data. It is almost impossible to search for patients who are prone to or have a history of substance abuse, diabetes, frailty or trauma by only looking at disease or diagnosis codes in the EMR systems. However, physicians and care providers often write down their observations in each visit. we use NLP to analyze and extract trend and meaning from the unstructured data in order to design a treatment for a certain cohort of patients or developing indices for prevention of certain conditions in higher risks patients.

Post Your Comments

Your email address will not be published. Required fields are marked *