Resume Analysis using NLP
In this article, I will be explaining all that you need to know to perform analysis on your resume with the help of Natural Language Processing.
About 90% of the leading industries’ Human Resource departments will use ATS (Application Tracking System) to filter suitable candidates automatically from large application pool before a resume is analyzed by a human representative, leading to less manpower and more efficiency. Considering this, it has been imperative for candidates to know the working understanding of ATS in order to be considered any job role in the competitive market. The methods introduced in this article not only helps candidates to improve upon the skills required in a job but also to any employer looking for the right candidates for respective job roles.
Table of Contents:
- Spellchecker
- Missing keywords
- Performing text summarization
- Skill-wise Classification
Spellchecker: To make sure resume is in correct format lets start with implementing spellchecker.
#Importing required packages
import PyPDF2
import docx2txt
import pandas as pd#Loading resume which is in word format
resume=docx2txt.process('res.docx')
word = resume.split()from spellchecker import SpellChecker
spell = SpellChecker()#Function for spell check
def spell_check(list):
spell_mistake= False
for i in list:
if i.isalpha()==True:
if i[0].isupper():
exit
else:
i=i.lower()
word=spell.correction(i)
if i!= word:
print('wrong spelling: ',i,'\nSuggestions are as follow :',word)
spell_mistake= True
if spell_mistake==False:
print('No spelling mistakes, good to go..')spell_check(word)
Missing Keywords: To increase our chance of getting selected there is a need to check if all the required keywords mentioned in job description are matched with the resume.
from gensim.summarization import keywords#Prompt for the Job description.
jd = input("Enter the job description: ")#important keywords of the job description
a = keywords(jd, ratio=0.7)#important keywords of the resume
b = keywords(resume, ratio =0.7)c = []
for i in a.split('\n'):
for j in i.split(' '):
c.append(j)
print(c)d = []
for i in b.split('\n'):
for j in i.split(' '):
d.append(j)
print(d)#Function to see the missing keywords
present = []
absent = []
for i in c:
if i in d:
present.append(i)
else:
absent.append(i)
print("Present words are: ", present)
print("Absent words are: ", absent)
Performing text summarization: Sometimes resumes can be of more than one page or a lot of information can be present in a single page. It might not be ideal to read the entire document and hence an ATS system use a shortened version of the existing resume in order to find similarities among the resume and the job description.
from gensim.summarization.summarizer import summarize#Summarizing resume
text_resume = str(resume)
resume1 = summarize(text_resume, ratio=0.1)
print(resume1)
What is text classification?
Text classification is a process of classifying/categorizing textual data into groups. After categorizing the data into groups and using techniques of NLP for text classifiers, it than assigns predefined tags/score to each of the group based on the data and then analyzes it.
Skill-wise Classification: Used Text classification to apply on unstructured resume and to create dictionaries from a well-defined job description based on various skills required by an entry-level data analyst / data scientist as well as assign scores to each skill based on the requirement.
#Loading required packages
import textract
import matplotlib.pyplot as plt
%matplotlib inline
from spacy.matcher import Phrase Matcher
import re
import stringtext=""
count=0
num_pages = 1while count < num_pages:
pageObj = pdfReader.getPage(count)
count +=1
text += pageObj.extractText()#Formatting the text
text = text.lower()
text = re.sub(r'\d+','',text)
text = text.translate(str.maketrans('','',string.punctuation))#Function to calculate scores of each specification in Data Science based on the keywords in the resume and dictionary. de= 0
#data mining
dm = 0
#cloud computing
cc = 0
#ML and AI
mlai = 0
#data visualization
dv = 0scores = []for area in terms.keys():
if area == 'Data Engineering & Warehousing':
for word in terms[area]:
if word in text:
de +=1
scores.append(de)
elif area == 'Data Mining & Statistical Analysis':
for word in terms[area]:
if word in text:
dm +=1
scores.append(dm)
elif area == 'Cloud & Distributed Computing':
for word in terms[area]:
if word in text:
cc +=1
scores.append(cc)
elif area == 'ML & AI':
for word in terms[area]:
if word in text:
mlai +=1
scores.append(mlai)
else:
for word in terms[area]:
if word in text:
dv +=1
scores.append(dv)#Summary of scores for different specifications
summary = pd.DataFrame(scores,index=terms.keys(),columns=['score']).sort_values(by='score',ascending=False)
summary#Pie chart that shows different skills of an applicant
pie = plt.figure(figsize=(7,7))
plt.pie(summary['score'], labels=summary.index, explode = (0.07,0,0,0,0), autopct='%1.0f%%',shadow=True,startangle=90)
plt.title('Classification of Skills')
plt.axis('equal')
plt.show()
Wrapping Up
The methods implemented ranging from, text summarization, text classification, missing keywords and spell checker aimed to increase the predicting power of getting rejected or accepted into a job more accurately in the future for the candidates. This will be helpful during a hiring freeze, recession, for new graduates and unpredictable pandemics. In addition, it is advantageous for an employer to choose the right candidate in terms of their necessity who will help strength their respective company in the long run.