Machine Learning for Resume Matching

Resume Matcher

Business leaders at companies all across the world spend too much time sifting through resumes to identify candidates for open job requirements. Time is wasted, hiring managers are left frustrated, and candidates leave with a negative outlook of a company. There is an opportunity to use advanced analytics and Natural Language Processing (NLP) to better match job opportunities with qualified candidates.

Wouldn’t it be nice to spend less time reviewing resumes, and more time interacting with appealing candidates? Different companies places emphasis on skills listed in the resume, colleges, location, work history, or any combination of these factors.

Mosaic Data Science, a leading big data consulting company, has built a solution using a probabilistic model which ranks the potential candidate using machine learning techniques and NLP to optimize the time spent by Human Resource (HR) professionals, business leaders and hiring managers. Mosaic develops and delivers a powerful classification model to give candidates with a higher likelihood of a job offer higher preference.

The tool uses a supervised classification machine learning model to rank submitted resumes and recommend which resumes should be given further consideration. The model should take as training input the text of prior years’ resumes (the more the better), candidate tracking data, and decisions made for this group as to whether or not they were selected for phone screen.

The performance of the classification model will be optimized based on standard probabilistic evaluation metrics, such as sensitivity and specificity. Mosaic will work with stakeholders to determine the combination of metrics and objectives that best align with the business processes and costs for candidate ranking. Pre-processing of resumes will include parsing and tagging of resumes, and application of a simple ontology for identification of the work history, location, degree program, specific skills, etc.

The models will use term frequency-inverse document frequency (tf-idf) features on both uni-grams and bi-grams, as well as other specific features based on variables chosen by the customer. Mosaic will compare the performance of multiple modeling approaches, including logistic regression, naïve Bayes, support vector machine, and random forests, and will select the modeling approach that performs best on the tagged data set according to the selected metrics.  Accepted cross-validation practices will be followed to ensure that the model performance generalizes to resumes outside of the training set.

In scoring mode, the model will use stored parameters to compute scores for new candidates.

One client saw an improvement of matching accuracy increase by over 60% when switching from a keyword search model to our resume matcher!