*This is Part 2 of a 2 part series examining human acception of machine learning model outputs.
posted by Mosaic Data Science
From Part 1:
‘Big Data’ and ‘Data Science’ are the new buzzwords creating a significant amount of excitement in the world of business today. We now experience the results of machine learning models on a frequent basis through online interaction with news sites that learn our interests, retail websites that provide automated offers that are customized to our buying habits, and credit card fraud detection that warns us when a transaction occurs that is outside of our normal purchasing pattern. Many such applications of machine learning can be performed using an automated approach where human interpretation of the recommendations is not necessary. However, in many other applications, there would be great benefit from the ability of the machine learning model to provide an explanation of its output recommendation or other result.
The Explainable Principal Components Analysis technique and Gray-Box Decision Characterization approach apply across a broad range of different types of machine learning models, and to essentially any application domain of machine learning. In Phase I we will utilize two large datasets from two distinct application domains to demonstrate the generic applicability of our proposed techniques. The two application domains are resume matching to job requirements, and air traffic flow management (ATFM), both of which involve resource allocation.
Principal Components Analysis (PCA) is a technique that is used extensively within machine learning model development for dimensionality reduction. From an information-theoretic perspective, regular PCA aggregates information contained in a high-dimensional space into a form that can represent an arbitrarily large portion of the information in the data through a lower-dimension vector representation. While this aggregation process identifies the orthogonal dimensions in the data over which the greatest explanation of the variance can be achieved, the “explanation” of the variance in PCA is maximized from a statistical perspective, but not from the perspective of understandability by a human. In fact, the basis vectors created by PCA are one of the primary sources of opacity in many practical machine learning applications. A formulation of a variant of PCA – which we have already referenced as Explainable Principal Components Analysis (EPCA ) computes basis vectors of the problem space with understandability as a primary objective.
The Gray-Box Decision Characterization (GBDC) approach utilizes the results of the EPCA algorithm (or regular PCA if sufficiently explainable) to create an orthogonal basis for sensitivity analysis of the output of the machine learning model around the input data vector for a single decision output. Thus, the portion of the problem space that must be known to the GBDC approach is the input feature representation, as well as access to a large set of training data samples.
Using these techniques to better match resumes with Job REQ’s
Mosaic, a top data mining company, has performed extensive analysis, algorithm design and development for the resume to job requirements matching problem. This predictive analysis includes the design and evaluation of a matching algorithm that demonstrated a significant performance improvement in resume-to-job matching for a major staffing firm that processes 1,000s of resumes per day. Through this experience, we claim that explainability of the resume-to-job matching problem is easier than explanation of more general machine learning models because problems that involve text analytics and Natural Language Processing (NLP), such as the resume-to-job matching problem, provide inherent explanation capabilities.
The application of machine learning to text-based problem domains can use the text itself as a basis for explanation. Because the text is already understandable to a human observer, the groupings of text tokens and phrases can also be readily explained and understood. *Note that this is not intended to imply that all groupings or associations of words and phrases found through machine learning will be obvious and could have been found through trivial exploration. The point is that the groupings and associations derived through machine learning algorithms are more likely to be understandable because of their linguistic nature and will provide a basis for explanation of unique, unexpected, and/or hidden relationships between the resume and the job requirements.
The GBDC technique was briefly described in a prior paragraph. This technique simply performs a sensitivity analysis of the behavior of the model in the region around the specific input data feature vector that generated the decision from the machine learning model. The GBDC approach searches for changes along explainable basis vectors that result in a change in the output of the machine learning model. Although this technique is simple in principle, the large number of dimensions (even after dimensionality reduction), and the need to search along each dimension, require additional enhancements beyond the simple concept explanation provided so far. We return to the EPCA approach to describe the additions to the GBDC algorithm.
Explainability in Machine Learning Models
A very important aspect of using the EPCA approach to establish the basis for explanation of machine learning model decisions is that in addition to obtaining orthogonal dimensions along which the ML decision can be parameterized for explainability, we also obtain explicit measures of the mean and variance of the input data samples along those dimensions. Thus, in the use of the EPCA basis for sensitivity analysis, we can compare the mean value of the entire set of input data along that dimension to the value of the specific input vector applied to the machine learning model that generated the decision to be explained. This comparison provides immediate interpretation of the input data vector for the decision such as providing an understandable description of the circumstance of the current case. If labels are assigned to the dimensions of the EPCA output, textual descriptions could be assigned such as ‘the eyes of this face are particularly close to each other,’ or ‘this flight path uses a particularly long final approach segment.’
With any set of basis vectors, we can represent the input sample data as a linear combination of the basis vectors.
In matrix form for all samples of the training data:
Where the U matrix is the orthonormal basis, the X matrix is the set of all input samples as row vectors, and the P matrix is the coefficients of the basis vectors to reconstruct the input, X. Since the U matrix is orthonormal, its inverse is the same as its transpose, thus:
The mean and standard deviation are then computed along the columns of the P matrix to obtain the mean and standard deviation across the entire input data set. In addition to using the mean value along each dimension to characterize the input, we also use the standard deviation of the variance along each dimension to determine the appropriate step size to use in the sensitivity analysis. For example, if the standard deviation, σj, of the input data samples along a particular dimension, j, is 10, then testing the sensitivity along that dimension by evaluating a change of 30 along that dimension would move the sensitivity test position by 3σ, which would likely be far outside of the range of nearly all input samples.
For characterization of a classification machine learning model, the GBDC technique conducts a search to find a change in the output classification of the model. A binary search is used to find the first occurrence of change along that dimension, recognizing that no change may occur at all within the realm of reasonable change values. For a regression model, the rate of change of the output given a change in the input is calculated. For both cases, if the value of the input feature vector along the current test dimension for the decision to be explained is near the mean for that dimension, then the search can be conducted in both directions. Otherwise, the search would likely be conducted in a single direction back toward the mean.
Finally, the explanation of the machine learning model decision is generated by selecting the dimension, or multiple dimensions, that generate the most significant change in the machine learning model output, or create a change in the classification decision with the smallest change (according to z-score) in the input vector.
Gray-Box Decision Characterization (GBDC) Technique and Resume Matching
The resume-to-job matching problem has a number of unique aspects that make it an ideal problem for initial evaluation of GBDC. First, we can evaluate GBDC on the resume matching case without also having to use the EPCA technique, because the resume matching problem provides a sufficiently explainable basis for the problem space through the use of regular PCA. Thus, our consideration of GBDC will not be affected by any unexpected or undesirable behavior that we may find in the EPCA technique.
The second advantageous aspect of the resume matching problem is that the feature representation of each resume will be sparse and the behavior of a machine learning model for matching a resume to a job will be relatively simple and mostly binary. In other words, the decisions to match a resume to a job will mostly depend on whether a given set of words occurs in the resume or not – a yes or no criteria. This is not a required characteristic for our GBDC technique in any way, but expect that it will simplify the analysis and evaluation of the GBDC concept. Specifically, the search along each dimension in the GBDC approach will be simplified.
To make sure that our approach for resume-to-job matching does, in fact, capture subtle connections between the skill base and experience of veterans and the private sector job descriptions, we will use advanced NLP techniques within this task. Most NLP classification solutions and other analytics companies simply use a bag-of-words approach, or are possibly more extensive and use bigrams and trigrams in the input feature representation. However, these approaches do not take advantage of the grammatical and syntactical dependencies that are available in textual data. Through our advanced experience and expertise in NLP and big data consulting, we will add the full dependency parse of resume text to the feature representation, and then apply the GBDC to decisions made by the machine learning model using this more detailed representation.
Mosaic recently developed an NLP engine called Mosaic Context Extractor (MCE) based on Google’s SyntaxNet. While SyntaxNet was intended by Google to be used as part of natural language understanding systems, the dependency parses that it produces are very useful for adding context beyond simple bag of words approaches in text analytics tasks. A dependency parse builds on the syntactic parse of a sentence by adding semantic relationships such as subject, direct object, indirect object, etc. Once the dependencies between the words are known, other types of information can be derived. For example, thematic role information can be applied (agent, patient, theme, etc.). In addition, one can derive binary (EAT -> BURGER) and ternary (I – EAT – BURGER) relations that can be used as higher level features (more context rich) in text classification and clustering tasks. Figure 4 shows an example of a dependency parse.
Figure 4. Dependency parse of the sentence “I ate the burger with my hands.”
We believe that dependency parse information (particularly binary and ternary relations) will create very rich features in a machine learning model. Simple bags of words lose the richness of the context in which each word occurs. This richness, however, is preserved in a dependency parse. Not only are the relations themselves preserved but also the relation types such as subject and direct object (which help derive the role of a word or phrase in the sentence).
Obviously, in order to evaluate the output decisions of a machine learning model for the resume-to-job matching problem, we need to have a machine learning model to evaluate. Mosaic, a leading analytics consulting firm, has already developed a machine learning model for the exact problem in the topic description of matching resumes to job descriptions.
Using the capabilities described above, we can evaluate the GBDC concept and technique by generating explanations of specific machine learning model outputs that classify a resume as being most appropriate for a particular job description. We can use our existing database of labeled resumes to perform this analysis, and enhance the database as needed to cover additional types of job descriptions. Although the search along each dimension will be simplified in this evaluation of the resume-to-job matching problem, we nonetheless can generate findings that will inform the more general application of the GBDC technique to other problem domains.
How does this relate to my business?
In today’s commercial world of “big data analytics” and “data science” understanding why an algorithm made the predictions that it did is every bit as important as understanding the output itself. As machine learning and artificial intelligence continue to improve, we must congruently and in parallel understand why they selected their outputs in order to understand how to apply them.
Every organization needs to hire the right people for their workforce. Machine learning and data science make this easier for human decision makers by providing better candidates to hiring managers. Not only will this use case drive bottom line value in attrition reduction, interviewing hours, and serious competitive advantage; but it represents a low-hanging fruit opportunity, garnering more investment for future data science projects.
Mosaic can bring these capabilities to your organization, Contact us Here and mention this blog post