Keep Mosaic Normal
banner-keepmosaicnormal

Keep Mosaic Normal


Human Decision Making in Machine Learning Deployment for Resume Matching
posted by Mosaic Data Science, Part 2 of 2

The application of machine learning to text-based problem domains can use the text itself as a basis for explanation. Because the text is already understandable to a human observer, the groupings of text tokens and phrases can also be readily explained and understood. *Note that this is not intended to imply that all groupings or associations of words and phrases found through machine learning will be obvious and could have been found through trivial exploration. The point is that the groupings and associations derived through machine learning algorithms are more likely to be understandable because of their linguistic nature and will provide a basis for explanation of unique, unexpected, and/or hidden relationships between the resume and the job requirements.

Read More »

 

 


Human Decision Making in Machine Learning Deployment for Air Traffic Flow Management
posted by Mosaic Data Science, Part 1 of 2

Although some machine learning models can provide limited insight into and explanation of the model outputs, most machine learning model output is highly obfuscated and opaque. In the realm of many decision support tools for military and other safety- or life-critical applications, it is necessary and appropriate for humans to be involved in decisions using the recommendations and guidance of computer automation and information systems. However, the opacity can lead users of the technology to doubt the reliability of the information or recommendation that is provided. This lack of understanding of the technology can result in distrust, and to eventual failure of the technology to receive acceptance and use in its intended operational domain.

Read More »

 

 


Debating the Issues wtih NLP
posted by Mosaic Data Science

Since August of 2015, the presidential hopefuls from both major political parties have been joining in the primary debates to jockey for the two coveted positions in the general presidential election later this fall. The debates have been spirited and full of rich information about each of the candidates. Back in February, the folks at About Techblog did an analysis of the candidates’ language use in the debates up to that time (see Analyzing the Language of the Presidential Debates). We thought it would be interesting to parse through all of the data, including the primary debates that have occurred since About Techblog did their analysis, using our own NLP techniques

Read More »

 

 



pics-blog-dataarch-11Word Frequency Models: A Natural Language Processing Technique

posted by Mosaic Data Science

In a recently completed project with a Mosaic client, we were able to use some Natural language processing (NLP) techniques to great effects. We used a word frequency model (also called bag of words) to parse resumes and then returned a set of most likely job roles the resume was suited for. Their metrics measured our outputs to be about ten times more accurate than what they were currently using. Since these models are pretty easy to use and can also be used for different types of NLP problems.

 

Read More »

 

 


pics-blog-dataarch-10Ontology 101, Part 3: How to Create an Ontology
posted by Mosaic Data Science

In Part 2 of the three part series, we discussed the motivation behind and a high-level overview of our TMI ontology. If you have yet to read either Part 1 or Part 2 of this series, please do so before continuing. In the final part of this series, we look at the steps that we took to create the TMI ontology. It is important to note that even though the examples for each step link back to the TMI ontology, the method that we utilized can be used for any domain. For the purposes of clarity, all references to specific classes, properties, and individuals contained within an ontology will be written in italics, like this.

Read More »

 

 

 


pics-blog-dataarch-9Ontology 101, Part 2: A Practical Application of an Ontology
posted by Mosaic Data Science

In Part 1 of the three part series, we discussed what an ontology is and what the key components are. If you have yet to read that article, please do so before continuing. In Part 2, we look at how an ontology can be applied to a domain, specifically our Traffic Management Initiative (TMI) ontology developed under the TMI Attribute Standardization (TAS) project. This article will first give a brief overview of why an ontology is needed for TMI data and then give a high-level overview of the ontology that we have created. For the purposes of clarity, all references to specific classes, properties, and individuals contained within an ontology will be written in italics, like this.

Read More »

 

 

 


pics-blog-dataarch-8Ontology 101, Part 1: What is an Ontology
posted by Mosaic Data Science

Through the use of an ontology in the development process, each team member (i.e., business analysts, data architects, and developers) plays a crucial role in maintaining a consistent story and plan across all aspects of the application. Understanding that the word “ontology” is new to some people, I thought it would be useful to explore the world of ontologies by giving a more formal introduction.

Read More »

 

 

 


pics-blog-dataarch-7The Taylor Series and Beyond
posted by Mosaic Data Science

In the modern science of data analytics, sometimes oldies are goodies. I once took an optimization class where the answer to every question posed by the professor was “the Taylor series,” referring to a popular numerical method that will be 300 years old next year. Brook Taylor’s 1715 formulation, which can be traced back even further to James Gregory in the seventeenth century, is the foundation of a great many of today’s numerical methods, of which one of the most powerful is nonlinear batch least squares.

Read More »

 

 

 


pics-blog-dataarch-5Data Architecture 101, Part 5: Indexes
posted by Mosaic Data Science

Indexes have two main purposes in relational databases. First, they can improve query performance. Second, they can implement data-integrity constraints. (For example, you can create a unique index to enforce a uniqueness constraint.) This article focuses on the former purpose, in the BI/analytics (not OLTP) context. Throughout, we use Oracle indexes as examples. Oracle’s indexing capabilities generally lead the market, so if you understand how to use indexes in an Oracle database, it’s easy to transfer that knowledge to other (less capable) RDBMS platforms. For example, SQL Server clustered tables approximate Oracle index-organized tables.

Read More »

 

 

 


pics-blog-dataarch-5Data Debt
posted by Mosaic Data Science

In 2011 Chris Sterling published the very instructive book Managing Software Debt: Building for Inevitable Change. The book generalizes the concept of technical debt to account for a variety of similar classes of software-development process debt. Besides technical debt, Mr. Sterling describes quality debt, configuration-management debt, design debt, and platform-experience debt.

Read More »

 

 

 


pics-blog-dataarch-4Data Architecture 101, Part 4: Ontology-Driven Development is Lean
posted by Mosaic Data Science

In software-development nirvana, the business analysts, database technologists, and application developers all speak the same language.  Everyone agrees about what each user story means.  Everyone knows what’s in each database table and column, just by looking at them.  The source code practically explains itself.  Nobody creates database tables that never get used.  Nobody writes orphaned code.

Sound too good to be true?  Not really.  It’s not even that hard.  To do it, you just need to add two documents and a few straightforward steps to your agile/scrum development process.  Here’s how.

Read More »

 


pics-blog-dataarch-3Data Architecture 101, Part 3: Dimensions
posted by Mosaic Data Science

Data marts, data warehouses, and some operational datastores use dimension tables.  A dimension table categorizes a fact table that joins to the dimension.  At query time one filters the facts by values in the dimension table, and uses those values to label the query results.  For example, four dimensions in Figure 2 of our second data-architecture post “Overview of Relational Architectures” categorize a sale line-item fact.

Read More »

 

 

 


pics-blog-dataarch-2Data Architecture 101, Part 2: Overview of Relational Architectures
posted by Mosaic Data Science

In our first post we reviewed the rudiments of relational data architecture.  This post uses those concepts to survey the main types of relational architectures.  These divide fundamentally into two types, the second having four sub-types:
• online transaction processing (OLTP)
• business intelligence (BI)
• online analytical processing (OLAP) cube
• data mart
• (enterprise) data warehouse
• operational datastore (ODS).

Read More »

 


 

pics-blog-kmn-1Data Architecture 101, Part 1: Rudiments
posted by Mosaic Data Science

This post is the first in a series on relational database architecture and tuning.  It’s a mature subject, but we continue to encounter programmers and data scientists who have limited exposure to the material.  This blog aims to become a “nutshell” treatment of the subject, so those of you who work with data in a relational database management system (RDBMS) can quickly learn how to make the best possible use of a database.

Read More »