Data marts, data warehouses, and some operational datastores use dimension tables. A dimension table categorizes a fact table that joins to the dimension. At query time one filters the facts by values in the dimension table, and uses those values to label the query results
This post uses those concepts to survey the main types of relational architectures. These divide fundamentally into two types, the second having four sub-types: OLTP & BI.
Variable selection is perhaps the most challenging activity in the data science lifecycle. Our blog highlights a repeatable approach to variable engineering.
n this post we describe some common ways to transform individual variables, and explore how doing so may benefit an analysis.
Most data science algorithms do not tolerate nulls (missing values). So, one must do something to eliminate them, before or while analyzing a data set.
This data science design pattern blog post focuses on kernel smoothing.
We hope that this blog will become a clearinghouse within the data science community for these data science design patterns, thereby extending the design-pattern tradition in software development and enterprise architecture to data science.