Basics of DATA MARTS
A Data Mart is a
specific, subject oriented, repository of data designed to answer specific
questions for a specific set of users. So an organization could have multiple
data marts serving the needs of marketing, sales, operations, collections, etc.
A data mart usually is organized as one dimensional model made of a fact table
and multiple dimension tables.
In contrast, a Data
Warehouse (DW) is a single organizational
repository of enterprise wide data across many or all subject
areas. The Data Warehouse is the authoritative repository of all the fact and
dimension data (that is also available in the data marts) at an atomic level.
Both Kimball and Inmon have different
views for the same concept:
Kimball School:
Ralph Kimball began with the Data Mart
as a dimensional model for departmental data and viewed the Data
Warehouse as the enterprise wide collection of Data Marts. This
is the bottom-up
approach. You may begin with the Sales Data Mart, after
sometime you put in place the Ops Data Mart, and so on an so forth. If you want
you could have even more specific Data Marts serving specific questions like
customer Churn. If you take care of consistency of metadata (making sure each
departmental Data Mart calls an Apple an Apple) and connectivity, you have a
Data Warehouse. So the Data Warehouse is really a virtual collection of Data Marts collected together on a
Data Warehouse Bus, and in that sense the data flows from
multiple Marts into the Warehouse.
Inmon School:
Inmon’s approach is the exact opposite
and avoids the problem of metadata consistency by looking at the Enterprise
Data Warehouse as a single repository that feeds subject oriented Data Marts.
You still have your Sales, Marketing, Ops and Churn Data Marts containing
atomic or aggregated information, but they are based on the Data Warehouse and
are really subsets of the data contained therein. This is the top-down
approach.
Kimball’s approach is easier to
implement as you are dealing with smaller subject areas to begin with, but the
end result often has meta data inconsistencies and can be a nightmare to
integrate. Inmon’s approach, on the other hand does not defer the integration
and consistency issues, but takes far longer to implement (which makes it
easier for the project to fail). Also, in my experience, organizations that are
just starting to do analytics usually do not have the patience or commitment
required for Inmon’s approach.
Any BI initiative is extremely
iterative in nature. Unless you are confident that you would still have the
CEO’s buy-in and a budget one year down the line, it might be better to begin
with a Data Mart (to start delivering, and to manage expectations) keeping the
meta data consistency requirements in mind, and then scale towards the Data
Warehouse.
What Is a Data Mart?
A data mart is a simple form of a data warehouse that is focused
on a single subject (or functional area), such as Sales, Finance, or Marketing.
Data marts are often built and controlled by a single department within an
organization. Given their single-subject focus, data marts usually draw data
from only a few sources. The sources could be internal operational systems, a
central data warehouse, or external data.
How Is It Different from a Data Warehouse?
A data warehouse, unlike a data mart, deals with multiple subject
areas and is typically implemented and controlled by a central organizational
unit such as the corporate Information Technology (IT) group. Often, it is
called a central or enterprise data warehouse. Typically, a data warehouse
assembles data from multiple source systems.
Nothing in these basic definitions limits the size of a data mart
or the complexity of the decision-support data that it contains. Nevertheless,
data marts are typically smaller and less complex than data warehouses; hence,
they are typically easier to build and maintain.
Category
|
Data Warehouse
|
Data Mart
|
Scope
|
Corporate
|
Line of Business (LOB)
|
Subject
|
Multiple
|
Single subject
|
Data Sources
|
Many
|
Few
|
Size (typical)
|
100 GB-TB+
|
< 100 GB
|
Implementation Time
|
Months to years
|
Months
|
Dependent and Independent Data Marts
There are two basic types of data marts: dependent and
independent. The categorization is based primarily on the data source that
feeds the data mart. Dependent data marts draw data from a central data
warehouse that has already been created. Independent data marts, in contrast,
are standalone systems built by drawing data directly from operational or
external sources of data, or both.
The main difference between independent and dependent data marts
is how you populate the data mart; that is, how you get data out of the sources
and into the data mart. This step, called the Extraction-Transformation-and
Loading (ETL) process, involves moving data from operational systems, filtering
it, and loading it into the data mart.
With dependent data marts, this process is somewhat simplified
because formatted and summarized (clean) data has already been loaded into the
central data warehouse. The ETL process for dependent data marts is mostly a
process of identifying the right subset of data relevant to the chosen data
mart subject and moving a copy of it, perhaps in a summarized form.
With independent data marts, however, you must deal with all
aspects of the ETL process, much as you do with a central data warehouse. The
number of sources is likely to be fewer and the amount of data associated with
the data mart is less than the warehouse, given your focus on a single subject.
The motivations behind the creation of these two types of data
marts are also typically different. Dependent data marts are usually built to
achieve improved performance and availability, better control, and lower telecommunication
costs resulting from local access of data relevant to a specific department.
The creation of independent data marts is often driven by the need to have a
solution within a shorter time.
No comments:
Post a Comment