The diagram depicts the basic components of a typical warehouse. On the left, we can see the Source Data component. The next building step is the data staging element. The Data Storage component, which handles data warehouse data, is seen in the center. It consists of two parts: an offline archive and a fast online storage system.
Finally, on the right, we have the Data Quality module. This module filters out all wrong or incomplete information in order to produce clean, consistent data that can be used for reporting purposes.
In conclusion, a data warehouse contains three elements: source systems, tools for extracting data from those sources, and methods for storing and analyzing it.
Discussion Forum
Que. | Which of the following is not a component of a data warehouse? |
---|---|
b. | Data warehouse data |
c. | Data metadata |
d. | None of the above are data warehouse components. |
Answer:Data metadata |
A data warehouse is a heterogeneous collection of several data sources that are organized according to a uniform structure. The top-down technique and the bottom-up strategy are both used to build a data warehouse. They are discussed further below. An external source is a location from which data is obtained, regardless of the type of data. For example, customer information is obtained from customers, sales information from employees, and inventory data from warehouses and vendors.
Data warehouses contain historical records of activity at organizations, such as banks, retail stores, and manufacturers. A data warehouse may contain information about customers, products sold, amounts received, and other aspects of a company's business. Data warehouses are used by companies to store and manage this information for future use. Companies can also use data warehouses to analyze past performance by region or other categories of interest.
The data warehouse concept was first introduced in 1992 by Gary Hill in his paper "The Data Warehouse Concept." He described it as a centralized repository that contains all relevant information about an organization's business for various time periods. Business users can query the data warehouse using structured queries (i.e., queries written in the Structured Query Language or SQL) to find patterns in the data that can be used to make business decisions.
Data warehouses can be built using one of two methods: the top-down approach or the bottom-up approach. Both methods start with identifying what information needs to be stored about each object in the business universe.
Data warehouses typically feature a three-tier design, which includes: Bottom Tier (Data Warehouse Server) Middle Level (OLAP Server) Top Shelf (Front-end Tools).
The bottom tier is made up of the database server on which the warehouse resides. This server must be capable of performing many functions including storing large volumes of data, providing rapid access to that data, and supporting business applications requiring direct access to the data stored in the warehouse.
The middle level consists of one or more OLAP servers that perform analytical functions on the bottom-tier data. These servers may be integrated into the same machine as the bottom-tier server, or they may be separate machines. OLAP servers analyze the data, create models, and report findings using multidimensional techniques.
Top Shelf tools present the user with views of the data that make it easy to navigate and search through. They also provide formatting capabilities for producing reports.
For example, suppose a company wants to create a financial reporting system for use by management to summarize the company's performance by division. The first step would be to determine what type of analysis is needed to satisfy this requirement. Then, a data model could be created to represent the relevant aspects of the business within an acceptable degree of accuracy.
A data warehouse architecture is a means of defining the entire architecture of data transfer processing and display that occurs inside the company for end-client computing. Each data warehouse is unique, yet they all share several essential components. A data warehouse begins with a client/server architecture using SQL Server as the server component.
The data warehouse also includes an information framework called Dimension Management that is used to describe what kind of data is stored in the data warehouse and how it can be analyzed. Dimensions are descriptions of attributes such as customer name, date purchased, or product description. Relationships between dimensions are called joins. For example, one could join sales information by month and year to analyze yearly trends in sales.
Finally, there is a measure management system that is used to track performance against objectives within each dimension. Measures are numeric values which can be added together to create aggregates such as monthly revenue. For example, one could calculate monthly average price to determine whether prices have increased or decreased over time.
Data warehouses contain large amounts of data that need to be organized and accessible for analysis. The database structure behind the data warehouse must be able to store multiple dimensional views of the data as well as historical data. Data warehouses often use multidimensional databases because they offer many advantages compared to traditional relational databases.
There are 2 approaches to constructing a data warehouse: The top-down approach and bottom-up approach are explained as below. 1. Top-down strategy: The essential components are discussed below. An external source is a source from where data is collected irrespective of the type of data. Data can be structured, semi-structured, and unstructured as well. Unstructured data refers to any collection of information that is not structured or classified in some way. For example, an archive of news articles would be considered unstructured data because it has no defined fields for storing data. Structured data is stored in databases with defined columns and rows. Semi-structured data is stored in files or other non-database systems such as web servers. Examples of semi-structured data include spreadsheets, word processing documents, PDFs, images, audio files, and video clips.
• External sources may include government agencies, companies, organizations, etc. That provide data for use by businesses or others. • A data warehouse should be built using only externally sourced data. In other words, it should not include data that you collect or generate yourself. • There are three types of external sources: Qualified, unqualified, and special needs sources. Qualified sources are those that are known to provide reliable data. They may be specified by contract or otherwise agreed upon by the data user and supplier. Unqualified sources provide data that may not be used for statistical purposes but could be used for exploratory analysis. These sources may include company reports, analyst forecasts, and surveys.