Title: The Essentials of Business Intelligence and Data Warehousing
Introduction
The world of business intelligence (BI) and data warehousing uses a unique terminology and deploys its own set of technologies, design techniques, and products. For newcomers all this can be overwhelming. What do all these new terms exactly mean, such as star schema, data mart, ETL, self-service BI, data science, big data, staging area, and BI in the cloud? In this new seminar, which has been set up specifically for newcomers,
all the typical BI terms, concepts, techniques, architectures, and technologies are explained. It’s a complete and critical introduction.
The well-known data warehouse architectures of Bill Inmon and Ralph Kimball are explained in detail, including all the database components that they are made up of: staging area, operational data store, enterprise data warehouse, and data mart. Their respective use cases, benefits, and drawbacks are discussed. Additionally, new upcoming architectures are clarified, such as datawarehouse in the cloud and the logical data
warehouse architecture.
For designing all these databases special techniques exist, such as star schemas, snowflake schemas, and datavault. Here, a specific terminology is used as well, such as fact table, dimensional table, hierarchy, hub table, and slowly-changing dimensions. Working with ETL products will receive quite some attention and is discussed in relationship to topics such as data quality, data profiling, and master data management.
The wide range of available technologies for data storage with their pros and cons are systematically discussed; from the traditional SQL databases to the new Hadoop technology, from analytical SQL database servers to SQL-on-Hadoop, and from on-premise data storage to storage in the cloud. And with all this new technology, what’s the role of good old SQL?
For reporting en for delivering data to business users a wide range of products is available, from simple reporting tools to the most advanced data science tools. Products from IBM, Information Builders, Microsoft, Oracle, Qlik, SAP, SAS, Tableau, and many others are discussed and compared. The impact of new technologies for analytics, such as Spark and Hadoop, are explained. The capabilities for analyzing unstructured
data, such as text, audio and video are discussed, together with the new world of streaming analytics.
To summarize, after this seminar you have a thorough understanding of the world of business intelligence and data warehousing. You will know about the techniques, the technologies, and the numerous products that are being applied. It is a complete and practical introduction to business intelligence and data warehousing and will help you on your way in BI projects.
Subjects
1. The Importance of Business Intelligence for Organizations
-
Data as competitive business asset
-
The history of business intelligence
-
Why reporting and analytics directly on the production systems is not recommended?
-
From reporting via self-service BI to data science and statistics
-
The data sources for business intelligence: transactional systems and open data
2. Overview of Database Technology
-
Working with standard SQL database servers
-
The importance of processing database queries as close to the stored data as possible
-
Pros and cos of in-memory database technology
-
What’s the added value of analytical SQL products, such as Exasol, IBM Netezza, and Teradata?
-
Making use of multi-dimensional cubes to speed up data access
-
Keeping data in memory with BI tools and Apache Spark
3. Traditional Data Warehouse Architectures
-
Every traditional data warehouse architecture consist of a chain of databases
-
Well-known databases in the chain: staging area, operational data store, data warehouse, and data marts
-
Why do we need a staging area and what is the relationship with change data capture and data replication?
-
The enterprise data warehouse as centralized data store for all the data needed for reporting and analytics
-
Using data marts to speed up query performance
-
The influence of big data
4. Designing Data Warehouses and Data Marts
-
Overview of design techniques: normalization, denormalization, star schema, snowflake schema
-
Facts and dimensions
-
Hierarchies of dimensions
-
Modeling time with slowly-changing dimensions
-
Storing derived and aggregated data
-
The special tables date and time
-
The influence of datavault on the architecture and usage
5. Using ETL to Pump Data Through the Chain
-
What is the functionality of an ETL product?
-
ETL is about data integration, filtering, cleansing, aggregation, and transformation
-
What do they mean exactly with pushdown?
-
Improving data quality
-
What is the difference between data cleansing and data profiling?
-
Overview of ETL products
-
The role of master data management when integrating data
6. Overview of the Tools for Reporting and Analytics
-
Identifying different types of users
-
The importance of a central metadata repository
-
Features and overview of tools for reporting and dashboards,
-
Tools for self-service BI and self-service data preparation, including PowerBI, QlikSense, Tableau, Paxata, SnapLogic, and Trifacta
-
Tools for statistics and data science
-
The role of an integrated BI platform
7. Modern data warehouse architectures
-
The logical data warehouse architecture
-
Data integration with data virtualization tools
-
The data lake for the data scientists
-
Fast data, streaming data, and the Internet-of-Things
8. The organization of the data warehouse
-
The relationship between business intelligence and data governance
-
The difference between data owners, data stewards, and data custodians
-
The importance of a data strategy and the alignment with the business strategy
-
The data lifecycle: creation, distribution, use maintenance, preservation, and disposal
-
The waterfall or the agile approach
9. Closing Remarks
Related Articles and Blogs:
The Roots of the Logical Data Warehouse Architecture
The Logical Data Warehouse Architecture is Not the Same as Data Virtualization
The Big BI Dilemma
The Logical Data Warehouse Architecture is Tolerant to Change
The Need for Flexible, Bi-Modal Data Warehouse Architectures
Related Whitepapers:
Developing a Bi-Modal Logical Data
Warehouse Architecture Using Data Virtualization, September 2016,
sponsored by Denodo
Designing a Logical Data
Warehouse, February 2016, sponsored by RedHat
Designing a Data Virtualization Environment; A
Step-By-Step Approach, January 2016
Developing a Data Delivery Platform with Composite
Information Server; June 2010; sponsored by Cisco (Composite
Software)
|