Title: Data Virtualization: Technology and Use Cases
Abstract: Data virtualization is the new data integration technology. It allows for more agile data integration through decoupling data consumers from data stores.
But why do we need a new technology? Data is increasingly becoming a crucial asset for organizations to survive in today’s fast moving business world. In addition, data becomes more valuable if enriched and/or fused with other data. Unfortunately, enterprise data is dispersed by most organizations over numerous systems all using different technologies. To bring all that data together is and has always been a major technological challenge.
In addition, more and more data is available outside the traditional enterprise systems. It's stored in big data platforms, in cloud applications, spreadsheets, simple file systems, in weblogs, in social media systems, and so on. and stored in traditional databases. For each system that requires data from several systems, different integration solutions are deployed. In other words, integration silos have been developed that over time
has led to a complex integration labyrinth. The disadvantages are clear:
Inconsistent integration specifications
Inconsistent results
Decreased time to market
Increased development costs
Increased maintenance costs
The bar for integration tools and technology has been raised: the integration labyrinth has to disappear. It must become easier to integrate data from multiple systems, and integration solutions should be easier to design and maintain to keep up with the fast changing business world.
All these new demands are changing the rules of the integration game, they demand that integration solutions are developed in a more agile way. One of the technologies making this possible today is Data Virtualization.
This seminar focuses on Data Virtualization. The technology is explained, advantages and disadvantages are discussed, products are compared, design guidelines are given, and use cases are discussed.
What you will learn:
How Data Virtualization could be used to integrate data in a more agile way
How to embed Data Virtualization in Business Intelligence systems
How Data Virtualization can be used for integrating on-premised and Cloud applications
How to migrate to a more agile integration system
How Data Virtualization products work
How to avoid well-known pitfalls
How to learn from real-life experiences with Data Virtualization
Main Topics:
Introduction to Data Virtualization
The changing world of data and application integration
Under the hood of a Data Virtualization server
Caching for performance and scalability
Query optimization techniques
Data Virtualization and the Logical Data Warehouse Architecture
Data Virtualization and Big Data
Data Virtualization and Master Data Management
Data Virtualization and Data Lakes
The future of Data Virtualization
Topics:
1. Introduction to Data Virtualization
What is data virtualization?
Use case of data virtualization: business intelligence, data science, democratizing of data, master data management, distributed data
Differences between data abstraction, data federation, and data integration
Open versus closed data virtualization servers
Market overview: AtScale, Data Virtuality, Denodo Platform, Dremio, Fraxses, IBM Data Virtualization Manager for z/OS, Stone Bond Enterprise Enabler, TIBCO Data Virtualization, and Zetaris
2. How Do Data Virtualization Servers Work?
The key building block: the virtual table
Integrating data sources via virtual tables
Implementing transformation rules in virtual tables
Stacking virtual tables
Impact analysis and lineage
Running transactions – updating data
Securing access to data in virtual tables
Importing non-relational data, such as XML and JSON documents, web services, NoSQL, and Hadoop data
The importance of an integrated business glossary and centralization of metadata specifications
3. Performance Improving Features
Caching of virtual tables to improve query performance, create consistent report results, or minimize interference on source systems
Different styles of refreshing caches: full, incremental, live, query-based, trigger-based, and offline refreshing
Different query optimization techniques, including query substitution, pushdown, query expansion, ship joins, sort-merge Joins, statistical data and SQL override
4. Use Case 1: The Logical Data Warehouse Architecture
The limitations of the classic data warehouse architecture
On-demand versus scheduled integration and transformation
Making a BI system more agile with data virtualization
The advantages of virtual data marts
Strategies for adopting data virtualization
The need for powerful analytical database servers
Migrating to a data virtualization-based BI system
5. Use Case 2: Data virtualization and Master Data Management
How can data virtualization help with creating a 360° view of business objects
Developing MDM with a data virtualization server – from a stored to a virtual solution
On-demand data profiling and data cleansing
6. Use Case 3: From the Physical Data Lake to the Logical Data Lake
Practical limitations of developing one physical data lake
Shortening the data preparation phase of data science with data virtualization
Sharing metadata specifications between data scientists
Implementing analytical models inside a data virtualization server
7. Use Case 4: Democratizing Enterprise Data
Increasing the business value of the data asset by making all the data available to a larger group of users within the organization
The business value of consistent data integration
Using lean data integration to make data available for analytics and reporting faster
One consistent data view for the entire organization
How the business glossary and search features help business users
The coming of the data marketplace
8. Use Case 5: Dealing with Big Data
Big data can be too big to move - data can't be transported to the place of integration
Data virtualization pushes data processing to where the data is produced
Hiding the physical location of the data
With data virtualization, the network becomes the database
9. Closing Remarks
The Future of Data Virtualization
Data virtualization as driving force for data integration
Potential new product features
Geared to: IT architects; enterprise architects; business intelligence specialists; data
analysts; data warehouse designers; business analysts; data scientists;
technology planners; technical architects; IT consultants; IT strategists;
systems analysts; database developers; database administrators; solutions
architects; data architects.
Related Whitepapers:
Data Fabrics for Frictionless Data Access; April 2021,
sponsored by TIBCO Software
Raising the Bar for Data Virtualization; September 2020,
sponsored by Intenda
Overcoming Cloud Data Silos with Data Virtualization; June 2020,
sponsored by TIBCO Software
Modernizing Data Architectures for a Digital Age Using Data Virtualization; October 2019;
sponsored by Denodo Technologies
The Business Benefits of Data Virtualization; May 2019,
sponsored by Denodo Technologies
The Fusion of Distributed Data Lakes - Developing Modern Data Lakes; February 2019,
sponsored by TIBCO Software
Unifying Data Delivery Systems Through Data Virtualization; October 2018;
sponsored by Fraxses
Architecting the Multi-Purpose Data Lake With Data Virtualization, April 2018,
sponsored by Denodo
Data Virtualization in the Time of Big Data, December 2017,
sponsored by Tibco Software
Developing a Bi-Modal Logical Data Warehouse Architecture Using Data Virtualization, September 2016,
sponsored by Denodo
Designing a Logical Data Warehouse, February 2016, sponsored by RedHat
Designing a Data Virtualization Environment; A Step-By-Step Approach, January 2016; sponsored by RedHat
Migrating to Virtual Data Marts using Data Virtualization; Simplifying Business Intelligence Systems; January 2015; sponsored by Cisco
Re-think Data Integration: Delivering Agile BI Systems With Data Virtualization; March 2014; sponsored by RedHat
Creating an Agile Data Integration Platform using Data Virtualization; May 2013; sponsored by Stone Bond Technologies
Data Virtualization for Business Intelligence Agility; February 2012; sponsored by Cisco (Composite Software)
Related Articles and Blogs:
A Decentralized Master Data Solution using Data Virtualization
Streamlining External Data Acess to Enrich Analytics
The Data Mesh, the New Kid on the Data Architecture Block
Developing a Data Fabric
Making Big Data Easy with Data Virtualization
Data Herding Is Not Data Integration!
Benefits of Data Virtualization to Data Scientists
Eight Data Virtualization Features to Help an Organization Become Data-Driven, June 2020
Data Virtualization and SnowflakeDB: A Powerful Combination, January 2020
Spark and Data Virtualization: Competitors or Cooperators, October 2019
Simplifying Big Data Projects with Data Virtualization, March 2019
Easy Database Migration with Data Virtualization, January 2019
Data Virtualization and the Fulfilling of Ted Codd's Dream
Data Virtualization or SQL-on-Hadoop for Logical Data Architectures?
Simplifying Big Data Integration with Data Virtualization
Data Virtualization for Developing Customer-Facing Apps
Do Data Scientists Really Ask for Physical Data Lakes
Do We Really Deploy ETL in Our Data Warehouse Architectures
Challenges for Developing Data Lakes
OLAP-on-Hadoop on the Rise
The Big BI Dilemma
The Logical Data Warehouse Architecture is Tolerant to Change
The Need for Flexible, Bi-Modal Data Warehouse Architectures
The Roots of the Logical Data Warehouse Architecture
The Logical Data Warehouse Architecture is Not the Same as Data Virtualization
Data Virtualization is Not the Same as Data Federation
Data Virtualization and Data Vault: Double Agility
Convergence of Data Virtualization and SQL-on-Hadoop Engines