Skip to main content
Abstract image of a data center

Modern Data Architecture: Mine, Refine, & Turn Data Into Digital Gold

See how to turn raw data into practical insights and help drive operational efficiency and growth.

Just like the use of oil increased industrial transformation in the 20th century, data is fueling innovations in the 21st century. As leaders strive to transform their organizations, businesses leveraging data can help steer an enterprise in the right direction. Massive data (approximately 402 quintillion bytes of data are generated every day)1 comes from numerous sources, including finance, human resources, healthcare, retail, education, science, surveys, government, and IoT devices. This in-depth guide provides a detailed overview of the modern data architecture. 

Five Characteristics of Modern Data

Today’s modern data differs from “traditional” data in five areas:

  1. Volume: The size of data sets is increasingly larger; units are now terabytes, petabytes, or exabytes.
  2. Velocity: The speed at which data is generated is now faster. Data can be received every month, day, hour, minute, and second.
  3. Variety: The sources of data are more numerous and varied than ever before, including various types of devices and systems, e.g., phones, vehicles, point-of-sale systems, social media, IoT devices, homegrown apps, etc. They can be structured data or unstructured data, like pictures or videos.
  4. Veracity: This refers to the quality and reliability of data. It’s important to maintain quality output to help ensure data integrity.
  5. Value: With data more diverse and detailed than ever before, businesses can unlock the full potential of their data to help enhance decision making, improve customer satisfaction, improve operational efficiency, and drive revenue growth.

Handling modern data requires a robust extract, transform, and load (ETL) data integration process. ETL consists of:

  • Extracting data from various sources
  • Transforming the data, i.e., cleansing data for quality and consistency integration to provide a unified view
  • Data normalization
  • Aggregation (providing insight by summarizing)
  • Enrichment (adding additional information to enhance the data)
  • Loading the data into purposely designed databases, etc.

Then, data from multiple sources can be ingested, stored, updated, analyzed, visualized, and governed properly.

The foundational goal for data ETL is for advanced analytics, which our analytics team at Forvis Mazars can provide, to leverage tools and technology to benefit your business through data-driven decisions.

How can organizations begin to answer myriad complicated questions within all sorts of complex environments? Sophisticated, disciplined, and modern data architecture is the right answer.

The Evolution of Data Architecture

Data architecture has evolved over the years. The computing history has also grown from single PC computing to network-based, and from web-based, grid computing to cloud-based.

As systems evolve overtime, so does the complexity. From left to right: Multitasking, Client-Server, Web-Based, Service-Based, Cloud-Based.

Organizations still reliant upon on-premises infrastructure are often constrained by rigid scalability, fragmented systems, and outdated tooling. Transitioning to cloud-based storage, compute, and ETL can provide a wide range of benefits, especially within the cloud ecosystem.

By adopting cloud-native services, businesses can seamlessly integrate with their customer relationship management (CRM) and enterprise resource planning (ERP) systems, e.g., Dynamics 365, visualization and dashboarding tools, e.g., Tableau, Power BI, Looker Studio, Qlik, etc., and perform their end-to-end ETL operations, e.g., Microsoft Fabric, Databricks, etc. For instance, storing data in a data lake or SQL database allows real-time access from CRM/ERP applications, so customer-facing teams can work with accurate and up-to-date insights. Cloud visualization tools natively connect with cloud-based services, allowing automated report refreshes and data-driven dashboards without complex pipelines or data duplication.

Cloud platforms also offer elastic scalability, automated backups, better disaster recovery, and the ability to innovate faster using artificial intelligence (AI) and machine learning (ML) tools, e.g., Amazon SageMaker, Azure Machine Learning, OpenAI integrations, Azure AI Foundry, etc.

Understanding Modern Data Storage Models

The sources of Relational Data Warehouse (RDW) include: sales data, HR data, product data, finance data, customer data, inventory data, schedule data, and marketing data.

Data architecture in the early stages mostly relied on data warehouses, which centralize and normalize data into structured fact and dimension tables. Star schema was and is still prevalent in various databases within the data warehouses. The schema-on-write repositories design, where data is pre-structured and transformed before being stored, keeps high-performance data batch loading, processing, and fast query in mind, and the ultimate goal is for business intelligence (reporting and analysis).

The following are examples of unstructured data that can be stored in a data lake: files, cloud sources, web apps, IoT data, databases (on prem/cloud), audio/video files, and social media feeds.

In the early 2000s, with “volume” growth and “variety” of data types, it became clear that traditional data warehouses for structured data were not enough. Data lakes emerged, and these schema-on-read repositories store structured, raw, semistructured, and unstructured data at scale. Netflix, for example, uses data lakes to collect and process big data about customer behavior to improve its recommendation algorithms. Data lakes are excellent for advanced analytics but may require specialized software for analysis.

Recently, the data lakehouse emerged (data lake + data warehouse), combining the best features of both architectures and providing a single platform for both structured and unstructured data. For example, retail businesses use data lakehouses to manage and analyze customers’ transactions and product inventory. They can leverage a “medallion architecture” to store raw data, e.g., point-of-sale system data, etc., in a bronze layer, identify and remove duplicate data in a silver layer, and then aggregate data to generate insights about marketing forecasting in a gold layer. A data lakehouse uses open file formats such as Parquet and JSON files to enable structured analytics on top of a lake. It supports big data analytics.

A data mart is a curated subset of a data warehouse that is subject-oriented, focusing on a specific business function or a particular department. The smaller size of data marts allows for faster data retrieval and analysis and is optimized for performance. There are three types of data marts: dependent, independent, and hybrid. A dependent data mart is created from a central data warehouse. An independent data mart is created from a standalone source, which is entirely different from the data warehouse. A hybrid data mart is a combination of both.

Pros & Cons of Modern Data Storage Models

Data Warehouse

Pros:

  • Optimized for structured data and fast SQL queries
  • Ideal for reporting and dashboarding tools like Power BI
  • Mature ecosystem and strong governance support

Cons:

  • Less suited for unstructured or semistructured data
  • Expensive for storing large volumes of raw data
  • Schema changes can be cumbersome and typically lack scalability

Data Lake

Pros:

  • Handles vast volumes of structured and unstructured raw data
  • Cost-effective storage
  • Most data lake implementations enable AI/ML functionalities not available in traditional data warehouses

Cons:

  • Failure to optimize data storage can lead to poor performance
  • Requires strong governance to avoid becoming a “data swamp”
  • Complexity in managing schema and data quality

Data Lakehouse

Pros:

  • Merges benefits of lakes and warehouses
  • Open format, e.g., Delta, allows easy interoperability
  • Atomicity, consistency, isolation, and durability (ACID) transactions bring reliability to big data

Cons:

  • Still a newer architecture with evolving standards
  • Requires technical sophistication to manage effectively
  • Integration with some legacy BI tools may need customization

Data Marts

Pros:

  • Targeted, fast access for specific teams or functions
  • Easier to maintain than a full data warehouse
  • Promotes self-service analytics

Cons:

  • Risk of data silos and duplication
  • Can be hard to scale if not governed properly
  • Requires alignment with enterprise data models

Common Architectures

 Data WarehouseData LakeData LakehouseData Mart
Data StructureStructuredUnstructuredBoth structured and unstructuredBoth structured and unstructured
Schema ChangesCumbersomeDifficult to manageComplexSimpler
SQL QueriesOptimizedPoor performance for traditional BI queriesOptimizedOptimized
Reporting ToolIdealNeeds configurationNeeds configurationIdeal
Ecosystem MaturityMatureMatureMatureMature
Data SciencesIdeal for relational dataFlexibleVery flexibleIdeal for targeted data science projects
CostExpensiveEffectiveExpensiveEffective

How Forvis Mazars Can Help

A well-thought-out data architecture strategy can lay the foundation for leveraging data as a strategic asset. Once the data is mined and refined, your organization can maintain high-quality, accessible data, equipping you with enhanced decision-making capabilities. Leaders can rely on relevant data to help drive organizational growth, leading to operational efficiency and significant cost savings. Not sure where to begin? Contact our analytics team at Forvis Mazars for a consultation!

  • 1“Big data statistics: How much data is there in the world?”, rivery.io, May 28,2025.

Related FORsights

Like what you see?
Subscribe to receive tailored insights directly to your inbox.