September 15, 2023 | Blog Featured Insights Technology

Microsoft Fabric: Insights and Considerations from Our Customers

It has been almost four months since Microsoft Fabric entered preview at the end of May. Conversations with our customers have shown a significant level of interest and raised expectations surrounding this offering. Let us explore the reasons behind this interest based on our discussions with customers and address essential considerations for those thinking about adopting it.

Solving Traditional Problems

Interest in Microsoft Fabric primarily arises from its ability to tackle familiar challenges in cloud infrastructure management. Fabric simplifies complex cloud data infrastructures, previously composed of multiple services and platforms, into a unified SaaS service. This consolidation also extends to pricing, offering a more straightforward and clear cost structure. We will go into the pricing model in more detail shortly.

Data Science and Machine Learning Capabilities

Many of our customers are intrigued by Fabric’s built-in data science capabilities. Some have already advanced analytics solutions in their current environment and are eager to explore what Fabric could add to their toolkit. Others have focused on traditional business intelligence but are now considering taking their data-driven business further. Microsoft Fabric’s Synapse Data Science integration brings these capabilities to the forefront, and customers recognize the value of having these tools as part of the Fabric package.

Navigating Concerns

While the interest in Microsoft Fabric is apparent, it is not without its share of questions and concerns. One of the foremost considerations is the maturity level of the product. Given its brief time in the preview phase and the absence of an official release date, customers understandably question its readiness for production use. Moreover, there are still some rough edges in the product that need to be addressed before the official release. Fortunately, familiar tools like Data Factory, Power BI, and Synapse Tools retain their user experience within Fabric, eliminating the need to learn their usage from scratch.

Pricing and Capacity Management

There are few licensing options in the Fabric Preview, affecting Power BI and sharing capabilities, but in the base is a time-based billing (per second with a minimum of one minute) for provisioned compute – Fabric capacity. Pricing varies by region and capacity size.

Currently, only a pay-as-you-go model is available, but a reserved capacity pricing model is in the works. Additionally, OneLake storage is billed at a pay-as-you-go rate. As the costs are generated per used time when the capacity is on, for efficient cost management it is important to close the capacity when it is not needed either manually, or through automation. If Fabric eventually offers internal auto-pause functionality for capacity, this concern will be alleviated. In the meantime, at Evitec, we have developed an automated solution to pause capacity when not in use.

Let us get back to the question of having enough capacity – computing power – for your organization. In the simplest scenario, the entire organization can use a single capacity for all their needs in Fabric. When you start using Fabric, it is a good idea to begin with the smallest capacity and increase it as necessary. As your use of Fabric grows and your organization’s needs become more diverse, you might need to scale up the capacity.

Now, it is important to understand another fundamental term in Microsoft Fabric: workspaces. Workspaces function as containers for Fabric items and can be created for example based on different business domains – this decision is up to your organization. Each workspace is associated with a capacity, and multiple workspaces can share a single capacity. However, as your usage and use-cases become more varied, your organization might want to have multiple capacities of different sizes available for different purposes. In other words, you can have different levels of computing power at your disposal.

Changing the capacity that a workspace uses is a straightforward process. Based on the increased need for capacity within a specific workspace, such as heavy calculations, you can temporarily or permanently scale up the capacity of that workspace. The following illustration demonstrates how your organization’s Fabric capacity and workspaces might evolve as your experience with Fabric grows.

Security and Region Availability

Security is paramount in data solutions, and many companies have strict policies about data storage locations. Currently, Fabric Preview is not available in all regions. For example, as of writing this blog, the closest regions to Finland offering Fabric Preview are Western Europe and Eastern Norway. We hope that region availability will expand when Microsoft releases the official version. However, if a company policy dictates a specific region not currently supporting Fabric Preview, it may raise concerns about considering Fabric as an option.

Migration Strategy Dependant on Current Data Environments

Customer situations regarding their current data solutions vary greatly. Some rely solely on Excel-based reporting, while others maintain on-premises data warehouses, and some have extensive experience with cloud-based data solutions. These varying starting points influence their needs and questions concerning Fabric.

For those migrating from on-premises environments to the cloud, Microsoft Fabric provides a straightforward option, albeit requiring a clean slate approach. In contrast, businesses with existing cloud environments seek to understand how Fabric complements their current stack and its potential for hybrid solutions. The starting point of each customer significantly influences the migration effort required. Due to Fabric’s one-copy data approach, development should be faster compared to previous cloud migrations.

Conclusion

Microsoft Fabric has garnered significant interest and raised expectations among our customers. Its ability to simplify cloud infrastructure management, offer powerful data science capabilities, and address traditional challenges is compelling. However, it is essential to address concerns around product maturity, pricing, capacity management, security, and region availability before making a decision.

We at Evitec are excited about the potential that Microsoft Fabric brings to the world of data and analytics and look forward to its continued evolution.

Written by

Henni Niiranen

Data Consultant

August 3, 2023 | Analytics Blog Featured Insights

What is Data Mesh?

Microsoft Fabric was introduced into preview on end of May. Few update releases have been made, and the discussion around Fabric is active. Let’s take a closer look into one of the topical themes now: Data Mesh.

If this buzzword is completely new to you, or you have heard the term couple of times but have not have the time to figure out what it means, here is a short curriculum on Data Mesh, and how does it connect to Microsoft Fabric. Enjoy!

Written by

Henni Niiranen

Data Consultant

Data Mesh in short

The concept of Data Mesh is relatively new as it was introduced in 2019 by Zhamak Dehghani, a pioneer in managed data decentralization. Data Mesh is an enterprise data architecture, that opposite to traditional “monolithic” and centralized data lakes and warehouses, embraces intentionally distributed approach to data architecture, especially meant for large and complex organizations dealing with big data.

The key message of Dehghani was, that the traditional way of implementing data warehouses or lakes as big, centralized structures hasn’t been able to unleash the true value of the data, and has created big, complex and expensive data architectures full of delivery bottle necks, especially in large organizations with rich domains, several sources and a diverse set of consumers.

To harness the full potential of data, Data Mesh approach advocates for distributing data ownership and governance to individual domain teams, enabling them to take ownership of their data and work in agile way. The four core principles of Data Mesh include domain and data product thinking, self-serve data platforms, and federated data governance. Let’s dig a bit deeper to these core principles.

Domain Thinking

Domain thinking is a fundamental aspect of Data Mesh. It involves aligning data infrastructure, tools, and processes with specific business domains rather than treating data as a monolithic entity. Each domain team becomes responsible for its data products, including data collection, processing, storage, and analytics.

This approach promotes a deep understanding of domain-specific data requirements, leading to better insights and faster decision-making. In large organizations a single platform/data integration team will cause bottlenecks and hinders getting business value out of data. The integration work also needs data expertise from the team, which is hard to achieve in large organizations with number of data sources, for small and centralized teams. This supports the way how business domains naturally distribute in organizations. Data domains and the teams around them should be long-term.

Data Products

Data Mesh introduces a product-oriented mindset to data management. Each domain team treats its data products as assets and focuses on delivering data products that support the specific needs of their users. This approach encourages teams to think beyond just data pipelines and storage, considering the end-to-end data product lifecycle, including data discovery, documentation, accessibility, and continuous improvement that bring long term value for the business. The customers of these data products delivered by the domain teams can be other data scientists, data engineers or business users within the organization. Data products can be for example APIs, reports, tables or datasets. Through the data products the data can be shared also between the different data domain teams when needed.

Self-Serve Data Platform

Data Mesh encourages the creation of self-serve data infrastructure as platform within the organization for the domain teams. The domain teams have the autonomy to choose and manage their data storage, processing, and analysis tools based on their unique needs to be to deliver successful data products, but their job is not to manage technical infrastructure. This job is done by a centralized platform team, who are responsible to create domain-agnostic infrastructure platform, that can support domain teams in creating their data products with low lead time. Automation capabilities are one of the key features of the platform.

Federated Data Governance

Data governance plays a crucial role in Data Mesh, as in the distributed domain approach it is very important to make sure that we don’t fall back to creating silos, data duplication and building a wild west of the enterprise data architecture.

As a recap, data governance is a set of processes, policies, and guidelines that ensure the effective and secure management of an organization’s data assets. Instead of relying solely on a centralized data governance model, Data Mesh promotes a federated data governance approach. Federation approach means, that a set of data governance guidelines and standards are defined and managed centrally in the organization, and each data domain team must comply with these rules. However, each domain is free to decide how they will comply with the governance rules in practice, taking account domain-specific requirements.

It is important to make the distinction between dusty data silos and the decentralized data domains in the data mesh. Data silos refer to isolated data storage where data is stored within individual teams or departments in an organization. Each silo typically has its own data formats, definitions, and access controls, making it challenging to share and integrate data across different silos. This results in data duplication, inconsistencies, and limited data accessibility, hindering collaboration and a holistic view of data across the organization.

The key difference between data silos and decentralized data domains lies in their approach to data management and governance. While data silos isolate data within specific teams or departments, leading to fragmentation and limited data sharing, decentralized data domains emphasize the culture of collaboration within the organization following standardized common practices, but keeping the autonomy to define their data products, data schemas, and access controls in a way that supports their use cases the best.

Fabric & Data Mesh

Ok, so now we know what is a Data Mesh is, but how does it relate to Microsoft Fabric? It is important to remember, that Data Mesh itself is not a technology or coupled with any tech provider, it is an architectural paradigm that can be implemented in many ways, and multiple paths can lead to a Data Mesh. Currently the Fabric Preview enables the organizing the data into domains and thus supporting the domain thinking of data mesh. In the future releases federated governance capabilities are enabled. In general, Microsoft now defines in their data architecture suggestions a data mesh as a one approach and gives implementation instructions for the technical side as well as for the concept and change management perspective. Data mesh is here to stay as a one enterprise data architecture.

Conclusion

Even though the Fabric would provide ease to the technical requirements of Data Mesh, no tool is going to the actual groundwork of setting up the working methods, defining and organizing the teams and generally tuning the mindset of the organization into the data mesh frequency. Changing the way how people work is never an easy task. Organization don’t need to necessarily start building a Data Mesh from scratch, maybe you have a solid existing implementation, but you just change the way how it is governed, managed, and developed and by what kind of teams.

It is also important to remember, that Data Mesh is not always the best approach, as it requires independently working autonomous domain teams. The biggest benefits of Data Mesh are achieved in larger and complex organizations with rich data landscape. For smaller organizations a single centralized team might be a better alternative for the team set up perspective.

But still, it is not a waste of time to understand the concepts of product thinking, general technical requirements of a data mesh platform or the importance of data governance.

June 5, 2023 | Blog Featured Insights Technology

Painting the future with Microsoft Fabric – data landscape in one frame

The data world is abuzz with excitement as Microsoft launched into public a preview of its latest offering, Microsoft Fabric. This so-called all-in-one analytics solution has generated significant market hype across the data community, promising to revolutionize and simplify the data & analytics infrastructures and bring the “data into the era of AI”. What does this all mean in practice? Take a minute and let us tell you what the Fabric is all about.

Microsoft Fabric is a Software-as-a-Service (SaaS) solution wrapping all the different components of data landscape together under one package. With one licence you get it all what you need for your data environment: Data Factory, Synapse, Power BI and OneLake. You don’t need to buy the different resources separately anymore; it is all included into a single service and managed and governed centrally.

OneLake = centralized data storage for all your analytics data

OneLake is the other of the most remarkable features of the Fabric, as it aims to mitigate the need of data duplication within the whole solution. You, who have been working with data infrastructures, probably know that it is common that the data needs to be duplicated across the data solution’s layers for different analytical engines to support the different use cases of the data. In OneLake the data is stored in compressed parquet-format, and all the different analytical engines within the Fabric can query the same data efficiently.

To put this in context, both T-SQL engine for building a data warehouse and Analysis Service Engine for Power BI reports can use the same data as efficiently. Microsoft promises to extend this “One copy of data“ -paradigm further by enabling shortcuts for the data, so that different teams can use the same data for their specific purposes by creating virtual data products. In addition, OneLake offers a possibility to expand the lake into some third-party data storages, such as Amazon S3, without a need to move the data physically to the OneLake. Quite impressive.

Introducing AI to empower developers

The other remarkable feature of Fabric is the inclusion of the AI within the Fabric across the solution. This means introducing Copilot into all building blocks of the Fabric to assist you in your work to increase your efficiency. For example, in the future you can ask Copilot to build a HR report for you in Power BI. Interesting to see how well this feature is going to work. With Copilot Microsoft aims to empower the citizen developers to be more integral part of the data development process and thus promote the organizations to become even more data driven. Most of the Copilot features are still in Private Preview though, so we all must wait a bit longer to get our hands on these cool new features.

More sustainable tomorrow through innovation in resource efficiency

At Evitec, we have already begun exploring the capabilities that Microsoft Fabric offers. Our own OneLake is already up and running, and we are well in our way to uncover the possibilities of Fabric. While the service is still in preview mode, and some child-diseases are expected, many of the features seem promising. We truly are impressed by its ability to eliminate the need for data duplication.

As the volume of data continues to grow in the world, so does the carbon footprint of the data storage. And as we are thriving towards more sustainable tomorrow, it is important that also the data solutions are designed to be as resource efficient as possible, and here Fabric seems to make a clear difference by having the only one copy of the data. Given of course that the processing of the data does not lose the benefits gained by reduction of the storage.

Time will tell whether Fabric can claim all the promises Microsoft has made for it, but if it does, we think that Fabric is a real game changer in the data field. Join us to the journey to unravel the potential of your data with Microsoft Fabric!

Written by

Henni Niiranen

Data Consultant