
Within the two earlier posts of the Incremental Refresh in Energy BI sequence, we have now discovered what incremental refresh is, the way to implement it, and finest practices on the way to safely publish the semantic mannequin modifications to Microsoft Material (aka Energy BI Service). This publish focuses on a few extra finest practices in implementing incremental refresh on massive semantic fashions in Energy BI.
Observe
Since Might 2023 that Microsoft introduced Microsoft Material for the primary time, Energy BI is part of Microsoft Material. Therefore, we use the time period Microsoft Material all through this publish to check with Energy BI or Energy BI Service.
Implementing incremental refresh on Energy BI is normally simple if we fastidiously comply with the implementation steps. Nevertheless in some real-world eventualities, following the implementation steps just isn’t sufficient. In numerous components of my newest e-book, Professional Knowledge Modeling with Energy BI, 2’nd Version, I emphasis the truth that understanding enterprise necessities is the important thing to each single growth mission and information modelling is not any completely different. Let me clarify it extra within the context of incremental information refresh implementation.
Let’s say we adopted all of the required implementation steps and we additionally adopted the deployment finest practices and all the things runs fairly good in our growth setting; the primary information refresh takes longer, we we anticipated, all of the partitions are additionally created and all the things appears positive. So, we deploy the answer to manufacturing setting and refresh the semantic mannequin. Our manufacturing information supply has considerably bigger information than the event information supply. So the information refresh takes manner too lengthy. We wait a few hours and depart it to run in a single day. The subsequent day we discover out that the primary refresh failed. Among the prospects that lead the primary information refresh to fail are Timeout, Out of sources, or Out of reminiscence errors. This will occur no matter your licensing plan, even on Energy BI Premium capacities.
One other subject chances are you’ll face normally occurs throughout growth. Many growth groups attempt to hold their growth information supply’s measurement as shut as doable to their manufacturing information supply. And… NO, I’m NOT suggesting utilizing the manufacturing information supply for growth. Anyway, chances are you’ll be tempted to take action. You set one month’s price of knowledge utilizing the RangeStart and RangeEnd parameters simply to seek out out that the information supply really has a whole bunch of hundreds of thousands of rows in a month. Now, your PBIX file in your native machine is manner too massive so you can not even put it aside in your native machine.
This publish supplies some finest practices. Among the practices this publish focuses on require implementation. To maintain this publish at an optimum size, I save the implementations for future posts. With that in thoughts, let’s start.
Thus far, we have now scratched the floor of some frequent challenges that we could face if we don’t take note of the necessities and the dimensions of the information being loaded into the information mannequin. The excellent news is that this publish explores a few good practices to ensure smoother and extra managed implementation avoiding the information refresh points as a lot as doable. Certainly, there would possibly nonetheless be circumstances the place we comply with all finest practices and we nonetheless face challenges.
Observe
Whereas implementing incremental refresh is accessible in Energy BI Professional semantic fashions, however the restrictions on parallelism and lack of XMLA endpoint is likely to be a deal breaker in lots of eventualities. So most of the methods and finest practices mentioned on this publish require a premium semantic mannequin backed by both Premium Per Consumer (PPU), Energy BI Capability (P/A/EM) or Material Capability.
The subsequent few sections clarify some finest practices to mitigate the dangers of going through troublesome challenges down the highway.
Apply 1: Examine the information supply by way of its complexity and measurement
This one is simple; probably not. It’s essential to know what sort of beast we’re coping with. If in case you have entry to the pre-production information supply or to the manufacturing, it’s good to understand how a lot information shall be loaded into the semantic mannequin. Let’s say the supply desk incorporates 400 million rows of knowledge for the previous 2 years. A fast math means that on common we may have greater than 16 million rows monthly. Whereas these are simply hypothetical numbers, you’ll have even bigger information sources. So having some information supply measurement and progress estimation is at all times useful for taking the following steps extra completely.
Apply 2: Hold the date vary between the RangeStart and RangeEnd small
Persevering with from the earlier observe, if we cope with pretty massive information sources, then ready for hundreds of thousands of rows to be loaded into the information mannequin at growth time doesn’t make an excessive amount of sense. So relying on the numbers you get from the earlier level, choose a date vary that’s sufficiently small to allow you to simply proceed along with your growth while not having to attend a very long time to load the information into the mannequin with each single change within the Energy Question layer. Bear in mind, the date vary chosen between the RangeStart and RangeEnd does NOT have an effect on the creation of the partition on Microsoft Material after publishing. So there wouldn’t be any points in case you selected the values of the RangeStart and RangeEnd to be on the identical day and even at the very same time. One necessary level to recollect is that we can not change the values of the RangeStart and RangeEnd parameters after publishing the mannequin to Microsoft Material.
Apply 3: Be conscious of variety of parallelism
As talked about earlier than, one of many frequent challenges arises after the semantic mannequin is printed to Microsoft Material and is refreshed for the primary time. It isn’t unusual to refresh massive semantic fashions that the primary refresh will get timeout and fails. There are a few prospects inflicting the failure. Earlier than we dig deeper, let’s take a second to remind ourselves of what actually occurs behind the scenes on Microsoft Material when a semantic mannequin containing a desk with incremental refresh configuration refreshes for the primary time. On your reference, this publish explains all the things in additional element.
What occurs in Microsoft Material to semantic fashions containing tables with incremental refresh configuration?
Once we publish a semantic mannequin from Energy BI Desktop to Microsoft Material, every desk within the printed semantic mannequin has a single partition. That partition incorporates all rows of the desk which can be additionally current within the information mannequin on Energy BI Desktop. When the primary refresh operates, Microsoft Material creates information partitions, categorised as incremental and historic partitions, and optionally a real-time DirectQuery partition primarily based on the incremental refresh coverage configuration. When the real-time DirectQuery partition is configured, the desk is a Hybrid desk. I’ll talk about Hybrid tables in a future publish.
Microsoft Material begins loading the information from the information supply into the semantic mannequin in parallel jobs. We will management the parallelism from the Energy BI Desktop, from Choices -> CURRENT FILE -> Knowledge Load -> Parallel loading of tables. This configuration controls the variety of tables or partitions that shall be processed in parallel jobs. This configuration impacts the parallelism of the present file on Energy BI Desktop whereas loading the information into the native information mannequin. It additionally influences the parallelism of the semantic mannequin after publishing it to Microsoft Material.

Because the previous picture exhibits, I elevated the Most variety of concurrent jobs to 12.
The next picture exhibits refreshing the semantic mannequin with 12 concurrent jobs on a Premium workspace on Microsoft:

The default is 6 concurrent jobs, which means that after we refresh the mannequin in Energy BI Desktop or after publishing it to Microsoft Material, the refresh course of picks 6 tables, or 6 partitions to run in parallel.
The next picture exhibits refreshing the semantic mannequin with the default concurrent jobs on a Premium workspace on Microsoft:

Tip
I used the Analyse my Refresh software to visualise my semantic mannequin refreshes. An enormous shout out to the legendary Phil Seamark for creating such an incredible software. Learn extra about the way to use the software on Phil’s weblog.
We will additionally change the Most variety of concurrent jobs from third-party instruments comparable to Tabular Editor; due to the wonderful Daniel Otykier for creating this glorious software. Tabular Editor makes use of the SSAS Tabular mannequin property referred to as MaxParallelism which is proven as Max Parallelism Per Refresh on the software (take a look at the beneath picture from Tabular Editor 3).

Whereas loading the information in parallel would possibly enhance the efficiency, relying on the information quantity being loaded into every partition, the concurrent question limitations on the information supply, and the useful resource availability in your capability, there’s nonetheless a danger of getting timeouts. In order a lot as rising the Most variety of concurrent jobs is tempting, it’s suggested to vary it with care. It is usually worthwhile to say that the behaviour of Energy BI Desktop in refreshing the information is completely different from Microsoft Material’s semantic mannequin information refresh exercise. Subsequently, whereas altering the Most variety of concurrent jobs could affect the engine on Microsoft Material’s semantic mannequin, it doesn’t assure of getting higher efficiency. I encourage you to learn Chris Webb’s weblog on this matter.
Apply 4: Contemplate making use of incremental insurance policies with out partition refresh on premium semantic fashions
When working with massive premium semantic fashions, implementing incremental refresh insurance policies is a key technique to handle and optimise information refreshes effectively. Nevertheless, there is likely to be eventualities the place we have to apply incremental refresh insurance policies to our semantic mannequin with out instantly refreshing the information inside the partitions. This observe is especially helpful to manage the heavy lifting of the preliminary information refresh. By doing so, we be certain that our mannequin is prepared and aligned with our incremental refresh technique, with out triggering a time-consuming and resource-intensive information load.
There are a few methods to realize this. The only manner is to make use of Tabular Editor to use the incremental coverage which means that each one partitions are created however they aren’t processed. The next picture exhibits the previous course of:

The opposite technique that some builders would possibly discover useful, particularly in case you are not allowed to make use of third-party instruments comparable to Tabular Editor is so as to add a brand new question parameter within the Energy Question Editor on Energy BI Desktop to manage the information refreshes. This technique ensures that the primary refresh of the semantic mannequin after publishing it to Microsoft Material can be fairly quick with out utilizing any third-party instruments. Which means Microsoft Material creates and refreshes (aka processes) the partitions, however since there is no such thing as a information to load, the processing can be fairly fast.
The implementation of this method is straightforward; we outline a brand new question parameter. We then use this new parameter to filter out all information from the desk containing incremental refresh. In fact, we wish this filter to fold so the complete question on the Energy Question aspect is totally foldable. So after we publish the semantic mannequin to Microsoft Material, we apply the preliminary refresh. Because the new question parameter is accessible through the semantic mannequin’s settings on Microsoft Material, we modify its worth after the preliminary information refresh to load the information when the following information refresh takes place.
You will need to word that altering the parameter’s worth after the preliminary information refresh won’t populate the historic Vary. It signifies that when the following refresh occurs, Microsoft Material assumes that the historic partitions are already refreshed and ignores them. Subsequently, after the preliminary refresh the historic partitions stay empty, however the incremental partitions shall be populated. To refresh the historic partitions we have to manually refresh them through XMLA endpoints which will be performed utilizing SSMS or Tabular Editor.
Explaining the implementation of this technique makes this weblog very lengthy so I put it aside for a separate publish. Keep tuned in case you are enthusiastic about studying the way to implement this method.
Apply 5: Validate your partitioning technique earlier than implementation
Partitioning technique refers to planning how the information goes to be divided into partitions to match the enterprise necessities. For instance, let’s say we have to analyse the information for 10 years. As information quantity to be loaded right into a desk is massive, it doesn’t make sense to truncate the desk and totally refresh it each evening. Through the discovery workshops, you came upon that the information modifications every day and it’s extremely unlikely for the information to vary as much as 7 days.
Within the previous state of affairs, the historic vary is 10 years and the incremental vary is 7 days. As there aren’t any indications of any real-time information change necessities, there is no such thing as a have to hold the incremental vary in DirectQuery mode which turns our desk right into a hybrid desk.
The incremental coverage for this state of affairs ought to seem like the next picture:

So after publishing the semantic mannequin to Microsoft Material and the primary refresh, the engine solely refreshes the final 7 partitions on the following refreshes as proven within the following picture:

Deciding on the incremental coverage is a strategic resolution. An inaccurate understanding of the enterprise necessities results in an inaccurate partitioning technique, therefore inefficient incremental refresh which may have some severe negative effects down the highway. That is a type of circumstances that may result in erasing the prevailing partitions, creating new partitions, and refreshing them for the primary time. As you possibly can see, a easy mistake in our partitioning technique will result in incorrect implementation that results in a change within the partitioning coverage which suggests a full information load shall be required.
Whereas understanding the enterprise necessities in the course of the discovery workshops is important, everyone knows that the enterprise necessities evolve on occasion; and actually, the tempo of the modifications is usually fairly excessive.
For instance, what occurs if a brand new enterprise requirement comes up involving real-time information processing for the incremental vary aka hybrid desk? Whereas it’d sound to be a easy change within the incremental refresh configuration, in actuality, it isn’t that straightforward. To clarify extra, to get the perfect out of a hybrid desk implementation, we should always flip the storage mode of all of the related dimensions to the hybrid desk into Twin mode. However that isn’t a easy course of both if the prevailing dimensions’ storage modes are already set to Import. We can not swap the storage mode of the tables from Import to both Twin or DirectQuery modes. Which means we have now to take away and add these tables once more which in real-world eventualities just isn’t that straightforward. As talked about earlier than I’ll write one other publish about hybrid tables sooner or later, so chances are you’ll think about subscribing to my weblog to get notified on all new posts.
Apply 6: Think about using the Detect information modifications for extra environment friendly information refreshes
Let’s clarify this part utilizing our earlier instance the place we configured the incremental refresh to archive 10 years of knowledge and incrementally refresh 7 days of knowledge. This implies Energy BI is configured to solely refresh a subset of the information, particularly the information from the final 7 days, fairly than the complete semantic mannequin. The default refreshing mechanism in Energy BI for tables with incremental refresh configuration is to maintain all of the historic partitions intact, truncate the incremental partitions, and reload them. Nevertheless in eventualities coping with massive semantic fashions, the incremental partitions may very well be pretty massive, so the default truncation and cargo of the incremental partitions wouldn’t be an optimum strategy. Right here is the place the Detect information modifications characteristic may help. Configuring this characteristic within the incremental coverage requires an additional DateTime column, comparable to LastUpdated, within the information supply which is utilized by Energy BI to first detect the information modifications, then solely refresh the particular partitions which have modified for the reason that earlier refresh as a substitute of truncating and reloading all incremental partitions. Subsequently, the refreshes doubtlessly course of smaller quantities of knowledge utilising fewer sources in comparison with common incremental refresh configuration. The column used for detecting information modifications have to be completely different from the one used to partition the information with the _RangeStart and RangeEnd parameters. Energy BI makes use of the utmost worth of the column used for outlining the Detect information modifications characteristic to determine the modifications from the earlier refresh and solely refreshes the modified partitions and shops it within the refreshBookmark property of the partitions inside the incremental vary.
Whereas the Detect information modifications can enhance the information refresh efficiency, we are able to improve it even additional. One doable enhancement can be to keep away from importing the LastUpdated column into the semantic mannequin which is more likely to be a high-cardinality column. One choice is to create a brand new question inside the Energy Question Editor in Energy BI Desktop to determine the utmost date inside the date vary filtered by the RangeStart and RangeEnd parameters. We then use this question within the pollingExpression property of our refresh coverage. This may be performed in varied methods comparable to operating TMSL scripts through XMLA endpoint* or utilizing Tabular Editor. I will even clarify this technique in additional element in a future publish, so keep tuned.
This publish of the Incremental Refresh in Energy BI sequence delved into some finest practices for implementing incremental refresh methods, notably for giant semantic fashions, and underscored the significance of aligning these methods with enterprise necessities and information complexities. We’ve navigated by way of frequent challenges and provided sensible finest practices to mitigate dangers, enhance efficiency, and guarantee smoother information refresh processes. I’ve a few extra blogs from this sequence in my pipeline so keep tuned for these and subscribe to my weblog to get notified once I publish a brand new publish. I hope you loved studying this lengthy weblog and discover it useful.
As at all times, be happy to go away your feedback and ask questions, comply with me on LinkedIn and @_SoheilBakhshi on X (previously Twitter).
Associated