it is our intent that the PowerBi can be used as a real time visualization tool for many use cases including:
- Analytics on Compressor Efficiency
- Real time monitoring off Natural Gas Consumption in our methanol unit
- Work Order Wrench Time Optimization
- Shift Schedule and OT optimization
The problem with PowerBi connector is that it currently treats CDF as a data source and then imports data on a scheduled basis. There is no federated connection available where data is not imported or duplicated. Microsoft has confirmed this issue lies with Cognite and not on their side. The current Powerbi connector forces an import and does data duplication in the Powerbi space. This method has two names, some call it Import Data, and some call it Scheduled Refresh. Both names explain the behavior of this method. With this method data from the source will be loaded into Power BI. Loading in Power BI means consuming memory and disk space. As long as you are developing Power BI on your machine with Power BI Desktop, then it would be the memory and disk space of your machine. When you publish the report on the website, then it will be the memory and disk space of the Power BI cloud machines on AZURE.
If you have 1 Million rows in a source table, and you load it into Power BI with no filtering, you end up having the same amount of data rows in Power BI. If you have a database with 1000 tables, however, you only load 10 of those tables in Power BI, then you get memory consumption for only those 10 tables. Bottom line is that you spent memory and disc space as much as you load data into Power BI. This also has limitation on how much can be imported and what frequency. This not only limits us in scale but does not provide data in real time.
PowerBi live connection is very similar to DirectQuery in the way that it works with the data source. It will not store data in Power BI, and it will query the data source every time. Power BI will be a visualization layer, then query the data from the data source every time. Power BI will only store metadata of tables (table names, column names, relationships…) but not the data. Power BI file size will be much smaller, and most probably you never hit the limitation of the size because there is no data stored in the model.
Because these data sources are modeling engines themselves, Power BI only connects to these and fetches all model metadata (measure names, attribute names, relationships…). With this method, you need to handle all your modeling requirements in the data source which in our case is CDF. and Power BI just surfaces that data through Visualization.
I hope this make sense.