CDF Extractors

Badge +2

I have to build two 2 types of extractors.

1. PI extractor to connect to the PI server and fetch the data and ingest into CDF

2. SharePoint online - Extract data from files present in SharePoint online and ingest into CDF.


I wanted to know the construct in building the extractors. Mainly wanted to account for scenarios where the PI server is not available and PI extractor is unable to fetch the data from the pipeline. How to handle these kinds of situations and incorporate them in the code while building the extractor. Also, how to handle monitoring while performing extractor. Are there some sample code repos that can be referenced for getting complete idea for building extractors.  



9 replies

We have pre-built extractors for both of these systems. You can read about them on our docs page:

The extractors themselves can be downloaded from after you have logged in.

Badge +2

Thanks @mathialo  for your inputs. Please could you share monitoring guidelines and means of handling the DataStream if there is any hiccups in the PI server connectivity etc. How does the extractor resume from where it stopped / halted. How to handle those scenarios in the extractor scripts.


The extractor will reconnect automatically on connectivity issues. It also keeps track of extraction state in order to resume from where it left off in the case of the extractor being shut down or crashed.


From the docs page:

If the extractor reruns after a period of downtime, it resumes the backfill task and starts a frontfill task to fill in the gap between when the extractor stopped and the current time. When the frontfill task has caught up, the extractor returns to streaming live data points.

The extractor maintains an extraction state for the time range between the first and last data point inserted into CDF. Only the streaming task can insert data points in CDF within this range. Any changes to historical values already existing in CDF will only be updated in CDF when the extractor is streaming data.

Badge +2

Thanks for the inputs @mathialo . So, I don't need to invest into doing any custom coding while using prebuilt extractors. I just need to configure with the right parameters and the data extraction can be accomplished.

Correct, for something like Osisoft PI, it should be relatively plug-and-play. When you download the extractor from CDF you are also given an example configuration file you can use as a starting point for your own setup.

Badge +2

Thanks @mathialo . I have two environments, DEV and PROD. So, should I manually do the same steps for extraction for PROD as well separately? Can I setup Extraction pipelines using Git actions so that we can automate these steps and not do these steps manually in the respective environments?


Userlevel 2

Hey @eashwar11, for the PI extractor, you need a Windows Server Machine. Then you can install and configure a few extractors, one for DEV and one for PROD. Extractors are usually running continuously. If you’re going to use a cloud VM, you can configure GH actions to update the config files and restart services, for example, depending on a particular cloud provider. 

Badge +2

Thanks @roman.chesnokov . Could you please share any reference material that has all the technical details provided. (using GH actions etc).  I can note that the main documentation site shares the details in some abstract fashion. 

I am looking for something that has the necessary details like in the boot-camp documentation. 

Userlevel 2

@eashwar11 It strongly depends on the particular architecture and the use case. That’s probably why there are no particular examples of automation in the docs. Basically, you can just install a few instances of extractors on a VM manually and forget about that, that’s how it’s done for many cases.