Solved

CDF Extractors

1 year ago
22 May 2023
11 replies
105 views

eashwar11
42 replies

I have to build two 2 types of extractors.

1. PI extractor to connect to the PI server and fetch the data and ingest into CDF

2. SharePoint online - Extract data from files present in SharePoint online and ingest into CDF.

I wanted to know the construct in building the extractors. Mainly wanted to account for scenarios where the PI server is not available and PI extractor is unable to fetch the data from the pipeline. How to handle these kinds of situations and incorporate them in the code while building the extractor. Also, how to handle monitoring while performing extractor. Are there some sample code repos that can be referenced for getting complete idea for building extractors.

icon

Best answer by Dilini Fernando 22 June 2023, 08:52

View original

11 replies

eashwar11
Author
42 replies
1 year ago
22 May 2023

Thanks @mathialo for your inputs. Please could you share monitoring guidelines and means of handling the DataStream if there is any hiccups in the PI server connectivity etc. How does the extractor resume from where it stopped / halted. How to handle those scenarios in the extractor scripts.

mathialo
Practitioner
16 replies
1 year ago
22 May 2023

The extractor will reconnect automatically on connectivity issues. It also keeps track of extraction state in order to resume from where it left off in the case of the extractor being shut down or crashed.

From the docs page:

If the extractor reruns after a period of downtime, it resumes the backfill task and starts a frontfill task to fill in the gap between when the extractor stopped and the current time. When the frontfill task has caught up, the extractor returns to streaming live data points.
The extractor maintains an extraction state for the time range between the first and last data point inserted into CDF. Only the streaming task can insert data points in CDF within this range. Any changes to historical values already existing in CDF will only be updated in CDF when the extractor is streaming data.

eashwar11
Author
42 replies
1 year ago
22 May 2023

Thanks for the inputs @mathialo . So, I don't need to invest into doing any custom coding while using prebuilt extractors. I just need to configure with the right parameters and the data extraction can be accomplished.

mathialo
Practitioner
16 replies
1 year ago
22 May 2023

Correct, for something like Osisoft PI, it should be relatively plug-and-play. When you download the extractor from CDF you are also given an example configuration file you can use as a starting point for your own setup.

eashwar11
Author
42 replies
1 year ago
23 May 2023

Thanks @mathialo . I have two environments, DEV and PROD. So, should I manually do the same steps for extraction for PROD as well separately? Can I setup Extraction pipelines using Git actions so that we can automate these steps and not do these steps manually in the respective environments?

Userlevel 3

roman.chesnokov
Seasoned Practitioner
49 replies
1 year ago
23 May 2023

Hey @eashwar11, for the PI extractor, you need a Windows Server Machine. Then you can install and configure a few extractors, one for DEV and one for PROD. Extractors are usually running continuously. If you’re going to use a cloud VM, you can configure GH actions to update the config files and restart services, for example, depending on a particular cloud provider.

eashwar11
Author
42 replies
1 year ago
23 May 2023

Thanks @roman.chesnokov . Could you please share any reference material that has all the technical details provided. (using GH actions etc). I can note that the main documentation site shares the details in some abstract fashion.

I am looking for something that has the necessary details like in the boot-camp documentation.

Userlevel 3

roman.chesnokov
Seasoned Practitioner
49 replies
1 year ago
23 May 2023

@eashwar11 It strongly depends on the particular architecture and the use case. That’s probably why there are no particular examples of automation in the docs. Basically, you can just install a few instances of extractors on a VM manually and forget about that, that’s how it’s done for many cases.