Prerequisites
Administrative permissions are essential for both the Azure Active Directory instance, where the app registration is hosted, and the SharePoint Server instance to which the extractor will connect. Before proceeding, ensure that the necessary administrative access is in place for a seamless setup and operation of the application.
NOTE: It may be the case that the SharePoint Server instance is hosted by the client and inaccessible by Cognite. In this case, you will need to work with a member of the client team to run through the steps where Admin access to SharePoint is required.
Azure Active Directory
First, create a new app registration associated with the project and generate a client secret. Navigate to the “API Permissions” tab, add permissions for the Microsoft Graph API with following access:
- Sites.FullControlAll - Delegated
- Sites.Read.All - Application
- User.Read - Delegated
- (Other permissions) Sites.FillControl.All - Application
SharePoint Settings
- Warning: This section requires administrative privileges. Contact the owner of the SharePoint Site to gain access.
- Note: If using the free developer version of SharePoint Server, insert “-admin” after the site name in “site.sharepoint.com” (eg: “site-admin.sharepoint.com”).
Setting up an app-only principal with tenant permissions
Navigate to a site in your tenant (e.g. https://contoso.sharepoint.com) and then call the appregnew.aspx page (e.g. https://contoso.sharepoint.com/_layouts/15/appregnew.aspx). In this page click on the Generate button to generate a client id and client secret and fill the remaining information like shown in the image below.
Next step is granting permissions to the newly created principal. Since we're granting tenant scoped permissions this granting can only be done via the appinv.aspx page on the tenant administration site. You can reach this site via https://contoso-admin.sharepoint.com/_layouts/15/appinv.aspx. Once the page is loaded add your client id and look up the created principal:
To grant permissions, you'll need to provide the permission XML that describes the needed permissions. Since this application needs to be able to access all sites + also uses search with app-only it needs below permissions.
<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" />
</AppPermissionRequests>
When you click on “Create” you'll be presented with a permission consent dialog. Press Trust It to grant the permissions.
Linking Azure App to SharePoint Site
- Getting the Site-ID
Append “/_api/site/id” to the base URL of the SharePoint site. Assume your SharePoint base URL is https://cognite.sharepoint.com/sites/CogniteExampleDocSite, Then it should be structured as: https://cognite.sharepoint.com/sites/CogniteExampleDocSite/_api/site/id. This will return the site ID as highlighted in the box below.
- Getting the web-ID
Similarly append “/_api/web/id” to the base url of the Sharepoint site. Assume your Sharepoint base URL is https://cognite.sharepoint.com/sites/CogniteExampleDocSite, Then it should be structured as: https://cognite.sharepoint.com/sites/CogniteExampleDocSite/_api/web/id. This will return the Web-ID as highlighted in the box below.
In the next steps, we will be utilizing the Microsoft Graph API tool. Issue following Using Microsoft Graph API tool, start a GET request using the following header.
https://graph.microsoft.com/v1.0/sites/<SITE>.sharepoint.com,{Site-Id},{Web-Id}/permissions
- <SITE> : Sharepoint site. Eg: cognite.sharepoint.com
- <Site-ID> : Site-ID retrieved from step 1.
- <Web-ID> : Web-ID retrieved from step 2.
Then the final URL should look something like this: https://graph.microsoft.com/v1.0/sites/8g1fsr.sharepoint.com,84d271dd-1f22-4ebe-829d-45019865c2a7,c30567ee-553d-49a7-ad4b-72b404d15bab/permissions
Use the following JSON package as the request body.
If everything is done correctly, a successful run should give a response similar to the following:
{
"@odata.context": "https://graph.microsoft.com/v1.0/$metadata#sites('8g1fsr.sharepoint.com%2C84d271dd-1f22-4ebe-829d-45019865c2a7%2Cc30567ee-553d-49a7-ad4b-72b404d15bab')/permissions",
"value": [
{
"id": "aTowaS50fG1zLnNwLmV4dHwxOWYwZTI4ZS01ZjE1LTQ4NmItYjY4YS1iNDE1YjE2OWMxNDZAMjUxYjdhMTYtNzllYi00YmEyLWE0YjQtMDlkMTJiMzQ3YTJi",
"grantedToIdentitiesV2": [
{
"application": {
"displayName": "Sharepoint-extractor-cognite",
"id": "19f0e28e-5f15-486b-b68a-b415b169c146"
}
}
],
"grantedToIdentities": [
{
"application": {
"displayName": "Sharepoint-extractor-cognite",
"id": "19f0e28e-5f15-486b-b68a-b415b169c146"
}
}
]
},
{
"id": "aTowaS50fG1zLnNwLmV4dHxiMDhmNDQ3YS01ZTU5LTQ0NjItYjdkYi1jNTIyOTY5MmIxYzhAMjUxYjdhMTYtNzllYi00YmEyLWE0YjQtMDlkMTJiMzQ3YTJi",
"grantedToIdentitiesV2": [
{
"application": {
"displayName": "Sharepoint-extractor-cognite",
"id": "b08f447a-5e59-4462-b7db-c5229692b1c8"
}
}
],
"grantedToIdentities": [
{
"application": {
"displayName": "Sharepoint-extractor-cognite",
"id": "b08f447a-5e59-4462-b7db-c5229692b1c8"
}
}
]
}
]
}
Run the Cognite File Extractor
Download the latest version of Cognite DB extractor from CDF. Open the extractor config file and adjust the following fields as illustrated below. (Note: Set the logging level to DEBUG for more information if the extraction is failing).
- For the “files” section in the configuration, use the client id/ secret that was generated in the “SharePoint Settings” section previously.
- The tenant ID is also generated in the “SharePoint Settings” section.
- For base url/ site/ document-library, refer to the image below.
Now you are all set. To initiate the extractor on MS Windows, navigate to the command prompt and change directories to where the 'file_extractor-<version>.exe' file is located. You can start the extractor using the command 'file_extractor-<version>.exe config.yaml'
.