Skip to main content

How to Upload Large Files to CDF [Cognite Official]

  • October 31, 2025
  • 0 replies
  • 19 views

Forum|alt.badge.img

Overview

When working with large datasets, video recordings, or log archives, you may need to upload very large files (5 GiB–1 TiB) to Cognite Data Fusion (CDF).

CDF supports two upload methods:

  1. Single-part upload — for files up to 5 GiB.
     
  2. Multipart upload — for files larger than 5 GiB or when improved reliability is needed.
     

This article explains why multipart upload is required, how it works, and how to use the Cognite Python SDK to perform such uploads safely and efficiently.

 

Why Multipart Upload Is Needed

Uploading very large files through a single HTTP request often fails because of:

  • Connection resets / timeouts — CDF API gateway limits individual requests.
  • Network instability — A small packet loss can interrupt long uploads.
  • Retry inefficiency — A single failure at 9 GiB would force you to restart from 0 GB.
  • Memory constraints — Large files can overload system RAM or buffers.
     

Multipart upload solves these issues by:

  • Splitting the file into smaller, manageable chunks (5 MiB–4000 MiB each).
  • Uploading each part separately (optionally in parallel).
  • Assembling all parts on the server after successful upload.
  • Allowing retry per part, instead of re-uploading the whole file

 

Multipart Upload Workflow

 

 How it works:

  1. The SDK calls POST /files/initmultipartupload.
  2. CDF returns an uploadId and a list of uploadUrls.
  3. Each URL represents a file part (indexed 0…n).
  4. SDK uploads each chunk separately.
  5. Finally, POST /files/completemultipartupload merges them.

    If you’re uploading a 10 GB+ file, always use multipart_upload_session(). It allows retries per chunk, parallel uploads, and avoids connection resets.

Description

API

Documentation

Upload small files

POST /files

InitFileUpload

Upload multipart files (init)

POST /files/initmultipartupload

Multipart File Upload API

Complete multipart upload

POST /files/completemultipartupload

Complete Multipart Upload

SDK Wrapper

client.files.multipart_upload_session()

Cognite Python SDK: Files

 

Code Example (Python SDK)

 

Below is a working script using Cognite’s Python SDK to upload large files (up to 1 TB):

import os

import math

from typing import Tuple

from cognite.client import CogniteClient

from cognite.client.config import ClientConfig

from cognite.client.credentials import OAuthClientCredentials




# ---------- STEP 1: Setup Cognite Client ----------

creds = OAuthClientCredentials(

    token_url="https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token",

    client_id="<your-client-id>",

    client_secret="<your-client-secret>",

    scopes=["https://<your_cluster>.cognitedata.com/.default"],

)



cnf = ClientConfig(

    client_name="large-file-uploader",

    base_url="https://<your_cluster>.cognitedata.com",

    project="<your-project-name>",

    credentials=creds,

    timeout=600,  # Please adjust, SDK timeout per request

)



client = CogniteClient(cnf)




# ---------- STEP 2: Helper Function for Part Calculation ----------

def calculate_parts(file_size: int, max_parts: int = 250, target_part_mb: int = 512) -> Tuple[int, int]:

    """Returns (part_size, num_parts) for multipart upload."""

    part_size = target_part_mb * 1024 * 1024

    num_parts = (file_size + part_size - 1) // part_size



    if num_parts > max_parts:

        part_size = (file_size + max_parts - 1) // max_parts

        num_parts = (file_size + part_size - 1) // part_size



    # CDF limits

    if part_size < 5 * 1024 * 1024:

        part_size = 5 * 1024 * 1024

    if part_size > 4000 * 1024 * 1024:

        raise ValueError("Part size exceeds Cognite maximum of 4000 MiB")



    return part_size, num_parts




# ---------- STEP 3: Multipart Upload ----------

def upload_large_file(file_path: str):

    """Uploads large files (>5GB) to Cognite Data Fusion."""

    file_size = os.path.getsize(file_path)

    part_size, num_parts = calculate_parts(file_size)



    print(f"Uploading {file_path} ({file_size / 1024 / 1024:.2f} MB)")

    print(f"→ Splitting into {num_parts} parts of ~{part_size / 1024 / 1024:.2f} MB")



    with client.files.multipart_upload_session(name=os.path.basename(file_path), parts=num_parts) as session:

        with open(file_path, "rb") as f:

            for i in range(num_parts):

                chunk = f.read(part_size)

                if not chunk:

                    break

                print(f"➡️ Uploading part {i + 1}/{num_parts} ({len(chunk)/1024/1024:.2f} MB)")

                session.upload_part(i, chunk)

                print(f"✅ Part {i + 1} uploaded successfully")



    print("🎉 Upload complete!")




# ---------- STEP 4: Run ----------

upload_large_file("Data_models.mp4")