Overview
When working with large datasets, video recordings, or log archives, you may need to upload very large files (5 GiB–1 TiB) to Cognite Data Fusion (CDF).
CDF supports two upload methods:
- Single-part upload — for files up to 5 GiB.
- Multipart upload — for files larger than 5 GiB or when improved reliability is needed.
This article explains why multipart upload is required, how it works, and how to use the Cognite Python SDK to perform such uploads safely and efficiently.
Why Multipart Upload Is Needed
Uploading very large files through a single HTTP request often fails because of:
- Connection resets / timeouts — CDF API gateway limits individual requests.
- Network instability — A small packet loss can interrupt long uploads.
- Retry inefficiency — A single failure at 9 GiB would force you to restart from 0 GB.
- Memory constraints — Large files can overload system RAM or buffers.
Multipart upload solves these issues by:
- Splitting the file into smaller, manageable chunks (5 MiB–4000 MiB each).
- Uploading each part separately (optionally in parallel).
- Assembling all parts on the server after successful upload.
- Allowing retry per part, instead of re-uploading the whole file
Multipart Upload Workflow
How it works:
- The SDK calls POST /files/initmultipartupload.
- CDF returns an uploadId and a list of uploadUrls.
- Each URL represents a file part (indexed 0…n).
- SDK uploads each chunk separately.
- Finally, POST /files/completemultipartupload merges them.
If you’re uploading a 10 GB+ file, always use multipart_upload_session(). It allows retries per chunk, parallel uploads, and avoids connection resets.
| Description | API | Documentation |
| Upload small files | POST /files | |
| Upload multipart files (init) | POST /files/initmultipartupload | |
| Complete multipart upload | POST /files/completemultipartupload | |
| SDK Wrapper | client.files.multipart_upload_session() |
Code Example (Python SDK)
Below is a working script using Cognite’s Python SDK to upload large files (up to 1 TB):
import os
import math
from typing import Tuple
from cognite.client import CogniteClient
from cognite.client.config import ClientConfig
from cognite.client.credentials import OAuthClientCredentials
# ---------- STEP 1: Setup Cognite Client ----------
creds = OAuthClientCredentials(
token_url="https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token",
client_id="<your-client-id>",
client_secret="<your-client-secret>",
scopes=["https://<your_cluster>.cognitedata.com/.default"],
)
cnf = ClientConfig(
client_name="large-file-uploader",
base_url="https://<your_cluster>.cognitedata.com",
project="<your-project-name>",
credentials=creds,
timeout=600, # Please adjust, SDK timeout per request
)
client = CogniteClient(cnf)
# ---------- STEP 2: Helper Function for Part Calculation ----------
def calculate_parts(file_size: int, max_parts: int = 250, target_part_mb: int = 512) -> Tuple[int, int]:
"""Returns (part_size, num_parts) for multipart upload."""
part_size = target_part_mb * 1024 * 1024
num_parts = (file_size + part_size - 1) // part_size
if num_parts > max_parts:
part_size = (file_size + max_parts - 1) // max_parts
num_parts = (file_size + part_size - 1) // part_size
# CDF limits
if part_size < 5 * 1024 * 1024:
part_size = 5 * 1024 * 1024
if part_size > 4000 * 1024 * 1024:
raise ValueError("Part size exceeds Cognite maximum of 4000 MiB")
return part_size, num_parts
# ---------- STEP 3: Multipart Upload ----------
def upload_large_file(file_path: str):
"""Uploads large files (>5GB) to Cognite Data Fusion."""
file_size = os.path.getsize(file_path)
part_size, num_parts = calculate_parts(file_size)
print(f"Uploading {file_path} ({file_size / 1024 / 1024:.2f} MB)")
print(f"→ Splitting into {num_parts} parts of ~{part_size / 1024 / 1024:.2f} MB")
with client.files.multipart_upload_session(name=os.path.basename(file_path), parts=num_parts) as session:
with open(file_path, "rb") as f:
for i in range(num_parts):
chunk = f.read(part_size)
if not chunk:
break
print(f"➡️ Uploading part {i + 1}/{num_parts} ({len(chunk)/1024/1024:.2f} MB)")
session.upload_part(i, chunk)
print(f"✅ Part {i + 1} uploaded successfully")
print("🎉 Upload complete!")
# ---------- STEP 4: Run ----------
upload_large_file("Data_models.mp4")
Check the
documentation
Ask the
Community
Take a look
at
Academy
Cognite
Status
Page
Contact
Cognite Support