Skip to main content
Solved

Streaming upload to CDF Files

  • June 16, 2023
  • 10 replies
  • 183 views

thomafred
Seasoned ⭐️⭐️⭐️
Forum|alt.badge.img

Hi there!

I have a usecase where a file is uploaded by a user to an API. The API then uploads the file to CDF Files. We want to avoid having to have the full file in memory at the same time, and therefore must stream the file contents from the request handler directly into CDF Files.

There are two ways of achieving this:

  • Stream the request body from the request handler directly into CDF Files’ upload URL
  • Chunk the request body and upload each chunk as separate requests.

The first option may be achievable, but I don’t believe the second option is possible.

Do you have any insight whether it is possible to chunk a file upload like this in CDF Files?

Best answer by Dilini Fernando

Hi @thomafred,

As of now, I will close this thread. If you have any questions, please feel free to reply here.

Best regards,
Dilini  

10 replies

roman.chesnokov
Expert ⭐️⭐️⭐️⭐️
  • Expert ⭐️⭐️⭐️⭐️
  • June 16, 2023

Hey @thomafred ,

 

According to the documentation :

If the uploadUrl contains the string '/v1/files/gcs_proxy/', you can make a Google Cloud Storage (GCS) resumable upload request as documented in https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload.

 

And following the link there is an instruction about uploading files in chunks. 

I hope it helps you. Feel free to let me know if you have any additional questions.


thomafred
Seasoned ⭐️⭐️⭐️
Forum|alt.badge.img
  • Author
  • Seasoned ⭐️⭐️⭐️
  • June 16, 2023

Will this work if our CDF tenant is located on Azure?


roman.chesnokov
Expert ⭐️⭐️⭐️⭐️
  • Expert ⭐️⭐️⭐️⭐️
  • June 16, 2023

@thomafred could you double-check that? If your IdP is Azure, that doesn’t mean yet that CDF resources are also hosted on Azure. But if you are, then I suppose you can leverage the documentation for Azure blob storage. 


thomafred
Seasoned ⭐️⭐️⭐️
Forum|alt.badge.img
  • Author
  • Seasoned ⭐️⭐️⭐️
  • June 16, 2023

We are exclusively running CDF on Azure :p We are also using Azure as idP.

In other words - yes to both :p


roman.chesnokov
Expert ⭐️⭐️⭐️⭐️
  • Expert ⭐️⭐️⭐️⭐️
  • June 16, 2023

@thomafred then, most probably, you need to create a blob with Put Blob as described in the documentation and then use Put Block to upload chunks and Put Block List to commit the chunks after they are all uploaded. But it should be proven experimentally, I haven’t dealt with that particular case.


Dilini Fernando
Expert ⭐️⭐️⭐️⭐️
Forum|alt.badge.img+2
  • Expert ⭐️⭐️⭐️⭐️
  • June 27, 2023

Hi @thomafred,

I hope Roma’s reply has helped you. Let us know if you have more questions.  

Best regards,
Dilini

 


Dilini Fernando
Expert ⭐️⭐️⭐️⭐️
Forum|alt.badge.img+2
  • Expert ⭐️⭐️⭐️⭐️
  • Answer
  • July 3, 2023

Hi @thomafred,

As of now, I will close this thread. If you have any questions, please feel free to reply here.

Best regards,
Dilini  


thomafred
Seasoned ⭐️⭐️⭐️
Forum|alt.badge.img
  • Author
  • Seasoned ⭐️⭐️⭐️
  • July 3, 2023

Sorry about the delay, I have only now been able to follow up on this.

Did a quick proof-of-concept using postman, and using `PUT block` and `PUT blocklist` seems to work, however there are some limitations.

First and foremost, it appears that I am not authorized do the `GET blocklist`-operation (https://learn.microsoft.com/en-us/rest/api/storageservices/get-block-list?tabs=azure-ad), and instead getting the following error:

<?xml version="1.0" encoding="utf-8"?>
<Error>
<Code>AuthorizationPermissionMismatch</Code>
<Message>This request is not authorized to perform this operation using this permission.
RequestId:44177ea5-601e-0030-78d2-ada960000000
Time:2023-07-03T17:21:35.0950174Z</Message>
</Error>

I was able to do an async Python POC though :)

 


thomafred
Seasoned ⭐️⭐️⭐️
Forum|alt.badge.img
  • Author
  • Seasoned ⭐️⭐️⭐️
  • July 13, 2023

An update on this.

Block updates work quite well. However, after a few minutes (typically around 4 minutes, but appears to be somewhat random), the upload may fail due to a read-error on a block update. There doesn’t appear to be any response from the server.


  • Expert ⭐️⭐️⭐️⭐️
  • July 18, 2023

Just for documentation if others are interested, there’s a follow up post on the issue above: