logo
Downloading Files Asynchronously with Rust Using Futures

Downloading Files Asynchronously with Rust Using Futures

1/4/2025

When working on a service that had to download many files from multiple sources it became apparent the files should be downloaded in groups rather than one at a time. In making the code asynchronous, I made extensive use of Futures, and I felt it was worth writing about because they made the changes pretty straightforward.

Originally, when handling a download request, the program iterated through a list of URLs. Each URL was downloaded and the returned file handler put into a Vec that was later deleted at the end of the request.

Having to wait on each download to finish before starting the next download was clearly unnecessary.

How does async work?

Asynchronous code in Rust works by creating Futures, which are structs with the Future trait. Futures don't do anything until they are awaited. When they are awaited, the current thread blocks until the Future has a return value. In that way, a Future in Rust is a lot like a Promise in javascript.

When making your own Future, it is much like a coroutine in Lua, C# and potentially other languages. The Future trait requires the poll function to be implemented. This function either returns a pending state or the final result, if it is available. When a Future returns pending, the poll function can be re-entered by calling the Future's associated wake function. This wake is stored in a Waker object, that can be copied to other threads, allowing other threads to call wake to signal when a Future can make progress.

Additionally, the Future trait requires an Output type to be defined.

note: More specifically, Futures are distributed into Tasks, which are Futures specifically sent to an executor. The executor has a channel it monitors for Tasks to execute (call poll on). When a Future is awoken, the associated Task is pushed back on to the executor's queue. Please see the Async Book for further reading on this topic.

note: When reviewing the details I found this bitbashing article that explains the philosophy of Futures in Rust and compares them to coroutines from Go: https://bitbashing.io/async-rust.html

The Implementation

In the app, I implemented the Future trait into a DownloadFuture struct. The DownloadFuture's poll function spawned a thread on its first call. This ensures that the download only begins when async is called on the future. This thread initiated the actual download and updated the state to reflect any errors or when the download was finished. At either event, the wake function was called and the Future was able to return the relevant result to the caller.

To illustrate this, the following code functions the same way. First, the program defines the structs that will be used, along with a new function for initializing the Future:

The implementation of the poll function as described follows:

edit: There is an assumption made with the above implementation that makes it inefficient. The download function still blocks the Task it is being run in. This means that the the thread the Task is running on is also blocked, along with any other Tasks on that thread. At best, this implementation will concurrently download as many files as there are threads available. For better performance, the Task should instead use an async function that correctly wakes itself when progress may be made. For example, the Client struct provided by the reqwest crate offers many async functions for the different HTTP requests such as get and post. Client works with OS primitives to call wake when the socket it uses is in a readable state, letting other tasks progress in the meantime. Credit goes to reddit user simonask_ for pointing this out.

These DownloadFutures were put into a Vec, and which was joined together into one Future with the try_join_all function, so that multiple files could be downloaded at the same time. Here is a tokio main function that does the same:

This made for a significant boost in speed, but it had the unintended side effect of overwhelming the network interface if there were too many files in one request. To get around this, instead of combining all of the Futures, the Vec was chunked by a configurable amount and each chunk was awaited one-by-one. The program could be made more efficient with a queue that maintains the number of downloads, but this solution was simpler and sufficiently fast.

edit: It has been pointed out by reddit user awesomeusername2w that a more convenient option to implementing a Future would be to spawn threads and regulate how many of them are running at a time with semaphores. Alternatively, as suggested by user alpako-sl, you could turn multiple Futures into a Stream and await them using StreamExt::buffer_unordered, which limits the number of awaiting Futures to a certain amount. While I hope my post has been informative on Futures, either of these would result in more concise programs. I have included an example of the first below.

Thank you for reading. I hope this article has been informative.