Downloading Files Asynchronously with Rust Using Futures
1/4/2025When working on a service that had to download many files from multiple sources it became apparent the files should be downloaded in groups rather than one at a time. In making the code asynchronous, I made extensive use of Future
s, and I felt it was worth writing about because they made the changes pretty straightforward.
Originally, when handling a download request, the program iterated through a list of URLs. Each URL was downloaded and the returned file handler put into a Vec
that was later deleted at the end of the request.
Having to wait on each download to finish before starting the next download was clearly unnecessary.
How does async work?
Asynchronous code in Rust works by creating Future
s, which are structs with the Future
trait. Future
s don't do anything until they are awaited. When they are awaited, the current thread blocks until the Future
has a return value. In that way, a Future
in Rust is a lot like a Promise
in javascript.
When making your own Future
, it is much like a coroutine in Lua, C# and potentially other languages. The Future
trait requires the poll
function to be implemented. This function either returns a pending state or the final result, if it is available. When a Future
returns pending, the poll
function can be re-entered by calling the Future
's associated wake
function. This wake
is stored in a Waker
object, that can be copied to other threads, allowing other threads to call wake
to signal when a Future
can make progress.
Additionally, the Future
trait requires an Output
type to be defined.
note: More specifically, Future
s are distributed into Task
s, which are Future
s specifically sent to an executor. The executor has a channel it monitors for Task
s to execute (call poll
on). When a Future
is awoken, the associated Task
is pushed back on to the executor's queue. Please see the Async Book for further reading on this topic.
note: When reviewing the details I found this bitbashing article that explains the philosophy of Future
s in Rust and compares them to coroutines from Go: https://bitbashing.io/async-rust.html
The Implementation
In the app, I implemented the Future
trait into a DownloadFuture
struct. The DownloadFuture
's poll
function spawned a thread on its first call. This ensures that the download only begins when async
is called on the future. This thread initiated the actual download and updated the state to reflect any errors or when the download was finished. At either event, the wake
function was called and the Future
was able to return the relevant result to the caller.
To illustrate this, the following code functions the same way. First, the program defines the structs that will be used, along with a new
function for initializing the Future
:
The implementation of the poll
function as described follows:
edit: There is an assumption made with the above implementation that makes it inefficient. The download
function still blocks the Task
it is being run in. This means that the the thread the Task
is running on is also blocked, along with any other Task
s on that thread. At best, this implementation will concurrently download as many files as there are threads available. For better performance, the Task
should instead use an async function that correctly wakes itself when progress may be made. For example, the Client
struct provided by the reqwest
crate offers many async functions for the different HTTP requests such as get
and post
. Client
works with OS primitives to call wake
when the socket it uses is in a readable state, letting other tasks progress in the meantime. Credit goes to reddit user simonask_
for pointing this out.
These DownloadFutures
were put into a Vec
, and which was joined together into one Future
with the try_join_all
function, so that multiple files could be downloaded at the same time. Here is a tokio main function that does the same:
This made for a significant boost in speed, but it had the unintended side effect of overwhelming the network interface if there were too many files in one request. To get around this, instead of combining all of the Future
s, the Vec
was chunked by a configurable amount and each chunk was awaited one-by-one. The program could be made more efficient with a queue that maintains the number of downloads, but this solution was simpler and sufficiently fast.
edit: It has been pointed out by reddit user awesomeusername2w
that a more convenient option to implementing a Future
would be to spawn threads and regulate how many of them are running at a time with semaphores. Alternatively, as suggested by user alpako-sl
, you could turn multiple Future
s into a Stream
and await them using StreamExt::buffer_unordered
, which limits the number of awaiting Future
s to a certain amount. While I hope my post has been informative on Future
s, either of these would result in more concise programs. I have included an example of the first below.
Thank you for reading. I hope this article has been informative.