Problem

Bounded Concurrent Crawler (Rust)

Topics: graphs, BFS, concurrency, worker pool, visited dedup

Problem

Crawl every page reachable from a seed by following its links, fetching pages concurrently but without hammering the site: cap the number of in-flight fetches, and never fetch the same page twice.

pub fn crawl<F>(seed: &str, workers: usize, fetch: F) -> Vec<String>
where F: Fn(&str) -> Vec<String> + Sync;

fetch(page) returns the pages linked from page; it is the only way to discover the graph.
Visit every page reachable from seed (including seed itself).
Each distinct page is fetched at most once, even when it is linked many times or sits on a cycle.
At most workers fetches run concurrently.
Return the set of visited pages, sorted (the crawl order is nondeterministic; the result is not).

{a:[b], b:[c], c:[]}; seed=a          → ["a","b","c"]
{a:[b,c], b:[a], c:[]}; seed=a        → ["a","b","c"]   (cycle visited once)
{home:[about], island:[home]}; seed=home → ["about","home"]   (island unreachable)

This is a concurrent flood fill: workers pull pages from a shared frontier, fetch their links outside the lock, and enqueue any not-yet-seen page. A BTreeSet of visited pages dedups, stops cycles, and yields the sorted result; an in-flight counter (plus a Condvar) lets a worker tell "more links may still arrive" apart from "the graph is exhausted, exit." thread::scope lets the workers borrow the shared state directly.

Grading

Your solution.rs is compiled together with a trusted tests.rs (which include!s it) using rustc --test. Only solution.rs is yours to edit.

Leaderboard score: lower is better, 1.0 ≈ reference

#	Name	score	ns/op	allocs

Output

No submission yet.