Problem

Bounded Concurrent Crawler (Go)

Topics: graphs, BFS, concurrency, worker pool, visited dedup

Problem

Crawl every page reachable from a seed by following its links, fetching pages concurrently but without hammering the site: cap the number of in-flight fetches, and never fetch the same page twice.

func Crawl(seed string, workers int, fetch func(page string) []string) []string

fetch(page) returns the pages linked from page; it is the only way to discover the graph.
Visit every page reachable from seed (including seed itself).
Each distinct page is fetched at most once, even when it is linked many times or sits on a cycle.
At most workers fetches run concurrently.
Return the set of visited pages, sorted (the crawl order is nondeterministic; the result is not).

{a:[b], b:[c], c:[]}; seed=a          → [a, b, c]
{a:[b,c], b:[a], c:[]}; seed=a        → [a, b, c]   (cycle visited once)
{home:[about], island:[home]}; seed=home → [about, home]   (island unreachable)

This is a concurrent flood fill: workers pull pages from a shared frontier, fetch their links outside the lock, and enqueue any not-yet-seen page. The visited set both dedups and stops cycles; an in-flight counter (plus a sync.Cond) lets a worker tell "more links may still arrive" apart from "the graph is exhausted, exit."

Run

go test -v ./challenges/bounded-crawler/go/

Leaderboard score: lower is better, 1.0 ≈ reference

#	Name	score	ns/op	allocs

Output

No submission yet.