A crappy course on torrents
In 8 points1:
- You start seeding a file -- that means you split the file in a certain way, hash the pieces and wait.
- If anyone connects to you (either by TCP or UDP -- and now there's the webRTC transport) and ask for a piece you'll send it.
- Before downloading anything leechers must understand how many pieces exist and what are they -- and other things. For that exists the .torrent file, it contains the final hash of the file, metadata about all files, the list of pieces and hash of each.
- To know where you are so people can connect to you2, there exists an HTTP (or UDP) server called "tracker". A list of trackers is also contained in the .torrent file.
- When you add a torrent to your client, it gets a list of peers from the trackers. Then you try to connect to them (and you keep getting peers from the trackers while simultaneously sending data to the tracker like "I'm downloading, I have x bytes already" or "I'm seeding").
- Magnet links contain a tracker URL and a hash of the metadata contained in the .torrent file -- with that you can safely download the same data that should be inside a .torrent file -- but now you ask it from a peer before requesting any actual file piece.
- DHTs are an afterthought and I don't know how important they are for the torrent ecosystem (trackers work just fine). They intend to replace the centralized trackers with message passing between DHT peers (DHT peers are different and independent from file-download peers).
- All these things (.torrent files, tracker messages, messages passed between peers) are done in a peculiar encoding format called "bencode" that is just a slightly less verbose, less readable JSON.
- Posted first as this Twitter thread.
- Also your torrent client must be accessible from the external internet, NAT hole-punching is almost a myth.