BitTorrent.org community
You are not logged in.
The last missing piece of the puzzle of decentralized torrents are scrapes. Scrapes are necessary for seeding queue ordering and generally to give an impression of the swarm's state.
There are 2 problems to solve:
a) the number of torrents to scrape can vastly outnumber the torrents to announce/get peers for. thus scrape-lookups have to be more optimized for lower resource usage than regular lookups. More aggressive caching of lookup paths, lower concurrency during the lookup are starting points.
b) nodes can't return any seeds:peers statistics at the moment, announces would have to include flags and we'd have to come up with a statistically reasonable way to add stats from multiple nodes together (i was thinking along the lines of using the similarity of small bloom filters as coincidence weights... but 1280 bytes doen't allow you to carry all that much data)
So, i'd just like to discuss this in general and see if anyone might come up with some shortcuts to make this reasonably efficient. Trivial implementations would be either too inaccurate to give more than an order-of-magnitude estimate or they'd increase the DHT traffic too much for too little gain.
Offline
I think you're right about caching, it definitely makes sense. I would think that caching can be solved separately though, don't you think?
I spent some time thinking about torrent metadata in general, and I wonder if it would make sense to have some slightly more generalized way of specifying how to treat data when it's requested. For instance there could be some prefixes to the names that indicates that its data either return average, median, max, min over some window. This would assume that clients who participate in the swarm announce their understanding of the number of peers and seeds.
For this particular case though, it might make more sense to have the DHT nodes that track the torrent be the authoritative source of the scrape stats, instead of having nodes that announce to it say how many seeds and peers it knows of.
Offline
Well, caching node-addresses is something rather specific for high-load scenarios (like scraping 800 torrents every 30 minutes), otherwise regular, uncached lookups are preferable since the DHT piggybacks its routingtable management on them.
About torrent-statistics... i wouldn't overcomplicate things. Getting the simple case of scraping the current seed/peer count is complex enough in the DHT environment because - for large swarms - no single node will have a complete view of the entire swarm (neither DHT nodes nor peers in the swarm). While in smaller swarms each seed/peer list is almost exhaustive.
That's also why i'm considering bloom filters as weighting. Let's say you have
DHT Node nA
DHT Node nB
Torrent t1 (5 seed, 10 peers)
Torrent t2 (100 seeds, 1000 peers)
For t1 both nA and nB will contain a almost-complete, almost-identical seed/peer-lists. Thus querying the t1 seed and peer count from nA and nB cannot be added up but should be averaged instead. Or rather, they should be added together with a weight applied based on their similiarity, effectively only averaging their common nodes and adding those where they differ.
For t2 nA and nB might contain almost disjoint subsets of seed/peer lists. Thus it would make sense of adding their counts together. Again, with a weight based on similiarity so that we only (on average) add together the counts for differing peers and average those few that they share.
Since we cannot request the complete lists i think we could use bloom-filters as stand-in for the sets to compare their similarity and thus derive a weight for the addition.
I'm just not sure if this is actually feasible and statistically sound, especially considering that we're restricted to 1280 bytes per packet.
Offline
The last missing piece of the puzzle of decentralized torrents are scrapes. Scrapes are necessary for seeding queue ordering...
Since I'm not convinced that ordering the seeding queue according to peer count is such a hot idea, I'd rather we didn't add extra complexity for this purpose.
(Ordering the queue according to seed count is a case of negative feedback, and it causes persistent oscillations in torrent state.)
--Juliusz
Offline
You have to do some ordering if you don't want to run all torrents at once. There are several possible options as ranking schemes.
"seeds : peers ratio" allows every torrent to settle into a stable ratio, based on demand.
"peers-seeds" allows you to find the torrents the easiest to seed (private trackers)
"seed count only, fall back to seeds : peers ratio after N seeds" allows you to keep rare content alive
Anyway, any of those rules can be written in a way to avoid oscillation. But that is up to the client developers. Imo scrapes are a useful feature, not just for queueing seeds but also to gauge the liveliness of a torrent... or rather, the current phase of its lifecycle.
Offline
Or rather, they should be added together with a weight applied based on their similiarity, effectively only averaging their common nodes and adding those where they differ.
Just take the max. The node that you queried that has the most peers is the one that has been around the longest, and has the most complete vision of the swarm.
--Juliusz
Offline
That is assuming that all nodes have the same view, if nodes have a set limit of peers they store per key or peers get stored in different sets of nodes (e.g. due to perimeter widening) then each storing node will differing subsets of the actual peer list.
Offline
> e.g. due to perimeter widening
First of all, what you call "perimeter widening" doesn't work, at least not in Kademlia. Don't do it, unless you are able to give a proof of your extension to Kademlia. (Not only that it doesn't break the DHT, but also that it serves a useful purpose.)
Second, even if it did work, you wouldn't need it. If the torrent is very popular, and its 8 closest nodes are overloaded, you won't be able to announce yourself, but you'll still get information about other peers; that gives you 8*50=400 peers to connect to. While not all of those will be reachable, the 200 or so that will are more than enough to propagate your contact address over PEX. (And I know that you do implement the "p" parameter to LTEP.)
Third, if Kademlia is implemented right, then the oldest among the 8 nodes have roughly the same data -- excepth when some of them have dropped some data due to overload. In the latter case, you don't care that you're inaccurate -- the information that "the swarm has a lot of peers" is enough for you to make your queueing decisions.
I'd suggest a simple extension -- get_peers replies should contain the total number of values that the replying node has for a given torrent (which can be, in general, more than the number of returned values).
Another extension could be adding a parameter to get_peers to specify the maximum number of values that the replying node should return -- the default being 50, as per BEP-5.
--Juliusz
Last edited by jch (2009-11-14 09:49:19)
Offline
Speaking about PEX supplementing the DHT -- I see that Azureus doesn't send the "ipv6" parameter in the LTEP handshake. (µTorrent does, and I've added this for Transmission 1.80 -- in some swarms, the effects are rather dramatic for a node that's firewalled in IPv4 but has good IPv6 connectivity.)
--Julusz
Offline
jch wrote:
Speaking about PEX supplementing the DHT -- I see that Azureus doesn't send the "ipv6" parameter in the LTEP handshake.
We do since version 4.3.0.0. Which... has been released yesterday. ![]()
The issue is that IPv6 + Windows + Java's nonblocking IO requires Java 7 to work properly, which is still in beta. So most ipv6 capable users are either linux or osx peers and the select few early adopters.
jch wrote:
> e.g. due to perimeter widening
First of all, what you call "perimeter widening" doesn't work, at least not in Kademlia. Don't do it, unless you are able to give a proof of your extension to Kademlia. (Not only that it doesn't break the DHT, but also that it serves a useful purpose.)
Intuition tells me that - if properly implemented - it should work as it is no different from nodes close to the key simply failing and thus the nodes around them taking up the duty. Add that get-peers lookups can terminate early if they receive enough value lists and you would achieve your goal by putting load on other nodes than only the closest 8 ones.
jch wrote:
Second, even if it did work, you wouldn't need it. If the torrent is very popular, and its 8 closest nodes are overloaded, you won't be able to announce yourself, but you'll still get information about other peers; that gives you 8*50=400 peers to connect to. While not all of those will be reachable, the 200 or so that will are more than enough to propagate your contact address over PEX. (And I know that you do implement the "p" parameter to LTEP.)
that is probably right, i'm just viewing this from the exhaustiveness angle. I.e. how to cover all peers/seeds and get mostly accurate statistics on big torrents.
Consider that many NATed peers might insert themselves in the table, storing more nodes simply means possibly more non-nated ones to pick from. Assuming 50% nated nodes you'll only have 200 reachable nodes to test left. On a really big torrent (10k seeds, 15k peers for example) that would mean that those 200 nodes would have to act as gateways via PEX into the swarm, but it's possible that they can't simply because all those incoming connections make them reach their connection limit and thus they won't accept further connections.
Attempting to store more peers in the DHT would spread the load and thus improve the time-to-join into the swarm and prevent a few peers from being hammered.
And it's not just the peers that would get hammed... the 8 DHT nodes responsible for those values might get hammered by 25k different IPs with gets and puts over the course of one DHT announce interval.
jch wrote:
Third, if Kademlia is implemented right, then the oldest among the 8 nodes have roughly the same data -- excepth when some of them have dropped some data due to overload. In the latter case, you don't care that you're inaccurate -- the information that "the swarm has a lot of peers" is enough for you to make your queueing decisions.
"has lots of peers" is not sufficient for sorting torrents properly
jch wrote:
I'd suggest a simple extension -- get_peers replies should contain the total number of values that the replying node has for a given torrent (which can be, in general, more than the number of returned values).
Another extension could be adding a parameter to get_peers to specify the maximum number of values that the replying node should return -- the default being 50, as per BEP-5.
Well, for simplicity's sake i guess this would be better than nothing. But we still should do separate accounting for seeds and peers, thus we'll also have to include an is seed/peer flag into the announces and store those.
Offline
what you call "perimeter widening" doesn't work, at least not in Kademlia. Don't do it, unless you are able to give a proof of your extension to Kademlia
Intuition tells me that it should work
Oh my, there's not much I can answer to such a strong argument, can I?
More seriously, please read the Kademlia paper again.
While Kademlia is resilient to node outages, it is not resilient to frequent fluctuations in the set of nodes. When a node joins or leaves the network, there is a short period during which the router tables of its neighborhood are out of sync; during that time, different announces will store data in different sets of nodes.
(Kademlia, as described in the paper, has a mechanism for making the window of time during which tables are unsynchronised smaller, but this is not in the Mainline DHT.)
When you're doing your "perimeter widening" trick, you're basically pretending that a given node doesn't exist, and thus falling back to further away nodes. In effect, you're making the set of nodes vary on a very short timescale. Kademlia is not designed to work in the presence of such instabilities.
Please don't do it.
Offline
You are mixing up routing tables and lookup result sets here. The routing table would remain the same for all nodes, only GET_PEERS lookups would return doctored result sets and thus lead to stores on nodes that are slightly off-target.
This does not lead to any routing table instabilities. In fact, it spreads load over more nodes and thus makes it less likely to force nodes to drop out completely due to overload.
To keep your local buckets of the routing table up to date you'll be using FIND_NODES and PING requests, which would not be affected by perimeter widening. Only GET_PEERS requests would be.
I don't see any problem here.
In fact, what i call perimeter widening has been considered by the kademlia researchers before under the term "caching along the path". It's just not spelled out in detail in their paper. To quote the relevant section:
Like Chord’s clockwise circle metric, XOR is unidirectional.Fo r any given
point x and distance Δ > 0, there is exactly one point y such that d(x, y) = Δ.
Unidirectionality ensures that all lookups for the same key converge along the
same path, regardless of the originating node.Th us, caching <key,value> pairs
along the lookup path alleviates hot spots.Lik e Pastry and unlike Chord, the
XOR topology is also symmetric (d(x, y) = d(y, x) for all x and y).
So, this strategy is actually a part of the Kademlia spec.
jch wrote:
Oh my, there's not much I can answer to such a strong argument, can I?
Strong enough now? :]
jch wrote:
(Kademlia, as described in the paper, has a mechanism for making the window of time during which tables are unsynchronised smaller, but this is not in the Mainline DHT.)
One extremely useful thing in the Kademlia spec are replacement buckets, i suggest implementing them, even though they're not mentioned in the mainline spec. In our implementation they work quite well.
Offline
[DELETED: I realise that now scrape it's needed for other stuff than central indexing]
Last edited by h2p (2009-11-18 05:22:29)
Offline
Okay, I think I've finally managed to get my head around this. Apologies for being a little slow.
Assuming we use the new semantics of get_peers (i.e. we always send nodes even when values are also being sent), perimeter widening should not break anything.
I don't see why two distinct nodes should follow the same path when performing an announce, especially when they have cached nodes from a previous announce. Hence, I'm not convinced that it buys us anything. (This should be taken litterally -- I'm not saying they do not follow the same path, I'm only saying that I, personally, individually, speaking only for myself, do not see any argument why they should do, especially when nodes are caching the results of previous announces.)
Perimeter widening does break nodes that follow the older semantics of get_peers (values and nodes being mutually exclusive).
In conclusion, I'm not going to implement it myself, but have no objections to others implementing it.
--Juliusz
Offline
could you please describe how old nodes which break? I still don't quite understand what you see as issue here. Routing table maintenance happens through find_node requests, not get_peers ones.
Assuming the following get_peers + announce logic:
1. send get-peers to the N (N = concurrency) closest Nodest to target key T you know
2. repeat step 1 until we got K (K = bucket size) value-containing responses OR the set of the closest nodes which returned a token so far does not change for N queries
3. send announce to the first K nodes with values+tokens we have found, if we did not find enough nodes then perform the remaining announces on the closest-set
Then storing nodes may or may not hit the "core" bucket around that key during their lookup. But they will only store data on
a) nodes that already contain data
b) nodes that are closer to the target than nodes in a)
alternatively nodes may use the following (more conservative) algorithm:
1. send get-peers to the N (N = concurrency) closest Nodest to target key T you know
2. repeat step 1 until the set of the closest nodes which returned a token so far does not change for N queries
3. send announces to the K closest nodes found so far that did return a token
The "worst" thing that could happen that the perimeter shields the closest nodes completely from incoming queries. But to achieve this a significantly larger set of nodes shielding them is necessary.
Which is the whole point of parameter widening... to spread load.
As for the path that nodes take. They do not take exactly the same path. But since XOR coordinates are always a projection onto a flat keyspace a lookup is projected onto the distance between the currently found node and the target key. Each point along the distance-projected-keyspace can only be inhabited by one node. Where the closest node to the key is the last node you'll find.
A lookup will jump along this linear keyspace in exponentially decreasing steps, only ever moving forward, closer towards the K closest nodes.
The first hops will be mostly random since they'll most likely stem from the buckets spanning one half of the keyspace. But as they hop closer to the targets the routes their hops become smaller and the coincidences will increase, regardless where they come from. That in turn means that close to the target key the paths taken are mostly identical or at least have a high coincidence count.
The more popular a key gets the more nodes will approach the target during their lookups and more nodes will be statistically covered by the paths taken by the searching nodes. Which means more nodes near the target can perform value caching/accept stores.
And when the key isn't popular then the coincidences among the incoming paths are smaller, but in that case there should be room for all data in the closest set and perimeter widening will not be necessary anyway.
It is just important that nodes do at least one thing:
They have to keep track of who sent them tokens and do not count nodes which don't send tokens towards the K closest nodes set. This way the actual K closest nodes may refuse to store data by not sending a token and thus force store operations on slightly further away nodes.
This will increase the number of nodes storing the <key, value> pairs and allow other lookups to abort early/take load off the K closest ones.
Offline
anyway, we drifted offtopic. This is supposed to be about scrapes.
To enable scrapes we have to do 2 things:
- provide a get-scrape request and response to query nodes for seed and peer counts stored under a specific key. This reuqest should be implemented with a lower concurrency than regular lookups and caching of the set of first K nodes along the lookup path that returned scrape values
- add a isSeed=1/0 flag to announces (if it is not set at all then the announcing client is assumed to be a peer)
Offline
I've implemented a bloom filter and i'm currently testing if i can find a better method than ANDing them to sum/average up (depending on common values) their sizes to calculate an aggregate value.
But even ANDing seems good at the moment if the filters are correctly parametrized.
With 512 bytes filters (4096 bits) configured for an expected set size of 1000 i can estimate an aggregate count of up to ~10000 before it breaks down, which should be sufficient for most swarms.
2 filters, 512 bytes each would do for 10k seeds and 10k peers, i.e. it would fit into a single packet, but not much else, especially not a list of 8 IPv6 node addresses. Which means this has to go into a separate, new packet.
Since we don't really want peer lists either if we're only scrapeing i suggest the following extension:
Adding a "scr" flag to the "want" list in a GET_PEERS request instructs the responding node NOT to include a "values" list even if it has peers in its database.
Instead it should respond with a list "has" that contains the value "scr", which indicates that this node can be scraped and has values.
Then the requesting node can send a new RPC type, called "scrape", containing only the target key. the response contains two 512 byte long fields, named "seeds" and "peers", which are the bloom filters containing all seeds and peers for that infohash known by that node.
To allow nodes to know whether it is a seed or peer ANNOUNCE_PEER requests should contain a new list named "is", containing the entry "seed" if the announcing peer is seeding. If the list is not present or doe not cotain the entry "seed" then it is considered as a peer.
Additionally, this new flag and nodes storing it in their database allows GET_PEER requests and responses to be adapted to a nodes needs with various additional flags. See below.
Summary:
"want"-list in GET_PEERS extended, now supports (together with BEP32):
n4 = node should send a nodes list (default on IPv4 sourced requests)
n6 = node should send a nodes6 list (default on IPv6 sourced requests)
scr = node may omit the values list even if it has values
vals = node should send the values list if it has any even when "scr" is set
peer = node should only send peers in its values list, implies "vals"
"has"-list added to GET_PEERS response:
scr = node understands scrape requests and has a values, even when no values list is present in the response
new RPC call "scrape":
request only contains a "target" key
response contains 2 bloom filters (details not yet specified) "seeds" and "peers"
"is"-list added to ANNOUNCE_PEER requests:
seed = the announcing client is a seed and should be stored as such in the database
Edit: Thinking of it. To prevent the bloom filters from overflowing a node should not send tokens once it has more than R entries under the particular key in its database. where R still has to be parametrized, together with the bloom filters. Assuming properly implemented nodes this will also lead to perimeter widening on popular torrents as other nodes are forced store values further away from the closest-set.
Offline
After some tinkering i've found an averaging algorithm that performs better than ANDing in the worst case, but i'm lacking a way to calculate |intersection(A1,...,An)| for bloom filters atm and combining |intersection(A1,A2)| recursively increases errors too much for the algorithm to perform well in the average case;
So ANDing is the way to go for now, but that can be changed in the future since it's only a client-side algorithm. To estimate the number of entries of a bloom filter the following formula can be used:
int size = round(log(1.0 - bitCount/m) / (k * log(1-1.0/m)))
Anyway, apart from finding a better combining method the math stuff work fine, so we just have agree on the protocol-specific aspects.
Feedback please ^^
Offline
Summarising what I said on IRC.
One must not forget why we are scraping in the first place: (1) to order torrents in the queue, and (2) to give a rough estimate of a torrent's health to the user. Neither of these applications requires accurate measures, rough estimates are enough. Additionally, estimates for large swarms can be extremely imprecise -- if the swarm is large and healthy, that's all you need to know, you don't care how large it is exactly.
In the light of the above, I'm not sure if your solution isn't over-engineered. I'd be willing to bet that with the right fudge factors, you could find out sufficiently accurate data just by collecting (seeds, leeches) pairs of integers from the DHT nodes.
If the bloom filters are actually needed, then the size you suggest is certainly too large. And the encoding of the "want" key is certainly uselessly flexible.
--Juliusz
Offline
for documentation purposes:
The goal is to calculate the cardinality of the union of N bloom filters, since the error rate increases towards 1.0 (the |S| towards infinity) as c increases we can't just AND the filters together if the individual filters are already close to their max. capacity. Thus calculating the cardinality via the inclusion-exclusion principle would be the preferred approach, the problem is that intersecting bloom filters is not straight forward due to the probabilities associated with the bits set by each filter.
For bloom filters representing the set Sx the bit count is denoted as cx. 0 <= c <= m. The 0-bit count is m - c.
Bloom filters have the same parameters m (bits) and k (hashes).
exponentiations and logarithms are performed to the basis b = 1 - 1/m.
only applies to singular or unioned bloom filters, not to intersected ones:
|S| = log_b(1 - c/m) / k
Edit, rewritten the problem statement a bit:
_______________________________________________________
Task: intersection of N bloom filters
Basics: Bloom filters are a set representation with a false positive rate. They can map (through k hash functions) an infinite set S onto a finite set BF (represented by a bit array of m elements) where the probability of each bit being set after inserting n elements is p = 1 - (1-1/m)^k*n.
I'll use b = (1-1/m) as basis for exponentials and logarithms.
|S| =~ log_b(1.0 - |BF|/m) / k
Intersection of 2 bloom filters BF1, BF2 representing S1, S2:
As per http://www.eecs.harvard.edu/~michaelm/p … m2005b.pdf
|Si| = the actual cardinality of intersection
|BFi| = the bits set to 1 in the intersected bloom filters
p_i = (1-b^(k*|Si|)) + b^(k*|Si|) * ( 1 - b^(k*(|S1|-|Si|))) * ( 1 - b^(k*(|S2|-|Si|)))
(term 1) (term 2) (tern 3) (term 4)
term1 : probability for each bit = 1 if we calculated the bloom filter of the intersection directly
term2 : probability of bits that should be 0
term3 * 4 : but aren't 0 because they were set by different elements in S1 and S2 and thus aren't part of the real intersection
|BFi| =~ m * p_i
replacing |S1|,|S2| with the approximation based on |BF1|,|BF2| we get
|Si| = |intersect(S1,S2)| =~ log_b( (1-|BF1|/m) * (1-|BF2|/m) / ((1-|BF1|/m) + (1-|BF2|/m) - (1 - |BFi|/m))) / k;
Intersection of n bloom filters BF1,...BFn:
|intersect(S1,...,Sn)| = ???
Offline
*sigh* ... forum swallowed the message, so here it goes again. Demonstrating the simplicity of bloom filters with fixed parameters:
k = 2
m = 256*8 // power of 2, makes bit arithemetic easy
(everything is unsigned):
// the filter
byte[256] bloom
// insert IP 192.168.1.1 into the filter:
byte[20] hash = sha1(byte[4] {192,168,1,1})
int index1 = hash[0] << 16 | hash[1]
int index2 = hash[2] << 16 | hash[3]
// truncate index to m
index1 %= m
index2 %= m
bloom[index1 / 8] |= 0x01 << index1 % 8
bloom[index2 / 8] |= 0x01 << index2 % 8
// done
Offline
Note, worst case message size:
nodes6 + nodes + 40 ipv6 values =
8 * 38 + 8 * 28 + 40 * 21 = 1368 bytes (ignoring fixed packet content and IP/UDP overhead) => already too big for teredo tunnels. good thing we usually only need either nodes or nodes6
future(?):
nodes6 + 2 bloom filters (each 256bytes) + X ipv6 values
8 * 38 + 512 + 15 * 21 = 1.131 => can send 15 ipv6 values in addition to the bloom filters, maybe a bit less if we account for the remaining overhead
without bloom filters but 1 byte flags per peer:
8 * 38 + 35 * 21 + 35 = 1.074 bytes
Offline