You are not logged in.
We're hoping to replace this web forum with a mailing list (hosted by google groups).
Sign up for it here:
How about kicking it off on our brand new mailing list (which is meant to replace this web forum).
Check it out at:
tracker exchange is not widely implemented in uTorrent mostly for the reasons dwk mentions. It's also of questionable value to add a lot of trackers.
I don't doubt that it's possible to have a conservative specification to avoid DDoS attacks. It might be tricky though.
However, I also don't doubt that it's possible to bittorrent-seed a million torrents (especially not in the future).
I think there are simpler ways of achieving the same goals. For instance, if backwards compatibility is not an issue, each file should simply just have its own set of piece hashes, or its own hash tree root.
What you're proposing is essentially identical to an RSS feed. You can already create these kinds of "torrents", if you're not too concerned about whether it's bencoded or XML. You can simply express this as an RSS feed with all the single file torrents in it.
The problem of adding files can be solved separately. In fact, one of the thing we've been working on lately is to have a feature where a new torrent can pull in files/pieces from a previous one, to avoid downloading them.
When uTP was first released, years ago, the packet header looked different. In the first few releases of uTorrent the connection ID was the first 16 bits (iirc). No uTorrent client in the last few years has used this old protocol though, and the old protocol was never published. So, trust the protocol cbrown posted, and not the wireshark dissector. (it would be really nice with a wireshark dissector for uTP though).
However, the bigger problem here is that wireshark incorrectly classifies that packet as uTP. If you look in the hexdump of the packet you can see that you're looking at a DHT packet. It's bencoded and has the DHT fields.
currently there's a folder under %appdata%/uTorrent where a user can put trusted certificates. Any torrent signed by a cert in this folder is considered trusted. uTorrent ships with BitTorrent Inc.'s cert by default, to always be trusted.
Currently being trusted isn't very interesting. I can imagine that we might want to add some graphical treatment to trusted torrents, and possibly other features as well.
In order to integrate well with the web, we might want to create a file name extension like .btcert that we handle by installing when invoked. That way web sites could distribute their cert easily.
I did, and I have. I haven't found any techniques that would actually work - not on top of the current DHT anyway. In other words, they might be interesting and all, but they would only work if a major client implements it and that's a different way of saying "it's not gonna happen".
My understanding of your proposal is that it also would not work unless a significant portion of the DHT nodes would implement it. It doesn't seem like a very useful (or fruitful) criteria of a distributed search algorithm to work on the current BitTorrent DHT, without client support.
I updated my proposal to use a more clever way of restricting hte node IDs, by not mapping it 1-1 to an IP. it still lacks a couple of things, most notably test vectors and citation to the original paper proposing this technique.
Trying to capture the spirit of caleb's proposal, I wrote this up (and implemented it in libtorrent):
http://www.rasterbar.com/products/libto … store.html
It's simpler by only using SHA-1 as hash function and only use RSA-2048 for signatures and by just adding "put" and "get". I also fixed up some of the things I mentioned in earlier posts, like sticking "token" inside the "a" dict and allowing "v" to be of other types than strings.
The thing I haven't done (yet) is having a separate ID to hash the public key with. This has another side effect of simplifying the protocol since the public key doesn't have to be sliced (as in "target" + "k" in get messages).
Comments?
@Caleb: we should probably not assume that MTU is 1500 bytes, 1280 seems like more reasonable, to avoid fragmentation on teredo.
Also, in a previous proposal, The 8472 suggested (iirc) having the target for mutable keys be the hash of the public key + name, where name was the name of the feed. The benefit would be that you could use a single key pair for multiple key-value pairs. Now, name of a channel doesn't make sense for the generic case, but maybe we could add some other key that can be combined with the public key.
regarding using separate RPCs for mutable/non-mutable; I mostly like the idea of differentiating the concepts. I agree that it is nicer to just add two new RPCs.
In particular, I think mutable and immutable should be different *types*, by which I mean they should be stored in separate containers in the local node, and have separate limits on how many are allowed to be stored, and be evicted separately.
I haven't thought it all out yet, but it feels like a good idea to not have one able to overrun the other, just like these generic items should not be able to push out torrents.
Speaking of eviction, maybe the BEP should mention some ideas of how to determine what to evict when adding new items. The way I have implemented this in libtorrent is by keeping a bloom filter and a counter for each item, containing all IPs that have announced it. When evicting, items with more nodes announcing are prioritized. It seems like a reasonable general concept. Maybe it can be improved in some way.
I suppose there could be limits on how many items any single IP is allowed to announce as well.
@The 8472: The use of the terms "seed" and "peer" is a bit confusing in this BEP. I would imagine that both downloaders and seeds would be considered peers, but it sounds like in the BEP, "peers" are limited to just downloaders. I think it would be clearer if you would use the term "downloader" instead.
@The 8472: Personally I would prefer option (a).
@Caleb: Is there a good reason to require implementations to support all of sha-1, sha-256, dsap and rsa? It would simplify implementations a lot by picking one of each and stick to it.
I like the idea of separating get/put and getm/putm. I would imagine they would be separate types. i.e. you can't store something with "putm" and get it with "get".
would it make sense to allow "v" to be any bencoded type (not just a string), and the hash is the hash of the encoded form.
That way, uses of this extension would blend in nicely with the encoding of the message.
I also think the "token" argument should be passed inside the "a" dict, to be consistent with "announce_peer".
I think that you hand wave away the performance problems of openssl a bit too lightly.
In my experience, the diffie hellman handshake is currently my far the most expensive operations in bittorrent. Even more expensive than sha-1 and rc4. Obviously it depends on peer churn. Using ssl would be pouring gas onto the fire.
Another possible conclusion from the fact that people can identify and throttle MSE is to stop using it all together.
Even with extended accounting, you would still have the problem of resolving who's responsible for discrepancies. Which is the same problem as you started out with. Many private trackers that do ratio tracking have already solved this. The only reasonable further development in this area, in my opinion, is to actually make it possible to tell for sure who's lying in their stats. This is a less trivial patch and includes the tracker issuing certificates and digital coins, which peers exchange for data.
As a suggestion, if you want any traction, it's probably a good idea to at least implement your idea in one client and release the source.
You would probably still have problems with hot-spots in the DHT. You might have to come up with a hard coded list of stop-words, which might be hard to build.
How about doing search in a bit more decentralized way, with less complete search results. Like this:
every client sniffs the DHT for get_peers, and announce_peer sent to it. For each of those messages, it pulls out the info-hash and downloads the .torrent file for that hash (either straight from torcache.com or by simply joining the swarm and slurping it down over the metadata extension). Ones it has the .torrent file, it indexes it in a local database (or a simple index). This means every client is slowly building a set of random torrents that it knows about.
Now, add a peer-wire extension message to let you ask your peers to search their indices. When the user searches for a specific word, look through the local index as well as query all peers for the same search.
In order to distribute the popular torrents into more peers' indices, each response could contain a set of torrents that peer has seen a lot of searches for (maybe the top results from the last other peer that made a search).
If you're talking about rating of torrents, uTorrent 3.0 beta has a DHT based rating system. The general format is described here:
http://comments.gmane.org/gmane.network … devel/1009
The hash votes are announced to is sha1(info-hash + "rating")
How about just sticking the trackers public key in the .torrent itself. The client can send a challenge to the tracker on the first connect and verify the response using the key. It would be a very simple extension on top of the existing UDP tracker protocol.
Caleb, those pastes have expired now. Would you mind posting them directly in the forum or somewhere more persistent?
I've updated this document with your feedback: http://www.rasterbar.com/products/libto … t_rss.html
1. Does the signature need to be specified in advance? Because ECDSA over Curve25519 hasn't been implemented, much less time-tested, I'm not comfortable with it. I don't mind using OpenSSL and FIPS 186-3 P-256; I assume you are worried about potential patent infringement? It seems that just specifying which signature type is being used satisfies everyone, and allows it to be extensible in the future.
I think RSA would be reasonable.
2. I would really like to see individual feed items time out of the swarm. I'm currently running a few simulations of different drop patterns against different algorithms for assigning skip-list pointers to see how they fare. It seems like a possibly viable solution, but if we time out items in an undetermined order, most skip list implementations don't act well.
I would imagine that a pretty common use case would be to keep items alive for as long as possible, i.e. until the interest in them has entirely died out. I would imagine you could specify timeouts in the items though, say, as posix time, at which time the item is dropped and no longer accepted by nodes.
3. It shouldn't matter too much what the actual document content is, but to the extent that this RFC targets a single document type, Atom is more flexible and extensible than RSS. Again, since you're not enforcing a single type in the spec, this is just a side note.
Right, I think it makes sense to specify the fundamental "tags" for torrents, and then it's possible to pull in names of tags from Atom or RSS to extend the metadata for each item. Though, a 1-1 mapping of names may not be practical.
4. Keeping virtual torrents open, or using a gossip algorithm over DHT neighbors might introduce less load than clients periodically polling for new information on feedname+key. Using gossip over DHT is something we've already tried, and might stick to for the sake of speed in getting information out.
In order to store keys in the DHT, you need a write token from the node you're writing to, which means you need to do a get_item at some point.
You mean using gossip over the DHT to find out whether or not a key needs to be re-announced?
Btw. I checked in my implementation (and the source document I linked to in this post) to libtorrent trunk. It doesn't verify any signature yet, and it's only the storing end of the DHT node, not the announcing (yet).
I updated the spec to use linked lists instead.
http://www.libtorrent.org/dht_rss.html
Using linked lists also lifts the requirement for small keys, we could use RSA. Although, Bram might be implementing an EDCSA for 25519 as part of his live project.
Yeah, you're right. It might not make that much sense to tie node IDs to IPv6 addresses. It does seem to make sense for IPv4 though.
Actually, I think the DDoS attack might even be a bit simpler than blocking specific torrents. All you need to do is running many nodes on one machine (or having an implementation that pretends to have many many IDs). I believe you could even circumvent the plausibility check by distributing the node IDs you claim far enough apart, to make it unlikely for them to ever refer to each other in find_nodes responses, or any one node ever interacting with two of the nodes.
Really the only thing you need for the DDoS is to be popular enough, attract enough traffic and hi-jack every single get_peers request by injecting your victim. It's unclear how effective it would be, it would obviously be less effective than always spin up new (imaginary) nodes whenever you get an announce_peer or get_peers which happens to terminate that search (on the same machine).
That said, I still think blocking torrents is a more serious attack to protect against. There are so many other ways to mitigate the DDoS attack, and I'm not entirely convinced that it even works that well today, since every peer you trick into adding your victims IP to, will only try to connect 3 times and then never again, so you need to constantly find new peers to use for the attack, or it will be very short-lived.
1. Each feed entry is stored under its own exclusive DHT key, e.g. sha1(entry-signature)
2. Each feed entry contains backwards pointer to the next-older entry (linked list) and optionally to exponentially older entries (skip-list)
3. Each entry is signed and the pointers are signed separately
4. The feed root is stored under SHA1(feedname + pubkey) and returns
a) a pointer to the head entry
b) random pointers into the list
All pointers are signed
I like this general approach. One way to keep the number of lookups low, would be to still accumulate up to a certain number of items in each node (i.e. item in the linked list). Unless each node is the actual torrent, under the actual info-hash, in which case this wouldn't work.
Some concerns I have are:
1. Every participant should help keeping all items alive, so nodes should pick random items and announce them every now and then.
2. (this is the main one), if the list head doesn't move, it's important to not allowing to down-grade it to an older version, say with a list with a single node. This could be done with a sequence number (which is also signed), and making nodes ignore announces with a lower sequence number than what they already have.
This may seem to be a lot of DHT entries, but if you consider that each feed entry corresponds to a torrent then this doesn't significantly increase the load per torrent, especially since not everyone has to write feed entries to the DHT, you just have to randomly check if they're alive and republish them if they aren't.
Publishing would consist of 3 steps:
a) publish a new entry, with a forward-pointer to the now 2nd entry
b) put the pointer into the random list under the feed root
c) bump the head-pointer in the feed root
[b and c can be done in 1 call]
Using the actual torrents in the DHT as the list nodes in the linked list would have the following problems:
1. Long lists would become very expensive. In general this is hard to avoid, but it might be mitigated and the cost could be lowered of a factor of 10 at least, by bunching items.
2. a given torrent would have a limit on how many feeds it could be part of. It would either be just a single one or multiple ones if we allow multiple forward pointers keyed by the feed key. In this case popular torrents might still run out of space to store forward pointers, and there might be a privacy issue, if you can deduce all feeds a given torrent belongs to.
The main threat I've had in mind is someone blocking specific torrents (or DHT feeds), which is a subset of the DDoSing attack. It would require the plausibility checks to happen earlier, while bootstrapping the DHT and running find_nodes and get_peers which forwards you to new nodes.
How about having sequence numbers for each batch of info-hashes? The sequence number could be part of the portion that's signed as well, to lock it down. Something like this:
{
"a":
{
"ih":
[
"7ea94c240691311dc0916a2a91eb7c3db2c6f3e4",
"0d92ad53c052ac1f49cf4434afffafa4712dc062e4168d940a48e45a45a0b10808014dc267549624"
],
"sig":
[
"980774404e404941b81aa9da1da0101cab54e670cff4f0054aa563c3b5abcb0fe3c6df5dac1ea25266035f09040bf2a24ae5f614787f1fe7404bf12fee5e6101",
"3fee52abea47e4d43e957c02873193fb9aec043756845946ec29cceb1f095f03d876a7884e38c53cd89a8041a2adfb2d9241b5ec5d70268714d168b9353a2c01"
],
"i": [0, 1]
"id": "b46989156404e8e0acdb751ef553b210ef77822e",
"key": "6bc1de5443d1a7c536cdf69433ac4a7163d3c63e2f9c92d78f6011cf63dbcd5b638bbc2119cdad0c57e4c61bc69ba5e2c08b918c2db8d1848cf514bd9958d307",
"n": "my stuff"
"target": "b4692ef0005639e86d7165bf378474107bf3a762"
"token": "23ba"
},
"y": "q",
"q": "announce_item",
"t": "a421"
}"ih" is a list of strings, each string contains one or more info hashes (i.e. the length is divisible by 20). "sig" is a list with the same number of items, each item is a 64 byte signature of the corresponding info-hash string + the corresponding sequence number from "i".
This way, the sequence number is ordering the batches of info-hashes, and the order the info-hashes are listed inside the string defines their order in the feed. The sequence numbers can only increment by one. If each "batch" has info-hashes appended to it until it fills some notion of an MTU before a new batch is created, the sequence numbers can be considered pages.
You could make it possible to request pages by specifying negative numbers to index from the end. Requesting page -1 would thus refer to the latest items.
This is quite compact, but it's not very generic. Turning each batch of info-hashes to a list of dicts would make it a lot more generic, since each item can be extended that way. The signatures could be signing the bencoded representation of those list items.
With an approach like this, the assumption is that all feed items are always lumped together until it fills one MTU, so it wouldn't
make that much sense to have a bloom filter representing individual feeds. The requesting node could (essentially) be guaranteed that requesting a single page will always fit in one MTU as well.
Just to throw it out there; Another way of doing pagination and load balancing would be to announce page 1 to SHA1(<feed-name> + <public-key> + "page1"), the second page to SHA1(<feed-name> + <public-key> + "page2") and so on.
It might add some extra traffic though, since everyone needs to look up one node past the last one.