forum.bittorrent.org

BitTorrent.org community

You are not logged in.

Announcement

Forums are closed. Use the new mailing list! https://groups.google.com/a/bittorrent.com/forum/#!forum/bt-developers

#1 2010-12-11 00:33:02

arvid
Administrator

DHT security feature

Another proposal to make the DHT a bit harder to attack and snoop.

http://www.libtorrent.org/dht_sec.html

I believe Vuze has a very similar feature in its DHT. Any thoughts?

Offline

#2 2010-12-11 11:50:59

The 8472
Azureus Developer

Re: DHT security feature

This approach is quite pointless because it will break down with IPv6 where every single home user will have a whole /64 block at their disposal or even a /48 with such things as 6to4.

It also complicates the logic of a node since its routing table will become invalid if it is forced to change its node ID when its external IP changes.

Another issue is that it does not prevent data harvesting at all, you just need a bunch of IP addresses (~30) to get enough data, i have demonstrated this with my DHT indexer. Right now it chooses an optimal distribution of node IDs to cover the entire keyspace, but a random distribution would work too.
The only attack that the restricted node ID scheme would prevent is a handful of nodes hijacking a small portion of the keyspace, e.g. to blackhole the traffic or to inject bogus results.
Data harvesting attacks would be unaffected.


The only good part i can see there is filtering out private IPs, i do that myself already to keep routing tables clean.


Az dev

Offline

#3 2010-12-11 12:05:26

arvid
Administrator

Re: DHT security feature

This approach is quite pointless because it will break down with IPv6 where every single home user will have a whole /64 block at their disposal or even a /48 with such things as 6to4.

That doesn't make the approach pointless. First of all, the vast majority of nodes in the wild today would not be affected. Secondly, it could be addressed by not hashing the full IPv6 address, but maybe the first 5-8 bytes.

It also complicates the logic of a node since its routing table will become invalid if it is forced to change its node ID when its external IP changes.

Right. It's only marginally more complicated though. You essentially just need a way to restart, and a way to prevent being tricked into changing.

Another issue is that it does not prevent data harvesting at all, you just need a bunch of IP addresses (~30) to get enough data, i have demonstrated this with my DHT indexer. Right now it chooses an optimal distribution of node IDs to cover the entire keyspace, but a random distribution would work too.

That's not really the main concern this change tries to prevent though.

With 30 nodes, I would imagine that you would cover the vast majority of info-hashes, but you wouldn't get the majority of peers for those info hashes, would you?

Offline

#4 2010-12-11 13:18:05

The 8472
Azureus Developer

Re: DHT security feature

arvid wrote:

That doesn't make the approach pointless. First of all, the vast majority of nodes in the wild today would not be affected. Secondly, it could be addressed by not hashing the full IPv6 address, but maybe the first 5-8 bytes.

Today, yes. But IPv6 usage will only go up, not down. According to current estimates the global ipv4 pool will be depleted in 1-2 months, the RIR pools in 7-15 months.

So this solution won't hold for long.

And your suggestion is not workable for ipv6. A determined attacker could easily get a /32 prefix from the IANA (you just need your own AS), while normal home users have a /64. If you filter by /32 then home users will be massively clustered. Everyone within an ISP will have the same node ID prefix, which would utterly break Kademlia. And if you filter by anything less than a /32 then an attacker with a /32 won't be deterred by this countermeasure.

So this strategy will be pointless in the near future.


Another thing is that µT does not support IPv6 DHT at the moment, so the size of the v6 DHT actually is fairly large considering the fact that most popular client doesn't support it.



IPv4 depletion will also mean incraesed deployment of carrier grade NAT. Which means many nodes will share the same public IP, again leading to clustering.


Az dev

Offline

#5 2010-12-31 18:42:14

arvid
Administrator

Re: DHT security feature

The 8472 wrote:

And your suggestion is not workable for ipv6. A determined attacker could easily get a /32 prefix from the IANA (you just need your own AS), while normal home users have a /64. If you filter by /32 then home users will be massively clustered. Everyone within an ISP will have the same node ID prefix, which would utterly break Kademlia. And if you filter by anything less than a /32 then an attacker with a /32 won't be deterred by this countermeasure.

Even just ignoring the last 64 bits in the IPv6 address would make a big difference for certain attacks. The fact that you need to be determined to be able to launch an attack doesn't mean the counter-measure is pointless. If you can't do it from a single normal home connection, it is harder than it is today.

If you have access to a /32, and we assume normal IPv6 users have /64, that gives you 32 bits of entropy when picking a node ID, which is a lot more than the total number of nodes in the DHT, and effectively makes it possible to always pick a valid node ID that's the closest one to any given info-hash.

You could settle for something in between, say assuming /48 for home users, and hash the first 80 bits. Sure, there would be some clustering, but it wouldn't break the DHT, just make it slightly less efficient.

So this strategy will be pointless in the near future.

I think you might be more optimistic about the rate at which IPv6 is being deployed, or define "near future" as farther out than I would.

Offline

#6 2011-01-01 09:20:38

The 8472
Azureus Developer

Re: DHT security feature

arvid wrote:

If you can't do it from a single normal home connection, it is harder than it is today.

I did some research. A big german ISP is planning to roll out /56s to home users by the end of the year. Other ISPs are planning to use /64s for home users. We really cannot rely on prefix lengths.

You could settle for something in between, say assuming /48 for home users, and hash the first 80 bits. Sure, there would be some clustering, but it wouldn't break the DHT, just make it slightly less efficient.

Let's say we operate on that assumption then a big ISP that decides to allocate /64s to home users could allocate 65536 home networks from a single /48. Which means up to 64k nodes could fall into a single node prefix.

If several big ISPs would do that it would mean a 2^16-fold decrease in the local keyspace density or a 2^16-fold increase in the number of keys that nodes would be responsible for (thx to inaccurate routing on the last few hops).

I would like to avoid codifying such fundamental flaws unless they're absolutely necessary.


I think you might be more optimistic about the rate at which IPv6 is being deployed, or define "near future" as farther out than I would.

Considering the time frame required to roll out the changes it is the near future, yes.



Instead of all this SHA1ing of the node IDs we could simply add some plausibility checks to returned values. E.g. ignoring all ports below 1024 (as suggested on the 27C3 talk to avoid ddosing specific IPs) or only accept the data from 1 node within a /48 per lookup.

And before we try to design defenses we actually need a threat model of things we want to defend against and things that are acceptable.


Az dev

Offline

#7 2011-01-01 20:50:53

arvid
Administrator

Re: DHT security feature

The main threat I've had in mind is someone blocking specific torrents (or DHT feeds), which is a subset of the DDoSing attack. It would require the plausibility checks to happen earlier, while bootstrapping the DHT and running find_nodes and get_peers which forwards you to new nodes.

Offline

#8 2011-01-02 05:29:32

The 8472
Azureus Developer

Re: DHT security feature

I think burris mentioned that not defending against this threat was a conscious design decision when they made the DHT.

Nevertheless, let's consider ways an attacker might achieve this:

for just 1 target ID:
a) the super cheap way: simply have 1 host, run ~16 nodes on an port each and have each claim a node ID near to the target.
This is trivial to defend against

b) the colo way: have one host with multiple IPs from a subnet and have each claim a node ID close to the target.
This simply needs plausibility checks in a lookup's terminal phase

c) the hard way: scatter ~16 nodes throughout various IP ranges (on different ISPs for example) and have each claim a node ID close to the target.
I currently have no idea how to defend against that except fixed node IDs


for many many target IDs:
a) the cheap way: have a single host, run a bunch of nodes on different ports scattered throughout the keyspace and forward all lookups that go to the desired target IDs to nodes that dynamically change their ID to be close to the target.
Again, trivial to defend against

b) datacenter way: you need a /16 for this to do the same thing as the previous attack but with IPs instead of ports
This, again, should only require plausibility checks during lookups

c) the hard way: again, run 16-100 nodes scattered throughout the entire IP space, run multiple nodes per IP (different ports) and carefully choose the node IDs per port to be far apart from each other. This should yield a high keyspace coverage and again allow the attacker to do the forwarding-attack with dynamic node IDs on the forwarding targets.

It's not easy to defend against this, but since the attacker now needs to dynamically change his node IDs instead of just picking ones that suit his needs and then stick with them it might be easier to detect. Alternatively restricting the number of ports the DHT can run on could work too.



So i think we can already achieve quite a bit by adding plausibility checks to lookups


Az dev

Offline

#9 2011-01-02 13:20:11

arvid
Administrator

Re: DHT security feature

Yeah, you're right. It might not make that much sense to tie node IDs to IPv6 addresses. It does seem to make sense for IPv4 though.

Actually, I think the DDoS attack might even be a bit simpler than blocking specific torrents. All you need to do is running many nodes on one machine (or having an implementation that pretends to have many many IDs). I believe you could even circumvent the plausibility check by distributing the node IDs you claim far enough apart, to make it unlikely for them to ever refer to each other in find_nodes responses, or any one node ever interacting with two of the nodes.

Really the only thing you need for the DDoS is to be popular enough, attract enough traffic and hi-jack every single get_peers request by injecting your victim. It's unclear how effective it would be, it would obviously be less effective than always spin up new (imaginary) nodes whenever you get an announce_peer or get_peers which happens to terminate that search (on the same machine).

That said, I still think blocking torrents is a more serious attack to protect against. There are so many other ways to mitigate the DDoS attack, and I'm not entirely convinced that it even works that well today, since every peer you trick into adding your victims IP to, will only try to connect 3 times and then never again, so you need to constantly find new peers to use for the attack, or it will be very short-lived.

Offline

#10 2011-01-03 05:31:21

The 8472
Azureus Developer

Re: DHT security feature

Ah, btw. I just remembered there is a defense against the "single node returns set of very close dynamic-ID nodes and thus terminates the lookup very early". It consists of splitting the lookup result sets into generations and making sure that no generation is derived from a single ancestor at any level. It basically ensures that multiple paths lead to the node. For this to work efficiently it requires an alpha of 10 or so to have a meaningful generation size.

So the possible defenses without mandating hashed node IDs are the following:
-allow only one address per /24 (v4) or /64 (v6) per bucket in the routing table
-perform the same check for the final set within a lookup
-do generational lookup checks to prevent single-ancestor-funneling
-crosscheck the ID contained in nodes/nodes6 entries vs. the actual ID that a node claims to have during a lookup (this prevents a simpler changing-node-IDs attack without colluding nodes)


That would only leave the option to use "real" nodes scattered across several IP ranges to attack a few infohashes. And idk if we really need to defend against that.


Az dev

Offline

#11 2011-01-07 15:10:47

jch
Member

Re: DHT security feature

> Yeah, you're right. It might not make that much sense to tie node IDs to IPv6 addresses. It does seem to make sense for IPv4 though.

I'm not sure how useful it is, then.  It will take at least a few years until your extension can be safely deployed, and I expect ISP-side NAT to be widely deployed sometime in 2013.  I'd hope most peer-to-peer traffic to be IPv6 by that time.

(Note that this is different from ordinary client-server traffic, which will work just fine through multiple layers of NAT.)

Offline

#12 2011-03-02 07:09:12

rauljim
Member

Re: DHT security feature

We have a student whose master thesis [1] is about the "censorship" of a given infohash in Mainline DHT.

He used nodes with different IPs (he used PlanetLab, so the IPs are distributed all over the world) and NodeIDs close enough to the target infohash. He was careful enough to have the closest nodes but have reasonable distances to the infohash (to seem less suspicious).

His main finding was that he was not able to completely censor any popular swarm. He found that popular infohashes are tracked by many nodes in the DHT.
One of the reasons of such spread of nodes tracking an infohash may be the fact that some "misbehaving" clients announce to many nodes (for instance, we see that KTorrent sends announces to every node it encounters in a lookup).

[1] http://tslab.ssvl.kth.se/thesis/files/T … report.pdf (report, see section 7.4)
http://tslab.ssvl.kth.se/thesis/files/f … tation.pdf (slides)

Offline

#13 2011-03-09 11:56:41

The 8472
Azureus Developer

Re: DHT security feature

rauljim wrote:

His main finding was that he was not able to completely censor any popular swarm.

From what i can see he used a relatively low number of nodes though. Anti-P2P companies often have several /24 subnets at their disposal and he is not taking DHT routing-hijack attacks into account.


Az dev

Offline

#14 2011-05-26 07:45:35

arvid
Administrator

Re: DHT security feature

I updated my proposal to use a more clever way of restricting hte node IDs, by not mapping it 1-1 to an IP. it still lacks a couple of things, most notably test vectors and citation to the original paper proposing this technique.

http://www.rasterbar.com/products/libto … t_sec.html

Offline

#15 2011-05-26 11:31:07

The 8472
Azureus Developer

Re: DHT security feature

and also to make it harder to snoop the network

This part certainly is not true as you don't need fine-grained coverage of small regions of key-space for general info-harvesting. Instead wide-spread coverage of the keyspace is enough, which can be achieved with even a few hundred nodes.

Consider that a node can easily do 100 requests to unique IPs with every lookup, that it performs lookups repeatedly over time and that long-lived nodes are more likely to be visited during the lookup this significantly increases the probability that a snooping node is visited during a lookup.

Specifically the attack this extension intends to make harder is launching 8 or more DHT nodes which node-IDs selected close to a specific target info-hash

Just to note, using a collusion attack it is possible to convince a node that a number of malicious nodes is the K-closest set without actually being closest to the target key. This can be achieved by triggering the look-up termination conditions (closest-set is not changing due to responses from the as-of-yet closest nodes) by significantly out-pacing the natural lookup convergence. This still requires a significant amount of node IDs, does not work on the portion of the keyspace that is close to the querying node and only works in a probabilistic manner, depending on the closeness of the initial responses returned from honest nodes, but it still should require less node IDs than strictly necessary to genuinely dominate the closest-set.

So, given the ipv6 case where i can get about 100k node IDs with a single /32 (which are relatively easy to get at the moment) it should be possible to fake almost any closest-set for several million real nodes. Note that due to the forced distribution distribution this is a all-or-nothing situation. If i can get enough nodes in one region that means i have enough nodes everywhere.

Depending on the eagerness of the lookup algorithm it might be even possible to do it with fewer nodes.


Az dev

Offline

#16 2011-06-21 01:45:00

rauljim
Member

Re: DHT security feature

Arvid, I'd be very interested to read the paper you mention. Is it available on-line?

Offline

Board footer

Powered by FluxBB