You are not logged in.
I've updated this document with your feedback: http://www.rasterbar.com/products/libto … t_rss.html
1. Does the signature need to be specified in advance? Because ECDSA over Curve25519 hasn't been implemented, much less time-tested, I'm not comfortable with it. I don't mind using OpenSSL and FIPS 186-3 P-256; I assume you are worried about potential patent infringement? It seems that just specifying which signature type is being used satisfies everyone, and allows it to be extensible in the future.
I think RSA would be reasonable.
2. I would really like to see individual feed items time out of the swarm. I'm currently running a few simulations of different drop patterns against different algorithms for assigning skip-list pointers to see how they fare. It seems like a possibly viable solution, but if we time out items in an undetermined order, most skip list implementations don't act well.
I would imagine that a pretty common use case would be to keep items alive for as long as possible, i.e. until the interest in them has entirely died out. I would imagine you could specify timeouts in the items though, say, as posix time, at which time the item is dropped and no longer accepted by nodes.
3. It shouldn't matter too much what the actual document content is, but to the extent that this RFC targets a single document type, Atom is more flexible and extensible than RSS. Again, since you're not enforcing a single type in the spec, this is just a side note.
Right, I think it makes sense to specify the fundamental "tags" for torrents, and then it's possible to pull in names of tags from Atom or RSS to extend the metadata for each item. Though, a 1-1 mapping of names may not be practical.
4. Keeping virtual torrents open, or using a gossip algorithm over DHT neighbors might introduce less load than clients periodically polling for new information on feedname+key. Using gossip over DHT is something we've already tried, and might stick to for the sake of speed in getting information out.
In order to store keys in the DHT, you need a write token from the node you're writing to, which means you need to do a get_item at some point.
You mean using gossip over the DHT to find out whether or not a key needs to be re-announced?
Btw. I checked in my implementation (and the source document I linked to in this post) to libtorrent trunk. It doesn't verify any signature yet, and it's only the storing end of the DHT node, not the announcing (yet).
Offline
I've updated this document with your feedback: http://www.rasterbar.com/products/libto … t_rss.html
I see that you altered my republishing algorithm, but you did not preserve all its properties.
The goal was to always republish one or more entries, as needed. Now you republish zero or more entries. The trick behind republishing at least 1 entry is that given enough subscribers we'll always refresh all entries before they expire. If we wait for data being lost then there may be gaps in the coverage.
It's late here, i'll check the other parts tomorrow.
Az dev
Offline
I had a brilliant idea when i got up today. We can do away with distinguishing between HEAD/ITEM and at the same time also gain mutability for all elements.
The data structure format needs to be changed to the following:
item
key
uid | name
seq
ptr
...
sig
uid is a random 8byte binary string.
entries are checked for validity the following way:
1. (dht-id == sha1(uid+key)) or (dht-id == sha1(name+key))
2. verify_signature(key,sig,bencode(bdecode()["item"]))
3. a) seq == old_seq -> just update timeout
3. b) seq > old_seq -> replace data
This way all entries will only be identified by their pubkey and 1 unique field (name or uid). This means they are mutable, should be practically collision-free and we don't really need to distinguish between cases apart from the DHT id checking.
I also renamed the "next" entry into "ptr" in case different data structures, such as trees or doubly linked lists, would be implemented using the storage since this is publisher and subscriber-side logic and not storage-node logic.
Az dev
Offline
Patents are an issue with point compression which is needed to keep the signatures compact. If there are other schemes that only require the X-coordinates on the curves those would work too.
FIPS 186-2 specifies a minimum of 40 bytes for a 160-bit EC signature; the OpenSSL implementation uses 40 + 6 (ASN.1) bytes. What signature schemes use less? If you're talking about the coordinate compression for the efficiently of computing and verifying signatures, I don't know how the OpenSSL implementation works, but I suspect that it is much slower than Certicom's implementations. My point was that we needn't define the signature type in the spec, and allowing it to be replaceable might be a good thing.
Dealing with missing entries would be a failure recovery mode atm, not an intended feature. If you actually want to remove entries we basically have to make them mutable to alter the skiplist after the first publication.
Right, I was assuming that the feed owner obsoletes items periodically, but doesn't know at the time that they are posted when that will be, and testing to see how a skip list would handle that. My needs may diverge enough from the main purposes of libtorrent that I just have to keep a separate fork, but I expect to replace items periodically.
DHT wide gossip certainly is not an option, since it would be too unspecific.
What I had in mind was gossiping only among subscribers, which involves each client knowing which of their neighbors subscribe to the same channels the client does and passing enough metadata around to prevent flooding or too many redundant messages. Again, this may be something that you have no interest in, but it will cause faster response and less network congestion (than polling for updates) in the clients I'm working on. My only worry with virtual torrents is that it would be less robust to high rates of node failures and targeted attacks.
I checked in my implementation (and the source document I linked to in this post) to libtorrent trunk.
Thanks!
This way all entries will only be identified by their pubkey and 1 unique field (name or uid)
I'm likely confused. Do you mean that the data structure contains both a uid and a name, or either at the submitter's discretion? I assume the latter; this also means that all entries are stored under both keys, since nodes can't tell which will be used to access an entry, right? This might not be a problem, but in case I'm confused, I thought I should ask.
Offline
FIPS 186-2 specifies a minimum of 40 bytes for a 160-bit EC signature; the OpenSSL implementation uses 40 + 6 (ASN.1) bytes.
I'll have a look at the standard then.
What I had in mind was gossiping only among subscribers, which involves each client knowing which of their neighbors subscribe to the same channels the client does and passing enough metadata around to prevent flooding or too many redundant messages. Again, this may be something that you have no interest in, but it will cause faster response and less network congestion (than polling for updates) in the clients I'm working on. My only worry with virtual torrents is that it would be less robust to high rates of node failures and targeted attacks.
Like i said, virtual torrents would require totally different connection behavior (using random walks) than normal ones.
Anyway, the DHT doesn't support any kind of gossip right now since nodes interested in similar data cannot find each other. It would require a pile of new features to support that.
I'm likely confused. Do you mean that the data structure contains both a uid and a name, or either at the submitter's discretion? I assume the latter;
The latter.
this also means that all entries are stored under both keys, since nodes can't tell which will be used to access an entry, right?
No, you would only store either a name+key or a uid+key entry, depending on your needs. name+key entries serve as HEAD entries, they are locateable by humans. uid+key entries serve as leaf nodes stored in randomized locations in the DHT.
Az dev
Offline
Ok, i think my updated proposal is reasonable now. Comments are welcome.
Here's a patch against arvid's version:
http://dhtrss.pastebay.com/pastebay.php?dl=115163
And here the patched one:
http://dhtrss.pastebay.com/115164
Az dev
Offline
I wrote up some comments on that RFC.
The most major problem I see is truncation of a SHA-256. AFAICT there is never a validation of the entire SHA-256, only the high 160 bits. There might be literature on the collision resistance of truncated SHA-256 but I have not seen any.
My review inline:
http://dhtrss.pastebay.com/115225
I also wrote my own proposal for the low level get/put functions.
http://cjdns.pastebay.com/115023
Basic properties include:
* Static content is stored as static thus removing the need for keys with each entry.
* Target id is first 20 bytes of public signing key (if mutable) or SHA-256 (if static) decreasing packet overhead and forcing publishers to use a different key for each mutable entry which, for security reasons, they should be doing anyway.
* Requesting node receives entry from storing node _without_ key (or hash if static), the requester must know the key, not take the storing node's word for it or rely on a truncated SHA-256.
* No pointer entries, these can and should be implemented at a higher level.
* Supports multiple cryptographic and hash algorithms providing a path forward as they become outdated. Packet overhead is so small enough a 768 byte entry should fit in a packet even with RSA-2048 key and signature! Default crypto algorithms should be NIST p256 and RSA-2048 for a good modern algorithm as well as an old standby.
Caleb
Offline
Caleb, those pastes have expired now. Would you mind posting them directly in the forum or somewhere more persistent?
Offline
Before I update my spec... i'd like to ask developers present here what they would be more likely to implement:
a) a generic, versioned, signed get/put and pointer RPC API on the DHT
b) something less flexible/potentially more traffic-heavy using modified bittorrent connections (limited broadcasts via random walkers on always-on virtual torrents), DHT would only serve as a way to join the virtual torrent
Az dev
Offline
Sorry, I didn't notice your message, if you're looking for me I'm always on irc.
That is my proposal:
https://github.com/cjdelisle/cjdns/blob … TStore.txt
It is partially implemented here:
https://github.com/cjdelisle/cjdns/blob … HA1Store.c
https://github.com/cjdelisle/cjdns/blob … reModule.c
and more interestingly the test is here:
https://github.com/cjdelisle/cjdns/blob … ore_test.c
Unfortunately I don't seem to have the review anymore.
Offline
@The 8472: Personally I would prefer option (a).
@Caleb: Is there a good reason to require implementations to support all of sha-1, sha-256, dsap and rsa? It would simplify implementations a lot by picking one of each and stick to it.
I like the idea of separating get/put and getm/putm. I would imagine they would be separate types. i.e. you can't store something with "putm" and get it with "get".
would it make sense to allow "v" to be any bencoded type (not just a string), and the hash is the hash of the encoded form.
That way, uses of this extension would blend in nicely with the encoding of the message.
I also think the "token" argument should be passed inside the "a" dict, to be consistent with "announce_peer".
Offline
i think mutable/non-mutable can be solved with a single query simply by properly specifying the invariants that nodes have to verify. With a properly placed OR we can both support sha1-only, non-mutable storage and signed, mutable storage with a single request.
I also don't want clients to implement tons of new RPC calls when they essentially do all the same with just minor variations. That's what parameters are good for. This also applies for the cryptographic primitives.
The thing is we have to specify binary formats for asymmetric primitives, all clients have to implement them in case someone decides to use a different one and yet we only need one at a time.
Considering that collision attacks are not a real issue and pre-image attacks on hashes are far out there compared to collisions then the real issue would be a broken private key of the asymmetric primitive. The damage of a compromised private key is the same whether we have a stronger backup function or not: Users would potentially auto-download malicious content. So the damage would be already done.
I mean, transitioning faster than we usually do would be nice, but the implementation costs are quite high for something we won't need for years to come.
Az dev
Offline
regarding using separate RPCs for mutable/non-mutable; I mostly like the idea of differentiating the concepts. I agree that it is nicer to just add two new RPCs.
In particular, I think mutable and immutable should be different *types*, by which I mean they should be stored in separate containers in the local node, and have separate limits on how many are allowed to be stored, and be evicted separately.
I haven't thought it all out yet, but it feels like a good idea to not have one able to overrun the other, just like these generic items should not be able to push out torrents.
Speaking of eviction, maybe the BEP should mention some ideas of how to determine what to evict when adding new items. The way I have implemented this in libtorrent is by keeping a bloom filter and a counter for each item, containing all IPs that have announced it. When evicting, items with more nodes announcing are prioritized. It seems like a reasonable general concept. Maybe it can be improved in some way.
I suppose there could be limits on how many items any single IP is allowed to announce as well.
Offline
@Caleb: we should probably not assume that MTU is 1500 bytes, 1280 seems like more reasonable, to avoid fragmentation on teredo.
Also, in a previous proposal, The 8472 suggested (iirc) having the target for mutable keys be the hash of the public key + name, where name was the name of the feed. The benefit would be that you could use a single key pair for multiple key-value pairs. Now, name of a channel doesn't make sense for the generic case, but maybe we could add some other key that can be combined with the public key.
Offline
regarding using separate RPCs for mutable/non-mutable; I mostly like the idea of differentiating the concepts. I agree that it is nicer to just add two new RPCs.
In particular, I think mutable and immutable should be different *types*, by which I mean they should be stored in separate containers in the local node, and have separate limits on how many are allowed to be stored, and be evicted separately.
Considering that they need to keep track of mutability anyway it's not absolutely necessary to store them in separate containers. you can just sieve them out if you need to. But that's really an implementation detail
And considering that we're using cryptographic methods with 80bit birthday space here it's not really necessary to avoid collisions by creating separate namespaces.
Speaking of eviction, maybe the BEP should mention some ideas of how to determine what to evict when adding new items. The way I have implemented this in libtorrent is by keeping a bloom filter and a counter for each item, containing all IPs that have announced it.
Since my proposal contains strategies to reduce the number of concurrent put operations on a single key that wouldn't be a good measure.
Keeping the most-requested stuff around might work better but would ultimately allow people with more resources to push out other data.
I think first-come, first-serve basis, i.e. preferring the longest-lived data just like routing table entries is the safest approach, as long as they get refreshed of course.
Also, in a previous proposal, The 8472 suggested (iirc) having the target for mutable keys be the hash of the public key + name, where name was the name of the feed.
pubkey + id. Where id can either be a randomly generated string or a human-readable one (with preference for random ones wherever possible).
That way i was able to achieve two properties:
a) decouple the target ID from the data (which is necessary for mutable data)
b) made it easy to many (billions) of possible target IDs (both human-readable and purely random) for a single public key.
My idea for dealing with mutable and non-mutable data would be the following, nodes simply have to check the following invariants:
target ID == hash(payload) || (new_version >= current_version && target ID == hash(pubkey, uid) && check_signature(pubkey, sig,payload))
The first case is for immutable content and doesn't need any signatures, the 2nd case is fore mutable content.
Az dev
Offline
Trying to capture the spirit of caleb's proposal, I wrote this up (and implemented it in libtorrent):
http://www.rasterbar.com/products/libto … store.html
It's simpler by only using SHA-1 as hash function and only use RSA-2048 for signatures and by just adding "put" and "get". I also fixed up some of the things I mentioned in earlier posts, like sticking "token" inside the "a" dict and allowing "v" to be of other types than strings.
The thing I haven't done (yet) is having a separate ID to hash the public key with. This has another side effect of simplifying the protocol since the public key doesn't have to be sliced (as in "target" + "k" in get messages).
Comments?
Offline