Many clients now implement the Extension Protocol, http://bittorrent.org/beps/bep_0010.html . Since the original python bdecoder was recursive, most (if not all) BitTorrent implementations copied it, and implemented it recursively. This combination exposes a potential problem which is fixed in several clients now, but maybe not all.
The messages in the extension protocol are bencoded, and a very deep structure could arrive off the network. In languages like Python and Java parsing this is not so bad; a StackOverflow exception occurs and parsing of that single message fails. However, in C/C++ a stack overflow means application death. I was, at one point a month ago, able to show that every C/C++ implementation with extension protocol support was vulnerable to this remote crash.
There are two obvious solutions. Easy: keep a depth counter, and bail after a certain depth (10 is possibly too shallow, 10000 is possibly too deep). Hard: rewrite the bdecoder to be iterative.
Indeed, my client (KTorrent) is probably vulnerable to this. Though these bugs will be quickly fixed by most clients if people with bad intentions start using this to disrupt swarms.
Since the encoding is recursive, you'll need some sort of stack to decode it, even if it's not the actual call stack. So there will still be an upper bound of some sort. I suggest agreeing to an upper bound on the acceptable structure depth, and make our programs adhere to it when creating torrents. What is the maximum depth a genuine torrent could be? Lets take that number and multiply it by ten.
Since the encoding is recursive, you'll need some sort of stack to decode it, even if it's not the actual call stack. So there will still be an upper bound of some sort. I suggest agreeing to an upper bound on the acceptable structure depth
Well, yes. With a call stack the max depth is somewhere in the tens of thousands deep. Using an iterative decoder with a heap-based stack of bencode structures you can reach tens of millions deep.
and make our programs adhere to it when creating torrents. What is the maximum depth a genuine torrent could be?
A limit of 1000 seems reasonable. Legitimate torrents can't really reach that sort of depth, at least on Windows the maximum depth is something like 260 characters, so no directory structure would be very deep. Not sure about linux / OS X.
However, bencode itself isn't theoretically limited, so we should probably assign these maximum depths to individual uses. So 1000 for .torrents, 100 for network messages, etc.
i would not standardize anything, it is not really necessary. bencoded values causing stack overflows are almost certainly artificial and far far from any realistic case, so it should be at discretion of the client implementation to deal with it, as clients may choose to use bencoding for other things too, such as storing configuration files, metadata or other internal program structures to disk.
Common sense and maybe a little footnote mentioning potential issues that should be considered when implementing a bencoder/decoder should be sufficient to cover this topic.