Designing And Implementing A Bittorrent Client
For CS3, my class titled "Introduction to Software Engineering", we are choosing a final project to implement in Java.
To me, the software development process is still pretty mysterious. How does an application with tens of thousands of lines of source code get written? The purpose of this series of entries will be twofold: first, I hope it wil help keep my thoughts organized and help streamline the process of writing the app, and second, I hope the next undergrad or high schooler or middle schooler who comes along and wants to read about bittorrent application design will have something well documented to read about. This first post is just to give an overview of our application and what we think will happen.
Anyways, my group, which consists of two freshmen, myself, and a senior, chose to implement a BitTorrent client. This is pretty funny, after the Institute sent us this email: (thanks for protecting us, Caltech. I'm going to assume that since this was sent to every undergrad, it's pretty much public.) Please pay special attention to the following notice from the Caltech Office of the General Council about downloading and public sharing of copyrighted material, especially the second paragraph. Jo hn Ha ll Dea n of Stud ents
Dear student,
From the Office of the General Council: Students should know that content owners (of copyrighted materials such as music recordings, movies, TV shows, electronic books, games, software and similar files) object to the illegal downloading and public sharing of their material. Recently, copyright holders have become much more active in asserting their rights against those who illegally download and share their materials using peer to peer (P2P) file sharing software. Recently, we received Preservation Notices asking us to preserve information about the downloading and public sharing of music files by three students at Caltech. These Notices are an alert that a subpoena may be issued to Caltech, requiring us to identify the students who were using the specified IP addresses to illegally share the material. If these subpoenas are served, Caltech will be compelled by law to identify the names and addresses of the students and a lawsuit against the students may follow. These lawsuits can cost the person responsible for the downloading and sharing tens of thousands of dollars in damages and attorney fees. An alternative often used by the content owners is to email a university like Caltech what is called an “early settlement letter” and ask that we forward this letter to the person responsible. We understand from other universities that these letters have requested an amount of at least several thousand dollars (which would not include any attorney fees a student incurs) to settle the dispute and avoid a lawsuit. While some might question whether the copyright law should restrict the sharing of music and other content, the fact is that P2P sharing of copyrighted content is illegal. As you can see from the above, this can be a very expensive way to listen to music, or watch a movie or a TV show. We urge you to not download copyrighted material unless you are certain you may do so legally, and to remove any P2P file sharing software from your computer.
Did you see that last line?
"We urge you to not download copyrighted material unless you are certain you may
do so legally, and to remove any P2P file sharing software from your computer."
Alright! We're designing banned software for our final project!
Well anyways, as we were researching our project and how much work exactly we would have to do, we tried to find a well documented minimalist bittorrent client. As of yet, we haven't found one. Azureus and Transmission are open source, but the source code is pretty intimidating.... I opened up five alphabetically consecutive files in the Azureus source package... none of the five had any comments. We haven't been able to find much good information about how these are designed.
As we thought more about the design of the client, we started to break it down into classes.
If you don't know what BitTorrent is, it's a protocol for distributing files that does not require a central server to operate. One user creates a .torrent file from a source file on his computer, say a folder of MP3s or a CD image. The torrent file contains information about the directory structure of the files in the torrent, information about the HTTP server that keeps track of connected peers, and a code for checking each piece of a file received. The files are distributed in small blocks, around 32KB each, and checked to be valid in pieces of 32KB-4MB each. If you're still curious, pictures help. Luckily, bittorrent.org has some of those! That's also where the official specification of the protocol lives.
So, We'll probably have a class for dealing with the file I/O operations like reading .torrent files, writing .torrent files, hashing pieces of the file, and all that. And we'll have a network I/O class for talking to trackers, peers, requesting pieces, sending and receiving status messages, and all that good stuff. And we'll probably have classes for dealing with and keeping track of the numerous peers and their current status.
At first glance, this kind of seems like a lot of work. After thinking about it some more, it still does seem pretty complicated. Before writing a single line of code, here's what I think we need to do. Assuming we'll use a command line interface,
a.) parse command line arguments
b.) store names of torrents to add, from command line.
c.) load previously seeding/leeching torrents from file
d.) talk to trackers about previously active torrents
e.) look for peers for previously active torrents
f.) start transferring data
g.) load and parse torrent files specified on command line
h.) perform d.), e.), and f.) for the new torrents
i.) rinse, dry, repeat.
We'll be using preexisting libraries for the .torrent file encoding.
One of my first questions was, (after "how can we make money from this") was, "how do you deal with all of those connections?" Each torrent (and we might have a handful or a few hundred open) can make hundreds of connections to other peers itself. Each connection can be a taxing process, what with the peer sending its status, us responding with a piece request, receiving data, checking that the data is valid, writing it to a file, and repeating. Do you spawn a new process or thread for each connection? That seems absurd. How do you manage all the sockets? Well, it turns out that Java has the java.nio.channels class, which does multiplexed I/O for sockets. I'm not sure how this works, and I have no idea how it works in other languages, but I assume that this will do what we need. I also assume that most of my future posts about this will deal with java.nio.channels.
One of the frosh in our group has started work on a GUI in Netbeans, but as of yet, there is no code written for this project at all. I will post updates soon, this is due in a few weeks. (; Trust me, I'm just as eager to find out what happens as you are!
Stay safe, you pirates of the interweb. And let me just remind you that there are many "legitimate" uses for BitTorrent.
-robert


1 Comments:
Hi honey, I got the t-shirts, they are awesome! I love them, thank you, thank you, thank you! I got your papers, grades are low--work harder would ya? lol I love you. send me your email address so I can email you without everyone reading it, ok? ~~~mom
Post a Comment
Links to this post:
Create a Link
<< Home