Notes on P2P searching

Initially written 09 Mar 2002

Updated 2003-03-01

1. Introduction

Compared to a client-server connection, peer-to-peer is many times more efficient. Instead of centralizing all the transfers on one server for every servent, users can transfer a file from someone who already has or is currently transferring the file. Bandwidth is spread across all users, which makes P2P a very exciting technology. However, current P2P implementations lacks one very important aspect: decent searching.

It makes sense to concentrate on searching P2P networks, as actual transfer of files is trivial compared to searching. This document is not concerned with downloading technologies, such as swarming, parallel downloading, and resuming.

2. Metadata

Specifications of individual files, used in searches. General metadata includes:

Filename

Most sharing programs make use of the filename, though some like Freenet use a hash instead. Common filenames are artist - title and artist - album - track no - title.

Size

In bytes, can be used to derive time required to download. Some servents like WinMX 3.0 are able to download from users who have not finished downloading the file, so you'll see things like "29% of 5,464,064".

Hash

Allows the program to quickly identify if files of the same size are identical, without checking each byte multiple times. Blubster uses MD5, but other P2P's are using SHA1 (which is superior).

Three types of metadata are:

Filesystem

Gained from filesystem, for example Filename and Size.

Internal

From within file's contents, most metadata falls in this category.

External

Stored in database somewhere separate from file, editing does not effect contents of file. Fasttrack's keywords and description are examples of this.

2.1. Audio

Audio is the killer file type for P2P networks. Pioneered by Napster, some newer sharing programs such as the 23 million user Audiogalaxy and the ever-growing Blubster only allow sharing of audio.

2.1.1. MP3 ID Tags

ID3v1 tag, exactly 128 bytes long, located at very end of file.

Title
Limited to 30 characters.
Artist
Limited to 30 characters.
Album
Limited to 30 characters.
Year
Limited to 4 characters.
Comment
Limited to 30 characters. In ID3v1.1, last byte may specify track number.
Genre
One byte, 0-115, see table.

ID3v2 tag (specifications), relevant fields include: :

2.1.2. MPEG Information

Length
Duration of audio in seconds, often converted to hh:mm:ss.ff.
Bitrate
Measured in kbps, kilobytes of data per second. The bitrate can be constant (CBR), averaged (ABR), or variable (VBR).
Frequency
Samples per second, 44100Hz, 48000Hz, or 32000Hz.
Channel Mode
Stereo, joint-stereo, dual-channel stereo, or single channel mono.
CRCs
If used, can verify data's integrity. 16 bits after every frame.
Copyrighted, Original, Private
Flag bits: set of file is copyrighted, set if not copy of original media, and set for application-specific purposes.
Emphasis
None, 50/15ms, reserved, or CCIT J.17.

3. Existing Search Capabilities

Protocol Filename Artist Title Bytes Length Bitrate Frequency Hash
Gnutella X X - - - - - -
Blubster X - - X X X X X
OpenNap X - - X X X X X
Audiogalaxy - X X X X X - -

3.1. OpenNap

Reverse-engineered protocol used by Napster. Specification.

As shown above, Napster includes:

3.2. Gnutella

Once the most metadata-lacking network (as shown above), servents are now beginning to add and recognize metadata on shared files. Yet most are limited to:

BearShare, however, adds:

LimeWire has the possibility of using XML metadata

3.3. Blubster

3.4. Fasttrack

King of metadata. This protocol has capabilities to store and transmit tons of information about many types of media and information. All file types can have:

Audio:

Document:

Image:

Other:

Software:

Video:

3.5. Shareaza

All:

Application:

Audio (broken):

Book:

Image:

Video:

URLs

magnet:?xt=urn:bitprint:SHA1&dn=filename - magnet-uri project, Bearshare, Xolox, Shareaza

gnutella://urn:bitprint:SHA1

urn:sha1:SHA1 - linked to by Bitzi

ed2k: - eDonkey2000, based on precursor to MD5

sig2dat:/// "UUHash" - FastTrack, first 300k is MD5, rest is custom

http://bitzi.com/lookup/SHA1 - publishes metadata, used by BearShare, Limewire, Shareaza, Xolox, Acquisition, Mutella, Atomwire, FreeAmp (music player)

  • JivePlayer - portable media files

    Audio part hash - allows files with different ID tags but same audio data to be grouped together. Hashing is essential to swarming. with different ID tags to be grouped together (very good)

    Valid HTML 4.0?

    Modified Sun Mar 25 08:48:47 2007 generated Sun Mar 25 08:56:33 2007
    http://jeff.tk/p2p/search.html