Peer-to-Peer (P2P) overlay network commonly does not require any central server component for data storage and lookup. This is in sharp contrast to client-server based architectures. Central servers do not scale well in terms of numbers of clients, render the system vulnerable due to the single point of failure, and are expensive to maintain. Due to these disadvantages of client-server based systems, P2P systems have recently greatly gained in popularity. P2P overlay system work on top of the IP network, typically the Internet. P2P overlays feature wide area routing, efficient search of data items, redundant storage, massive scalability, and fault tolerance.
Within P2P there are two different types of networks: structured and unstructured. In unstructured overlays (Gnutella, Kazaa), there is no fixed structure to the topology of the overlay, whereas in structured networks, a Distributed Hash Table (DHT) is used. Unstructured networks are very flexible and do not exhibit any significant amount of maintenance traffic. However, searches in unstructured overlays typically employ flooding algorithms or random walk algorithms. With flooding algorithms there is a good chance the data is found in the network, but at a cost of high volume of traffic. Random walk approaches may not find data which is not duplicated often in the network. However, unstructured overlays support wildcard searches. Searches in unstructured overlays are often referred to as Blind searches as the approach is not affected by what is actually looked for.
Structured overlay networks (Chord, CAN, D1HT, EpiChord, and Tapestry) use DHTs to structure the nodes and to assign data items to particular nodes. In such systems, each node is assigned a unique node ID. This is typically generated encoding their IP address with a secure hash function, such as SHA1. Likewise, data to be stored is assigned a file ID. Again this is generated applying a hash function on the file name or similar keyword. Each node stores data whose ID falls in a certain section of the overall ID space. Nodes locate content using a protocol, often also referred to as routing algorithm. Structured systems include short path lengths to insert/retrieve data and the guarantee that existing data will be found in the network. Structured searches are not blind as the routing of the search is based on the hash of the search string. However, structured overlays do not support wildcard searches where not the full name of the data item is known. An abbreviated name will result in a different hash which bears no relation to the hash of the data item.
This project will investigate wildcard searches in structured P2P overlay networks.
There is some ongoing research on blind search algorithms in structured overlays. However, current blind-search algorithms employ multiple indexing of data items creating a very large key base. Unlike previous work, the approach proposed here is to employ blind searches in the overlay by means of intelligently broadcasting the search. At the first glance this appears to create an enormous traffic overhead in the overlay as a message will need to be sent to every node. However, some overlays employ intelligent maintenance algorithms to inform nodes of a node joining or leaving the network. One such maintenance algorithm is EDRA* used by D1HT. This project will investigate the viability using such maintenance algorithms for wildcard searches in structured P2P overlays. An important research question will be finding solutions to minimise the message overhead.