Distributed Query Processing on P2P Systems
Speaker: John Colquhoun
Abstract
Todays systems employing relational database management systems are based on the traditional client-server architecture. In this paper, we consider an alternative based on the P2P architecture. At present, P2P systems are mainly used for file-sharing applications, but this paper investigates the design of a P2P database, based on the popular BitTorrent file-sharing protocol. The peers cache the results of queries, and use them to satisfy the requests of other peers, so reducing the load on the server and producing more scalable systems. This paper will introduce both the concepts and the research issues relating to P2P database systems, with the differences between P2P file-sharing (BitTorrent) and a P2P database being highlighted.
In the system being investigated, a database table is initially located on one machine known as a seed. When other peers receive data from this machine, they can make it available to others. A central component, known as the Tracker, stores a record of which parts of the table each peer has. A peer requesting data receives from the Tracker a list of peers that could satisfy the query.
A simulator has been written to simulate BitTorrent and has been extended to explore the P2P database system. Various parameters, such as the upload rate of peers and system workload, can be altered and various statistics, including the number of I/Os performed and the query response time, can be measured. This system has been tested with a sample database and the paper presents results analysing its behaviour. Future work will include experiments with more complex queries and sample workloads from real client-server databases.
|