Data structures for a large-scale social network


2nd year Masters’ Student Project and Internship

Background

A social network is based on complex data structures, for instance the graph of ``buddy’’ links between users. Specific invariants need to hold : for instance, in Facebook, the buddy graph is symmetric (if X is Y’s buddy, then Y is X’s buddy). Consistency with respect to such invariants is problematic for scalability. Thus, for instance, Facebook executes all its buddy graph updates at a single data centre, which raises issues of cost, latency and availability.

We suggest to replicate data close to their users, either in geographically-distributed data centres, or in their homes in a peer-to-peer mode. For scalability, we use optimistic replication, which allows a replica to update without synchronising with the others ; the system ensures eventual consistency by reconciling conflicts between concurrent updates after the fact. Furthermore, partial replication creates a replica only if the local site is directly interested in the data.

Our Telex platform supports partial and optimistic replication. It ensures data replication, conflict detection and resolution, reconciliation, and maintains application invariants. Objectives and pre-requisites The objective is to study the requirements of social networks, and to design application-oriented data structures that are suited to optimistic and partial replication. This will start with a bibliographical study of existing social networks, of conflict avoidance techniques, and of scalability techniques. Then, an experimental model of relevant data structures and their updates shall be built. Improvements to these structures, invariants and algorithms, may be proposed in order to improve performance and scalability, while maintaining the essential characteristics of the applications. Finally, these proposals will be validated experimentally on the model.

The intern should be interested in distributed and/or peer-to-peer systems, and in social networking applications ; to have a good knowledge of distributed algorithms, and good programming skills ; to be capable of implementing large-scale experiments.

Contact and other information

Send your application to the advisors (see below) with the following information :

- A resume or curriculum vitæ

- The list of the courses taken during the last two years and the corresponding marks

- The name and contact information of two references (we will contact them ourselves.)

- A short essay (1—2 pages) on the proposed topic. It is free form, but here are some suggestions. Using the bibliography and your technical knowledge, motivate your interest in the topic and your qualifications for it ; discuss scientific and practical issues, problems and difficulties, possible solutions, relate the issues with your own experience, etc.

- Optionally : Any article, report or thesis (in English or French), and any code you wrote in the past year.

The internship will take place in the Regal group (INRIA & LIP6) at LIP6, 104 avenue du Président Kennedy, Paris, France. Duration : 3 to 6 months, depending on the requirements of the school.

Advisors

- Marc Shapiro, Senior Researcher at INRIA

- Lamia Benmouffok, PhD student in the Regal group.

Bibliography

- A commutative replicated data type for cooperative editing. Preguiça, Marquès, Shapiro, Letia. Int. Conf. on Distributed Computing Systems, Montréal (ICDCS), Canada, June 2009. PDF

- Telex : A Semantic Platform for Cooperative Application Development. Benmouffok et al. Conférence Française de Systèmes d’Exploitation (CFSE), Toulouse, September 2009. PDF