Structures de données avancées

Structures de Données avancées ( Pr. DE Zegour)

Préface

Cours

Programmes

Sujets d'examen

Références bibliographiques

PROJET ADA

Advanced Data structures Animation

S é m i n a i r e

Introduction

Arbres équilibrés en RAM

Structures de fichiers unidimensionnelles

Structures de fichiers multidimensionnelles

Structures de fichiers distribuées

Télécharger toutes les présentations

Préface

Les structures de données constituent un concept vital dans la science des ordinateurs : quelque soit le problème que l’on veut automatiser, on se trouve toujours confronter au choix des structures de données les mieux adaptées à l’implémentation de sa solution.

Nous envisageons deux volets fort intéressants:

- les arbres de recherche binaires équilibrés en RAM

- les structures de fichiers

Les arbres constituent les structures de données les plus utilisées en RAM. Ces derniers sont efficaces seulement quand ils sont équilibrés.

Bien que les arbres 2-3 et 2-4 sont équilibrés par construction, ils consomment un espace non utilisé du aux nœuds non complètement remplis. Afin d'éviter ceci, les arbres de Anderson ( arbres AA) et de Sedgewick (arbres Red Black) ont été proposés comme des alternatives aux arbres BB (Binary B-trees) et SBB (Symmetric Binary B-trees) qui ne sont en fait que des représentations en arbre de recherche binaire des arbres 2-3 et 2-4 respectivement. En dépit de plusieurs propositions d'arbres de recherche binaire équilibrés, les arbres d'Adelson/Velscii/Landis (arbres AVL), historiquement les premiers, restent toujours les arbres les plus équilibrés.

Dans tous les sortes d'arbres cités, les algorithmes ont pour rôle de maintenir dynamiquement l'équilibrage de l'arbre. Une autre catégorie d'algorithmes permettent d'équilibrer l'arbre à postériori et périodiquement. L'algorithme de Day/Stout/Warren (DSW) is certainement l'une des méthodes les plus efficaces.

Les structures de fichiers peuvent être étudiées de plusieurs points de vue :

- Organisation directe(table d'index ou hash-code) ou organisation d'arbre ( tables d'index hiérarchisées

- Simple attribut, plusieurs attributs

- Statique ou dynamique

- Préservation de l'ordre ou pas

- Local ou distribué

Aussi, le module que nous proposons couvre trois catégories de structures de fichiers : Les structures de fichiers unidimensionnelles , multidimensionnelles et distribuées.

Afin de donner plus de clarté et plus de facilité pour ce cours, nous avons choisi trois structures de données uni-dimensionnelles de pointe : B-arbres, LH ( Linear Hashing) et TH ( Trie Hashing ). Ensuite, nous leur avons donné les généralisations dans les cas multidimensionel (LHM : Multidimensional Linear hashing, MTH : Multidimensional Trie Hashing et MBtrees : Multidimensional B-trees ) et distribué ( LH*, TH* et RP*).

Structures de fichiers uni-dimensionnelles

Les structures de fichiers uni-dimensionnelles permettent l’accès aux fichiers par une seule clé dite clé primaire. Il existe deux classes de structures de fichiers : celles à base de hachage et celles à base d’arbre.

L’une des structures de fichiers les plus populaires est celle des B-arbres et de ses variantes. Ce sont des structures de données qui préservent l’ordre des articles et de plus elles ont un comportement dynamique.

Vers les années 80, une nouvelle forme de hachage avait vu le jour grâce aux travaux de Litwin et Larson. Cette nouvelle forme de hachage, appelée hachage dynamique, a bouleversé les idées sur le hachage au sens de Knuth qui sont caractérisées par le fait que la fonction de hachage est une fonction mathématique produisant des adresses aléatoires dans un intervalle limité. Parmi les premiers travaux sur le hachage dynamique, on peut citer LH ( Linear Hashing) et TH ( Trie Hashing).

Structures de fichiers multi-dimensionnelles

Les méthodes traditionnelles pour l’accès aux données multidimensionnelles ( par plusieurs critères ) nécessitent l’emploi des tables d’index secondaires organisées sous forme de listes inversées. Ces méthodes se sont avérées très coûteuses en espace et en temps. Des méthodes modernes ont été apparues entre les années 80 et 90. La plupart des méthodes sont à base de hachage dit hachage dynamique. Ces méthodes n’utilisent pratiquement pas d ‘index et permettent un accès très rapide à l’information. Nous présenterons trois méthodes : MHL, HTM et MB-trees ; M pour multidimensionnel .

Structures de fichiers distribuées

Une nouvelle classe de structures de données a émergé cette dernière décennie. Cette classe a été baptisée SDDS : Scalable Distributed Data Structures. Elle est caractérisée par l’absence d’un site maître, l’absence de dialogue entre les clients et la scalabilté. Cette dernière caractéristique stipule que le fichier peut grandir indéfiniment sans dégradation des performances des opérations sur le fichier. Les SDDS utilisent des machines à partage de rien ( nothing sharing ). Ces machines constituent ce qu’on appelle un multiordinateur ( réseau d’ordinateurs ). Nous commençons par décrire les SDDS et par rappeler quelques termes relatifs aux réseaux avant de présenter trois classes très différentes de SDDS : Lh*, Th* et Rp*.

Autres structures de données

A coté de ces structures de données, il existe celles qui sont spécifiques à un domaine précis. Aussi les 'quadtrees', par exemple, sont des structures de données orientées images, les 'R-trees' sont des structures de données orientées données spatiales, etc.

Cours (PPT)        Préface          Programmes      Sujets d'examen    Références bibliographiques

PARTIE 0 : Introduction & Pré-requis

1. Introduction

2. Pré- requis : Structures de données classiques

3. Pré- requis : Structures de fichiers classiques

PARTIE 1 : ARBRES équilibrés EN RAM

4. Arbres AVL

5. Arbres 2-3

6. Arbres 2-4

7. Arbres RB

8. Arbres AA

9. Arbres LLRB

PARTIE 2 : STRUCTURES DE DONNéES UNIDIMENSIONNELLES

10. B-arbres

11. B+-arbres

12. B+-arbres avec expansion partielle

13. Hachage dynamique

14. Hachage linéaire

15. Hachage digital

PARTIE 3 : STRUCTURES DE DONNéES MULTIDIMENSIONNELLES

16. Concept du multidimensionnel

17. B-arbres multidimensionnels

18. Hachage linéaire multidimensionnel ( LHM )

19. Hachage digital multidimensionnels ( THM )

Animation

PARTIE 4 : STRUCTURES DE DONNéES DISTRIBUéES

20. Structures de données distribuées

21. Concept réseau

22. Hachage linéaire distribué ( LH* )

23. B-arbres distribués ( RP* )

24 Hachage digital distribué ( TH* )

Programmes             Préface          Cours        Sujets d'examen    Références bibliographiques

Fonction de mapping dans LHM
Tableaux extensibles
Sujets d'examen      Préface        Cours     Programmes      Références bibliographiques
Sujet 2003 - Sujet 2004   - Sujet 2005(1) - Sujet 2005(2) - Sujet 2006 - Sujet 2007 (1) - Sujet 2007(2) -

Sujet 2008 - Sujet 2009 - Sujet 2010   - Sujet 2011 - Sujet 2012 - Sujet 2013 - Sujet 2015 - Sujet 2016

Références bibliographiques            Préface           Cours     Sujet d'examens    Programmes


ADEL’son - VELSKII, G. M., and Y. M. LANDIS [1962] “An algorithm for the organization of information”.
ANDERSON A. [1993] “Balanced Search trees made simple”. Workshop on algorithms and data structures, pages 60-71, Springer Verlag.

Bennour, F. Diène, A. W. Ndiaye Y. Litwin, W. Scalable and Distributed Linear Hashing LH*LH under Windows NT. SCI-2000 Orlando, Florida, USA. July 23-26, 2000.

Bayer, R. and McCreight, E. Organization and Maintenance of largeorde red indexs. Acta Informatica, 1:173-189, 1972.

Bayer, R [1972] “Symmetric Binary B-Trees”: Data structure and maintenance algorithms. Acta Informatica 1(4):290-306, 1972.

Burkhard W.A. Interpolation-based index maintenance. BIT 23 (1983), 274-294.

Devine, R. Design and Implementation of DDH: Distributed Dynamic Hashing. Int. Conf. on Foundations of Data Organizations, FODO93. 12 VLDB1994, Chile. Lecture Notes in Comp. Sc., SpringerVerlag (publ.), Oct. 1993.

Diène, A. W. Litwin, W. Performance Measurements of RP*: A Scalable Distributed Data Structure for Range Partitioning. 2000 Intl. Conf. onInformation Society in the 21st Century: Emerging Techn. and New Challenges. Aizu City, Japan, 2000.

Diène, A. W. Litwin, W. Implementation and performance measurements of the RP* Scalable Distributed Data Structure for Windows Multicomputers. PADDA 2001. Munich, Germany, April 19th-20th,2001.

GUIBAS, L.J. and SEDGEWICK R [1978] “A dichromatic framework for balanced trees”. Proceedings of the 19th Annual Symposium on Foundations of Computer Science (1978).

J. S. Karlson, W. Litwin and T. Risch. LH*LH: A Scalable High Performance Data Structure for Switched Multicomputers. In Advances in Database Technology - EDBT'96, pages 573-591, Avignon, France, March 1996. Springer.

Kroll, B., Widmayer, P. Distributing a Search Tree Among a Growing Number of Processors. To app. at ACMSIGMOD Int. Conf. On Management of Data, 1994.

P. A. Larson. Dynamic Hashing. BIT, pages 184-210, 1978.

Litwin, W. Linear Hashing : a new tool for file and tables addressing. Reprint from VLDB-80.

Litwin, W. Redundant Arrays of LH* files for high availability and security. Techn. Note GERM Paris 9 & Distributed Inf. Techn. Dep. HPL Palo Alto, Sept. 1995.

Litwin, W., Karlsson et Risch J. LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers. T. Intl. Conf. on Extending

Litwin, W., Menon, J., Risch, T., Schwarz, Th. Design Issues For Scalable Availability LH* Schemes with Record Grouping. DIMACS Workshop on Distributed Data and Structures, Princeton U. Carleton Scientific, (publ.)1999.

Litwin, W., Neimat, M-A. k-RP*S : A Scalable Distributed Data Structure for High-Performance Multi-Attribute Access. Res. Rep. GERM Paris 9 & Distributed Inf. Techn. Dep. HPL Palo Alto, April 1995.

Litwin, W., Neimat, M-A. High-Availability LH* Schemes with Mirroring. Intl. Conf. on Coope. Inf. Syst. COOPIS-96, Brussels, 1996.

Litwin, W. Neimat, M-A., Schneider, D. LH* : Linear Hashing for Distributed Files. ACM-SIGMOD Int. Conf. On Management of Data, 1993.

Litwin, W., Neimat, M-A., Schneider, D. LH*: A Scalable Distributed Data Structure. (Nov. 1993). Submitted for journal publ

Litwin, W. Neimat, M-A, Schneider, D. RP*: A Family of Order-Preserved Scalable Distributed Data Structures. 20th Intl. Conf. On Very Large Data Bases (VLDB), 1994.

Litwin, W. Schwarz, Th. LH*RS: A High-Availability Scalable Distributed Data Structure using Reed Solomon Codes (CERIA Res. Rep. & ACM-SIGMOD, 2000, Dallas)

Lanin, V., Shasha, D. A Symmetric Concurrent B-tree Algorithm. Proc. FJCC, 380-389. (1986).

Litwin, W., W. and Schwarz, J. E.: LH*RS: A High-Availability Scalable Distributed Data Structure using Reed Solomon Codes. CERIA Res. Rep. 99-2, U. Paris 9, 1999.

Litwin, W., Risch, T. LH*g : a High-Availability Scalable Distributed Data Structure through Record Grouping. Rees Rep. CERIA, U. Dauphine &U. Linkoping (May. 1997).

Lehman, T., Yao, S. Efficient Locking for Concurrent Operations on B-trees. ACM TODS, 6(4) December 1981.

Litwin, W. Linear hashing : A new tool for files and tables addressing. VLDB 80, ACM, (Sep 1980),

Litwin, W. Trie hashing. SIGMOD 81. ACM, (May 1981), 19-29.

Litwin W.Trie hashing: Further properties and performance. Int. Conf. F.D.O Kyoto May 1985.

Litwin, W., Lomet, D. Bounded Disorder Access Method. 2-nd Int. Conf. on Data Eng. IEEE, Los Angeles, (Feb. 1986).

Litwin, W. Neimat, MA., Schneider, D. LH* : Linear Hashing for Distributed Files. ACMSIGMOD Int. Conf. On Management of Data, 1993.

Litwin, W., Neimat, MA., Schneider, D. RP* : A Family of Order Preserving Scalable Distributed Data

Structures. Proc. Of 2O th conf. VLDB, chile 1994

W. Litwin, T.J.E Schwarz. LH*RS : A High-Availability Scalable Distributed Data Structures using Reed Solomon Codes. ACM-Sigmod 2000, Dallas

Lomet, D. Bounded Index Exponential Hashing. ACM TODS, 8, 1. (Mar 1983), 136-165.

Matsliach, G., Shmueli, O. Distributing a B+tree in a loosely coupled environment. Inf. Proc. Letters, 34, 1990, 313321.

Matsliach, G., Shmueli, O. An Efficient Method for Distributing Search Structures. IEEEPDIS Conf., 1991.

Ndiaye, Y. Diène, A. W., Litwin, W. Risch, T. Scalable Distributed Data Structures for High-Performance Databases, WDAS 2000, Italia.

Ndiaye, Y. Diène, A. W. Litwin, W. Risch, T. AMOSSDDS: A Scalable Distributed Data Manager for Windows Multicomputers. 14th International Conference on Parallel and Distributed Computing Systems (PDCS-2001). Richardson, Texas, USA, August 8-10, 2001.

Otoo E.J. Multikey trie hashing for scientific and statistical databases. CODATA (North Holland) 1987.

Ouksel, M.   Scheuerman,   P.   Storage Mapping for Multidimensional Linear Dynamic Hashing. PODS 83. ACM, (March 1983), 90-105.

Perrizo, W., Lin, J., Hoffman, W. Algorithms for Distributed Query Processing in Broadcast Local Area Networks. IEEE TKDE, 1, 2, 1989, 215225.

SEDGEWICK R. [2008]: “Left leaning Red Black trees. http://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf

Torenvliet, L., Van Emde Boas, P. The Reconstruction and Optimization of Trie Hashing Functions. VLDB 83, (Nov. 1983), 142-157.

D.E Zegour, Litwin W, and Levy G. Multilevel Trie hashing. Int. Conf. VLDB, Venise, Italy. 1987.

D.E Zegour, W. Litwin. Trie hashing with the sequential representations of the trie. Revue internationale des technologies avancées. CDTA, Alger. 1994.

D.E Zegour, W.K Hidouci. Comparison of B-trees and trie hashing for multidimensional access. 4th Maghrebian conference on software technology and artificial intelligence. Algiers 96.

D.E Zegour, D. Boukhelef. Hi* : A new hash-based multidimensional access. WDAS 2002. Workshop on Distributed Data structures. Paris 21-22 –23 mars 2002

D.E Zegour. Adaptation of trie hashing for distributed environments. WDAS 2003. Workshop on Distributed Data structures. Greece 13 – 14 June 2003

D.E Zegour Scalable Distributed Compact Trie Hashing. Information Software Technology. Elseiver, 2004