A simple Database management system

Related tags

Database CreatorDB
Overview

总览

img

在开始 simpledb 旅途之前, 我们先从整体上来看看

SimpleDb 是一个 DBMS 数据库管理系统, 包含存储, 算子, 优化, 事务, 索引 等, 全方位介绍了如何从0实现一个 DBMS, 可以说, 这门课是学习 TIDB 等其他分布式数据库的前提.

项目文档:

lab1 - Storage

image-20211003151458924

lab1 主要涉及存储 -- 也即和各种 file, page, bufferPool 等打交道

  • TupleDesc: td 描述了一个表每一列的元数据, 也即每个列的类型等等
  • Tuple: 代表了一行的数据
  • Page: 代表一个表的某个 page, page 由 header 和 body 组成, header 是一个 bitmap, 记录了body 中哪个位置是存在数据的. body 中存储了一个个 Tuple
  • DbFile: SimpleDb 中, 一个 Table 用一个 file 进行存储, 每个 file 包含了若干个 page
  • BufferPool: SimpleDb 的缓存组件, 可以搭配 Lru 缓存, 效果更佳. 是整个系统最核心的组件, 任何地方访问一个 page 都需要通过 bufferPool.getPage() 方法
  • CataLog: SimpleDb 等全局目录, 包含了tableid 和 table 的映射关系等

lab2 - Operators & Volcano

lab2 主要涉及算子的开发: 也即各种 Operator, 如 seqScan, join, aggregation 等

需要注意的是, SimpleDb 采用了的 process model 是 volcano model, 每个算子都实现了相同的接口 --- OpIterator

  • SeqScan: 顺序扫描表的算子, 需要做一些缓存
  • Join + JoinPredicate: join 算子, 可以自己实现 简单的 nestedLoopJoin, 或者 sortMergeJoin
  • Filter + Predicate: filter 算子, 主要用于 where 后面的条件判断
  • Aggregate: aggregation 算子, 主要用于 sum() 等聚合函数
  • Insert / Delete: 插入/删除算子

关于 Volcano model, 举个例子, 在 lab2 中会更详细的介绍img

lab3 -- Query Optimization

这个实验主要介绍了如何简单的进行数据估算和 join 优化

  • 利用直方图进行谓词预估统计
  • 利用 left-deep-tree 和动态规划算法进行 Join Optimizer
  • 代码量较少

流程图如下:

img

lab4 -- Transaction

实验四要求我们实现基于 2pl 协议的事务, 先来说一下在 simpleDB 中是如何实现事务的:

image-20211213163243849

在SimpleDB中,每个事务都会有一个Transaction对象,我们用TransactionId来唯一标识一个事务,TransactionId在Transaction对象创建时自动获取。事务开始前,会创建一个Transaction对象,trasactionId 会被传入到 sql 执行树的每一个 operator 算子中,加锁时根据加锁页面、锁的类型、加锁的事务id去进行加锁。

比如, 底层的 A, B seqScan 算子, 就会给对应的 page 加读锁.

我们知道, page 是通过 bufferPool.getPage() 来统一获取的, 因此, 加锁的逻辑就在 bufferPool.getPage() 中

具体的方法就是实现一个 lockManager, lockManager 包含每个 page 和其持有其锁的事务的队列

当事务完成时,调用transactionComplete去完成最后的处理。transactionComplete会根据成功还是失败去分别处理,如果成功,会将事务id对应的脏页写到磁盘中,如果失败,会将事务id对应的脏页淘汰出bufferpool并从磁盘中获取原来的数据页。脏页处理完成后,会释放事务id在所有数据页中加的锁。

  • 需要实现一个 LockManager, 跟踪每一个 transaction 持有的锁, 并进行锁管理.
  • 需要实现 LifeTime lock, 也即有限等待策略
  • 需要实现 DeadLock detect, 可以采用超时等待, 也可以通过依赖图进行检查

lab5 -- B+ tree

img

lab5主要是实现B+树索引,主要有查询、插入、删除等功能

  • 查询主要根据B+树的特性去递归查找即可
  • 插入要考虑节点的分裂(节点tuples满的时候)
  • 删除要考虑节点内元素的重新分配(当一个页面比较空,相邻页面比较满的时候),兄弟节点的合并(当相邻两个页面的元素都比较空的时候)

lab6 -- log & rollback & recover

lab6 主要是实现一个 redo log & undo log 日志系统, 使得 simpledb 支持日志回滚和崩溃恢复

总结

总的来说, 实验难度不大, 但是可以让我们快速入门数据库领域, 可以说是顶级的数据库课程了.

You might also like...

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

MapDB: database engine MapDB combines embedded database engine and Java collections. It is free under Apache 2 license. MapDB is flexible and can be u

Jan 1, 2023

ObjectBox is a superfast lightweight database for objects

ObjectBox is a superfast lightweight database for objects

ObjectBox Java (Kotlin, Android) ObjectBox is a superfast object-oriented database with strong relation support. ObjectBox is embedded into your Andro

Dec 30, 2022

Transactional schema-less embedded database used by JetBrains YouTrack and JetBrains Hub.

Transactional schema-less embedded database used by JetBrains YouTrack and JetBrains Hub.

JetBrains Xodus is a transactional schema-less embedded database that is written in Java and Kotlin. It was initially developed for JetBrains YouTrack

Mar 12, 2021

Java implementation of Condensation - a zero-trust distributed database that ensures data ownership and data security

Java implementation of Condensation - a zero-trust distributed database that ensures data ownership and data security

Java implementation of Condensation About Condensation enables to build modern applications while ensuring data ownership and security. It's a one sto

Oct 19, 2022

R2DBC Driver for Oracle Database

About Oracle R2DBC The Oracle R2DBC Driver is a Java library that supports reactive programming with Oracle Database. Oracle R2DBC implements the R2DB

Dec 13, 2022

Bu projede Mernis ile Tc kimlik no doğrulanarak database kayıt simülasyonu gerçekleştirildi.

Bu projede Mernis ile Tc kimlik no doğrulanarak database kayıt simülasyonu gerçekleştirildi.

📌 CoffeShop Proje Hakkında Nitelikli Yazılımcı Geliştirme kampına aittir. Bu projede Mernis ile Tc kimlik no doğrulanarak database kayıt simülasyonu

Dec 13, 2021

blockchain database, cata metadata query

Drill Storage Plugin for IPFS 中文 Contents Introduction Compile Install Configuration Run Introduction Minerva is a storage plugin of Drill that connec

Dec 7, 2022
Comments
  • Applied Refactoring techniques to improve code quality

    Applied Refactoring techniques to improve code quality

    Please review my pull request and if you find it useful then please accept it. it is very important for me if you could do it on or before 27 March, 2022. Thanks in advance.

    Description:

    1. Refactoring name: EXTRACT METHOD Location: src/main/java/simpledb/algorithm/Join • File: src/main/java/simpledb/algorithm/Join/SortMergeJoin.java • Class: SortMergeJoin.java • Method: mergeJoin

    To reduce duplicate code and cyclomatic complexity: Extracted code block from mergeJoin() and created two new methods a)joinTuples() which joins the tuple based on predicate operator(LESS_THAN_OR_EQ or GREATER_THAN_OR_EQ) passed. b)equalsPredicate.

    1. Refactoring name: RENAME VARIABLE Location: src/main/java/simpledb/index • File: src/main/java/simpledb/index/BTreeFile.java • Class: BTreeFile.java Class level field is renamed from f to file for better understanding and readability.

    2. Refactoring name: PULL UP VARIABLE Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/Operator.java • Class: Operator.java • Variable: TupleDesc td;

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/Aggregate.java • Class: Aggregate.java

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/Delete.java • Class: Delete.java

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/Filter.java • Class: Filter.java

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/HashEquiJoin.java • Class: HashEquiJoin.java

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/Insert.java • Class: Insert.java

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/Join.java • Class: Join.java

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/OrderBy.java • Class: OrderBy.java

    Location: src/main/java/simpledb/execution • File: src/main/java/simpledb/execution/Project.java • Class: Project.java

    To remove duplication of code for the variable of type TupleDesc in 8 classes,Pull up variable refactoring is performed and variable is pulled from 8 classes and is kept in the parent class Operator.

    4 . Refactoring name: PUSH DOWN METHOD Location: src/main/java/simpledb/algorithm/Join • File: src/main/java/simpledb/algorithm/Join/JoinStrategy.java • Class: JoinStrategy.java • Method: close()

    Location: src/main/java/simpledb/algorithm/Join • File: src/main/java/simpledb/algorithm/Join/HashJoin.java • Class: HashJoin.java • Method: close()

    Location: src/main/java/simpledb/algorithm/Join • File: src/main/java/simpledb/algorithm/Join/NestedLoopJoin.java • Class: NestedLoopJoin.java • Method: close()

    Location: src/main/java/simpledb/algorithm/Join • File: src/main/java/simpledb/algorithm/Join/SortMergeJoin.java • Class: SortMergeJoin.java • Method: close()

    close() method declared in JoinStrategy was being used in only one child class i.e SortMergeJoin and was not useful in other classes.It was defined with empty body in rest 2 classes. Hence, close() method is moved to subclass SortMergeJoin and removed form parent class i.e Join/JoinStrategy.

    1. Refactoring name: Change bidirectional association to unidirectional association Location: src/main/java/simpledb/optimizer • File: src/main/java/simpledb/optimizer/JoinOptimizer.java • Class: JoinOptimizer.java

    Location: src/main/java/simpledb/optimizer/LogicalPlan.javaa • File: src/main/java/simpledb/optimizer • Class: LogicalPlan.java

    Removed bidirectional asociation between JoinOptimizer and LogicalPlan classes. It reduces depenedency between these two classes since independent classes are easier to maintain.

    Please let me know, in case of any queries.

    opened by damank16 0
Owner
null
Student Result Management System - This is a CLI based software where the Software is capable of maintaining and generating Student's Result at the end of a semester after the teacher's have provided the respective marks.

Student Result Management System This is a CLI based software where the Software is capable of maintaining and generating Student's Result at the end

Abir Bhattacharya 3 Aug 27, 2022
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Prometheus 46.3k Jan 10, 2023
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

About CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time. CrateDB offers the

Crate.io 3.6k Jan 2, 2023
Apache Druid: a high performance real-time analytics database.

Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download Apache Druid Druid is a high performance real-time a

The Apache Software Foundation 12.3k Jan 1, 2023
eXist Native XML Database and Application Platform

eXist-db Native XML Database eXist-db is a high-performance open source native XML database—a NoSQL document database and application platform built e

eXist-db.org 363 Dec 30, 2022
Flyway by Redgate • Database Migrations Made Easy.

Flyway by Redgate Database Migrations Made Easy. Evolve your database schema easily and reliably across all your instances. Simple, focused and powerf

Flyway by Boxfuse 6.9k Jan 9, 2023
MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.

MapDB: database engine MapDB combines embedded database engine and Java collections. It is free under Apache 2 license. MapDB is flexible and can be u

Jan Kotek 4.6k Dec 30, 2022
Realm is a mobile database: a replacement for SQLite & ORMs

Realm is a mobile database that runs directly inside phones, tablets or wearables. This repository holds the source code for the Java version of Realm

Realm 11.4k Jan 5, 2023
Transactional schema-less embedded database used by JetBrains YouTrack and JetBrains Hub.

JetBrains Xodus is a transactional schema-less embedded database that is written in Java and Kotlin. It was initially developed for JetBrains YouTrack

JetBrains 1k Dec 14, 2022
Flyway by Redgate • Database Migrations Made Easy.

Flyway by Redgate Database Migrations Made Easy. Evolve your database schema easily and reliably across all your instances. Simple, focused and powerf

Flyway by Boxfuse 6.9k Jan 5, 2023