dali 内存数据库系统结构 [复制链接]

论坛徽章:: 0

11楼 [报告]

发表于 2003-07-01 22:45 |只看该作者

dali 内存数据库系统结构

2．3 Storage Allocation

This section describes the Dalí storage allocation mechanism. Each database file in Dalí is composed组成[ of segments, which are contiguous page-aligned units of allocation, similar to clusters in a file system. As Figure 3 illustrates, a chunk [大块]is a collection of segments. Recovery characteristics[描述] of memory (transient[临时的], zeroed[为零的], or persistent[持久的]) are specified on a per-chunk basis when a chunk is created. Zeroed memory remains allocated on recovery, but each byte is set to zero. The data in transient memory is no longer allocated on recovery. Users allocate memory within a chunk, but they do not specify a particular segment. Because segments can be arbitrarily large (within the size of the database), equally large objects can be stored contiguously. When allocating space within a chunk, the system returns a standard Dalí pointer to the space, which specifies the offset within the file. The elements shown linking together segments in a chunk are themselves stored in a special chunk used for control information. Storing control information separately from the data reduces the likelihood that it will be corrupted by stray application pointers.
本单讨论DALI的存储分配机制，DALI中每一个数据库文件都是由段组成，段是连续的页面对齐的单元，与文件系统中的簇相似。在图3的图解中，大块是段的集合。当每个大块建立时，内存[临时的，为零的，持久的]的恢复描述就会在每个块的开始处指定。零内存保留用于恢复的内存，但是它是全零的。临时内存中的数据不再出现在恢复内存中。用户在块内分配央存，但他们没有指定一个特定的段。因为段可以任意大[在数据库的最大容量内]。相同的大对象可以连接的保存。当在块内分配空间，系统返回一个标准的DALI指针指向该对象，该指针描述了在文件内的偏移。在块中显示为互相连接的段的元素是自己保存在一个特殊的块用于保存控制信息。把箜制信息与数据信息分开保存减少它被错误的应用程序指针干扰的机会。

Within a chunk, various types of allocators are available as tradeoffs for speed, safety, and size. To avoid excessive overhead for small items, no record of allocated space is retained in any allocator; therefore, the user must keep track of the size of the allocated data. If a record of space is required, the user must specify a layer above the allocator. The currently defined and implemented allocators in Dalí are:

在块中，为了在速度、安全性、尺寸大小的折衷，可以使用不同的内存分配器。为了避免过多的头部，在任何分配器中都没有保存已分配的空间的记录。因此，用户必须跟踪已分配数据的大小。如果需要分配一个记录的空间，用户必须指定在分配器上的层。当前在DALI中已定义和实现的分配器是：

The power-of-two allocator, which assigns storage in buckets of size 2i * m, where m is some minimum item size; and
2的N方分配器，分配器按2i * m桶进行分配。M是最小单元的大小
The coalescing allocator, which allocates exact amounts of space and merges adjacent free space using a free tree.

接合分配器，分配准确的空间并使用自由树接合邻近的自由空间

image015.jpg (17.41 KB, 下载次数: 62)

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

12楼 [报告]

发表于 2003-07-01 22:45 |只看该作者

dali 内存数据库系统结构

3TRANSACTION MANAGEMENT IN DALI

This section describes how Dalí achieves transaction atomicity, isolation, and durability. In Dalí, transaction management is based on principles of multilevel recovery.6 Dalí is one of the few implementations of explicit[外在的, 清楚的, 直率的, (租金等)直接付款的] multilevel recovery reported to date and, to our knowledge, is the only implementation of multilevel recovery for main memory.
这一章讨论DALI怎样达到事务的原子性、一致性和持久性。在DALI中，事务模型基于多层恢复定理。DALI是少数使用清楚的多层恢复机制的数据库，同时它也是唯一实现多层恢复机制的MMDB。

In our recovery scheme, data is logically organized into regions. A region is a unit of physical locking for recovery, not directly related to the concepts of segments or chunks. A region, therefore, can be a tuple or an object that fits in a single segment, or it can be an arbitrary data structure like a list or a tree, possibly comprising an entire chunk. Each region has a single associated lock with exclusive (X) and shared (S) modes. Referred to as the region lock, it guards accesses and updates to the region.

在我们的恢复机制中，数据是逻辑组织到区域中，一个区域是一个有用于恢复的锁的物理单元，与段或块的概念没有直接的关联。因此一个区域，可以是一个tuple或是一个使用一个记录的对象，也可以是一个任意的数据结构，如树或是链表，由整个块组成。每一个区域有个相关的锁，这个锁有排它X和共识S两种模式。可以通过访问区域锁来对区域进行更新。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

13楼 [报告]

发表于 2003-07-01 22:46 |只看该作者

dali 内存数据库系统结构

3．1 Multilevel Recovery
Multilevel recovery provides recovery support for enhanced concurrency based on the semantics[语义] of operations. It allows weaker operation locks to be used in place of stronger shared/exclusive region locks.
多层恢复为增加的并发性提供操作语义上的恢复支持。它允许在使用强的在会谈中锁/排它锁的地方使用弱的操作锁代替。

A common example of the need for multilevel recovery occurs in index management, where holding region locks until transaction commit leads to unacceptably low levels of concurrency. If undo logging (explained later, in the section titled "Logging Model"

has been performed physicallysuch as recording exactly which bytes were modified to insert a key into the indexthen the transaction management system must ensure that these physical undo descriptions are valid until transaction commit. Because the descriptions refer to specific updates at specific positions, this typically implies that the region locks on updated index nodes are retained to ensure correct recovery, even though they are no longer needed for correct concurrent access to the index.
要使用多层恢复的常见例子是索引管理器，在这里保持区域锁直到事务完成，这导致不可接受的低层次的并发性。如果UNDO日志（在后面的"Logging Model"会有解释）记录已以被物理的执行的日志，如准确到为了插入索引记录哪个字节被修改，那么事务管理系统必须保证这些物理UNDO描述在事务在更新索引提交前是可以变长的。因为描述指定了对特定位置的特定更新，这意味着在更新索引节点时的区域锁为了正确恢复必须被保留，尽管它们在正确的并发性访问不不再需要。

The multilevel recovery approach replaces these low-level physical-undo log records with higher-level logical-undo log records, which contain undo descriptions at the level of operations. For an insert operation, a physical-undo record would be replaced by a logical-undo record to indicate that the inserted key must be deleted. Once this replacement is made, the region locks can be released and replaced by less restrictive operation locks. For example, the region locks can be released on nodes involved in an insert, whereas an operation lock is retained on the newly inserted key, to prevent the key from being accessed or deleted.

多层UNDO系统使用高层的逻辑ONDO日志记录代替物理UNDO日志记录，逻辑UNDO日志记录保存当前层次的UNDO描述。在INSERT操作中，物理UNDO记录将用该记录必须被删除的逻辑UNDO记录代替。一旦这个代替完成，区域锁次被释放并用更少约束的操作锁代替。例如，在一个包括插入的操作中区域锁可以被释放，但是一个包括新插入的键的操作锁将被保持，用于防止这个键被访问或是删除。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

14楼 [报告]

发表于 2003-07-01 22:47 |只看该作者

dali 内存数据库系统结构

3．1 Multilevel Recovery
Multilevel recovery provides recovery support for enhanced concurrency based on the semantics[语义] of operations. It allows weaker operation locks to be used in place of stronger shared/exclusive region locks.
多层恢复为增加的并发性提供操作语义上的恢复支持。它允许在使用强的在会谈中锁/排它锁的地方使用弱的操作锁代替。

A common example of the need for multilevel recovery occurs in index management, where holding region locks until transaction commit leads to unacceptably low levels of concurrency. If undo logging (explained later, in the section titled "Logging Model"

has been performed physicallysuch as recording exactly which bytes were modified to insert a key into the indexthen the transaction management system must ensure that these physical undo descriptions are valid until transaction commit. Because the descriptions refer to specific updates at specific positions, this typically implies that the region locks on updated index nodes are retained to ensure correct recovery, even though they are no longer needed for correct concurrent access to the index.
要使用多层恢复的常见例子是索引管理器，在这里保持区域锁直到事务完成，这导致不可接受的低层次的并发性。如果UNDO日志（在后面的"Logging Model"会有解释）记录已以被物理的执行的日志，如准确到为了插入索引记录哪个字节被修改，那么事务管理系统必须保证这些物理UNDO描述在事务在更新索引提交前是可以变长的。因为描述指定了对特定位置的特定更新，这意味着在更新索引节点时的区域锁为了正确恢复必须被保留，尽管它们在正确的并发性访问不不再需要。

The multilevel recovery approach replaces these low-level physical-undo log records with higher-level logical-undo log records, which contain undo descriptions at the level of operations. For an insert operation, a physical-undo record would be replaced by a logical-undo record to indicate that the inserted key must be deleted. Once this replacement is made, the region locks can be released and replaced by less restrictive operation locks. For example, the region locks can be released on nodes involved in an insert, whereas an operation lock is retained on the newly inserted key, to prevent the key from being accessed or deleted.

多层UNDO系统使用高层的逻辑ONDO日志记录代替物理UNDO日志记录，逻辑UNDO日志记录保存当前层次的UNDO描述。在INSERT操作中，物理UNDO记录将用该记录必须被删除的逻辑UNDO记录代替。一旦这个代替完成，区域锁次被释放并用更少约束的操作锁代替。例如，在一个包括插入的操作中区域锁可以被释放，但是一个包括新插入的键的操作锁将被保持，用于防止这个键被访问或是删除。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

15楼 [报告]

发表于 2003-07-01 22:47 |只看该作者

dali 内存数据库系统结构

3．2 System Overview
Figure 4 presents an overview of the structures used for recovery. The database is mapped into the address space of each process, as described earlier in "Architecture." (The database here represents a single database file. In fact, different database files can be checkpointed at different times, and transactions can span database files arbitrarily. The generalization[一般化] for multiple database files is straightforward, but it is omitted[忽略] for clarity[透明] and space.) Two checkpoint images of the database, Ckpt_A and Ckpt_B , reside on disk. Also stored on disk are cur_ckpt , an "anchor" pointing to the most recent valid checkpoint image for the database, as well as a single system log containing redo information, with its tail in memory. The variable end_of_stable_log stores a pointer in the system log to indicate that all records before the pointerin both time and position in the loghave been flushed to the stable system log.
图4介绍了恢复中使用的结构。正如早期讨论的那样数据库被映射到每个进程的地址空间，（这里的数据库表示单个数据文件，事实上，不同的数据库文件可以在不同时候产生检查点，事务可以任意跨越数据库文件。多数据库文件的一般化是直接的，但是它忽略了透明性和空间）数据库的两个检查点映象Ckpt_A and Ckpt_B在磁盘上。同样保存在磁盘上的是当前检查点cur_ckpt,一个锚点，记录当前数据库的不同检查点，同样也记录了保持有REDO信息的单个系统信息的系统日志。不同的end_of_stable_log保存不同的指针，这些指针指向的位置表示该位置前的所有记录已被写到磁盘上的系统日志同一时间和日志位置中。

A single active transaction table (ATT), stored in the system database, contains separate redo and undo logs for each active transaction. A dirty page table, dpt , is maintained for the database (also in the system database); it records the pages that have been updated since the last checkpoint. The ATT (with undo logs) and the dirty page table are also stored with each checkpoint. The dirty page table in a checkpoint is referred to as ckpt_dpt .

单个活动事务表(ATT)，保存在系统数据库中，保存着每个事务独立的REDO和UNDO日志。脏页表dpt保存在数据库中（它同样出现在系统数据库中）。它记录自从上次检查点后被更新的页。ATT（带有UNDO日志）和脏页表同样保存在每个检查点。脏页表中的检查点被ckpt_dpt引用。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

16楼 [报告]

发表于 2003-07-01 22:48 |只看该作者

dali 内存数据库系统结构

3．3 Transactions and Operations

In our model, transactions consist of a sequence of operations. We assume, as did Lomet,7 that each operation has a level Li associated with it. An operation at level Li can consist of a sequence of operations at level Li-1. Transactions, assumed to be at level Ln, call operations at level Ln-1. Physical updates to regions are level L0 operations. For transactions, we distinguish between pre-commit and commit. When a transaction pre-commits, the commit record enters the system log in memory, establishing a point in the serialization order. Transactions commit when the commit record reaches the stable log (on disk). We use the same terminology for operations, where only the pre-commit point is meaningful, although this is sometimes referred to as "operation commit" in this paper.
在我们的模型中，事务认为一一系列的操作。我们假定，如Lomet,7 中一样，每个操作有与它相连的Li 层。在Li 层的操作可以看成是一系列的Li-1 层的操作。事务假定在Ln 层，它调用Ln-1 层的操作。区域的物理更新是L0 层的操作。对于事务来说，我们的区别是未提交和提交。当一个事务未提交，已提交的记录保存在内存中的系统日志中，并按连接的顺序分配一个指针。当已提交的记录写到可靠的日志时（在磁盘上）事务提交。我们对操作也的术语。仅未提交的指针是有意义的，虽然在这篇论文中有时引用“操作提交“。

A transaction obtains an operation lock before executing an operation (the lock is granted to the transaction if it commutes交换 with other operation locks held by active transactions). L0 operations are required to obtain[获得] region locks. The locks on the region are released as soon as the L1 operation pre-commits; in general, an operation lock at level Li is held until the transaction or the containing operation (at level Li+1) pre-commits. All the locks acquired by a transaction are released as soon as the transaction pre-commits. A deadlock is handled by checking for a cycle in the wait-for graph after a transaction has waited a certain amount of time for a lock. The multilevel variation[变化] of two-phase locking described above guarantees[保证] isolation[隔离] of a transaction (operation) from other transactions (operations).

在执行一个操作前一个事务维护一个操作锁（如果锁与其它活动事务中拥有的操作锁交换，那么它也被事务承认）。L0操作要获得操作锁。区域锁在在L1层操作预提交时被释放。通常，当事务或是包含该操作的操作（在Li+1层）预提交时在L1层的操作锁会被释放。当事务预提交时事务拥有的所有锁都会被释放。如果一个事务等待锁足够的时间后仍然没有得到那么这个锁当成死锁，被定时锁检查进程处理。以上讨论的两阶段上锁的多层模型变化保证一个事务/操作与其它事务/操作隔离。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

17楼 [报告]

发表于 2003-07-01 22:48 |只看该作者

dali 内存数据库系统结构

3．4 Logging Model

Dalí uses logging to implement the atomicity and durability of transactions. Undo logs ensure that incomplete transactions (operations) can be rolled back in case the transaction (operation) is aborted, while redo logs ensure durability in the face of system failure. The recovery algorithm maintains separate undo and redo logs in memory for each transaction. These are stored as linked lists off the entry for the transaction in the ATT. Each update (to a part of a region) generates physical undo and redo log records that are appended to the transaction's undo and redo logs, respectively.
DALI使用日志实现原子性和持久性。UNDO日志确定未完成的事务在事务退出时可以被回滚，REDO日志保证在系统错误时可以保证数据的持久性。恢复机制为每个事务保在内存中持着独立的UNDO和REDO日志，它们作为链表，保存在ATT表的事务入点处。每次更新（对区域的一部分）产生物理的UNDO和REDO日志，并分别被添加到事务的UNDO和REDO日志。

When a transaction/operation pre-commits, all the log records in the transaction's redo log are appended to the system log, and the logical-undo description for the operation is included in the operation commit log record within the system log. With the exception of these logical-undo descriptors, only redo records are written to the system log during normal processing. When an operation pre-commits, two things happen: the undo log records for its suboperations/updates are deleted (from the transaction's undo log), and a logical-undo log record containing the undo description for the operation is appended to the transaction's undo log. Undo logs of transactions already in memory that have pre-committed are deleted, because they are not required again. Locks acquired by an operation/transaction are released as soon as they pre-commit.
当一个事务/操作预提交时，事务REDO日志中所有日志记录被添加到系统日志中，操作的逻辑UNDO描述被包括在操作的提交日志中，并包含在系统日志中。除了这些逻辑UNDO日志，在普通的进程中仅REDO记录被写到磁盘上。当一个进程预提交时会发生两件事情：子操作/更新的UNDO日志从事务UNDO日志中删除，描述该操作的逻辑UNDO日志添加到事务的UNDO日志。已预提交的事务中已在内存中的UNDO日志被删除，因为它们已不再需要。操作/事务申请的锁在预提交时也会被释放。

When a transaction decides to commit, the system log is flushed to disk, ensuring the durability of committed transactions. The flushing procedure marks as dirty those pages updated by a redo log record written to disk in the dirty page table, dpt . In our recovery scheme, update actions do not obtain latches on pages; instead, region locks ensure that updates do not interfere with each other.

当一个事务决定提交时，系统日志被刷新到磁盘上，保证已提交事务的持久性。脏页表中被写到磁盘上的REDO日志更新的页面被刷新进程标志为脏页面。在我们的恢复机制中，更新操作不包括页面上的锁，相反，区域锁保证更新没有相互干扰。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

18楼 [报告]

发表于 2003-07-01 22:48 |只看该作者

dali 内存数据库系统结构

3．5 Ping-pong Checkpointing

Checkpointing ensures that only a final portion of the log is needed for recovery. To implement checkpointing in Dalí, two copies of the database image are stored on disk, and alternate checkpoints write dirty pages to alternate copies. This strategy（策略） is called ping-pong checkpointing (see, for example, Salem and Garcia-Molina3). The ping-pong checkpointing strategy permits a checkpoint being created to be temporarily inconsistent, that is, updates may have been written out before corresponding undo records have been written. After dirty pages are written out, however, a sufficient amount of redo and undo log information is written out to bring the checkpoint to a consistent state. If a failure occurs while one checkpoint is being created, the other checkpoint is still consistent and can be used for recovery.
检查点确认只有最后一部分日志需要进行恢复。为了在Dalí中实现检查点，数据库的两个影像被保存在磁盘上。这个策略叫做乒乓检查点（例如Salem and Garcia-Molina3）。乒乓检查点允许检查点在不一致的情况下临时创建。就是说，在相应的UNDO日志写出前更新可以被写出，然而在脏页表写出后，足够的REDO和UNDO日志信息被写到磁盘上，保证检查处于一个一致的状态。当在检查点创建的过程中产生系统失败，那么其它检查点仍然处于一致恹并可以用于恢复。

Before writing any dirty data to disk, the checkpoint notes the current end of the stable log in the variable end_of_stable_log , which will be stored with the checkpoint. This variable indicates where to start scanning the system log to recover from a crash. Next, the contents of the (in-memory) ckpt_dpt are set to those of the dpt , and the dpt is zeroed (noting of end_of_stable_log and zeroing of dpt are performed atomically with respect to flushing). The pages written out are those that were either dirty in the ckpt_dpt of the last completed checkpoint, or dirty in the current (in-memory) ckpt_dpt , or both. In other words, all pages that were modified since the current checkpoint image was last written, namely, pages that were dirtied since the last-but-one checkpoint, are written out. This ensures that updates described by log records preceding the current checkpoint's end_of_stable_log are included in the database image in the current checkpoint.
在写任何脏数据到磁盘前，检查点用变量end_of_stable_log标志当胶可靠日志的结束，这与检查点一起保存。这个变量表明在系统崩溃后恢复时应该从什么地方开始扫描日志，然后内存中的ckpt_dpt内容被设置成dpt内容，dpt内容被清0（在刷新到磁盘上时标志end_of_stable_log和把dpt清0是一个原子操作）。被写到磁盘上的页面是那些自从上次检查点后在ckpt_dpt中为脏的页面，或是在当前内存中ckpt_dpt中为脏的页面，或两者都是。换句话说，从当前检查点后，在上一次写回磁盘后，所有被修改的页面被写回磁盘。这确认日志记录中的更新在当前检查点的end_of_stable_log前，被当前检查点的数据库映象包括。

Checkpoints write out dirty pages without obtaining any latches and do not, therefore, interfere with normal operations. This fuzzy checkpointing is possible because physical-redo log records are generated by all updates; these are used during restart recovery and their effects are idempotent[幂等的]. After the database image has been written, undo log records are written out to disk for any uncommitted update whose effects have made it to the checkpoint image. This is performed by checkpointing the ATT after checkpointing the data; the checkpoint of the ATT writes out undo log records, as well as some other status information.
检查点写出的脏页表不包含任何锁，因此不干扰当前任何操作。因为物理REDO日志被所有的更新产生，它们在重起后的恢复中被使用，他们的影响是幂等的，所以这种模糊检查点是可行的。在数据库映象被写出后，其它未提交的更新的UNDO日志记录被写出，因为它们已影响了检查点映象。在对数据执行检查点完后在对ATT执行检查点时执行，对ATT执行检查点时写出UNDO日志记录，同时也会写出一些其它的状态信息。

After checkpointing is performed, a log flush must be done before declaring the checkpoint completed (and consistent) by toggling cur_ckpt to point to the new checkpoint. Undo logs are deleted on transaction or operation pre-commit, which may happen before the checkpoint of the ATT. If the checkpoint completes but the system fails before a log flush, the checkpoint may contain uncommitted updates for which no undo information exists. The log flush ensures that the transaction or operation has committed, so the updates will not have to be undone (except perhaps by a compensating operation, for which undo information will be present in the log).

在检查点执行完后，日志刷新必须在定义检查点完成前完成，（并一致性）通过把cur_ckpt指向新检查点。UNDO日志在事务或操作预提交时被删除，这可能出现在对ATT执行检查点前。如果检查点完成但是系统在刷新日志前崩溃，检查点可能会包含没有UNDO信息的未完成的更新，日志刷新确认更务或更新已提交，因此更新不必进行UNDO（除了对那些日志中包含UNDO信息的记录对可能通过补偿操作）

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

19楼 [报告]

发表于 2003-07-01 22:49 |只看该作者

dali 内存数据库系统结构

3．6 Abort Processing

When a transaction abortsthat is, does not successfully complete executionupdates/operations described by log records in the transaction's undo log are undone by traversing the undo log sequentially from the end. The transaction is aborted by executing, in reverse order, every undo record just as if the execution were part of the transaction. Correct abort processing, even in the face of system failure, ensures the atomicity of transactions in Dalí.
当一个事务退出，就是说没有成功完成，事务UNDO日志记录中的更新/操作描述被从尾到头的顺序执行。每个UNDO记录的执行看成是事务的一部分。纠正退出进程，甚至在系统崩溃后也一样，这样来保证DALI的事务原子性。

Following the philosophy of repeating history, 6 new physical-redo log records are created for each physical-undo record encountered during the abort. Similarly, for each logical-undo record encountered, a new "compensation" or "proxy" operation is executed based on the undo description. Log records for updates performed by the operation are generated as they are during normal processing. Furthermore, when the proxy operation commits, the aborting transaction deletes all the proxy operation's undo log records and the logical-undo record for the operation that was undone. The commit record for the proxy operation serves a purpose similar to that served by compensation[补偿] log records (CLRs) in ARIES.6 During restart recovery, when the commit record for the proxy operation is encountered, the logical-undo log record for the operation that was undone is deleted from the transaction's undo log, preventing it from being undone again.

与哲学上的重复历史一样，在退出过程中，新的物理UNDO日志为每个物理UNDO日志创建，同样，每遇到一个逻辑UNDO记录，一个新的补偿或是代理操作将基于UNDO描述执行。更新过程中执行的日志记录和正常操作一样被产生，进一步的说，当代理操作完成后，退出事务删除所有代理操作的日志和已完成的逻辑UNDO日志。当代理操作提交日志服务的目的与在ARIES.6 中的补偿日志服务一样。在重起恢复机制在，当遇到代理操作提交记录时，已UNDO完毕的操作的逻辑UNDO记录被从事务UNDO日志中删除，阻止它被再次UNDO。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

无双

腰缠万贯

论坛徽章:: 0

20楼 [报告]

发表于 2003-07-01 22:49 |只看该作者

dali 内存数据库系统结构

3．7 Recovery

Recovery refers to the actions taken after a system fails to return the database to a transaction-consistent state, which reflects exactly the effects of those transactions that have committed. Recovery uses the stable log, which has been flushed to disk, as well as information stored in the last checkpoint. A checkpoint is taken when the database is created, and others may be taken later, so recovery is described based on the last completed checkpoint.
恢复用于系统崩溃后把数据库带到事务一致的状态，这将准确的反应已提交的事务。恢复使用可靠的日志，这些日志是磁盘上的，保存有最近一次检查点的信息。当数据库创建时产生一个检查点，其它的将会在后面创建，因此恢复根据最后的完整检查点。

As part of the checkpoint operation, the end-of-the-system log on disk is noted before the database image is checkpointed; this becomes the "begin-recovery point" for this checkpoint as soon as the checkpoint has completed. All updates described by log records preceding this point are guaranteed to be reflected in the checkpointed database image. After initializing the ATT and transaction undo logs with the copy of the ATT and undo logs stored in the most recent checkpoint, restart recovery loads the database image and sets dpt to zero. It then "rolls forward" the portion of the system log following the begin-recovery point by applying all redo log records to the checkpoint image. (Appropriate pages in dpt are set to dirty for each log record.) During the application of redo log records, any actions necessary are taken to keep the checkpointed image of the ATT consistent with the log applied so far. These actions mirror those taken during normal processing. For example, when an operation commit log record is encountered, lower-level log records in the transaction's undo log for the operation are replaced by a higher-level undo description.
作为检查点操作的一部分，磁盘上的end-of-the-system日志标识在数据库映象被执行检查点前。在检查点操作完成后这些点将成为“开始恢复点“。在这点前日志记录中所有更新的描述保证反应被检查的数据库映象。在初始化ATT表，并用保存在ATT和多数最近检查点中的UNDO日志进行UNDO后，重起恢复过程，并把dpt清0。然后，通过向检查点映象应用日志中所有REDO日志记录把系统日志位置前滚到开始恢复点。（对每个记录dpt中对应页会变脏）。在应用REDO日志记录过程中，任何必须的动作被用于保证ATT的检查点映象与已应用的日志相一致。这些动作映象在普通进程中进行，如，当一个操作提交日志记录时，ATT事务表中的低层UNDO日志记录被高层的描述所代替。

As soon as all the redo log records have been applied, the active transactions are rolled back. To do this, all completed operations that have been invoked directly by the transaction or have been directly invoked by an incomplete operation have to be rolled back. However, the order in which operations of different transactions are rolled back is very important, to ensure that an undo at level Li sees data structures that are consistent.7 First, all operations (across all transactions) at L0 that must be rolled back are rolled back, followed by all operations at level L1, then L2, and so on.

当所有的REDO日志完成后，活动事务被回滚。为了实现，所有直接被事务调用的或被未完成操作调用的已完成的操作，必须回滚。然而，为了保证在Li层的UNDO看到的数据结构是一致的，不同事务中操作的回滚顺序很是重要。首先，在L0层所有要回滚的操作（穿越所有事务）被回滚，然后是在L1,层的操作，然后是L2,层，依此类推。