Database Note - C10 Storage and File Structure

Table of Contents

1 Overview of Physical Storage Media

TYPE DESCRIPTION
primary storage fastest storage media, eg. cache, main memory
second/online storage eg. magnetic disks
tertiary/offline storage slowest storage media, eg. magnetic type
volatile storage lose contents when power off
nonvolatile storage keep contents when power off

F10.1.png

2 Magnetic Disk and Flash storage

2.1 Disk Storage

F10.2.png

TERM DESCRIPTION
track logically divided disk surface
sector smallest unit of information read/write from/to disk
block logical unit consisting of a fixed number of contiguous sectors
page in disk context, refer to block

Two type of requests for blocks from disk:

  • sequential access: fast
  • random access: slow (require frequently seek to track)

Some techniques to improve the speed of access to blocks:

  • Buffering
  • Read-ahead
  • Scheduling
  • File organization
  • Nonvolatile write buffers
  • Log disk

2.2 Flash Storage

Two type of flash:

  • NOR flash: fast but expensive, random access
  • NAND flash: slow but cheap, read entire page

3 Raid

SKIP

4 File Organization

Two principle:

  • No record is larger than a block
  • Each record is entirely contained in a single block

4.1 Fixed-Length Records

For example, a file of instructor records for university database:

type instructor = record
                    ID varchar(5);
                    name varchar(20);
                    dept_name varchar(20);
                    salary numeric(8,2);
                  end

If each character occupies 1 byte and numeric(8,2) occupies 8 bytes. We can allocate \( 5+20+20+8 = 53 \) bytes for every entries of instructor.

This method has two short comings:

  • Unless the block size happens to be a multiple of 53, part of the record will store across two blocks.
  • Difficult to delete a record.

Improvement:

  • Allocate only as many records to a block as would fit entirely in the block. For example, assume block size is 1000 bytes, we can store \( \lfloor blockSize / maxRecordSize \rfloor = \lfloor 1000/53 \rfloor = 18 \) entries in one block.
  • Instead of deleting one entry and move all following entries, moving the last entry to the deleted entry's position.
    Or just let the space empty, and reuse this space when new entry insertion.
  • Use file header (metadata?): Allocate certain number of bytes as a file header containing a variety of infomation about the file.
  • Use freelist: use pointers to form a linked list of deleted records.

F10.7.png

4.2 Variable-Length Records

Some causes that will result in variable-length records:

  • Storage of multiple record types in a file.
  • Record types that allow variable lengths for one or more fields.
  • Record types that allow repeating fields, such as arrays or multisets.

Records with variable-length attributes has two part: fixed-length attributes and variable-length attributes. So we can seperate them, use a pair (offset, length), where offset denotes where the data for that attributes begins within the record, and length is the length in bytes of the variable-sized attribute.

For example, an instructor entry, whose first three attributes ID, name, dept_name are variable-length strings, and fourth attribute salary is a fixed-length number. If the (offset, length) pair occupies 4 bytes, salary occupies 8 bytes.

F10.8.png

Use slotted-page structure to store variable-length record in a block. Allocating a header at the beginning of each block, containing:

  • Number of record entries in the header
  • The end of free space in the block
  • An array of locations and sizes of records

F10.9.png

Block headers are allocated contiguously from left to right, and records allocated from right to left. Therefore, the space between the last block header and the start of the first record block is free space.

Insertion
Allocate space at the end of free space, and an entry containing its size and location is added to the header.
Deletion
the space that it occupies is freed, and its entry is set to deleted (eg. set its size to -1). The records in the block before (left to) the deleted record are moved, and their headers are updated too, so all free space is again between the final entry in the header array and the first record. Finally, update the end-of-free-space pointer in the header.

5 Organization of Records in Files

Several possible ways of organizing records in files:

  • Heap file organization
  • Sequential file organization
  • Hashing file organization

5.1 Sequential File Organization

F10.10.png

Efficient in processing sorted records. Define a search key, then use pointers to chain entries together. This organization enable records to be read in sorted order, but it is difficult to insert and delete an entry.

F10.11.png

Created: 2019-02-19 Tue 20:14