Chapter 1: A Tour to Computer Systems

In a sense, the goal of this book is to help you understand what happens and why when you run hello-world on your system.

1.1 Information Is Bits + Context

All information in a system is represented as a bunch of bits.
The only thing that distinguishes different data objects is the context in which we view them.


1.2 Programs Are Translated by Other Programs into Different Forms

CSAPP-Figure-1.3
Figure 1.3: The compilation system.

The programs that perform the four phases (preprocessor, compiler, assembler, and linker) are known collectively as the compilation system.


1.3 It pays to Understand How Compilation Systems Work

  • Optimizing program performance.
  • Understanding link-time errors.
  • Avoiding security holes.

1.4 Processors Read and Interpret Instructions Stored in Memory.

1.4.1 Hardware Organization of a System

CSAPP-Figure-1.4
Figure 1.4: Hardware organization of a typical system.
CPU: central processing unit, ALU: arithmetic/logic unit, PC: program counter, USB: Universal Serial Bus.

Buses

Running throughout the system is a collection of electrical conduits called buses that carry bytes of information back and forth between the components.

I/O Devices

Input/output (I/O) devices are the system’s connection to the external world.
Each I/O device is connected to the I/O bus by either a controller or adapter.

Main Memory

The main memory is a temporary storage device that both a program and the data it manipulates while the processor is executing the program.
Physically, main memory consists of a collection of dynamic random access memory (DRAM) chips.

Processor

The central processing unit (CPU), or simply processor, is the engine that interprets (or executes) instructions stored in main memory.
We can distinguish the processor’s instruction set architecture, describing the effect of each machine-code instruction, from its microarchitecture, describing how the processor is actually implemented.

1.4.2 Running the hello Program


1.5 Caches Matter

From a programmer’s perspective, much of this copying is overhead that slows down the “real work” of the program.
To deal with the processor-memory gap, system designers include smaller, faster storage devices called cache memories (or simply caches) that serve as temporary staging areas for information that the processor is likely to need in the near future.
The idea behind caching is that a system can get the effect of both a very memory and a very fast one by exploiting locality, the tendency for programs to access data and code in localized regions.


1.6 Storage Devices Form a Hierarchy

CSAPP-Figure-1.9
Figure 1.9: An example of a memory hierarchy.

The main idea of a memory hierarchy is that storage at one level serves as a cache for storage at the next lower level.


1.7 The Operating System Manages the Hardware

CSAPP-Figure-1.11
Figure 1.11: Abstractions provided by an operating system.

1.7.1 Processes

A process is the operating system’s abstraction for a running program.
Multiple processes can run concurrently on the same system, and each process appears to have exclusive use of the hardware. By concurrently, we mean that the instructions of one process are interleaved with the instructions of another process.
The operating system performs this interleaving with a mechanism known as context switching.
The operating system keeps track of all the state information that the process needs in order to run. This state, which is known as the context, includes information such as the current values of the PC, the register file, and the contents of main memory.
The transition from one process to another is managed by the operating system kernel.
When an application program requires some action by the operating system, such as to read or write a file, it executes a special system call instruction, transferring control to the kernel. The kernel them performs the requested operation and return back to the application program.
Note that the kernel is not a separate process. Instead, it is a collection of code and data structures that the system uses to manage all the processes.

1.7.2 Threads

Although we normally think of a process as having a single control flow, in modern systems a process can actually consists of multiple execution units, called threads, each running in the context of the process and sharing the same code and global data.

1.7.3 Virtual Memory

Virtual Memory is an abstraction that provides each process with the illusion that it has exclusive use of the main memory.

CSAPP-Figure-1.13
Figure 1.13: Process virtual address space.

For virtual memory to work, the basic idea is to store the contents of a process’s virtual memory on disk and then use the main memory as a cache for the disk.

1.7.4 Files

A file is a sequence of bytes, nothing more and nothing less.
Every I/O device, including disks, keyboards, displays, and even networks, is modeled as a file.
All input and output in the system is performed by reading and writing files, using a small set of system calls known as Unix I/O.
It provides applications with a uniform view of all the varied I/O devices.


1.8 System Communicate with Other Systems Using Network

From the point view of an individual system, the network can be viewed as just another I/O device.
When the system copies a sequence of bytes from main memory to the network adapter, the data flow across the network to another machine.


1.9 Important Themes

1.9.1 Amdahl’s Law

Consider a system in which executing some application requires time \(T_{old}\). Suppose some part of the system requires a fraction \(\alpha\) of this time, and that we improve its performance by a factor of k.
The overall execution time would thus be \(T_{new} = (1-\alpha)T_{old} + (\alpha T_{old})/k\)
Even though we made a substantial improvement to a major part of the system, our net speedup was significantly less than the speedup for the one part.
This is the major insight of Amdahl’s law——to significantly speed up the entire system, we must improve the speed up of a very large fraction of the overall system.

1.9.2 Concurrency and Parallelism

We use the term concurrency to refer to the general concept of a system with multiple, simultaneous activities, and the term parallelism to refer to the use of concurrency to make a system run faster.
Parallelism can be exploited at multiple levels of abstraction in a computer system. We highlight three levels here, working from the highest to the lowest level in the system hierarchy.

Thread-Level Concurrency

Building on the process abstraction, we are able devise systems where multiple programs execute at the same time, leading to concurrency. With threads, we can even have multiple control flows executing within a single process.
Traditionally, this concurrent execution was only simulated, by having a single computer rapidly switch among its executing processes.
Compared to uniprocessor system, when we construct a system consiting of multiple processors all under the control of a single operating system kernel, we have a multiprocessor system.
Multi-core processors have several CPUs, each with its own L1 and L2 caches, and with each L1 cache split into two parts——one to hold recently fetched instructions and one to hold data. The cores share higher levels of caches as well as the interface to main memory.
Hyperthreading, sometimes called simultaneous multi-threading, is a technique that allows a single CPU to execute multiple flows of control. It involves having multiple copies of some of the CPU hardware, such as program counters and register files, while having only single copies of other parts of the hardware.
Whereas a conventional processor requires around 20,000 clock cycles to shift between different threads, a hyperthreaded processor decides which of its threads to execute on a cycle-by-cycle basis.
The use of multiprocessing can reduces the need to simulate concurrency when performing multiple tasks and can run a single application program faster (but only if that program is expressed in terms of multiple threads that can effectively execute in parallel).

Instruction-Level Parallelism

At a much lower level of abstraction, modern processor can execute multiple instructions at one time.
With pipelining, the actions required to execute an instruction are partitioned into different steps and the processor hardware is organized as a series of stages, each performing one of these steps. The stages can operate in parallel, working on different parts of different instructions.
Processors that can sustain execution rates faster than 1 instruction per cycle are known as superscalar processors.

Single-Instruction, Multiple-Data (SIMD) Parallelism

At the lowest level, many modern processors have special hardware that allows a single instruction to cause multiple operations to be performed in parallel.

1.9.3 The Importance of Abstractions in Computer Systems

CSAPP-Figure-1.18
Figure 1.18: Some abstractions provided by a computer system. A major theme in computer systems is to provide abstract representation at different levels to hide the complexity of the actual implementations.

On the processor side, the instruction set architecture provides an abstraction of the actual processor hardware.
With this abstraction, a machine behaves as if it were executed on a processor that perform just one instruction at a time. The underlying hardware is far more elaborate, executing multiple instructions in parallel, but always in a way that is consistent with the simple, sequential model.
By keeping the same execution model, different implementations can execute the same machine code while offering a range of cost and performance.
On the operating system side, we have introduced three abstractions: files, virtual memory, and processes.
To these abstractions, we add a new one: the virtual machine, providing an abstraction of the entire computer, including the operating system, the processor, and the programs.


1.10 Summary

A computer system consists of hardware and system software that cooperate to run application programs.