Everything About Computer Architecture: A complex instruction set computer has much-specialized education and some are only rarely used in programs. Basic platforms increased and gave way to the rise of software engineering Computer architecture is the science or set of rules about computers’ hardware and software components, about how they would join together and are configured to interact and make the computer work. These rules describe the complete functionality, organization, and implementation of the computer systems.
Computer Architecture includes system design, instruction set architecture, and microarchitecture.
It is the process of defining the architecture, modules, interfaces, and data for a system to satisfy the specified requirements. This could be imagined as the application of systems theory to product development. This system design’s basic study is to understand the parts and their interactions with each other. Systems design used to have a vital role in the technical industry. After the standardization of computer hardware and software, the importance of software running on basic platforms increased and gave way to the rise of software engineering.
System Design further is divided into subparts:
- Architectural Design: This emphasizes the design of system architecture, which consists of the view, behavior, and structure of the system.
- Logical Design: This deals with the abstract representation of data flow and the system’s input/output. Logical design is done by making the abstract model of the system and analyzing it through different perspectives.
- Physical Design: This design is about the actual input and output processes of the system. It consists of how the data gets input into the system, analyzed, stored, verified, displayed, and processed. It states the input requirements, output requirements, storage requirements, processing requirements, system control, and backup recovery. Physical design can further be divided into:
- User interface Design: Concerned with how the user will add information to the system and how the system will present the information back to the user.
- Data Design: Concerned with how the data is represented and stored within the system.
- Process Design: Concerned with the movement of data, where and how of the validation process during its flow through and out of the system.
Instruction Set Architecture
Instruction Set Architecture (ISA) is an abstract model of a computer. The implementation of ISA is the Central Processing Unit (CPU). ISA generally defines the data types, registers, hardware for memory management support, fundamental features like memory consistency, addressing modes, virtual memory, and the computer’s input/output model. The ISA shows the machine code runs on a specific ISA implementation and does not depend on the implementation characteristics. This provides binary compatibility. This means that there can be a different implementation of ISA, which will differ in cost, power, and performance, and they can be upgraded without the need to replace the whole software. This concept of ISA led to the future development of microarchitecture. If an operating system is compatible with an ISA application binary surface, then the machine code of that ISA and the same OS will be able to run on the newer implementations of the ISA, and the newer implementation will also be able to support newer versions of the OS.
We can classify ISA in different ways, and a common classification criterion is an architectural complexity. A complex instruction set computer has much-specialized education, and some are only rarely used in programs. Reduced instruction set computer makes the processor work simply by only implementing those frequently used instructions in the plans. In contrast, not so frequent instructions are implemented as subroutines. Other ISA types seek to use the instruction-level parallelism with less hardware by using a compiler for instruction scheduling. Examples include Very long instruction words, long instruction words, and explicitly parallel instruction computing.
Machine instructions are made up of discrete statements, and they may specify:
o Particular registers – for arithmetic, addressing and control functions
o Particular memory locations
o Particular addressing modes – to interpret the operands
Different instruction types are corresponding to different types of operations:
- Data handling and memory operations
o Set register to fixed constant value
o Copy data from memory location to register and from register to memory location
o Read and write data functions from hardware devices
Arithmetic and logic operations
o Addition, subtraction, multiplication, division of the values of two registers, and placing the result in a register
o Bitwise operations
o Comparison of two values in registers
o Floating point instructions for arithmetic or floating point numbers
Control flow operations
o Branching – going to different location in a program and execute instructions there
o Conditionally branching – go to a different only if certain conditions results comes true
o Indirectly branching
o Call – executing a block of code and save the location for next code execution
o Loading and storing of data from coprocessor and exchanging with CPU registers
o Performing the coprocessor operations
Complex Instructions: One complex instruction may take up many instructions on other computers. These instructions can take multiple steps, control multiple functional units, or show up as a bulk of simple instructions. Examples of complex instructions are:
o Transfer of multiple registers between memories at once
o Moving out of large blocks of memory
o Complicated integer and floating point arithmetic calculations
o SIMD instructions, single instruction multiple data, operation of many homogeneous values parallel
o Performing of atomic test-and-set instruction or read-modify-write instruction
o Instructions that perform ALU operations with operand from memory instead of register
Instruction Encoding: An instruction includes opcode and operand specifiers. The opcode is the part of machine language instruction, and its work is to specify what type of operation is to be performed. Operand specifiers specify registers, memory location, and literal data. There are also some conditional instructions, and they often have predicate fields. The predicate field specifies the bits that encode specific conditions, which would determine whether the operation is to be performed.
Instruction sets are also categorized by the maximum numbers of operands explicitly specified in the instructions. There are instructions with 0 operands, 1 operand, 2 operands, 3 operands, and more operands. Each instruction has a specific number of operands explicitly. Some of the instructions give some operands implicitly by storing them on top of the stack; therefore, fewer operands are needed to be given explicitly.
Register Pressure: It is the measure of the number of free registers during the time of program execution. The register pressure is said to be high when a large number of registers are in use. Decreasing register pressure often increases the cost, as it will need more number of registers.
Instruction Length: Instruction length varies very much, from four bits in microcontrollers to hundreds of bits in VLIW systems. In some architectures, the instruction length is fixed in size, e.g., RISC, while in other architectures, the instruction length is variable.
Code Density: It is an important part of computer architecture. Memory was costly in the olden days. Therefore there was a need to write programs with a limited size as possible to work even with lower memory capacity. High code density often resulted in very complex instructions for procedure entry, parameterized return value, loop, etc. CISC and RISC were two types of instruction sets. CISC (Complex Instruction set computer) can perform several low-level operations with a single instruction or capable of multi-step operations. They make good use of code density features and make computers operate with less memory.
RISC (Reduced Instruction set computer) uses a small and highly optimized set of instructions. This has a more specialized set of instructions for an operation. This somewhat sacrifices code density by making less optimal use of cache memories by having longer instructions. MISC (Minimal Instruction set computer) uses a set of instructions in a stack so that the multiple instructions can fit into a single word. So, that’s how they take less memory for implementation. Still, the code density of this is almost similar to RISC due to the requirement of more primitive instructions to do a task.
Designing instruction sets is also a very complex issue to be dealt with. There were again the two types of microprocessors designed based on the instruction set. CISC had many different instructions, but after some research, RISC concluded that most of those instructions were not necessary, and only a smaller instruction set can do the work. RISC improved the processor speed, size, power consumption but required greater memory, while CISC used memory very efficiently, which was a major issue to be considered then.
Instruction set implementation: There are a variety of different ways to implement the instruction set. All the different implementation methods provide the same programming model, and the instructions would be able to run the same executable. The difference will be in power consumption, size, efficiency, speed, memory requirement, and cost. Some of the designs use hardwired design and microcode for the control unit. Some use a writable control store that is the instruction set could be compiled to set writable RAM inside the CPU. Also, instruction sets can be run using an interpreter, but this process is slower than running the programs directly. Still, today this is widely used to test the instruction set before implementing its hardware part.
It deals with the way a given set of instructions is implemented in a particular processor. An instruction set architecture may be implemented with different microarchitectures, due to different design goals or due to change in technology. The microarchitecture of a machine is represented as the interconnections of the machine’s different microarchitectural elements, which could be single gates and registers to a complete arithmetic logic unit (ALU). A microarchitecture usually is like a data flow diagram. There are three state buses, unidirectional buses, and individual control lines connecting the different microarchitecture parts like arithmetic and logic units and register files. Three state bus requires three-state buffers for each device that drives the bus; a unidirectional bus is always driven by a single source like the address bus driven by a memory address register. Some of the very simple computers consist only of a single three-state bus. Microarchitectures are represented using the connections between different logic gates, and the logic gates, in turn, are represented by the connections between transistors. New circuit solutions and the development in the semiconductor industry lead to the newer generations of processors that we see.
The pipelined datapath for implementing instruction-level parallelism within a single processor is the most commonly used data path design used in microarchitectures. This allows for multiple instructions to overlap during the execution. Different stages in the instruction execution pipeline like instruction fetch, instruction decode, execute, and write back. Also, there is a stage- memory access in some of the architectures. The processor’s operations or calculations are performed by the execution units, including arithmetic logic units (ALU), floating-point units (FLU), load/store units, branch prediction, and SIMD. The microarchitectural design also decides the size, latency, throughput, and connections between memories within the system. Microarchitecture is also responsible for determining whether to include peripherals like memory controllers, which will affect the performance level and connectivity of other peripherals.
Running a program on single or multi chip CPUs require to:
- Read an instruction and decode it
- Find the data that is associated with that instruction
- Process/execute the instruction
- Write/display the final result
This type of architecture is developed to use the least power, using the least number of logic gates. They should give great timing and high reliability in performance. No pipeline is to be stalled for conditional branching and cases of interrupts. In this type of computer, four steps are in sequence over several cycles of the clock. In the control logic, the cycle counter, cycle state, and the instruction decode register are used to determine the exact working of a part of the computer. A table can be created to describe each part of the computer’s control signals in a cycle, and this table can be used for testing the software simulation.
Increasing Execution Speed
The memory units like cache, main memory, non-volatile hard disks have always been slower than the processor. Therefore there was the need to complicate the simple lines of instructions to reduce the memory requirement. A lot of research has been put into this field, and for a long, the goal has been to execute more and more instructions in parallel. Evolution in the development of semiconductors has also greatly helped in this research.
Instruction Set Choice
There have been quite a lot of changes in the instruction sets and storage architectures in computers. Huge instruction words and explicitly parallel instruction computing have been a recent trend. Single instruction, multiple data, and vector processors have evolved for data parallelism. After the CISC computers, RISC computers came, which reduces complexity. The RISC processors simplified the instructions to the lowest level and enabled easier fetch, decode, and instruction execution in a pipeline fashion.
Before, when there was sequential execution of instructions, different parts of processors remained idle for most of the time. In sequential execution, all the steps for a single instruction were carried out, and then the processor moved on to the next instruction. Then the pipelining method came, which improved the performance. In the pipelining method, multiple instructions can be executed at the same time by the processor. The RISC architecture supported the pipe lining method with its simple instruction set. The whole process operates in an assembly fashion where the instruction comes in from one side and goes out from another. This also explains the faster execution of RISC. Due to the less complexity of RISC, pipeline core and instruction cache would be placed on the same size die, which would have been occupied by CISC architecture alone.
With advancements in hardware, semiconductors, and more space-efficient computer processors, there was eventually more space on chips to accommodate more circuitry. The designers started to experiment in ways of utilizing the saved space. For this problem, more and more cache memory started to get added to the silicon chip. This cache memory could be accessed faster and in fewer cycles compared to the main memory. The cache memory data was accessible really fast by the processor and doesn’t lead to any time loss. Cache and pipeline together allowed much faster-processing speed. Using on-chip cache memory lead the pipeline to run at speed matching the cache.
With the method of pipelining, with the increase in processing speed, there were also some difficulties. It was the error in reading and writing operations due to consecutive instructions operating simultaneously. E.g., if instruction A has to update the value of the variable by adding 10 to it, and the second instruction B has to use the value of that same variable for further operation, then there could be an error. Due to the pipelining concept, the second instruction execution will read the old value of that variable because instruction A would not have updated the variable value by that time. To resolve this conflict, the processor may have to stall/delay the succeeding instructions till the time, former instruction is not completely executed, and the value updates to memory location. To avoid these delays or instruction stall, the procedure like branch prediction is used. In this method, the hardware will itself guess whether to execute the next branch of a pipeline on analysis. This guess is made by analyzing the past instructions and predicting future instruction. This allows you to continue executing instructions by skipping the conflicting instruction and avoid stalling.
The advancements in semiconductor manufacturing did not halt yet. There were further improvements, and more and more logic gates continued to be used. Programs can be executed even faster if many instructions were processed simultaneously. To enhance this, superscalar processors came into being. They achieved this by replicating the functional unit of the CPU. The newer designs were able to accommodate two load units, one storage unit, two or more integer calculation units, two or more floating calculation units, and maybe a SIMD (single instruction multiple data) unit.
Out of Order Execution
In the pipeline execution, there was a problem with stalls that we encountered. Also, we tried to solve that by using the cache, which reduces the overall time by increasing execution speed and branch prediction. Still, those solutions do not completely eradicate the stall in the instruction cycles. Therefore, there is a way to make use of the stall period. Out of order execution will be the ready instruction that will be processed while any instruction is waiting for install. The system will then reorder the results of instructions to maintain the proper order of execution.
The program instructions may get assigned to the same register if there is nothing present to distinguish between them. This will result in serialized execution of those instructions, leading to more time taken by the processor. Therefore, it will be a good idea to name the register to distinguish between them. Now, when the instruction has to be sent to a register, it will be sent to a register that doesn’t already have an instruction set, leading to parallel execution.
Multiprocessing and Multithreading
There was often a mismatch in it with the development of different kinds of CPUs and RAMs with different frequencies. All the techniques previously used for the instruction-level parallelism could not compensate for the long stalls that occurred due to the fetching of data from the main memory. Also, with the same frequency operation of processors and more storage for ILP, there was a lot of heat dissipation, and all of this required for the new generation of computers. Multiprocessing was one of the solutions that were widely accepted in mainframe and supercomputers. Multiprocessing systems meant computers with multiple CPUs. There was also the appearance of multi core CPUs, where there was more than one CPU circuit on a single chip. Another solution that came was multithreading. In multithreading, there was switching between different processes. Whenever there is stalling in the data access, the processor switches to executing a different program that was ready to execute.
also read: Computer Science Vs Information Technology