What is a computer program, really?
In introductory programming courses here at Cabrillo (and elsewhere), you will learn a specific language like Java, C++, JavaScript, Python, PHP, etc. These are some of the many programming languages in use today.
Oddly enough, computers don't run code written in any of these languages.
So what do they run?
Computers can only execute machine code, which is not intended to be written nor read directly by humans, and consists of very simple instructions defined by sequences of bits (i.e., “zeros and ones”). The code that runs on computers such as modern desktops, laptops, smart phones, tablets, etc. generally consists of instructions written in the x86-64 or ARM standards.
We use tools called compilers and interpreters to convert relatively human-friendly languages like Java, C, C++, JavaScript, Python, etc. into machine code.
Just to give you an idea of the differences between programming languages, the following are different versions of a very simple program that prints one line of text to standard output containing only an exclamation point (AKA "bang") character, i.e.:
!
Machine Code
Why not start from the most basic? Here is an entire, functioning, executable program that will run on any 64-bit GNU/Linux operating system, e.g. on our server:
ELF>x@@x@8@@@££ ∆D$ˇ!∆$ Hçt$ˇø∫∏ø∏<x@x@£` £`®`__bss_start_edata_end.symtab.strtab.shstrtab.textx@x+®ê 8Q!
If you think that looks like gibberish, you're right! From the first few characters, you might think it's an ELF (and indeed it is!).
In reality, this is a mixture of machine code and other information bundled together into an “executable“. Most of the non-English-letter characters are not characters at all, but rather a result of attempting to render a sequence of machine code (and other) bits as text. In fact, here is the same data, rendered as binary numbers (i.e., each sequence of 8 bits is one byte of “code” from the executable):
011111110100010101001100010001100000001000000001000000010000000000000000000000000000000 000000000000000000000000000000000000000000000001000000000001111100000000000000001000000 000000000000000000011110000000000001000000000000000000000000000000000000000000000001000 000000000000000000000000000000000000000000000000000000000000111100000000001000000000000 000000000000000000000000000000000000000000000000000000000000000000000100000000000000001 110000000000000000001000000000100000000000000000001010000000000000100000000000000000100 000000000000000000000000000101000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000100000000000000000000000000000000000000 000000000000000000000000010000000000000000000000000000000000000000000000101000110000000 000000000000000000000000000000000000000000000000010100011000000000000000000000000000000 000000000000000000000000000000000000000000001000000000000000000000000000000000000000000 000110001100100010000100100111111110010000111000110000001000010010000001010010010001000 110101110100001001001111111110111111000000010000000000000000000000001011101000000010000 000000000000000000000101110000000000100000000000000000000000000001111000001011011111100 000000000000000000000000000000101110000011110000000000000000000000000000001111000001010 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000001100000000000000010000000001111000000000000100000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000001100000 000000000000000000000001000000000000000000010000000001111000000000000100000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000010000000000000000000000000001000000000000000000010000000010100011000000000 110000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000011010000000000000000000000000001000000000000000000010000000 010100011000000000110000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000101000000000000000000000000000001000000000 000000000010000000010101000000000000110000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000101111101011111011 000100111001101110011010111110111001101110100011000010111001001110100000000000101111101 100101011001000110000101110100011000010000000001011111011001010110111001100100000000000 000000000101110011100110111100101101101011101000110000101100010000000000010111001110011 011101000111001001110100011000010110001000000000001011100111001101101000011100110111010 001110010011101000110000101100010000000000010111001110100011001010111100001110100000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000011011000000000000000000000000000000010000000 000000000000000000000011000000000000000000000000000000000000000000000000000000000011110 000000000001000000000000000000000000000000000000000000000001111000000000000000000000000 000000000000000000000000000000000000010101100000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000001000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000010000000000000000000000000000001000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000101010000000000000000000000000000000000 000000000000000000000000010010000000000000000000000000000000000000000000000000000000000 000000001100000000000000000000000000000010000000000000000000000000000010000000000000000 000000000000000000000000000000000000000000000011000000000000000000000000000000000000000 000000000000000000000000100100000000000000000000000000000011000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000011100000000001000000000000000000000000000000000 000000000000000000110010000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000100000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000010001000000000000000000000000000000110000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000001010001000000010000000000000000000000000000000000000000000 000000010000100000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000001000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Here is that same data rendered as hexadecimal numbers:
7f454c4602010100000000000000000002003e0001000000780040000000000040000000000000007801000 000000000000000004000380001004000050004000100000005000000000000000000000000004000000000 000000400000000000a300000000000000a3000000000000000000200000000000c64424ff21c604240a488 d7424ffbf01000000ba02000000b8010000000f05bf00000000b83c0000000f050000000000000000000000 000000000000000000000000000000000000000000000300010078004000000000000000000000000000060 0000010000100780040000000000000000000000000000100000010000100a3006000000000000000000000 0000000d00000010000100a30060000000000000000000000000001400000010000100a8006000000000000 000000000000000005f5f6273735f7374617274005f6564617461005f656e6400002e73796d746162002e73 7472746162002e7368737472746162002e74657874000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000001b000000010000000600000000000000780040000000000078000000000000002b00000000000 000000000000000000001000000000000000000000000000000010000000200000000000000000000000000 000000000000a80000000000000090000000000000000300000002000000080000000000000018000000000 000000900000003000000000000000000000000000000000000003801000000000000190000000000000000 000000000000000100000000000000000000000000000011000000030000000000000000000000000000000 000000051010000000000002100000000000000000000000000000001000000000000000000000000000000
Only a small section of the data in that executable is actually machine code (<color red>highlighted above in red</fc>). There are ten machine-code instructions in this program, and they are precisely as follows:
c6 44 24 ff 21 c6 04 24 0a 48 8d 74 24 ff bf 01 00 00 00 ba 02 00 00 00 b8 01 00 00 00 0f 05 bf 00 00 00 00 b8 3c 00 00 00 0f 05
Remember, this is a program that simply prints an exclamation point to the screen. Nothing more, nothing less.
Here are those same ten instructions in binary, i.e. the actual machine code understood by the CPU:
1100011001000100001001001111111100100001 11000110000001000010010000001010 0100100010001101011101000010010011111111 1011111100000001000000000000000000000000 1011101000000010000000000000000000000000 1011100000000001000000000000000000000000 0000111100000101 1011111100000000000000000000000000000000 1011100000111100000000000000000000000000 0000111100000101
This is weird, scary-looking stuff. And it's supposed to be!
Remember, machine code is not meant to be written nor read by humans. But it is the only kind of instructions CPUs can execute, so we use tools to convert other—more friendly—languages to machine code.
Assembly Language
So, how did that machine code above come to be? Well, I didn't write it by hand, but I did generate it from another language. In fact, it was generated from a program I wrote in assembly language.
Assembly language is, in some ways, not a huge improvement over machine code. Each statement in assembly language is equivalent to one instruction in machine code, which means that no single statement can accomplish very much. For reference, even very simple Android or iOS games probably consist of millions of machine-code/assembly-language instructions!
However, assembly language is at least somewhat readable, and exists as regular text. Here is the x86-64 GNU/Linux assembly-language version of the same program:
.text .globl _start _start: movb $'!', -1(%rsp) movb $'\n', (%rsp) leaq -1(%rsp), %rsi movl $1, %edi movl $2, %edx movl $1, %eax syscall movl $0, %edi movl $60, %eax syscall
Technically, this program consists of 10 statements (the first three lines are not statements that become machine code). Each of these 10 statements corresponds to exactly one of the machine-code instructions mentioned above.
What does it mean? Well, that's kind of complicated. Writing assembly language requires detailed knowledge of the type of system you are programming for, i.e. both the hardware and the operating system. The two ”syscall
” instructions call on the operating system to perform two operations, namely:
- Print an exclamation point on its own line, to standard output (system call #1).
- Exit from the program (system call #60).
The other 8 instructions involve appropriate setup for those system calls.
Assembly language is not easy to read, nor to write. These days, very little code is written in assembly language, though it is occasionally an important part of programming operating systems and games, and interacting with hardware devices.
You can assemble and run this program yourself, if you'd like!:
- Save the source code (i.e. the assembly language instructions) to a file named
bang.s
. - Assemble the code into machine code in a file named
bang.o
:as -o bang.o bang.s
- Link the machine code into an executable named
asm
:ld -o asm bang.o
- Run the program by specifying the path to the executable:
./asm
You can also verify the contents of the resulting executable:
- In binary:
xxd -b asm
- In hexadecimal:
xxd -plain asm
- By disassembling the machine code:
objdump -D asm
The latter will show you the exact correspondence between the assembly-language instructions and the machine-code instructions:
asm: file format elf64-x86-64 Disassembly of section .text: 0000000000400078 <_start>: 400078: c6 44 24 ff 21 movb $0x21,-0x1(%rsp) 40007d: c6 04 24 0a movb $0xa,(%rsp) 400081: 48 8d 74 24 ff lea -0x1(%rsp),%rsi 400086: bf 01 00 00 00 mov $0x1,%edi 40008b: ba 02 00 00 00 mov $0x2,%edx 400090: b8 01 00 00 00 mov $0x1,%eax 400095: 0f 05 syscall 400097: bf 00 00 00 00 mov $0x0,%edi 40009c: b8 3c 00 00 00 mov $0x3c,%eax 4000a1: 0f 05 syscall
C
Machine code and assembly language is generally regarded as "low-level" programming, since there is little abstraction between program statements and the actual system on which they will execute.
Most programming languages in wide use today are, conversely, "high-level" languages which often require little to no knowledge of the specific kind of system that will execute the program.
Probably the most famous and long-lived high-level language is C, which remains generally the most widely used programming language.
Here is the program written in C:
#include <stdio.h> int main() { puts("!"); }
This C version consists of, essentially, one statement:
puts("!");
As you might imagine, this prints a specified string to standard output (“putting a string” there, you might say). Maybe it's not self-explanatory, but at least it's much friendlier than assembly language!
You can compile and run this C program yourself, if you'd like!:
- Save the source code (i.e. the C-language instructions) to a file named
bang.c
. - Compile and link the code into an executable named
c
:gcc -o c bang.c
- Run the program by specifying the path to the executable:
./c
You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the C-language standard library:
objdump -d c
C++
Past C, we start getting into so-called “object-oriented” languages that further abstract the idea of data in programs. C++ is one of the more iconic object-oriented languages. Here is a C++ version of the program:
#include <iostream> int main() { std::cout << "!\n"; }
This C++ version, like its C counterpart, also essentially consists of statement. It's fairly readable, though you need some background to know what things like int main()
and std::cout
really are.
You can compile and run this C++ program yourself, if you'd like!:
- Save the source code (i.e. the C++-language statements) to a file named
bang.cpp
. - Compile and link the code into an executable named
cpp
:g++ -o cpp bang.cpp
- Run the program by specifying the path to the executable:
./cpp
You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the C++-language standard library:
objdump -d cpp
Java
Java popularized the idea of running programs inside a virtual machine, i.e. a piece of software pretending to be hardware and performing real-time translation of virtual-machine instructions into actual machine code.
Here is a Java version of this program:
public class Bang { public static void main(String[] args) { System.out.println("!"); } }
Note that this doesn't look incredibly different from the C and C++ versions, other than:
- It's a little longer.
- There is apparently a “
class
” involved.
These two properties apply to Java in general.
You can compile and run this Java program yourself, if you'd like!:
- Save the source code (i.e. the Java-language statements) to a file named
Bang.java
. - Compile the code into Java Virtual Machine (JVM)bytecode in a file named
Bang.class
:javac Bang.java
- Run the program by specifying which class'
main
method the JVM should invoke:java Bang
You can also verify the contents of the resulting bytecode, but you will notice that it looks nothing like the assembly/machine code we have seen thus far, since it is instructions for a Java virtual machine:
javap -c Bang.class
Go
There has been much interest lately in languages that have the performance/speed of C++, but avoid the many safety/security issues of C++. Go is one such high-profile language, which also incorporates built-in support for concurrency, allowing developers to more easily take advantage of multi-core CPUs and multi-CPU systems.
Here is a Go version of this program:
package main import "fmt" func main() { fmt.Println("!") }
Note again the syntactical similarities to C/C++/Java.
You can compile and run this Go program yourself, if you'd like!:
- Save the source code (i.e. the Go-language statements) to a file named
bang.go
. - Compile and link the code into an executable named
go
:go build -o go bang.go
- Run the program by specifying the path to the executable:
./go
You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the Go standard library, along with a lot of boilerplate code that becomes part of every Go-based executable:
objdump -d go
Rust
There has been much interest lately in languages that have the performance/speed of C++, but lack the many safety/security issues of C++. Rust is one such high-profile language, with built-in features that guarantee memory safety in a way that C++ can't.
Here is a Rust version of this program:
fn main() { println!("!"); }
Again, not entirely dissimilar to C/C++/Java. A little more terse, I would say.
You can compile and run this Rust program yourself, if you'd like!:
- Save the source code (i.e. the Rust-language statements) to a file named
bang.rs
. - Compile and link the code into an executable named
rust
:rustc -o rust bang.rs
- Run the program by specifying the path to the executable:
./rust
You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the Rust standard library, along with a lot of boilerplate code that becomes part of every Rust-based executable:
objdump -d rust
Haskell
Functional programming languages offer many benefits in the age of parallel computing, i.e. where idiomatic programs can take advantage of multi-core CPUs and multi-CPU systems. One such language is called Haskell.
Here is a Haskell version of the program:
main = putStrLn "!"
A more complicated Haskell program would illustrate how far we now venture from C-style code, but for now it still looks fairly similar.
You can compile and run this Haskell program yourself, if you'd like!:
- Save the source code (i.e. the Haskell-language statements) to a file named
bang.hs
. - Compile and link the code into an executable named
hs
:ghc -o hs bang.hs
- Run the program by specifying the path to the executable:
./hs
You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the Haskell-language standard library, along with a lot of boilerplate code that by default becomes part of every Haskell-based executable:
objdump -d hs
JavaScript
So far, the languages we have looked at go through a permanent process of translation before they can be executed. Assembly language needs to be assembled into machine code, C/C++/Rust/Haskell need to be compiled into machine code, and Java must be compiled into JVM bytecode—all before a program can be run.
An alternative paradigm is the interpreted language. Interpreted programming languages remain as source code right up to the point of execution, when a program called an interpreter translates that source code into machine code for a given system.
Probably the most widely used interpreted language is JavaScript. Nearly every web page you visit has some JavaScript running on behalf of it. Web browsers come with JavaScript interpreters included, which have become the most crucial part of browser performance.
Here is a JavaScript version the program:
console.log('!');
Note the simplicity! Interpreted languages often feature simpler syntax and the ability to write code that runs without surrounding context, i.e. as a “scripting” language“.
You can run this JavaScript program yourself, if you'd like!:
- Run it in the browser:
- Press
Ctrl-Shift-I
(or Command-Option-I on a Mac) to open your browser's developer tools. - Make sure the Console tab is active.
- Paste the code into the console prompt.
- Run it on the server:
- Save the source code to a file named
bang.js
.
Python
The final interpreted language I'd like to discuss is called Python. It is known for the simplicity and elegance of its source code, and is widely used in scientific research.
Here is a Python version of the program:
print('!')
Again, note the lack of surroundings that is often a hallmark of interpreted languages. In this case, we will run this Python program as a script.
You can run this Python program yourself, if you'd like!:
- Save the source code to a file named
bang.py
. - Run the program by launching the Python interpreter and telling it to run your file:
python3 bang.py
Conclusion
Programming languages are just tools, and “code” (except for machine code) is merely text that needs to be translated by some other tool into the actual machine code that a CPU can execute.
If you're learning programming, I think you should keep these facts in mind:
- There is no perfect or best programming language, but rather different tools suited to different purposes.
- It's a good idea to focus on one language for a while, and get really good at it. It's much easier to pick up other languages later on.