What is a computer program, really?

Windows 10 menu Mac applications

In introductory programming courses here at Cabrillo (and elsewhere), you will learn a specific language like Java, C++, JavaScript, Python, PHP, etc. These are some of the many programming languages in use today.

Oddly enough, computers don't run code written in any of these languages.

So what do they run?

Computers can only execute machine code, which is not intended to be written nor read directly by humans, and consists of very simple instructions defined by sequences of bits (i.e., “zeros and ones”). The code that runs on computers such as modern desktops, laptops, smart phones, tablets, etc. generally consists of instructions written in the x86-64 or ARM standards.

We use tools called compilers and interpreters to convert relatively human-friendly languages like Java, C, C++, JavaScript, Python, etc. into machine code.

Just to give you an idea of the differences between programming languages, the following are different versions of a very simple program that prints one line of text to standard output containing only an exclamation point (AKA "bang") character, i.e.:

!

Machine Code

Why not start from the most basic? Here is an entire, functioning, executable program that will run on any 64-bit GNU/Linux operating system, e.g. on our server:

ELF>x@@x@8@@@££ ∆D$ˇ!∆$
Hçt$ˇø∫∏ø∏<x@x@£`
£`®`__bss_start_edata_end.symtab.strtab.shstrtab.textx@x+®ê	8Q!

I don't understand alien language.

If you think that looks like gibberish, you're right! From the first few characters, you might think it's an ELF (and indeed it is!).

In reality, this is a mixture of machine code and other information bundled together into an executable. Most of the non-English-letter characters are not characters at all, but rather a result of attempting to render a sequence of machine code (and other) bits as text. In fact, here is the same data, rendered as binary numbers (i.e., each sequence of 8 bits is one byte of “code” from the executable):

011111110100010101001100010001100000001000000001000000010000000000000000000000000000000
000000000000000000000000000000000000000000000001000000000001111100000000000000001000000
000000000000000000011110000000000001000000000000000000000000000000000000000000000001000
000000000000000000000000000000000000000000000000000000000000111100000000001000000000000
000000000000000000000000000000000000000000000000000000000000000000000100000000000000001
110000000000000000001000000000100000000000000000001010000000000000100000000000000000100
000000000000000000000000000101000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000100000000000000000000000000000000000000
000000000000000000000000010000000000000000000000000000000000000000000000101000110000000
000000000000000000000000000000000000000000000000010100011000000000000000000000000000000
000000000000000000000000000000000000000000001000000000000000000000000000000000000000000
000110001100100010000100100111111110010000111000110000001000010010000001010010010001000
110101110100001001001111111110111111000000010000000000000000000000001011101000000010000
000000000000000000000101110000000000100000000000000000000000000001111000001011011111100
000000000000000000000000000000101110000011110000000000000000000000000000001111000001010
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000001100000000000000010000000001111000000000000100000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000001100000
000000000000000000000001000000000000000000010000000001111000000000000100000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000010000000000000000000000000001000000000000000000010000000010100011000000000
110000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000011010000000000000000000000000001000000000000000000010000000
010100011000000000110000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000101000000000000000000000000000001000000000
000000000010000000010101000000000000110000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000101111101011111011
000100111001101110011010111110111001101110100011000010111001001110100000000000101111101
100101011001000110000101110100011000010000000001011111011001010110111001100100000000000
000000000101110011100110111100101101101011101000110000101100010000000000010111001110011
011101000111001001110100011000010110001000000000001011100111001101101000011100110111010
001110010011101000110000101100010000000000010111001110100011001010111100001110100000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000011011000000000000000000000000000000010000000
000000000000000000000011000000000000000000000000000000000000000000000000000000000011110
000000000001000000000000000000000000000000000000000000000001111000000000000000000000000
000000000000000000000000000000000000010101100000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000001000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000010000000000000000000000000000001000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000101010000000000000000000000000000000000
000000000000000000000000010010000000000000000000000000000000000000000000000000000000000
000000001100000000000000000000000000000010000000000000000000000000000010000000000000000
000000000000000000000000000000000000000000000011000000000000000000000000000000000000000
000000000000000000000000100100000000000000000000000000000011000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000011100000000001000000000000000000000000000000000
000000000000000000110010000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000100000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000010001000000000000000000000000000000110000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000001010001000000010000000000000000000000000000000000000000000
000000010000100000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000001000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

There are 10 kinds of people in the world, and I hate them both.

Here is that same data rendered as hexadecimal numbers:

7f454c4602010100000000000000000002003e0001000000780040000000000040000000000000007801000
000000000000000004000380001004000050004000100000005000000000000000000000000004000000000
000000400000000000a300000000000000a3000000000000000000200000000000c64424ff21c604240a488
d7424ffbf01000000ba02000000b8010000000f05bf00000000b83c0000000f050000000000000000000000
000000000000000000000000000000000000000000000300010078004000000000000000000000000000060
0000010000100780040000000000000000000000000000100000010000100a3006000000000000000000000
0000000d00000010000100a30060000000000000000000000000001400000010000100a8006000000000000
000000000000000005f5f6273735f7374617274005f6564617461005f656e6400002e73796d746162002e73
7472746162002e7368737472746162002e74657874000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000001b000000010000000600000000000000780040000000000078000000000000002b00000000000
000000000000000000001000000000000000000000000000000010000000200000000000000000000000000
000000000000a80000000000000090000000000000000300000002000000080000000000000018000000000
000000900000003000000000000000000000000000000000000003801000000000000190000000000000000
000000000000000100000000000000000000000000000011000000030000000000000000000000000000000
000000051010000000000002100000000000000000000000000000001000000000000000000000000000000

Dogs are notably unable to deal effectively with hexadecimal numbers.

Only a small section of the data in that executable is actually machine code (highlighted above in red). There are ten machine-code instructions in this program, and they are precisely as follows:

c6 44 24 ff 21
c6 04 24 0a
48 8d 74 24 ff
bf 01 00 00 00
ba 02 00 00 00
b8 01 00 00 00
0f 05
bf 00 00 00 00
b8 3c 00 00 00
0f 05

Remember, this is a program that simply prints an exclamation point to the screen. Nothing more, nothing less.

Here are those same ten instructions in binary, i.e. the actual machine code understood by the CPU:

1100011001000100001001001111111100100001
11000110000001000010010000001010
0100100010001101011101000010010011111111
1011111100000001000000000000000000000000
1011101000000010000000000000000000000000
1011100000000001000000000000000000000000
0000111100000101
1011111100000000000000000000000000000000
1011100000111100000000000000000000000000
0000111100000101

Confused puppy

This is weird, scary-looking stuff. And it's supposed to be!

Remember, machine code is not meant to be written nor read by humans. But it is the only kind of program computers can run, so we use tools to convert other—more friendly—languages to machine code.

Assembly Language

GNU logo (we will use the GNU assembler!)

So, how did that machine code above come to be? Well, I didn't write it by hand, but I did generate it from another language. In fact, it was generated from a program I wrote in assembly language.

Assembly language is, in some ways, not a huge improvement over machine code. Each statement in assembly language is equivalent to one instruction in machine code, which means that no single statement can accomplish very much. For reference, even very simple Android or iOS games probably consist of millions of machine-code/assembly-language instructions!

However, assembly language is at least somewhat readable, and exists as regular text. Here is the x86-64 GNU/Linux assembly-language version of the same program:

.text
.globl _start
_start:
  movb  $'!', -1(%rsp)
  movb  $'\n', (%rsp)
  leaq  -1(%rsp), %rsi
  movl  $1, %edi
  movl  $2, %edx
  movl  $1, %eax
  syscall
  movl  $0, %edi
  movl  $60, %eax
  syscall

Technically, this program consists of 10 statements (the first three lines are not statements that become machine code). Each of these 10 statements corresponds to exactly one of the machine-code instructions mentioned above.

What does it mean? Well, that's kind of complicated. Writing assembly language requires detailed knowledge of the type of system you are programming for, i.e. both the hardware and the operating system. The two ”syscall” instructions call on the operating system to perform two operations, namely:

  1. Print an exclamation point on its own line, to standard output (system call #1).
  2. Exit from the program (system call #60).

The other 8 instructions involve appropriate setup for those system calls.

Assembly language is not easy to read, nor to write. These days, very little code is written in assembly language, though it is occasionally an important part of programming operating systems and games, and interacting with hardware devices.

You can assemble and run this program yourself, if you'd like!:

  1. Save the source code (i.e. the assembly language instructions) to a file named bang.s.
  2. Assemble the code into machine code in a file named bang.o:
    as -o bang.o bang.s
  3. Link the machine code into an executable named asm:
    ld -o asm bang.o
  4. Run the program by specifying the path to the executable:
    ./asm

You can also verify the contents of the resulting executable:

  • In binary:
    xxd -b asm
  • In hexadecimal:
    xxd -plain asm
  • By disassembling the machine code:
    objdump -D asm

The latter will show you the exact correspondence between the assembly-language instructions and the machine-code instructions:

asm:     file format elf64-x86-64

Disassembly of section .text:

0000000000400078 <_start>:
  400078:	c6 44 24 ff 21       	movb   $0x21,-0x1(%rsp)
  40007d:	c6 04 24 0a          	movb   $0xa,(%rsp)
  400081:	48 8d 74 24 ff       	lea    -0x1(%rsp),%rsi
  400086:	bf 01 00 00 00       	mov    $0x1,%edi
  40008b:	ba 02 00 00 00       	mov    $0x2,%edx
  400090:	b8 01 00 00 00       	mov    $0x1,%eax
  400095:	0f 05                	syscall 
  400097:	bf 00 00 00 00       	mov    $0x0,%edi
  40009c:	b8 3c 00 00 00       	mov    $0x3c,%eax
  4000a1:	0f 05                	syscall 

C

K&R C book cover

Machine code and assembly language is generally regarded as "low-level" programming, since there is little abstraction between program statements and the actual system on which they will execute.

Most programming languages in wide use today are, conversely, "high-level" languages which often require little to no knowledge of the specific kind of system that will execute the program.

Probably the most famous and long-lived high-level language is C, which remains (along with Java) generally the most widely used programming language.

Here is the program written in C:

#include <stdio.h>
 
int main() {
  puts("!");
}

This C version consists of, essentially, one statement:

puts("!");

As you might imagine, this prints a specified string to standard output. Maybe it's not self-explanatory, but at least it's much friendlier than assembly language!

You can compile and run this C program yourself, if you'd like!:

  1. Save the source code (i.e. the C-language instructions) to a file named bang.c.
  2. Compile and link the code into an executable named c:
    gcc -o c bang.c
  3. Run the program by specifying the path to the executable:
    ./c

You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the C-language standard library:

objdump -d c

C++

C++ logo, kinda

Past C, we start getting into so-called “object-oriented” languages that further abstract the idea of data in programs. C++ is one of the more iconic object-oriented languages. Here is a C++ version of the program:

#include <iostream>
 
int main() {
  std::cout << '!' << std::endl;
}

This C++ version, like its C counterpart, also essentially consists of statement. It's fairly readable, though you need some background to know what things like int main() and std::cout really are.

You can compile and run this C++ program yourself, if you'd like!:

  1. Save the source code (i.e. the C++-language statements) to a file named bang.cpp.
  2. Compile and link the code into an executable named cpp:
    g++ -o cpp bang.cpp
  3. Run the program by specifying the path to the executable:
    ./cpp

You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the C++-language standard library:

objdump -d cpp

Java

Java logo

Java popularized the idea of running programs inside a virtual machine, i.e. a piece of software pretending to be hardware and performing real-time translation of virtual-machine instructions into actual machine code.

Here is a Java version of this program:

public class Bang {
  public static void main(String[] args) {
    System.out.println("!");
  }
}

Note that this doesn't look incredibly different from the C and C++ versions, other than:

  1. It's a little longer.
  2. There is apparently a “class” involved.

These two properties apply to Java in general. :-)

You can compile and run this Java program yourself, if you'd like!:

  1. Save the source code (i.e. the Java-language statements) to a file named Bang.java.
  2. Compile the code into Java Virtual Machine (JVM)bytecode in a file named Bang.class:
    javac Bang.java
  3. Run the program by specifying which class' main method the JVM should invoke:
    java Bang

You can also verify the contents of the resulting bytecode, but you will notice that it looks nothing like the assembly/machine code we have seen thus far, since it is instructions for a Java virtual machine:

javap -c Bang.class

Rust

Rust logo

There has been much interest lately in languages that have the performance/speed of C++, but lack the many safety/security issues of C++. Rust is one such high-profile language.

Here is a Rust version of this program:

fn main() {
  println!("!");
}

Again, not entirely dissimilar to C/C++/Java. A little more terse, I would say.

You can compile and run this Rust program yourself, if you'd like!:

  1. Save the source code (i.e. the Rust-language statements) to a file named bang.rs.
  2. Compile and link the code into an executable named rust:
    rustc -o rust bang.rs
  3. Run the program by specifying the path to the executable:
    ./rust

You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the Rust-language standard library, along with a lot of boilerplate code that becomes part of every Rust-based executable:

objdump -d rust

Haskell

Haskell logo

Functional programming languages offer many benefits in the age of parallel computing, i.e. where programs can run on multi-core CPU's or on entire “clouds” of devices. One such language is called Haskell.

Here is a Haskell version of the program:

main = putStrLn "!"

A more complicated Haskell program would illustrate how far we now venture from C-style code, but for now it still looks fairly similar.

You can compile and run this Haskell program yourself, if you'd like!:

  1. Save the source code (i.e. the Haskell-language statements) to a file named bang.hs.
  2. Compile and link the code into an executable named hs:
    ghc -o hs bang.hs
  3. Run the program by specifying the path to the executable:
    ./hs

You can also verify the contents of the resulting executable, but you will now notice much more code, since this version of the program incorporates and uses code from the Haskell-language standard library, along with a lot of boilerplate code that by default becomes part of every Haskell-based executable:

objdump -d hs

JavaScript

JavaScript logo, kinda

So far, the languages we have looked at go through a permanent process of translation before they can be executed. Assembly language needs to be assembled into machine code, C/C++/Rust/Haskell need to be compiled into machine code, and Java must be compiled into JVM bytecode—all before a program can be run.

An alternative paradigm is the interpreted language. Interpreted programming languages remain as source code right up to the point of execution, when a program called an interpreter translates that source code into machine code for a given system.

Probably the most widely used interpreted language is JavaScript. Nearly every web page you visit has some JavaScript running on behalf of it. Web browsers come with JavaScript interpreters included, which have become the most crucial part of browser performance.

Here is a JavaScript version the program:

console.log('!');

Note the simplicity! Interpreted languages often feature simpler syntax and the ability to write code that runs without surrounding context, i.e. as a “scripting” language“.

You can run this JavaScript program yourself, if you'd like!:

  1. Run it in the browser:
    1. Press Ctrl-Shift-I (or Command-Option-I on a Mac) to open your browser's developer tools.
    2. Make sure the Console tab is active.
    3. Paste the code into the console prompt.
  2. Run it on the server:
    1. Save the source code to a file named bang.js.
    2. Run the program by launching the Node.js interpreter and telling it to run your file:
      js bang.js

Python

Python logo

The final interpreted language I'd like to discuss is called Python. It is known for the simplicity and elegance of its source code, and is widely used in scientific research.

Here is a Python version of the program:

print('!')

Again, note the lack of surroundings that is often a hallmark of interpreted languages. In this case, we will run this Python program as a script.

You can run this Python program yourself, if you'd like!:

  1. Save the source code to a file named bang.py.
  2. Run the program by launching the Python interpreter and telling it to run your file:
    python3 bang.py

Conclusion

Programming languages are just tools, and (except for machine language) they are merely text that needs to be translated by some other tool into the actual machine code that a CPU can execute.

If you're learning programming, I think you should keep these facts in mind:

  1. There is no perfect or best programming language.
  2. It's a good idea to focus on one language for a while, and get really good at it. It's much easier to pick up other languages later on.
 
projects/what_is_a_computer_program_really.txt · Last modified: 2018-12-23 00:54 by jbergamini@jeff.cis.cabrillo.edu · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki