Stupid Question 148: Why the love for semicolons in so many programming languages?

[To celebrate my first year of programming I will ask a ‘stupid’ questions daily on my blog for a year, to make sure I learn at least 365 new things during my second year as a developer]

Why the love for semicolons in so many programming languages?

I know somebody is tempted to slap me right know, so I’ll make sure to comment that there is a great deal of semicolon hate as well. Nonetheless, this little symbol seems to be rather important, but why a semicolon (and not something else), and why anything at all?
I was hoping for a quick google/bing and a quick answer. But nope. For a tiny symbol it sure seems to be complicated.

Let’s answer the easy part first: why in so many languages?
Languages are based on other languages, and they often keep the way of creating closures and the use of statement terminators. Who the grandpa of all this was I really don’t know.

Why a statement terminator?
Well, you need to break down bits and pieces of instructions, and to be able to do that you need to notify when a statement has terminated. Most programming languages have conventions for statement separation, termination and line continuation.

Why a semicolon?
Some use carriage return, some use brackets and some use the semicolon (and so on). Why exactly the semicolon I actually don’t know, some theories about the grammatical use of it in the English language, where it is on a keyboard, and some theories about its visibility in text, and it doesn’t conflict with other symbols used (so it was the left-over symbol?? Tragic!). Maybe a combination of them all?

From Wikipedia:

While terminal marks (i.e., full stops, exclamation marks, and question marks) mark the end of a sentence, the comma, semicolon and colon are normally sentence internal, making them secondary boundary marks. The semicolon falls between terminal marks and the comma; its strength is equal to that of the colon

Nonetheless the semicolon, loved and hated, will probably not disappear anytime soon. I don’t mind it, but I’m used to using it and when programming in languages that don’t require it I’m left partially confused. It gives me comfort, but I’m not sure if anything more :)

Comments

Leave a comment below, or by email.

Robert MacLean

2/13/2013 2:17:20 AM

There is a historic perspective to this as well - take C versus Basic, why does C use semi colons & Basic doesn't? because Basic is an older language and the common keyboards at that time didn't have an easily findable semicolon key.

This moves to current trends in design as well, like significant white space. Why wasn't that always used? Because the hardware of the day in rendering text could not do a good enough job. Now we have much better rendering, fonts and tooling which make significant white space a viable design choice where it wasn't in the past.

Iris Classon

2/13/2013 2:19:29 AM

Reply to: Robert MacLean

That is a very good point! Apart from being used to using semicolons, I actually like using significant whitespace.

Peter Morlion (@petermorlion)

2/13/2013 2:29:47 AM

Funny, I wondered about that a while ago and found this on StackOverflow: http://stackoverflow.com/questions/1308591/in-which-language-did-semicolon-first-appear-as-a-terminator
It's more of a where than a why however.

James Curran

2/13/2013 9:40:04 AM

In early days, BASIC was completely line-oriented (one statement to a line, each line numbered).  You couldn't even have an if-block -- The only thing an if could do was a goto.  Hence, end-of-line indicated end-of-statement.

COBOL was (is) based on English and English grammar, with each statement formatted like a sentence, hence it uses a period (aka full stop) to indicate end-of-statement.

Fortran, like Basic, is line-oriented, so end-of-line = end-of-statement.  Fortran also has some weird ideas about whitespace insignificance.  In a classic bug, it's noted that "DO 100  I = 1,10"  is a loop statement (roughly equal to FOR I = 1 to 10 in Basic), while "DO 100 I = 1. 10"  (that's the same line with a typo of a period instead of a comma") is actually an assignment statement (essentially creating a floating point variable named DO100I and assigning 1.10 to it)
  Formatting was quite strict.  The only thing allowed in column 1 was either a "C" (indicting the line was a comment) or a blank.   Column 2-7 were for jump labels.  Columns 8-72 were for the program statements. Column 73-80 were for a sequence number (in case you dropped the punch card deck).  Nothing beyond column 80, because that's how big a punch card was.

When C came around (or actually it's precursor), the idea was to rebel against the fixed formatting, and allow programmer to divide lines up how that felt.  Parsing that was much easier with an explicit end-of-statement character.  Since the period already had a purpose (member selection in structs), they had to look among the other characters on a standard keyboard. 

Modern parsing techniques can ONLY eliminate the need end-of-statement characters.  Javascript allows an "implied semi-colon" at the end-of-statement, but it's not perfect, and a missing semi-colon can lead to incorrect interpretation of the code.

Hasen Ahmad

2/13/2013 2:59:35 PM

I think the best reason, besides historical tradition is that the semicolon key doesn't require that you press shift, doesn't conflict with any other kind of syntax and allows for writing multiple instructions on the same line.

Peter Wone

2/13/2013 10:34:02 PM

If I recall correctly, VaxBasic DID use semicolon line terminators. But that was a long, long time ago and I can't swear to it.

Andy Dent

2/14/2013 12:57:10 AM

The semicolon is easier to see than the full stop which was used as a terminator in Smalltalk. Remember also that a lot of early programming was done on teletypewriters - printers with keyboards - and the odd dot could vanish as the ribbon on the dotmatrix was worn. That device explains the behaviour of some of the early Unix text editors.

Forogar

2/18/2013 10:05:37 AM

Go back further than many of my younger colleagues have been alive, to the DOS days, in some common command shells, the semi-colon was used as a command separator when wanting to enter multiple commands on a line so the concept of semi-colon as the end of a command may have had help from there.

Paul Foster

3/4/2014 4:35:10 PM

Sorry, James, right idea, wrong columns... FORTRAN  column 1 - C = comment line; columns 2-5**   = line number (usually used only when required, but lovers of BASIC could number to their heart's content); column 6 nonblank = continuation of previous line (often just a 'C', some used single digits to count them); columns 7-72 text; columns 73-80 sequence numbers, optional, but useful when your deck had been dropped so that you could use a card sorter to put the deck back in order.    **=this varied according to system

Both FORTRAN and ALGOL were developed in the late 1950s. Note that the names indicated different orientations - FORTRAN is FORmula based; ALGOL is ALGOrithm based. For whatever reason - I'm sure Backus took it to his grave - the semicolon was selected in ALGOL to separate statements.  I'm sure he also took to his grave the reason behind the difference between the fixed-format FORTRAN and the free-format ALGOL. As a formula-based language, there would be only one formula per line. As an algorithm-based language, why not have more than one statement per line?  When I learned ALGOL60 back in the 1970s, none of my professors -- although they could explain much else -- could explain the romance of the semicolon:(

Last modified on 2013-02-12