Apr 292009

It is a very good idea to learn several programming languages. It makes you a better programmer. But if you want to devote all your energies to becoming a proficient and productive programmer and want to learn one powerful, freely available, high level general-purpose language (not domain specific), what would you choose?

The target audience for this exercise is someone with a couple of years of programming experience.

Here are my criteria for selecting (a non domain specific) language to learn.

1. Programmer productivity: It should provide high level of abstraction so that programmer productivity is high. A fast running application written in C that takes 6 months is — in most cases — not as useful as one that can be completed in 1 month: programmer cost as well as time-to-market considerations.
2. Speed: It should be fast (should approach C in speed). It should preferably have a highly optimized compiler that can generate architecture specific binaries — no JIT can be faster than an optimized binary. Second preference is a compiler that generates byte codes which can be processed by a fast JIT-capable virtual machine. The only virtual machine that fits the bill is JVM.
3. Succinct: The language should not be verbose. This is very important. Brevity is one reason why Python and Ruby are popular.
4. Support for Scripting:You shouldn’t have to learn one language for writing scripts and another for applications. This allows for rapid development in script-mode which can then can be compiled into an application, if required. This along with criteria #1 and #3 implies that you should be able to write a 10 line script that does something meaningful.
5. It should take advantage of multi-core processors without requiring additional effort from the programmer. This is important. Even entry level desktops and laptops have 2 cpu cores in a chip and the number of cores is only going to increase.
6. It should have multi-threading and/or message passing support.
7. It should be a mature and time-tested language with active development, user base and lots of applications.
8. It shouldn’t be difficult to learn for people with couple of years of programming experience. There should be good documentation, books, tutorials, active and helpful community (irc), developer tools and IDE.
9. Platform agnostic: It should not favor or give advantage to one platform. dot net comes to mind as it provides significant advantage to windows platform.
10. Code readability and maintainability: It should be relatively easy for authors and others to maintain existing code.
11. Opensource is a fine model, but if the author doesn’t want to release his/her creation under open-source he/she should be able to do so. After all, bills are not free. This means that reverse-engineering of code shouldn’t be possible. You get this as a by product of #2: it is practically impossible to reverse engineer from a binary; with Virtual machine bytecode, it can be made difficult by obfuscation.
12. Has a test framework that can generate and run tests.
13. Easily call C libraries (Foreign Function interface), for performance and code reuse.

All criteria except #10 are objective and measurable. #10 is to some extent subjective and therefore my bias will be reflected in the choice of language. Some people hate Python as it uses indentation to define code blocks. This makes the code compact but cut and paste of code is error-prone. Others love it. Some don’t like languages like Perl and Ruby where sigils ($,@,@@) are used liberally. People get personal when you belittle or disparage a computer language — and rightly so. If you have invested 10 years in Perl or Lisp, you have a vested interest in defending it and you cannot be objective.

Let’s begin with Common Lisp, Scheme, Fortran, Smalltalk, C, C++, Objective C, Ada, Java, Javascript, C#, D, Prolog, Perl, PHP, Python, Ruby, Groovy, Clojure, Lua, Forth, Factor, Erlang, OCaml, F#, Clean, Haskell, Scala and start the elimination process.

I will eliminate Prolog as it is not a general purpose language.
I will eliminate Lua for the same reason, though it has found a niche in the gaming community.
I will also eliminate Objective-C as it is mainly used in Mac OSX and iphone software. Of late, there has been some resurgence in Objective-C.
Criteria #1 eliminates C and Fortran. You probably already know C anyway.
Criteria #7 eliminates D.

Criteria #2 eliminates Smalltalk. There has been a revival of Smalltalk with Squeak/Gemstone/Seaside. You get very powerful features like continuations that make it easy to write web apps. But speed is a major concern and it also consumes humongous amount of memory. Also, except for GNU-Smalltalk, you have to work within an image and this is highly limiting. Deployment is also an issue as you have to ship the image. #11 is also a concern.

Criteria #10 eliminates Perl. Other reasons are #11, #5, #2. Though Perl is a powerful language, many people consider it to be line noise. Code written in Perl may look alien to the author 1 week after writing it 8-). Perl was the king several years ago when it had no competition; though Ruby and Python were around at that time, they were extremely sluggish when compared to Perl. Perl also has an amazing number of libraries. You can search the online CPAN archive and easily install perl modules or libraries. However, it has a steeper learning curve than other scripting languages. Object oriented functionality was bolted on to Perl in such a klugy and ugly way “bless ($self, $class_name)” that only a mother can love and bless. Perl6 (forever-ware) has been in the making for God knows how many years. But when it arrives, it is supposed to allow the ability to run other dynamic languages also. I think it is DBA (Dead before arrival). Damien Conway (Perl author) says that the reason (more like an excuse) for the delay is that it is an all-volunteer effort. It is likely that the number of volunteers is dwindling as they jump a sinking ship and go to Ruby or Python. Randal Schwartz who makes money off Perl is hedging his bets and touting Smalltalk/Seaside — on a different site from his stonehenge site. There is a huge base of ‘legacy’ Perl software and Perl will be here for years to come.

I will also eliminate PHP. PHP has been mainly used in web apps. Other reasons for removing PHP are the same as Perl: (#10, #11, #2, #5). Note that #10 is a subjective criteria and reflects my bias. Lot of object oriented features have been added in PHP 5. PHP has quietly and without the fanfare attributed to languages such as Python and Ruby, found a niche in web based applications. You are reading this article thanks to a PHP program — WordPress. Whether it is blogging, wiki, cms, forum, picture gallery there are several popular PHP apps to choose from. PHP is not going away any time soon.

I will also eliminate Forth and Factor. Factor has lot of nice features. However, stack based languages have not appealed to the vast majority of programmers. If you enjoyed using RPN Calculators, you will love Factor.

I will also eliminate Ada due to #3, #2, #7 and even #4, #5, #6.

Javascript: I rule out Javascript partly because its initial domain was client side code in browsers (it still is, to a large extent) but mainly for the same reason we don’t program in assembly any more — availability of highly optimizing compilers. In the case of Javascript, the highly optimizing compiler is GWT (Google Web Toolkit) that can generate better code than hand-crafted Javascript. Given that the primary domain of Javascript is client-side code in browsers, that there are several browsers — Firefox, Internet Explorer, Safari, Google Chrome, Opera — each with their own quirks, it is an uphill task to ensure that your Javascript code works on all these browsers. More browsers will be added to the list — for example Android. And Google will keep optimizing its GWT compiler to generate even better Javascript (targeted to each browser), so you don’t need to learn Javascript just to write some JSNI code.

Criteria #1 and #3 rule out a heavyweight — Java.
Criteria #1 says that it should provide a high level of abstraction. One of the must-haves for high level of abstraction is that functions should be first class. That is, functions should enjoy the same privileges as variables: pass functions as parameters to functions, return functions from functions etc. In Java, functions are not first-class citizens.
Java also falls short when it comes to criteria #3 — in the verbosity department, it gives COBOL a run for the money.

I will rule out another heavyweight — C++. The design goal — sound at the time C++ was developed — of being backward compatible with C constrained the language designer. It fails my (only subjective) criteria #10. I would argue that C++ fails criteria #8 as there are so many gotchas. There are many variations of C++. Freeing memory manually is error-prone. It is not for nothing you have books like “Exceptional C++”, “More Exceptional C++”, “More and More Exceptional C++”. Garbage collection techniques have come a long way from the “stop-everything-and-run-gc” days. The slight performance penalty is more than worth paying for in return for not having crashes. In addition, you probably already know C++ and the objective is to learn a new language.

I will rule out C#. C# never really took off and hardly is the Java killer that it was touted to be. C# and F# are both from Microsoft and are programming languages for .NET. Now if a same company produces 2 products named C# and F#, which one will you pick? F#, as it got to be better 8-) Seriously speaking, you can do with F# all that you can do with C# and more. F# (F stands for functional), lets you write both functional and imperative code. It is heavily influenced by the OCaml (Objective Caml) language. F# allows scripting whereas C# doesn’t.

Common Lisp and Scheme excel in many of the criteria but I am eliminating it primarily due to syntax (#10). Though I enjoy writing emacs lisp code and reading (primarily emacs) lisp code, the syntax — code is data, jungle of parenthesis — has not found mainstream acceptance. Reading and understanding code where the powerful macro feature is used liberally — as it would be in advanced apps — would be difficult for many; using macros correctly would also be a challenge. Support for multi-core processors (#5) is lacking (it is quite difficult to do with mutable objects) and native thread support (#6) may not be available in all platforms.

Clojure , from Rich Hickey, is an interesting dynamic language which has lisp syntax but is a functional language that compiles directly to Java VM bytecode. It has built-in support for concurrency, transactions. So, even though it shares the lisp syntax with Common Lisp and Scheme, I am not eliminating it at this point.

We are left with Python,Ruby,Groovy,Clojure,Erlang,OCaml,F#,Clean,Haskell and Scala.

Static Typing for Lazy People: Type inference — One of the biggest advantages touted by people in the dynamic typing camp (Ruby, Python) is that you can be more ‘productive’ as you don’t need to waste time specifying the type. Any debate on static vs dynamic typing will be not unlike that between vi and emacs — religious wars, where you have people on both camps arguing passionately . I can’t think of any advantage of dynamic typing except that it allows the programmer to be lazy and not specify the type. This laziness has a hidden cost in that a lot more time needs to be spent on testing as you have refused the help of the compiler. The productivity gains are more than offset by the additional testing that is required. And if you don’t write good tests you will have problems at run time where there is not much you can do. Why do you want to hobble the compiler and not let it perform better just because you want to avoid specifying the type. Also, I can never understand why you would want to catch a bug at run-time when you can do it at compile time. Here is the kicker: do you want all the benefits of static typing but don’t want to bother entering type? You can have your cake and eat it too. The name for this magic is type inference. To put it simply, you can infer the types from the operations that are performed. Take division operation: a = c/d. We know that you can divide 2 numbers. You cannot divide 2 Strings or 2 enums (unless you have added these definitions). Similarly if you call ‘head xyz’, the head operation applies to a list and therefore xyz must be a list. This is a bit simplistic but you get the idea. This means that the only excuse for not choosing a statically typed language — laziness — doesn’t apply as you can be lazy and still have static types. With static typing a lot of the bugs will be found at compile time. IDEs can take advantage of (specified or inferred) types. Code factoring is lot easier.
So, though Python and Ruby pass criteria #1, they don’t do it by a wide-margin if you include the extra time required for testing. They also pass criteria 3, 4, 7, 8, 9, 10, 13. But they fail my criteria #2, by a wide margin — Ruby and Python are slow. There are some implementations of Python and Ruby on the JVM but they are not ready for prime-time. Ruby and Python are used in a lot in web-frameworks, the most popular being Rails and Django, but they are not known for speed. Native thread support is lacking (#6) and you cannot transparently take advantage of multi-core processors. So I am eliminating Python and Ruby, primarily for #2 and secondarily due to #11, #5, #6 and the possibility of run time errors due to type mismatch. For many use-cases, speed is not a concern and so Python and Ruby are suitable. But if speed is a concern, it doesn’t make sense (in my opinion) to prototype the app first in Python or Ruby and then write parts (or whole) of it in C for speed. If you can prototype in language-x as quickly as you can do it in Python or Ruby but with much better performance, you have best of both worlds.

I will eliminate Groovy. It doesn’t offer much more than Java with a more succinct syntax. You may as well learn Python or Ruby which are also dynamic languages.

I will eliminate F# due to #8 (Platform agnostic). F# runs on the .NET platform. It may run on Mono (.NET on Linux) but may not be as efficient. We need to depend on Microsoft’s roadmap for F#. You can use OCaml as F# is largely derived from it.

That leaves us with Clojure,Erlang,OCaml,Clean,Haskell and Scala.

Message Passing: It is quite difficult to get multi-threaded programs to work correctly. Locking is a nightmare and it is easy to get it wrong, resulting in deadlocks, starvation etc. Conservative use of locking leads to performance issues. Message passing is an effective alternative. It has been used with great success in Erlang.
Ericsson uses Erlang to write massively parallel concurrent programs. Some of these applications have been running non-stop on switches for several years. You can even upgrade a live system (hot swapping). Erlang is easy to learn and fun to use. Processes are very light-weight and you can send messages to processes. Until recently speed (criteria #2) was a concern, but this has changed with the availability of HIPE (High-Performance Erlang Project). String processing in Erlang is quite inefficient as they are stored as a list of (32 bit) integers. While Erlang is quite suitable for certain domains, I wouldn’t call it a general purpose language. It never came out of its niche until recently with the advent of multi-core processors and interest in functional programming. Lack of 3rd party libraries is an issue. So, with regret, I eliminate Erlang.

Haskell and Clean have a lot in common and so I am combining them as Haskell/Clean to get the top 4: Clojure,OCaml,Haskell/Clean and Scala. At this point, I have to eliminate Clojure. Though it has a lot going for it, the Lisp syntax will still be an issue for most people (#8, #10).

Now we are left with 3: OCaml, Haskell/Clean and Scala. These 3 languages have several things in common.

  • All use static types
  • All use type inference, so that you get the benefit of static typing without having to specify types. Scala’s type-inference is not as complete as Haskell — types for function arguments need to specified.
  • Provide high-level of abstraction (for e.g higher order functions) (Criteria #1)
  • Fast: OCaml approaches C in speed, Haskell/Clean are not far behind. These can be compiled to architecture specific binaries. Haskell frequently wins the Great Computer Language Shootout. Scala compiles to Java byte code and thus takes advantage of the fast JIT compiler. (Criteria #2)
  • Succinct (Criteria #3)
  • Scripting Support (Criteria #4)
  • They are functional languages. Haskell and Clean are pure functional languages. Rich Hickey who designed Clojure says — and I agree — that “Mutable objects are the new spaghetti code”.
  • Because they are functional languages, it is much easier to take advantage of multiple-cores. Multi-Core processors are here to stay and I can say that sooner or later you will be programming in a language that provides support for functional programming.
  • They also satisfy criteria #9, #11,#12. Regarding #12, Haskell has something called QuickCheck. Scala has ScalaCheck which started out as a port of QuickCheck. They automate unit testing and test case generation.

My favorite is Haskell as its syntax is absolutely beautiful (personal opinion) — No unnecessary symbols, punctuation or words. It comes close to writing math that can be compiled into an efficient binary.

Regarding OCaml, while it has a rare combination of providing a very high-level of abstraction and at the same time almost matching C++ in speed, I find its syntax ugly — this is personal opinion. We are all used to + as a symbol for adding, – for subtraction etc; and it is natural to write 5+10 or 5.5+12.5. However, it doesn’t work like that in OCaml. This was a deal-breaker for me and I couldn’t go past that and learn about what the language had to offer. To find the average of 2 numbers you have to write “(a +. b) /. 2.0;;”. To add integers you need to use ‘+’, to add floats you need to use ‘+.’. Similarly for the other operators.

Scala:The language is the brainchild of Martin Odersky, ACM Fellow. He co-designed Java Generics. He implemented Java Generics. His Java+Generics compiler was adopted by Sun as the official Java compiler more than 10 years ago — with the generics switched off, until they became part of the language in Java 5. If you have been programming in Java, you have been using Martin’s Java compiler. It is one of the few languages that supports both threads and message based concurrency. Message based concurrency is based on the Actors model (like Erlang) and it is a Scala library and not part of the core language. It supports both object-oriented programming and functional programming and attempts to unify them.
Functions are values and values are objects. Therefore functions are objects. Unlike Java which has primitive types int, float etc, Scala is completely object oriented. Numbers, characters, booleans, functions are just objects
A big deal is made of duck typing in languages such as Python and Ruby. In Scala you have “Structural typing” which is Duck Typing done right. While not as powerful, you get compile time checking and no run time overhead.

Scala is a huge language with lots of features: traits, abstract types, higher order functions, closures, native threads, concurrency (Actors), xml processing, implicits, pattern matching, partial functions, monads. You can start using it right away and slowly learn about the more powerful constructs. You can easily write a DSL (Domain Specific Language) using scala. There is however a drawback of a functional language like Scala (or Clojure) that runs on top of Java Virtual Machine: it is that recursion is a common paradigm in these languages but the JVM doesn’t allow optimization of tail calls.

Scala, unlike OCaml and Haskell/Clean satisfies criteria #7 and #8. The extensive set of Java libraries can be put to use.

Scala, unlike OCaml and Haskell/Clean satisfies criteria #8. OCaml and Haskell/Clean have a much higher learning curve than Scala. Haskell is a pure functional language and you have to use something called Monads for side-effects. Scala is much easier to learn for the majority of programmers who have been programming in the imperative style.
With Scala, you can start with imperative or object-oriented style of programming (think of it Java without the verbosity) and migrate slowly to the functional features. Even if Martin Odersky’s attempt (experiment as he puts it) to unify functional and object-oriented style of programming doesn’t pan out, Scala can still be a successful language.

Lift is a web framework written in Scala. You can create web apps as easily as you can do with Rails and Django but it will typically run 4 to 6 times faster, use less CPU and it will be lot more scalable.

So, for the reasons mentioned above, I will remove OCaml, Haskell and Clean from the list and declare Scala the winner.

To give a flavor of a Scala program written in imperative style, here is a script I wrote to answer a question posed by Cedric Otaku in his blog. You don’t need to write a program to answer his question but you can verify it using a simulation. Otaku wrote it in Ruby; my Scala version is equally concise.

Here is a sample program

 * Question: Do you want to play this game?
 * You put down some money and roll a die between 1 and 100
 * Depending on your roll:
 *    1-50: I keep your money.
 *   51-65: You get your money back.
 *   66-75: You get 1.5 times your money
 *   76-99: You get 2 times your money
 *     100: You get 3 times your money
 * Do you want to play?
object game { // object is used to define a singleton class
  // alas, type inference doesn't work for function args.
  def play(bet: Double, numPlays: Int)  = {
    // parameter values can't be changed
    import java.util.Random // imports can be inside function
    // val is like final in java or const in C
    val r = new Random()
    var moneyStart, money = numPlays * bet
    // type of moneyStart and money need not be specified.
    // However, unlike python or ruby, they are inferred as
    //  Double at compile time. Similarly for r and dice.
    for (i <- 1 to numPlays) {
      var dice = r.nextInt(100) + 1  // roll the dice
      if (dice <= 50)
        money -= bet
      else if ( (dice >= 66) && (dice <= 75) )
        money += 0.5 * bet
      else if ( (dice >= 76) && (dice <= 99) )
        money += bet
      else if (dice == 100)
        money += 2 * bet
    printf("%6.0f$ became %6.0f$ after %6d plays: " +
           "You get %.2f for a dollar\n", moneyStart,
           money, numPlays, (money / moneyStart))
game.play(1.0, 100000) // Each bet is $1, play 100000 times
game.play(1.5, 5000)
game.play(4.0, 10000)

Run the script:
scala game.scala
100000$ became 80426$ after 100000 plays: You get 0.80 for a dollar
7500$ became 6134$ after 5000 plays: You get 0.82 for a dollar
40000$ became 31922$ after 10000 plays: You get 0.80 for a dollar

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>