[GAS] Explains: What is Metaprogramming?

By Sterling “Chip” Camden
Contributing Writer, [GAS]

In recent years, the Greek root meta has perhaps become overused.  Originally, it was just a lowly preposition meaning “after”, “beyond”, or simply “with” – but especially since the writings of Douglas Hofstadter it has taken on the meaning of a higher level of abstraction, especially a self-referential abstraction.  That’s the sense in which it is used in the term metaprogramming – modifying programs programmatically, or modifying the programming language itself.

As with many terms that describe programming, metaprogramming admits of many different incarnations, shades of meaning, and degrees of support.  In the broadest sense, the simple act of creating generalized functions or classes represents an extension and abstraction of the “language” used for programming – but the term “metaprogramming” is usually reserved for more radical modifications.  Languages that provide features for those types of operations are often called dynamic languages.

Generative programming

One use of the term “metaprogramming” refers to programs that generate or manipulate their own code.  Languages that provide the best support for this are those that easily overcome the distinction between code and data.  In more than fifty years since the introduction of Lisp, no other language has devised a more radical yet natural representation of that interchangeability.  Code and data are both represented in Lisp as lists, so any list can easily be treated as either code or data.  It’s simple, therefore, to manipulate code as data, and then execute it – either via EVAL or by returning it as the result of a macro expansion.

By comparison, COBOL’s single contribution to metaprogramming, the odious ALTER statement, seems laughable.  If GOTO is deemed harmful, then ALTER was pure evil.  It allowed you to change the destination of a GOTO statement at runtime, producing stealth spaghetti.  It had all the pitfalls of metaprogramming and none of the benefits, since it reduced readability without improving abstraction.

Some versions of the early line-numbered BASIC language allowed you to “include” code from a file at runtime.  You could, therefore, write the code from your program and then include it in order to generate and execute code at runtime.

At the machine level, code is of course a form of data, so it’s always been possible to modify code at runtime in assembly languages.  I’ve seen that used intentionally and unintentionally, to impressive and devastating effect, respectively.  Buffer overrun vulnerabilities represent a malicious form of this type of metaprogramming, when code is overwritten by data that exceeds the bounds of an unchecked buffer.

The Bourne shell and its descendants, (including languages such as Perl, Ruby, Python, PHP, and JavaScript) borrowed the verb “eval” from Lisp — but because they require its argument to be a string, the code must go through a text phase before being processed by the language’s interpreter.  That text-to-code translation opens the door to vulnerabilities as well — if any of that text comes from a user, then it must be fully sanitized before it is evaluated.

Some languages that support reflection also allow for dynamic code generation.  For instance, the Microsoft .NET Framework includes the System.Reflection.Emit namespace that can be used to generate types and methods at runtime.  But it seems intentionally engineered to be difficult to use.

Reflection

Reflection constitutes another domain of metaprogramming.  It’s the ability of a language to inspect its own code – most commonly, to determine what members a given class provides.  It can therefore be used to extend the programming language beyond its usual capabilities.  For instance, it’s possible to implement a form of duck typing in C# by using reflection to look for a desired member function in an object’s type, regardless of its inheritance hierarchy – and then execute it dynamically.

Altering language behavior

Perhaps the most radical form of metaprogramming involves changing what a given statement means.  Traditional object-oriented languages allow a limited form of verb redefinition by overriding virtual methods on derived classes – but more dynamic languages provide access to basic components of the language itself.

In Ruby and Python, for instance, a script can modify any class, even the core classes of the language, to add or replace functionality as desired.  This is sometimes called monkey patching or duck punching.  While its indiscriminate use can cause rampant confusion, when applied thoughtfully it can be exceptionally powerful.

Extending the language

Languages that provide a mechanism for macro expansion allow programmers to extend the syntax of the language.  In C, C++, and Synergy/DE this is limited to parameterized replacement of identifiers at compile-time, but it can still be a powerful tool for adding domain-specific syntax.

Lisp macros, however, take this capability to a far greater level – because they are essentially generative:  compile-time code can use the full power of the Lisp language itself to determine what runtime code gets generated.  That’s perhaps the main reason why new Lispers find Lisp macros so hard to decipher:  the compile-time macro code is in the same language as the run-time code it generates, rather than using a completely different syntax for text replacement like C’s #define.

In The Art of the Metaobject Protocol, Gregor Kiczales at al. describe modifications to the Common Lisp Object System (CLOS) to allow Lisp developers to alter and extend the behavior of the class mechanism itself.  For instance, when porting applications to CLOS from a different object system, it might be useful to override CLOS’s class precedence ordering for multiple inheritance.  The Metaobject protocol provides acces to the class that represents classes themselves (standard_class), as well as other classes that correspond to other components of the object system.  A developer can therefore extend one of these metaobject classes with a derived class that implements the behavior they desire, tweaking the rules of object-orientation itself without affecting the behavior of the default cases.

Domain-specific languages

Why do you need to be able to modify or extend a general purpose programming language?  When working in a specific problem domain, the ability to write programs in a language that provides the same terminology and abstractions that domain experts use  to describe the problem can improve both productivity and the quality of the end product.  But rather than creating a whole new language from scratch and writing a compiler or interpreter for it, why not extend an existing language?  Much of the syntax that a DSL will require is general in nature – why reinvent that part?  Dynamic programming languages allow you to add the syntax and features you need while still being able to take advantage of the generalized syntax and capabilities they already provide.

This post is part six of a series on the history of programming languages.  For the first five parts, see:


Geeks are Sexy needs YOUR help. Learn more about how YOU can support us here.