how_to_DSL(1).pdf

When and How to Develop Domain-Speciﬁc Languages

MARJAN MERNIK

University of Maribor

JAN HEERING

CWI

AND

ANTHONY M. SLOANE

Macquarie University

Domain-speciﬁc languages (DSLs) are languages tailored to a speciﬁc application

domain. They offer substantial gains in expressiveness and ease of use compared with

general-purpose programming languages in their domain of application. DSL

development is hard, requiring both domain knowledge and language development

expertise. Few people have both. Not surprisingly, the decision to develop a DSL is often

postponed indeﬁnitely, if considered at all, and most DSLs never get beyond the

application library stage.

Although many articles have been written on the development of particular DSLs,

there is very limited literature on DSL development methodologies and many questions

remain regarding when and how to develop a DSL. To aid the DSL developer, we

identify patterns in the decision, analysis, design, and implementation phases of DSL

development. Our patterns improve and extend earlier work on DSL design patterns.

We also discuss domain analysis tools and language development systems that may

help to speed up DSL development. Finally, we present a number of open problems.

Categories and Subject Descriptors: D.3.2 [ Programming Languages ]: Language

Classiﬁcations— Specialized Application Languages

General Terms: Design, Languages, Performance

Additional Key Words and Phrases: Domain-speciﬁc language, application language,

domain analysis, language development system

Authors’ addresses: M. Mernik, Faculty of Electrical Engineering and Computer Science, University of Mari-

bor, Smetanova 17, 2000 Maribor, Slovenia; email: marjan.mernik@uni-mb.si; J. Heering, Department of Soft-

ware Engineering, CWI, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands; email: Jan.Heering@cwi.nl;

A.M. Sloane, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia; email:

asloane@ics.mq.edu.au.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted

without fee provided that copies are not made or distributed for proﬁt or direct commercial advantage and

that copies show this notice on the ﬁrst page or initial screen of a display along with the full citation.

Copyrights for components of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any

component of this work in other works requires prior speciﬁc permission and/or a fee. Permissions may be

requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax:

1 (212)

869-0481, or permissions@acm.org.

2005 ACM 0360-0300/05/1200-0316 $5.00

ACM Computing Surveys, Vol. 37, No. 4, December 2005, pp. 316–344.

When and How to Develop Domain-Speciﬁc Languages

317

1. INTRODUCTION

the well-known syntax speciﬁcation for-

malism, dates back to 1959 [Backus

1960]. Domain-speciﬁc visual languages

(DSVLs), such as visual languages for

hardware description and protocol speciﬁ-

cation, are important but beyond the scope

of this survey.

We will not give a deﬁnition of what con-

stitutes an application domain and what

does not. Some consider Cobol to be a DSL

for business applications, but others would

argue this is pushing the notion of appli-

cation domain too far. Leaving matters of

deﬁnition aside, it is natural to think of

DSLs in terms of a gradual scale with very

specialized DSLs such as BNF on the left

and GPLs such as C

1.1. General

Many computer languages are domain-

speciﬁc rather than general purpose.

Domain-speciﬁc languages (DSLs) are

also called application-oriented [Sammet

1969], special purpose [Wexelblat 1981,

p. xix], specialized [Bergin and Gib-

son 1996, p. 17], task-speciﬁc [Nardi

1993], or application [Martin 1985] lan-

guages. So-called fourth-generation lan-

guages (4GLs) [Martin 1985] are usually

DSLs for database applications. Little lan-

guages are small DSLs that do not include

many features found in general-purpose

programming languages (GPLs) [Bentley

1986, p. 715].

DSLs trade generality for expressive-

ness in a limited domain. By providing

notations and constructs tailored toward

a particular application domain, they of-

fer substantial gains in expressiveness

and ease of use compared with GPLs for

the domain in question, with correspond-

ing gains in productivity and reduced

maintenance costs. Also, by reducing the

amount of domain and programming ex-

pertise needed, DSLs open up their appli-

cation domain to a larger group of soft-

ware developers compared to GPLs. Some

widely used DSLs with their application

domains are listed in Table I. The third

column gives the language level of each

DSL as given in Jones [1996]. Language

level is related to productivity as shown

in Table II, also from Jones [1996]. Apart

from these examples, the beneﬁts of DSLs

have often been observed in practice and

are supported by quantitative results such

as those reported in Herndon and Berzins

[1988]; Batory et al. [1994]; Jones [1996];

Kieburtz et al. [1996]; and Gray and Kar-

sai [2003], but their quantitative valida-

tion in general as well as in particular

cases, is hard and an important open prob-

lem. Therefore, the treatment of DSL de-

velopment in this article will be largely

qualitative.

The use of DSLs is by no means new.

APT, a DSL for programming numerically-

controlled

on the right. (The

language level measure of Jones [1996] is

one attempt to quantify this scale.) On this

scale, Cobol would be somewhere between

BNF and C

but much closer to the lat-

ter. Similarly, it is hard to tell if command

languages like the Unix shell or script-

ing languages like Tcl are DSLs. Clearly,

domain-speciﬁcity is a matter of degree.

In combination with an application li-

brary , any GPL can act as a DSL. The

library’s Application Programmers Inter-

face (API) constitutes a domain-speciﬁc

vocabulary of class, method, and function

names that becomes available by object

creation and method invocation to any

GPL program using the library. This be-

ing the case, why were DSLs developed in

the ﬁrst place? Simply because they can

offer domain-speciﬁcity in better ways.

—Appropriate or established domain-

speciﬁc notations are usually beyond the

limited user-deﬁnable operator notation

offered by GPLs. A DSL offers appropri-

ate domain-speciﬁc notations from the

start. Their importance should not be

underestimated as they are directly re-

lated to the productivity improvement

associated with the use of DSLs.

—Appropriate domain-speciﬁc constructs

and abstractions cannot always be map-

ped in a straightforward way to func-

tions or objects that can be put in a

library. Traversals and error handling

are typical examples [Bonachea et al.

1999; Gray and Karsai 2003; Bruntink

machine

tools,

was

devel-

oped

1957–1958

[Ross

1981].

BNF,

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

318

M. Mernik et al.

Table I. Some Widely Used Domain-Speciﬁc Languages

DSL

Application Domain

Level

BNF

Syntax speciﬁcation

n.a.

Excel

Spreadsheets

(version 5)

HTML

Hypertext web pages

(version 3.0)

L A T E X

Typesetting

n.a.

Make

Software building

MATLAB

Technical computing

n.a.

SQL

Database queries

VHDL

Hardware design

Java

General-purpose

(comparison only)

Table II. Language Level vs. Productivity

as Measured in Function Points (FP)

Productivity Average

cost effective solution in many cases, the

more so since the advent of component

technologies such as COM and CORBA

[Szyperski 2002] has further complicated

the relative merits of DSLs and appli-

cation libraries. For instance, Microsoft

Excel’s macro language is a DSL for

spreadsheet applications which adds pro-

grammability to Excel’s fundamental in-

teractive mode. Using COM, Excel’s imple-

mentation has been restructured into an

application library of COM components,

thereby opening it up to GPLs such as

Level

per Staff Month (FP)

1–3

5–10

4–8

10–20

9–15

16–23

15–30

24–55

30–50

40–100

et al. 2005]. A GPL in combination with

an application library can only express

these constructs indirectly or in an awk-

ward way. Again, a DSL would incorpo-

rate domain-speciﬁc constructs from the

start.

—Use of a DSL offers possibilities for anal-

ysis , veriﬁcation , optimization , paral-

lelization , and transformation in terms

of DSL constructs that would be much

harder or unfeasible if a GPL had been

used because the GPL source code pat-

terns involved are too complex or not

well deﬁned.

—Unlike GPLs, DSLs need not be exe-

cutable . There is no agreement on this

in the DSL literature. For instance, the

importance of nonexecutable DSLs is

emphasized in Wile [2001], but DSLs

are required to be executable in van

Deursen et al. [2000]. We discuss DSL

executability in Section 1.2.

, Java, and Basic which can access

it through its COM interfaces. This pro-

cess of componentization is called automa-

tion [Chappell 1996]. Unlike the Excel

macro language, which by its very nature

is limited to Excel functionality, GPLs are

not. They can be used to write applica-

tions transcending Excel’s boundaries by

using components from other automated

programs and COM libraries in addition

to components from Excel itself.

In the remainder of this section, we dis-

cuss DSL executability (Section 1.2), DSLs

as enablers of reuse (Section 1.3), the scope

of this article (Section 1.4), and DSL liter-

ature (Section 1.5).

1.2. Executability of DSLs

DSLs are executable in various ways and

to various degrees even to the point of

being nonexecutable. Accordingly, depend-

ing on the character of the DSL in ques-

tion, the corresponding programs are often

more properly called speciﬁcations, deﬁ-

nitions, or descriptions. We identify some

points on the DSL executability scale.

—DSL with well-deﬁned execution seman-

tics (e.g., Excel macro language, HTML).

Despite their shortcomings, application

libraries are formidable competitors to

DSLs. It is probably fair to say that

most DSLs never get beyond the ap-

plication library stage. These are some-

times called domain-speciﬁc embedded

languages (DSELs) [Hudak 1996]. Even

with improved DSL development tools, ap-

plication libraries will remain the most

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

When and How to Develop Domain-Speciﬁc Languages

319

—Input language of an application

generator [Cleaveland 1988; Smarag-

dakis and Batory 2000]. Examples are

ATMOL [van Engelen 2001], a DSL for

atmospheric modeling, and Hancock

[Bonachea et al. 1999], a DSL for

customer proﬁling. Such languages are

also executable, but they usually have

a more declarative character and less

well-deﬁned execution semantics as far

as the details of the generated appli-

cations are concerned. The application

generator is a compiler for the DSL in

question.

—DSL not primarily meant to be exe-

cutable but nevertheless useful for ap-

plication generation. The syntax speciﬁ-

cation formalism BNF is an example of a

DSL with a purely declarative character

that can also act as an input language

for a parser generator.

—DSL not meant to be executable. Exam-

ples are domain-speciﬁc data structure

representations [Wile 2001]. Just like

their executable relatives, such nonex-

ecutable DSLs may beneﬁt from vari-

ous kinds of tool support such as special-

ized editors, prettyprinters, consistency

checkers, analyzers, and visualizers.

semantic notions embodied in the DSL

without having to perform a detailed do-

main analysis themselves. Examples in-

clude BDL [Bertrand and Augeraud 1999]

that generates software to control concur-

rent objects and Teapot [Chandra et al.

1999] that produces implementations of

cache coherence protocols. Krueger iden-

tiﬁes deﬁnition of domain coverage and

concepts as a difﬁcult challenge for im-

plementors of application generators. We

identify patterns for domain analysis in

this article.

DSLs also play a role in other reuse cat-

egories identiﬁed by Krueger [1992]. For

example, software architectures are com-

monly reused when DSLs are employed

because the application generator or com-

piler follows a standard design when pro-

ducing code from a DSL input. For exam-

ple, GAL [Thibault et al. 1999] enables

reuse of a standard architecture for video

device drivers. DSLs implemented as ap-

plication libraries clearly enable reuse

of source code. Prominent examples are

Haskell-based DSLs such as Fran [Elliott

1999]. DSLs can also be used for for-

mal speciﬁcation of software schemas.

For example, Nowra [Sloane 2002] speci-

ﬁes software manufacturing processes and

SSC [Buffenbarger and Gruell 2001] deals

with subsystem composition.

Reuse may involve exploitation of an

existing language grammar. For example,

Hancock [Bonachea et al. 1999] piggy-

backs on C, while SWUL [Bravenboer and

Visser 2004] extends Java. Moreover, the

success of XML for DSLs is largely based

on reuse of its grammar for speciﬁc do-

mains. Less formal language grammars

may also be reused when notations used

by domain experts, but not yet available

in a computer language, are realized in

a DSL. For example, Hawk [Launchbury

et al. 1999] uses a textual form of an ex-

isting visual notation.

1.3. DSLs as Enablers of Reuse

The importance of DSLs can also be appre-

ciated from the wider perspective of the

construction of large software systems. In

this context, the primary contribution of

DSLs is to enable reuse of software arti-

facts [Biggerstaff 1998]. Among the types

of artifacts that can be reused via DSLs

are language grammars, source code, soft-

ware designs, and domain abstractions.

Later sections provide many examples of

DSLs; here we mention a few from the per-

spective of reuse.

In his deﬁnitive survey of reuse Krueger

[1992] categorizes reuse approaches along

the following dimensions: abstracting, se-

lecting, specializing, and integrating. In

particular, he identiﬁes application gener-

ators as an important reuse category. As

already noted, application generators of-

ten use a DSL as their input language,

thereby enabling programmers to reuse

1.4. Scope of This Article

There are no easy answers to the “when

and how” question in the title of this arti-

cle. The previously mentioned beneﬁts of

DSLs do not come free.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

320

M. Mernik et al.

—DSL development is hard, requiring

both domain and language development

expertise. Few people have both.

—DSL development techniques are more

varied than those for GPLs, requiring

careful consideration of the factors in-

volved.

—Depending on the size of the user com-

munity, development of training mate-

rial, language support, standardization,

and maintenance may become serious

and time-consuming issues.

Biggerstaff and Perlis [1989], a two-

volume collection of articles on software

reuse including DSL development and

program generation; Nardi [1993], focuses

on the role of DSLs in end-user program-

ming; Salus [1998], a collection of articles

on little languages (not all of them DSLs);

and Barron [2000], which treats scripting

languages (again, not all of them DSLs).

Domain analysis, program generators,

generative programming techniques, and

intentional programming (IP) are treated

in Czarnecki and Eisenecker [2000].

Domain analysis and the use of XML,

DOM, XSLT, and related languages and

tools to generate programs are discussed

in Cleaveland [2001]. Domain-speciﬁc

language development is an important

element of the software factories method

[Greenﬁeld et al. 2004].

Proceedings of recent workshops and

conferences partly or exclusively devoted

to DSLs are Kamin [1997]; USENIX

[1997, 1999]; HICSS [2001, 2002, 2003];

Lengauer et al. [2004]. Several journals

have published special issues on DSLs

[Wile and Ramming 1999; Mernik and

L ammel 2001, 2002]. Many of the DSLs

used as examples in this article were

taken from these sources. A special is-

sue on end-user development is the sub-

ject of Sutcliffe and Mehandjiev [2004]. A

special issue on program generation, opti-

mization, and platform adaptation is au-

thored by Moura et al. [2005]. There are

many workshops and conferences at least

partly devoted to DSLs for a particular do-

main, for example, description of features

of telecommunications and other software

systems [Gilmore and Ryan 2001]. The an-

notated DSL bibliography [van Deursen

et al. 2000] (78 items) has limited overlap

with the references in this article because

of our emphasis on general DSL develop-

ment issues.

These are not the only factors complicat-

ing the decision to develop a new DSL. Ini-

tially, it is often far from evident that a

DSL might be useful or that developing a

new one might be worthwhile. This may

become clear only after a sizable invest-

ment in domain-speciﬁc software develop-

ment using a GPL has been made. The

concepts underlying a suitable DSL may

emerge only after a lot of GPL program-

ming has been done. In such cases, DSL

development may be a key step in software

reengineering or software evolution [Ben-

nett and Rajlich 2000].

To aid the DSL developer, we provide a

systematic survey of the many factors in-

volved by identifying patterns in the de-

cision, analysis, design, and implementa-

tion phases of DSL development (Section

2). Our patterns improve and extend ear-

lier work on DSL design patterns, in par-

ticular [Spinellis 2001]. This is discussed

in Section 2.6. The DSL development pro-

cess can be facilitated by using domain

analysis tools and language development

systems. These are surveyed in Section

3. Finally, conclusions and open problems

are presented in Section 4.

1.5. Literature

We give some general pointers to the DSL

literature; more speciﬁc references are

given at appropriate points throughout

this article rather than in this section.

Until recently, DSLs received relatively

little attention in the computer science

research community, and there are few

books on the subject. We mention Martin

[1985], an exhaustive account of 4GLs;

2. DSL PATTERNS

2.1. Pattern classiﬁcation

The following are DSL development

phases: decision , analysis , design , imple-

mentation , and deployment . In practice,

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: