how_to_DSL(1).pdf

(310 KB) Pobierz
When and How to Develop Domain-Specific Languages
MARJAN MERNIK
University of Maribor
JAN HEERING
CWI
AND
ANTHONY M. SLOANE
Macquarie University
Domain-specific languages (DSLs) are languages tailored to a specific application
domain. They offer substantial gains in expressiveness and ease of use compared with
general-purpose programming languages in their domain of application. DSL
development is hard, requiring both domain knowledge and language development
expertise. Few people have both. Not surprisingly, the decision to develop a DSL is often
postponed indefinitely, if considered at all, and most DSLs never get beyond the
application library stage.
Although many articles have been written on the development of particular DSLs,
there is very limited literature on DSL development methodologies and many questions
remain regarding when and how to develop a DSL. To aid the DSL developer, we
identify patterns in the decision, analysis, design, and implementation phases of DSL
development. Our patterns improve and extend earlier work on DSL design patterns.
We also discuss domain analysis tools and language development systems that may
help to speed up DSL development. Finally, we present a number of open problems.
Categories and Subject Descriptors: D.3.2 [ Programming Languages ]: Language
Classifications— Specialized Application Languages
General Terms: Design, Languages, Performance
Additional Key Words and Phrases: Domain-specific language, application language,
domain analysis, language development system
Authors’ addresses: M. Mernik, Faculty of Electrical Engineering and Computer Science, University of Mari-
bor, Smetanova 17, 2000 Maribor, Slovenia; email: marjan.mernik@uni-mb.si; J. Heering, Department of Soft-
ware Engineering, CWI, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands; email: Jan.Heering@cwi.nl;
A.M. Sloane, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia; email:
asloane@ics.mq.edu.au.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or direct commercial advantage and
that copies show this notice on the first page or initial screen of a display along with the full citation.
Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax:
+
1 (212)
869-0481, or permissions@acm.org.
c
2005 ACM 0360-0300/05/1200-0316 $5.00
ACM Computing Surveys, Vol. 37, No. 4, December 2005, pp. 316–344.
When and How to Develop Domain-Specific Languages
317
1. INTRODUCTION
the well-known syntax specification for-
malism, dates back to 1959 [Backus
1960]. Domain-specific visual languages
(DSVLs), such as visual languages for
hardware description and protocol specifi-
cation, are important but beyond the scope
of this survey.
We will not give a definition of what con-
stitutes an application domain and what
does not. Some consider Cobol to be a DSL
for business applications, but others would
argue this is pushing the notion of appli-
cation domain too far. Leaving matters of
definition aside, it is natural to think of
DSLs in terms of a gradual scale with very
specialized DSLs such as BNF on the left
and GPLs such as C
1.1. General
Many computer languages are domain-
specific rather than general purpose.
Domain-specific languages (DSLs) are
also called application-oriented [Sammet
1969], special purpose [Wexelblat 1981,
p. xix], specialized [Bergin and Gib-
son 1996, p. 17], task-specific [Nardi
1993], or application [Martin 1985] lan-
guages. So-called fourth-generation lan-
guages (4GLs) [Martin 1985] are usually
DSLs for database applications. Little lan-
guages are small DSLs that do not include
many features found in general-purpose
programming languages (GPLs) [Bentley
1986, p. 715].
DSLs trade generality for expressive-
ness in a limited domain. By providing
notations and constructs tailored toward
a particular application domain, they of-
fer substantial gains in expressiveness
and ease of use compared with GPLs for
the domain in question, with correspond-
ing gains in productivity and reduced
maintenance costs. Also, by reducing the
amount of domain and programming ex-
pertise needed, DSLs open up their appli-
cation domain to a larger group of soft-
ware developers compared to GPLs. Some
widely used DSLs with their application
domains are listed in Table I. The third
column gives the language level of each
DSL as given in Jones [1996]. Language
level is related to productivity as shown
in Table II, also from Jones [1996]. Apart
from these examples, the benefits of DSLs
have often been observed in practice and
are supported by quantitative results such
as those reported in Herndon and Berzins
[1988]; Batory et al. [1994]; Jones [1996];
Kieburtz et al. [1996]; and Gray and Kar-
sai [2003], but their quantitative valida-
tion in general as well as in particular
cases, is hard and an important open prob-
lem. Therefore, the treatment of DSL de-
velopment in this article will be largely
qualitative.
The use of DSLs is by no means new.
APT, a DSL for programming numerically-
controlled
on the right. (The
language level measure of Jones [1996] is
one attempt to quantify this scale.) On this
scale, Cobol would be somewhere between
BNF and C
++
but much closer to the lat-
ter. Similarly, it is hard to tell if command
languages like the Unix shell or script-
ing languages like Tcl are DSLs. Clearly,
domain-specificity is a matter of degree.
In combination with an application li-
brary , any GPL can act as a DSL. The
library’s Application Programmers Inter-
face (API) constitutes a domain-specific
vocabulary of class, method, and function
names that becomes available by object
creation and method invocation to any
GPL program using the library. This be-
ing the case, why were DSLs developed in
the first place? Simply because they can
offer domain-specificity in better ways.
++
—Appropriate or established domain-
specific notations are usually beyond the
limited user-definable operator notation
offered by GPLs. A DSL offers appropri-
ate domain-specific notations from the
start. Their importance should not be
underestimated as they are directly re-
lated to the productivity improvement
associated with the use of DSLs.
—Appropriate domain-specific constructs
and abstractions cannot always be map-
ped in a straightforward way to func-
tions or objects that can be put in a
library. Traversals and error handling
are typical examples [Bonachea et al.
1999; Gray and Karsai 2003; Bruntink
machine
tools,
was
devel-
oped
in
1957–1958
[Ross
1981].
BNF,
ACM Computing Surveys, Vol. 37, No. 4, December 2005.
318
M. Mernik et al.
Table I. Some Widely Used Domain-Specific Languages
DSL
Application Domain
Level
BNF
Syntax specification
n.a.
Excel
Spreadsheets
57
(version 5)
HTML
Hypertext web pages
22
(version 3.0)
L A T E X
Typesetting
n.a.
Make
Software building
15
MATLAB
Technical computing
n.a.
SQL
Database queries
25
VHDL
Hardware design
17
Java
General-purpose
6
(comparison only)
Table II. Language Level vs. Productivity
as Measured in Function Points (FP)
Productivity Average
cost effective solution in many cases, the
more so since the advent of component
technologies such as COM and CORBA
[Szyperski 2002] has further complicated
the relative merits of DSLs and appli-
cation libraries. For instance, Microsoft
Excel’s macro language is a DSL for
spreadsheet applications which adds pro-
grammability to Excel’s fundamental in-
teractive mode. Using COM, Excel’s imple-
mentation has been restructured into an
application library of COM components,
thereby opening it up to GPLs such as
C
Level
per Staff Month (FP)
1–3
5–10
4–8
10–20
9–15
16–23
16–23
15–30
24–55
30–50
>
55
40–100
et al. 2005]. A GPL in combination with
an application library can only express
these constructs indirectly or in an awk-
ward way. Again, a DSL would incorpo-
rate domain-specific constructs from the
start.
—Use of a DSL offers possibilities for anal-
ysis , verification , optimization , paral-
lelization , and transformation in terms
of DSL constructs that would be much
harder or unfeasible if a GPL had been
used because the GPL source code pat-
terns involved are too complex or not
well defined.
—Unlike GPLs, DSLs need not be exe-
cutable . There is no agreement on this
in the DSL literature. For instance, the
importance of nonexecutable DSLs is
emphasized in Wile [2001], but DSLs
are required to be executable in van
Deursen et al. [2000]. We discuss DSL
executability in Section 1.2.
, Java, and Basic which can access
it through its COM interfaces. This pro-
cess of componentization is called automa-
tion [Chappell 1996]. Unlike the Excel
macro language, which by its very nature
is limited to Excel functionality, GPLs are
not. They can be used to write applica-
tions transcending Excel’s boundaries by
using components from other automated
programs and COM libraries in addition
to components from Excel itself.
In the remainder of this section, we dis-
cuss DSL executability (Section 1.2), DSLs
as enablers of reuse (Section 1.3), the scope
of this article (Section 1.4), and DSL liter-
ature (Section 1.5).
++
1.2. Executability of DSLs
DSLs are executable in various ways and
to various degrees even to the point of
being nonexecutable. Accordingly, depend-
ing on the character of the DSL in ques-
tion, the corresponding programs are often
more properly called specifications, defi-
nitions, or descriptions. We identify some
points on the DSL executability scale.
—DSL with well-defined execution seman-
tics (e.g., Excel macro language, HTML).
Despite their shortcomings, application
libraries are formidable competitors to
DSLs. It is probably fair to say that
most DSLs never get beyond the ap-
plication library stage. These are some-
times called domain-specific embedded
languages (DSELs) [Hudak 1996]. Even
with improved DSL development tools, ap-
plication libraries will remain the most
ACM Computing Surveys, Vol. 37, No. 4, December 2005.
942504021.002.png 942504021.003.png 942504021.004.png 942504021.005.png 942504021.001.png
When and How to Develop Domain-Specific Languages
319
—Input language of an application
generator [Cleaveland 1988; Smarag-
dakis and Batory 2000]. Examples are
ATMOL [van Engelen 2001], a DSL for
atmospheric modeling, and Hancock
[Bonachea et al. 1999], a DSL for
customer profiling. Such languages are
also executable, but they usually have
a more declarative character and less
well-defined execution semantics as far
as the details of the generated appli-
cations are concerned. The application
generator is a compiler for the DSL in
question.
—DSL not primarily meant to be exe-
cutable but nevertheless useful for ap-
plication generation. The syntax specifi-
cation formalism BNF is an example of a
DSL with a purely declarative character
that can also act as an input language
for a parser generator.
—DSL not meant to be executable. Exam-
ples are domain-specific data structure
representations [Wile 2001]. Just like
their executable relatives, such nonex-
ecutable DSLs may benefit from vari-
ous kinds of tool support such as special-
ized editors, prettyprinters, consistency
checkers, analyzers, and visualizers.
semantic notions embodied in the DSL
without having to perform a detailed do-
main analysis themselves. Examples in-
clude BDL [Bertrand and Augeraud 1999]
that generates software to control concur-
rent objects and Teapot [Chandra et al.
1999] that produces implementations of
cache coherence protocols. Krueger iden-
tifies definition of domain coverage and
concepts as a difficult challenge for im-
plementors of application generators. We
identify patterns for domain analysis in
this article.
DSLs also play a role in other reuse cat-
egories identified by Krueger [1992]. For
example, software architectures are com-
monly reused when DSLs are employed
because the application generator or com-
piler follows a standard design when pro-
ducing code from a DSL input. For exam-
ple, GAL [Thibault et al. 1999] enables
reuse of a standard architecture for video
device drivers. DSLs implemented as ap-
plication libraries clearly enable reuse
of source code. Prominent examples are
Haskell-based DSLs such as Fran [Elliott
1999]. DSLs can also be used for for-
mal specification of software schemas.
For example, Nowra [Sloane 2002] speci-
fies software manufacturing processes and
SSC [Buffenbarger and Gruell 2001] deals
with subsystem composition.
Reuse may involve exploitation of an
existing language grammar. For example,
Hancock [Bonachea et al. 1999] piggy-
backs on C, while SWUL [Bravenboer and
Visser 2004] extends Java. Moreover, the
success of XML for DSLs is largely based
on reuse of its grammar for specific do-
mains. Less formal language grammars
may also be reused when notations used
by domain experts, but not yet available
in a computer language, are realized in
a DSL. For example, Hawk [Launchbury
et al. 1999] uses a textual form of an ex-
isting visual notation.
1.3. DSLs as Enablers of Reuse
The importance of DSLs can also be appre-
ciated from the wider perspective of the
construction of large software systems. In
this context, the primary contribution of
DSLs is to enable reuse of software arti-
facts [Biggerstaff 1998]. Among the types
of artifacts that can be reused via DSLs
are language grammars, source code, soft-
ware designs, and domain abstractions.
Later sections provide many examples of
DSLs; here we mention a few from the per-
spective of reuse.
In his definitive survey of reuse Krueger
[1992] categorizes reuse approaches along
the following dimensions: abstracting, se-
lecting, specializing, and integrating. In
particular, he identifies application gener-
ators as an important reuse category. As
already noted, application generators of-
ten use a DSL as their input language,
thereby enabling programmers to reuse
1.4. Scope of This Article
There are no easy answers to the “when
and how” question in the title of this arti-
cle. The previously mentioned benefits of
DSLs do not come free.
ACM Computing Surveys, Vol. 37, No. 4, December 2005.
320
M. Mernik et al.
—DSL development is hard, requiring
both domain and language development
expertise. Few people have both.
—DSL development techniques are more
varied than those for GPLs, requiring
careful consideration of the factors in-
volved.
—Depending on the size of the user com-
munity, development of training mate-
rial, language support, standardization,
and maintenance may become serious
and time-consuming issues.
Biggerstaff and Perlis [1989], a two-
volume collection of articles on software
reuse including DSL development and
program generation; Nardi [1993], focuses
on the role of DSLs in end-user program-
ming; Salus [1998], a collection of articles
on little languages (not all of them DSLs);
and Barron [2000], which treats scripting
languages (again, not all of them DSLs).
Domain analysis, program generators,
generative programming techniques, and
intentional programming (IP) are treated
in Czarnecki and Eisenecker [2000].
Domain analysis and the use of XML,
DOM, XSLT, and related languages and
tools to generate programs are discussed
in Cleaveland [2001]. Domain-specific
language development is an important
element of the software factories method
[Greenfield et al. 2004].
Proceedings of recent workshops and
conferences partly or exclusively devoted
to DSLs are Kamin [1997]; USENIX
[1997, 1999]; HICSS [2001, 2002, 2003];
Lengauer et al. [2004]. Several journals
have published special issues on DSLs
[Wile and Ramming 1999; Mernik and
L ammel 2001, 2002]. Many of the DSLs
used as examples in this article were
taken from these sources. A special is-
sue on end-user development is the sub-
ject of Sutcliffe and Mehandjiev [2004]. A
special issue on program generation, opti-
mization, and platform adaptation is au-
thored by Moura et al. [2005]. There are
many workshops and conferences at least
partly devoted to DSLs for a particular do-
main, for example, description of features
of telecommunications and other software
systems [Gilmore and Ryan 2001]. The an-
notated DSL bibliography [van Deursen
et al. 2000] (78 items) has limited overlap
with the references in this article because
of our emphasis on general DSL develop-
ment issues.
These are not the only factors complicat-
ing the decision to develop a new DSL. Ini-
tially, it is often far from evident that a
DSL might be useful or that developing a
new one might be worthwhile. This may
become clear only after a sizable invest-
ment in domain-specific software develop-
ment using a GPL has been made. The
concepts underlying a suitable DSL may
emerge only after a lot of GPL program-
ming has been done. In such cases, DSL
development may be a key step in software
reengineering or software evolution [Ben-
nett and Rajlich 2000].
To aid the DSL developer, we provide a
systematic survey of the many factors in-
volved by identifying patterns in the de-
cision, analysis, design, and implementa-
tion phases of DSL development (Section
2). Our patterns improve and extend ear-
lier work on DSL design patterns, in par-
ticular [Spinellis 2001]. This is discussed
in Section 2.6. The DSL development pro-
cess can be facilitated by using domain
analysis tools and language development
systems. These are surveyed in Section
3. Finally, conclusions and open problems
are presented in Section 4.
1.5. Literature
We give some general pointers to the DSL
literature; more specific references are
given at appropriate points throughout
this article rather than in this section.
Until recently, DSLs received relatively
little attention in the computer science
research community, and there are few
books on the subject. We mention Martin
[1985], an exhaustive account of 4GLs;
2. DSL PATTERNS
2.1. Pattern classification
The following are DSL development
phases: decision , analysis , design , imple-
mentation , and deployment . In practice,
ACM Computing Surveys, Vol. 37, No. 4, December 2005.
Zgłoś jeśli naruszono regulamin