MixedLanguageProgramming.pdf

Mixed Language Programming

Mix

ed Language Programming

Chapter Twelve

12.1

Chapter Overview

Most assembly language code doesn’

t appear in a stand-alone assembly language program. Instead,

most assembly code is actually part of a library package that programs written in a high le

el language wind

up calling.

Although HLA mak

es it really easy to write standalone assembly applications, at one point or

another you’

ll probably w

ant to call an HLA procedure from some code written in another language or you

may w

ant to call code written in another language from HLA.

This chapter discusses the mechanisms for

doing this in three languages: lo

w-le

el assembly (i.e., MASM or Gas), C/C++, and Delphi/K

ylix.

The

mechanisms for other languages are usually similar to one of these three, so the material in this chapter will

still apply e

en if you’

re using some other high le

el language.

12.2

Mixing HLA and MASM/Gas Code in the Same Program

It may seem kind of weird to mix MASM or Gas and HLA code in the same program.

After all, the

y’

both assembly languages and almost an

ything you can do with MASM or Gas can be done in HLA. So wh

bother trying to mix the tw

o in the same program?

ell, there are three reasons:

You’ve already got a lot of code written in MASM or Gas and you don’t want to convert it to

HLA’s syntax.

• There are a few things MASM and Gas do that HLA cannot, and you happen to need to do one

of those things.

• Someone else has written some MASM or Gas code and they want to be able to call code

you’ve written using HLA.

In this section, we’ll discuss two ways to merge MASM/Gas and HLA code in the same program: via in-line

assembly code and through linking object ﬁles.

12.2.1

In-Line (MASM/Gas) Assembly Code in Your HLA Programs

As you’

re probably a

are, the HLA compiler doesn’

t actually produce machine code directly from your

HLA source ﬁ

les. Instead, it ﬁ

rst compiles the code to a MASM or Gas-compatible assembly language

source ﬁ

le and then it calls MASM or Gas to assemble this code to object code. If you’

re interested in seeing

the MASM or Gas output HLA produces, just edit the

ﬁ

lename

.ASM ﬁ

le that HLA creates after compiling

your

ﬁ

lename

.HLA source ﬁ

le.

The output assembly ﬁ

le isn’

t amazingly readable, b

ut it is f

airly easy to cor

relate the assembly output with the HLA source ﬁ

le.

HLA pro

vides tw

o mechanisms that let you inject ra

w MASM or Gas code directly into the output ﬁ

le it

produces: the

#ASM..

#END

ASM sequence and the

#EMIT statement.

The #ASM..#END

ASM sequence

copies all te

xt between these tw

o clauses directly to the assembly output ﬁ

le, e.g.,

#asm

mov eax, 0 ;MASM/Gas syntax for MOV( 0, EAX );

add eax, ebx ; “ “ “ ADD( ebx, eax );

#endasm

ASM sequence is how you inject in-line (MASM or Gas) assembly code into your HLA

programs. For the most port there is very little need to use this feature, but in a few instances it is valuable.

Beta Draft - Do not distribute

Page

1151

•

The #ASM..#END

Chapter Twelve

Volume Four

es the “.intel_syntax” diretive, so you should use Intel syntax when

supplying Gas code between #asm and #endasm.

For example, if you’re writing structured exception handling code under Windows, you’ll need to access

the double word at address FS:[0] (offset zero in the segment pointed at by the 80x86’s FS segment register).

Unfortunately, HLA does not support segmentation and the use of segment registers. However, you can drop

into MASM for a statement or two in order to access this value:

#asm

mov ebx, fs:[0] ; Loads process pointer into EBX

#endasm

At the end of this instruction sequence, EBX will contain the pointer to the process information structure

that Windows maintains.

HLA blindly copies all text between the #ASM and #ENDASM clauses directly to the assembly output

ﬁle. HLA does not check the syntax of this code or otherwise verify its correctness. If you introduce an

error within this section of your program, the assembler will report the error when HLA assembles your

code by calling MASM or Gas.

The #EMIT statement also writes text directly to the assembly output ﬁle. However, this statement does

not simply copy the text from your source ﬁle to the output ﬁle; instead, this statement copies the value of a

string (constant) expression to the output ﬁle. The syntax for this statement is as follows:

#emit( string_expression );

valuates the expression and veriﬁes that it’s a string expression. Then it copies the string

data to the output ﬁle. Like the #ASM/#ENDASM statement, the #EMIT statement does not check the syn-

tax of the MASM statement it writes to the assembly ﬁle. If there is a syntax error, MASM or Gas will catch

it later on when HLA assembles the output ﬁle.

When HLA compiles your programs into assembly language, it does not use the same symbols in the

assembly language output ﬁle that you use in the HLA source ﬁles. There are several technical reasons for

this, but the bottom line is this: you cannot easily reference your HLA identiﬁers in your in-line assembly

code. The only exception to this rule are external identiﬁers. HLA external identiﬁers use the same name in

the assembly ﬁle as in the HLA source ﬁle. Therefore, you can refer to external objects within your in-line

assembly sequences or in the strings you output via #EMIT.

One advantage of the #EMIT statement is that it lets you construct MASM or Gas statements under

(compile-time) program control. You can write an HLA compile-time program that generates a sequence of

strings and emits them to the assembly ﬁle via the #EMIT statement. The compile-time program has access

to the HLA symbol table; this means that you can extract the identiﬁers that HLA emits to the assembly ﬁle

and use these directly, even if they aren’t external objects.

The @StaticName compile-time function returns the name that HLA uses to refer to most static objects

in your program. The following program demonstrates a simple use of this compile-time function to obtain

the assembly name of an HLA procedure:

program emitDemo;

#include( “stdlib.hhf” )

procedure myProc;

begin myProc;

stdout.put( “Inside MyProc” nl );

end myProc;

begin emitDemo;

?stmt:string := “call “ + @StaticName( myProc );

Page

1152

Beta Draft - Do not distribute

Note, when using Gas, that HLA speciﬁ

This statement e

Mixed Language Programming

#emit( stmt );

end emitDemo;

Program 12.1

Using the @StaticName Function

This e

xample creates a string v

alue (

stmt

) that contains something lik

e “call ?741_myProc” and emits

this assembly instruction directly to the source ﬁ

le (“?741_myProc” is typical of the type of name mangling

that HLA does to static names it writes to the output ﬁ

le). If you compile and run this program, it should dis

play “Inside MyProc” and then quit. If you look at the assembly ﬁ

le that HLA emits, you will see that it has

en the

myPr

procedure the same name it appends to the CALL instruction

The @StaticName function is only v

alid for static symbols.

This includes ST

TIC, READONL

, and

ORA

GE v

ariables, procedures, and iterators. It does not include

AR objects, constants, macros, class

iterators, or methods.

ou can access

AR v

ariables by using the [EBP+of

fset] addressing mode, specifying the of

fset of the

desired local v

ariable.

ou can use the

@of

fset compile-time function to obtain the of

fset of a

AR object or

a parameter

The follo

wing program demonstrates ho

w to do this:

program offsetDemo;

#include( “stdlib.hhf” )

var

i:int32;

begin offsetDemo;

mov( -255, i );

?stmt := “mov eax, [ebp+(“ + string( @offset( i )) + “)]”;

#print( “Emitting ‘”, stmt, “‘” )

#emit( stmt );

stdout.put( “eax = “, (type int32 eax), nl );

end offsetDemo;

Program 12.2 Using the @Offset Compile-Time Function

This example emits the statement “mov eax, [ebp+(-8)]” to the assembly language source ﬁle. It turns out

that -8 is the offset of the i variable in the offsetDemo program’s activation record.

Of course, the examples of #EMIT up to this point have been somewhat ridiculous since you can

achieve the same results by using HLA statements. One very useful purpose for the #emit statement, how-

ever, is to create some instructions that HLA does not support. For example, as of this writing HLA does not

support the LES instruction because you can’t really use it under most 32-bit operating systems. However, if

1. HLA may assign a different name that “?741_myProc” when you compile the program. The exact symbol HLA chooses

varies from version to version of the assembler (it depends on the number of symbols deﬁned prior to the deﬁnition of

myProc . In this example, there were 741 static symbols deﬁned in the HLA Standard Library before the deﬁnition of

myProc .

Beta Draft - Do not distribute

Page

1153

Chapter Twelve

Volume Four

you found a need for this instruction, you could easily write a macro to emit this instruction and appropriate

operands to the assembly source ﬁle. Using the #EMIT statement gives you the ability to reference HLA

objects, something you cannot do with the #ASM..#ENDASM sequence.

12.2.2 Linking MASM/Gas-Assembled Modules with HLA Modules

Although you can do some interesting things with HLA’s in-line assembly statements, you’ll probably

never use them. Further, future versions of HLA may not even support these statements, so you should avoid

them as much as possible even if you see a need for them. Of course, HLA does most of the stuff you’d want

to do with the #ASM/#ENDASM and #EMIT statements anyway, so there is very little reason to use them at

all. If you’re going to combine MASM/Gas (or other assembler) code and HLA code together in a program,

most of the time this will occur because you’ve got a module or library routine written in some other assem-

bly language and you would like to take advantage of that code in your HLA programs. Rather than convert

the other assembler’s code to HLA, the easy solution is to simply assemble that other code to an object ﬁle

and link it with your HLA programs.

Once you’ve compiled or assembled a source ﬁle to an object ﬁle, the routines in that module are call-

able from almost any machine code that can handle the routines’ calling sequences. If you have an object

ﬁle that contains a SQRT function, for example, it doesn’t matter whether you compiled that function with

HLA, MASM, TASM, NASM, Gas, or even a high level language; if it’s object code and it exports the

proper symbols, you can call it from your HLA program.

Compiling a module in MASM or Gas and linking that with your HLA program is little different than

linking other HLA modules with your main HLA program. In the assembly source ﬁle you will have to

export some symbols (using the PUBLIC directive in MASM or the .GLOBAL directive in Gas) and in your

HLA program you’ve got to tell HLA that those symbols appear in a separate module (using the EXTER-

NAL option).

Since the two modules are written in assembly language, there is very little language imposed structure

on the calling sequence and parameter passing mechanisms. If you’re calling a function written in MASM

or Gas from your HLA program, then all you’ve got to do is to make sure that your HLA program passes

parameters in the same locations where the MASM/Gas function is expecting them.

About the only issue you’ve got to deal with is the case of identiﬁers in the two programs. By default,

MASM and Gas are case insensitive. HLA, on the other hand, enforces case neutrality (which, essentially,

means that it is case sensitive). If you’re using MASM, there is a MASM command line option (“/Cp”) that

tells MASM to preserve case in all public symbols. It’s a real good idea to use this option when assembling

modules you’re going to link with HLA so that MASM doesn’t mess with the case of your identiﬁers during

assembly.

Of course, since MASM and Gas process symbols in a case sensitive manner, it’s possible to create two

separate identiﬁers that are the same except for alphabetic case. HLA enforces case neutrality so it won’t let

you (directly) create two different identiﬁers that differ only in case. In general, this is such a bad program-

ming practice that one would hope you never encounter it (and God forbid you actually do this yourself).

However, if you inherit some MASM or Gas code written by a C hacker, it’s quite possible the code uses this

technique. The way around this problem is to use two separate identiﬁers in your HLA program and use the

extended form of the EXTERNAL directive to provide the external names. For example, suppose that in

MASM you have the following declarations:

public AVariable

public avariable

.data

AVariable dword ?

avariable byte ?

Page 1154

Beta Draft - Do not distribute

Mixed Language Programming

If you assemble this code with the “/Cp” or “/Cx” (total case sensitivity) command line options, MASM will

emit these two external symbols for use by other modules. Of course, were you to attempt to deﬁne vari-

ables by these two names in an HLA program, HLA would complain about a duplicate symbol deﬁnition.

However, you can connect two different HLA variables to these two identiﬁers using code like the following:

static

AVariable: dword; external( “AVariable” );

AnotherVar: byte; external( “avariable” );

HLA does not check the strings you supply as parameters to the EXTERNAL clause. Therefore, you

can supply two names that are the same except for case and HLA will not complain. Note that when HLA

calls MASM to assemble it’s output ﬁle, HLA speciﬁes the “/Cp” option that tells MASM to preserve case in

public and global symbols. Of course, you would use this same technique in Gas if the Gas programmer has

exported two symbols that are identical except for case.

The following program demonstrates how to call a MASM subroutine from an HLA main program:

// To compile this module and the attendant MASM file, use the following

// command line:

// ml -c masmupper.masm

// hla masmdemo1.hla masmupper.obj

// Sorry about no make file for this code, but these two files are in

// the HLA Vol4/Ch12 subdirectory that has it’s own makefile for building

// all the source files in the directory and I wanted to avoid confusion.

program MasmDemo1;

#include( “stdlib.hhf” )

// The following external declaration defines a function that

// is written in MASM to convert the character in AL from

// lower case to upper case.

procedure masmUpperCase( c:char in al ); external( “masmUpperCase” );

static

s: string := “Hello World!”;

begin MasmDemo1;

stdout.put( “String converted to uppercase: ‘” );

mov( s, edi );

while( mov( [edi], al ) <> #0 ) do

masmUpperCase( al );

stdout.putc( al );

inc( edi );

endwhile;

stdout.put( “‘” nl );

end MasmDemo1;

Beta Draft - Do not distribute

Page 1155

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: