deep

a Cross Development Platform for Java

User Tools

Site Tools


dev:crosscompiler:strings

Implementation of Strings

In Java a string like

String str = new String("abc");

causes the allocation of two objects. Strings in Java are immutable. With the field offset it is possible to form substrings in the constant string.

A string in Java

Creating two objects and managing them leads to unfavorable behaviour in terms of memory and runtime. A more elegant solution would create a single object. What to do?

  1. Change of the class file and replacement of the class String (relatively costly)
  2. Change of code generator

We choose the second alternative. When a new is encountered we must check if a string is to be created. If this is the case a special string allocation method is to be called, see Objects. The translation of the SSA instructions sCloadFromArray and sCstoreToArray nedd to be adapted as well.

Optimized String

The creation and manipulation of strings is done solely in java/lang/String. This class extends BString (this stands for Base String). BString has the single field count, which is the lenght of the string. This ensures that the field count will reside at index 0. String has another field value of type char[]. Instead of containing a reference to a character array, this field contains the characters directly.

Optimized String

The field size holds the number of bytes of the whole object. This is analogous to other objects (see Heap Manager and Garbage Collection). It is used for an efficient sweep phase.

Creation

The Bytecode for the creation of a string is

new java/lang/String
invokespecial constructor

new must not yet be executed, because the size of the character array cannot be fetched from the type descriptor. The character array will be inserted directly into the object and this size varies from string to string. The code generator has to check if new creates something of type String. If true, nothing will be done except putting the reference to the type descriptor of String into the result register of new.
The next Bytecode instruction is invokespecial. In our case, it must not call the constructor, because the object does not yet exist. The linker will insert a call to a special factory method with similar parameters.

public static String allocateString(int ref, char value[]) {
	int len = value.length;
	String str = newstring(ref, len * 2);
	str.count = len;
	for (int i = 0; i < len; i++) str.value[i] = value[i];
	return str;
}

Foremost, the necessary size must be calculated and the object must be allocated on the heap. The linker must replace the method address of the stub newstring with the method newstring in the class Heap. newstring does:

  • set the tag of the string object to the type descriptor of String (similar to newobject).
  • create the necessary number of array fields (similar to newarray).
  • set the field count.
  • call the constructor of the super class. This call has to be inserted in the code generator.
  • initialize the string.

Finally, the reference to the newly created string has to be put into the result register of the sCnew instruction.

Access to Fields

When accessing fields of a string, the code generator has to modify the code. Such an access shows up in the SSA as follows

1: MonadicRef[sCloadFromField] {0} <value([C)> (Char-Array)
2: NoOpnd[sCloadLocal] (Integer)
3: Dyadic[sCloadFromArray] {1, 2} (Char)

The field value is loaded. value is a reference. With this reference and an index into the array the proper character can be accessed. The code generator has to check for the instruction sCloadFromField if the field is String.value. If true, the fetching is ommited and the offset of value is stored. With the subsequent instruction sCloadFromArray, the offset must be added to the index. At the same time one cannot use value as the array reference but must use the reference to the string.
Similarly, writing to a string must be customized. However, strings in java are immutable. Writing to strings happens only in the factory methods during initialization. Checking for array index out of bounds may be ommited as well.

Constant String

Constant strings are stored in the constant block of a class, see Linker32. The layout there must be identical to the layout on the heap as given above.

Special Linking

The type descriptor of the class String must have its field nofInstPtr set to 0. Though there is a field char[] values we access this field in an omtimized way as decribed above. The garbage collector must not follow this field!

dev/crosscompiler/strings.txt · Last modified: 2016/02/25 13:33 by 127.0.0.1