deep

a Cross Development Platform for Java

User Tools

Site Tools


dev:crosscompiler:strings

This is an old revision of the document!


Implementation of Strings

In Java a string like

String str = new String("abc");

causes the allocation of two objects. Strings in Java are immutable. With the field offset it is possible to form substrings in the constant string.

A string in Java

Creating two objects and managing them leads to unfavorable behaviour in terms of memory and runtime. A more elegant solution would create a single object. What to do?

  1. Change of the class file and replacement of the class String (relatively costly)
  2. Change of code generator

We choose the second alternative. When a new is encountered we must check if a string is to be created. If this is the case a special string allocation method is to be called, see Objects. The translation of the SSA instructions sCloadFromArray and sCstoreToArray nedd to be adapted as well.

Optimized String

The creation and manipulation of strings is done solely in java/lang/String. This class extends BString (this stands for Base String). BString has the single field count, which is the lenght of the string. This ensures that the field count will reside at index 0. String has another field value of type char[]. Instead of containing a reference to a character array, this field contains the characters directly.

Optimized String

The field size holds the number of bytes of the whole object. This is analogous to other objects (see Heap Manager and Garbage Collection). It is used for an efficient sweep phase.

Erzeugung

The Bytecode for the creation of a string is

new java/lang/String
invokespecial constructor

new must not yet be executed, because the size of the character array cannot be fetched from the type descriptor. The character array will be inserted directly into the object and this size varies from string to string. The code generator has to check if new creates something of type String. If true, nothing will be done except putting the reference to the type descriptor of String into the result register of new.
The next Bytecode instruction is invokespecial. In our case, it must not call the constructor, because the object does not yet exist. The linker will insert a call to a special factory method with similar parameters.

public static String allocateString(int ref, char value[]) {
	int len = value.length;
	String str = newstring(ref, len * 2);
	str.count = len;
	for (int i = 0; i < len; i++) str.value[i] = value[i];
	return str;
}

Foremost, the necessary size must be calculated and the object must be allocated on the heap. The linker must replace the method address of the stub newstring with the method newstring in the class Heap. newstring does:

  • set the tag of the string object to the type descriptor of String (similar to newobject).
  • create the necessary number of array fields (similar to newarray).
  • set the field count.
  • call the constructor of the super class. This call has to be inserted in the code generator.
  • initialize the string.

Finally, the reference to the newly created string has to be put into the result register of the sCnew instruction.

Feldzugriffe

Sobald auf Felder des Stringarrays zugegriffen werden soll, muss das im Codegenerator speziell berücksichtigt werden. Ein solcher Zugriff hat in der SSA die Form

1: MonadicRef[sCloadFromField] {0} <value([C)> (Char-Array)
2: NoOpnd[sCloadLocal] (Integer)
3: Dyadic[sCloadFromArray] {1, 2} (Char)

Zuerst wird also das Feld value geladen. value ist eine Referenz. Anschliessend wird über die Referenz und ein Index auf das Array zugegriffen. Im Codegenerator muss bei einer Instruktion sCloadFromField geprüft werden, ob auf das Feld String.value zugegriffen wird. Falls ja, wird dieser Zugriff weggelassen und der Offset von value gemerkt. Bei der darauffolgenden Instruktion sCloadFromArray muss dieser Offset zum Index dazuaddiert werden. Ebenfalls darf als Arrayreferenz nicht das value dienen, sondern die Referenz auf den String.
Analog muss auch das Schreiben auf die Stringzeichen angepasst werden. Hier gibt es aber noch eine Besonderheit. Weil Strings immutable sind, werden nur beim Initialisieren in den Fabrikmethoden die Arrayfelder geschrieben. Darum kann die Prüfung auf Arrayüberschreitung weggelassen werden.

Constant String

Constant strings are stored in the constant block of a class, see Linker32. The layout there must be identical to the layout on the heap as given above.

Special Linking

The type descriptor of the class String must have its field nofInstPtr set to 0. Though there is a field char[] values we access this field in an omtimized way as decribed above. The garbage collector must not follow this field!

dev/crosscompiler/strings.1428329647.txt.gz · Last modified: 2016/02/25 13:33 (external edit)