Friday, June 24, 2011

Converting a Collection to an array

If you have been using Collections in Java 5 you probably have stumbled upon a problem of converting a given Collection<T> into an array of type T. In the Collection interface, there is a method called toArray() which returns an array of type Object[]. But then, if since Java 5 Collections are using generics, shouldn’t it be possible to get an array T[] of the generic type we used in the declaration of our Collection<T>? Well, there is a method T[] toArray(T[] a). But what is that strange “T[] a” doing there? Why is there no simple method T[] toArray() without any strange parameters? It seams like it shouldn’t be a big problem to implement such a method… but actually it is not possible to do so in Java 5, and we will show here why.
Let’s start from trying to implement that method by ourselves. As an example, we will extend the ArrayList class and implement there a method T[] toArrayT(). The method simply uses the standard Object[] toArray() method and casts the result to T[].

import java.util.*;

/**
 * Implementation of the List interface with "T[] toArrayT()" method.
 * In a normal situation you should never extend ArrayList like this !
 * We are doing it here only for the sake of simplicity.
 * @param <T> Type of stored data
 */
class HasToArrayT<T> extends ArrayList<T> {
    public T[] toArrayT() {
        return (T[]) toArray();
    }
}

public class Main {
    public static void main(String[] args) {
        // We create an instance of our class and add an Integer
        HasToArrayT<Integer> list = new HasToArrayT<Integer>();
        list.add(new Integer(4));

        // We have no problems using the Object[] toArray method
        Object[] array = list.toArray();
        System.out.println(array[0]);

        // Here we try to use our new method... but fail on runtime
        Integer[] arrayT = list.toArrayT(); // Exception is thrown :
        // "Ljava.lang.Object; cannot be cast to [Ljava.lang.Integer"
        System.out.println(arrayT[0]);
    }
}

If you compile it and run, it will throw an exception when trying to use our new method. But why? Didn’t we do everything right? Objects stored in the array ARE Integers and toArray method returns an array of those objects, so what is the problem?

Java compiler gives us a hint to understand what is wrong. During compilation, you should see a warning at line 12, saying something like “Type safety: Unchecked cast from Object[] to T[]“. Let’s follow this trace and check if the array that comes out of our toArrayT method is really of type Integer[]. Put this code somewhere in main() method before the Integer[] arrayT = list.toArrayT() line :

System.out.println("toArrayT() returns Integer[] : "
    + (list.toArrayT() instanceof Integer[]));

When you run the program, you should get “false”, which means that toArrayT() DID NOT return Integer[]. Why? Well, many of you probably know the answer – it is Type Erasure. Basically, during compilation the compiler removes every information about the generic types. Effectively, when the program runs, all the appearances of T in our HasToArrayT class are nothing more than simple Object’s. It means, that the cast in the function toArrayT does not cast to Integer[] any more, even if we are using Integer in the declaration of the generic type. Actually, there is also a second problem here, coming from the way Java handles casting of arrays. This time however we would like to concentrate just on the first one – usage of generics.
So, if the information that T is an Integer is no longer there at runtime, is there any way to really return an array of type Integer[] ? Yes, there is, and it is exactly what the method T[] toArray(T[] a) in the Collections framework is doing. But the price you have to pay for it is the additional argument a, whose instance you have to create first by yourself. How do they use it there? Let’s look at how it is really implemented in, for example, ArrayList :

public <T> T[] toArray(T[] a) {
    if (a.length < size)
        // Make a new array of a's runtime type, but my contents:
        return (T[]) Arrays.copyOf(elementData, size, a.getClass());
    System.arraycopy(elementData, 0, a, 0, size);
    if (a.length > size)
        a[size] = null;
    return a;
}

As it is explained in the method’s Javadoc, if the specified array a is big enough it just copies elements into this array. If not, it creates a new array. What is important for us, is that when creating a new array it uses Arrays.copyOf method, specifying there a.getClass() as the type of array to create. So that’s where it needs that additional argument – to know what type of array to create. But then you might say, would it not be easier to fit there T.class instead of a.getClass()? No. Because of type erasure, T is no longer the type we specified. Moreover, we would get a compile time error if we wrote T.class! Some of you may be thinking about creating a new array the simple way – new T[], but it is also not possible because of the same reason. We have already discussed how to get Generic Arrays in Java here.
To put it straight, in order to return a new array of T, the type that we really specified, we MUST give something of that type to the function toArray. There is no other way the function could now what type of array it should create, because the generic type T supplied when creating that Collection is erased every time during compilation.
One other way to implement such a method is by using the Class<T[]> type as an input parameter. This way, instead of having to create a new instance a of array T[] and then using it in the function toArray(a), we could use the .class field – toArrayT(TYPE[].class) :

public T[] toArrayT(Class<T[]> c) {
        return (T[]) Arrays.copyOf(toArray(), size(), c);
 }

It still takes an argument, but it has two advantages over the T[] toArray(T[] a) function. First, it does not require you to create an instance of an array just to pass it to the function. Second, it is simpler. Of course, it has still the disadvantage of having to pass some argument.

After all, there is no perfect way to perform such a conversion. The best solution would be not to use arrays at all. Code with arrays is difficult to maintain and invites people to make mistakes that often come out not earlier than on runtime. Using collections is generally much preferred and if you use them all the time, you will not be forced to deal with the toArray method.

No comments:

Post a Comment

Chitika