且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

有没有办法在没有初始化的情况下创建原始数组?

更新时间:2023-02-25 18:57:18

我已经做了一些调查.在 Java 中没有合法的方法来创建未初始化的数组.甚至 JNI NewXxxArray 也会创建初始化数组.所以不可能确切地知道数组归零的成本.尽管如此,我还是做了一些测量:

I've done some investigation. There is no legal way to create uninitialized array in Java. Even JNI NewXxxArray creates initialized arrays. So it is impossible to know exactly the cost of array zeroing. Nevertheless I've done some measurements:

1) 使用不同的数组大小创建 1000 字节的数组

1) 1000 byte arrays creation with different array size

        long t0 = System.currentTimeMillis();
        for(int i = 0; i < 1000; i++) {
//          byte[] a1 = new byte[1];
            byte[] a1 = new byte[1000000];
        }
        System.out.println(System.currentTimeMillis() - t0);

在我的电脑上它给出 <byte[1] 为 1ms,byte[1000000] 为 ~500 ms.听起来让我印象深刻.

on my PC it gives < 1ms for byte[1] and ~500 ms for byte[1000000]. Sounds impressive to me.

2) 我们在 JDK 中没有快速(本机)方法来填充数组,Arrays.fill 太慢了,所以让我们看看使用本机 System.arraycopy 至少 1,000,000 大小的数组的 1000 次复制需要多少

2) We don't have a fast (native) method in JDK for filling arrays, Arrays.fill is too slow, so let's see at least how much 1000 copying of 1,000,000 size array takes with native System.arraycopy

    byte[] a1 = new byte[1000000];
    byte[] a2 = new byte[1000000];
    for(int i = 0; i < 1000; i++) {
        System.arraycopy(a1, 0, a2, 0, 1000000);
    }

是 700 毫秒.

这让我有理由相信 a) 创建长数组很昂贵 b) 由于无用的初始化,它似乎很昂贵.

It gives me reasons to believe that a) creating long arrays is expensive b) it seems to be expensive because of useless initialization.

3) 让我们以 sun.misc.Unsafe http://www.javasourcecode.org/html/open-source/jdk/jdk-6u23/sun/misc/Unsafe.html.它可以防止外部使用,但不会太多

3) Let's take sun.misc.Unsafe http://www.javasourcecode.org/html/open-source/jdk/jdk-6u23/sun/misc/Unsafe.html. It is protected from external usage but not too much

    Field f = Unsafe.class.getDeclaredField("theUnsafe");
    f.setAccessible(true);
    Unsafe unsafe = (Unsafe)f.get(null);

这里是内存分配测试的开销

Here is the cost of memory allocation test

    for(int i = 0; i < 1000; i++) {
        long m = u.allocateMemory(1000000);
    }

它需要 <1 毫秒,如果你还记得的话,对于 new byte[1000000] 需要 500 毫秒.

It takes < 1 ms, if you remember, for new byte[1000000] it took 500ms.

4) Unsafe 没有处理数组的直接方法.它需要知道类字段,但反射显示数组中没有字段.关于数组内部的信息不多,我猜它是特定于 JVM/平台的.尽管如此,它与任何其他 Java 对象一样,具有标头 + 字段.在我的 PC/JVM 上它看起来像

4) Unsafe has no direct methods to work with arrays. It needs to know class fields, but reflection shows no fields in an array. There is not much info about arrays internals, I guess it is JVM / platform specific. Nevertheless, it is, like any other Java Object, header + fields. On my PC/JVM it looks like

header - 8 bytes
int length - 4 bytes
long bufferAddress - 8 bytes

现在,使用 Unsafe,我将创建 byte[10],分配一个 10 字节的内存缓冲区并将其用作我的数组元素:

Now, using Unsafe, I will create byte[10], allocate a 10 byte memory buffer and use it as my array's elements:

    byte[] a = new byte[10];
    System.out.println(Arrays.toString(a));
    long mem = unsafe.allocateMemory(10);
    unsafe.putLong(a, 12, mem);
    System.out.println(Arrays.toString(a));

打印

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[8, 15, -114, 24, 0, 0, 0, 0, 0, 0]

你可以看到数组的数据没有初始化.

You can see thay array's data are not initialized.

现在我将改变我们的数组长度(尽管它仍然指向 10 字节的内存)

Now I'll change our array length (though it still points to 10 bytes memory)

    unsafe.putInt(a, 8, 1000000);
    System.out.println(a.length);

它显示了 1000000.这只是为了证明这个想法有效.

it shows 1000000. It was just to prove that the idea works.

现在进行性能测试.我将创建一个空字节数组a1,分配一个1000000字节的缓冲区,将此缓冲区分配给a1并设置a1.length = 10000000

Now performance test. I will create an empty byte array a1, allocate a buffer of 1000000 bytes, assign this buffer to a1 an set a1.length = 10000000

    long t0 = System.currentTimeMillis();
    for(int i = 0; i < 1000; i++) {
        byte[] a1 = new byte[0];
        long mem1 = unsafe.allocateMemory(1000000);
        unsafe.putLong(a1, 12, mem);
        unsafe.putInt(a1, 8, 1000000);
    }
    System.out.println(System.currentTimeMillis() - t0);

需要 10 毫秒.

5) C++中有malloc和alloc,malloc只分配内存块,calloc也用0初始化.

5) There are malloc and alloc in C++, malloc just allocates memory block , calloc also initializes it with zeroes.

cpp

...
JNIEXPORT void JNICALL Java_Test_malloc(JNIEnv *env, jobject obj, jint n) {
     malloc(n);
} 

Java

private native static void malloc(int n);

for (int i = 0; i < 500; i++) {
    malloc(1000000);
}

结果 malloc - 78 毫秒;calloc - 468 毫秒

results malloc - 78 ms; calloc - 468 ms

结论

  1. 由于无用的元素归零,Java 数组创建似乎很慢.
  2. 我们无法更改它,但 Oracle 可以.无需在 JLS 中更改任何内容,只需将原生方法添加到 java.lang.reflect.Array 中,如

  1. It seems that Java array creation is slow because of useless element zeroing.
  2. We cannot change it, but Oracle can. No need to change anything in JLS, just add native methods to java.lang.reflect.Array like

public static native xxx[] newUninitialziedXxxArray(int size);

public static native xxx[] newUninitialziedXxxArray(int size);

适用于所有原始数字类型(byte - double)和 char 类型.它可以在整个 JDK 中使用,例如 java.util.Arrays

for all primitive numeric types (byte - double) and char type. It could be used all over the JDK, like in java.util.Arrays

    public static int[] copyOf(int[] original, int newLength) {
        int[] copy = Array.newUninitializedIntArray(newLength);
        System.arraycopy(original, 0, copy, 0, Math.min(original.length, newLength));
        ...

或 java.lang.String

or java.lang.String

   public String concat(String str) {
        ...   
        char[] buf = Array.newUninitializedCharArray(count + otherLen);
        getChars(0, count, buf, 0);
        ...