且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么我们允许在JavaScript中创建稀疏数组?

更新时间:2023-02-17 16:23:18

我想知道var foo = new Array(20),var foo = [1,2,3]等代码的用例是什么? foo.length = 10或var foo = [,,,]是

从理论上讲,出于相同的原因,人们通常使用稀疏数据结构(不一定按重要性顺序排列):内存使用情况(var x = []; x[0]=123;x[100000]=456;不会消耗100000个插槽"),性能(例如,前述的x,通过for-in或reduce())和便利性(没有硬"出界错误,无需显式增长/缩小);

从语义上讲,js数组只是具有索引键和特殊属性'length'的特殊关联集合,满足不小于其所有索引属性的不变性.虽然是一个非常优雅的定义,但它的缺点是呈现稀疏定义的数组有些令人困惑,并且容易出错,如您所见.

但是为什么允许我们做上述事情呢?

即使不允许我们定义稀疏数组,我们仍然可以将未定义的元素放入数组,从而导致与稀疏数组基本相同的可用性问题. 因此,假设[0,undefined,...,undefined,1,undefined][0,...,1,]相同,只会为您带来更多的内存消耗数组和较慢的迭代时间.

稀疏数组通常比密集数组以更慢,更节省内存的方式实现.更高的内存效率和更慢的速度对我来说似乎是一个矛盾

用于通用数据的

密集数组"通常实现为连续的内存块,其中填充了相同大小的元素;如果添加更多元素,则在耗尽时继续填充内存块并分配新的块.鉴于重新分配意味着将所有元素都移到新的内存块中,因此通常会大量分配所述内存,以最大程度地减少重新分配的机会(类似黄金比率乘以最后一个容量). 因此,此类数据结构通常是有序/本地遍历最快的(对CPU/缓存更友好),对于不可预测的插入/删除(对于足够大的N)最慢,并且具有较高的内存开销〜sizeof(elem)* N +额外未来元素的空间.

相反,稀疏数组/矩阵/..."是通过将散布在内存中的较小的存储块链接"在一起或通过使用某种逻辑压缩"形式的密集数据结构或二者兼而有之来实现的;在这两种情况下,都有明显的原因减少了内存消耗,但是遍历它们需要更多的工作和更少的本地内存访问模式.

因此,相对于相同的有效遍历元素, 稀疏数组消耗的内存要少得多,但比密集数组要慢得多.但是,假设您将稀疏数组与稀疏数据一起使用,并且算法对零"的作用微不足道,那么在某些情况下,稀疏数组的运算速度会更快(例如,将非常大的矩阵与很少的非零元素相乘...).

I was wondering what the use-cases for code like var foo = new Array(20), var foo = [1,2,3]; foo.length = 10 or var foo = [,,,] were (also, why would you want to use the delete operator instead of just removing the item from the array). As you may know already, all these will result in sparse arrays.

But why are we allowed to do the above thnigs ? Why would anyone want to create an array whose length is 20 by default (as in the first example) ? Why would anyone want to modify and corrupt the length property of an array (as in the second example) ? Why would anyone want to do something like [, , ,] ? Why would you use delete instead of just removing the element from the array ? Could anyone provide some use-cases for these statements ?



I have been searching for some answers for ~3 hours. Nothing. The only thing most sources (2ality blog, JavaScript: The Definitive Guide 6th edition, and a whole bunch of other articles that pop up in the Google search results when you search for anything like "JavaScript sparse arrays") say is that sparse arrays are weird behavior and that you should stay away from them. No sources I read explained, or at least tried to explain, why we were allowed to create sparse arrays in the first place. Except for You Don't Know JS: Types & Grammar, here is what the book says about why JavaScript allows the creation of sparse arrays:

An array that has no explicit values in its slots, but has a length property that implies the slots exist, is a weird exotic type of data structure in JS with some very strange and confusing behavior. The capability to create such a value comes purely from old, deprecated, historical functionalities ("array-like objects" like the arguments object).

So, the book implies that the arguments object somehow, somewhere, uses one of the examples I listed above to create a sparse array. So, where and how does arguments use sparse arrays ?



Something else that is confusing me is this part in the book "JavaScript: The Definitive Guide 6th Edition":

Arrays that are sufficiently sparse are typically implemented in a slower, more memory-efficient way than dense arrays are`.

"more memory-efficient" appears like a contradiction to "slower" to me, so what is the difference between the two, in the context of sparse arrays especially ? Here is a link to that specific part of the book.

I was wondering what the use-cases for code like var foo = new Array(20), var foo = [1,2,3]; foo.length = 10 or var foo = [,,,] were

in theory, for the same reason people usually use sparse data structure ( not necessarily in order of importance ): memory usage ( var x = []; x[0]=123;x[100000]=456; won't consume 100000 'slots' ), performance ( say, take the avarage of the aforementioned x, via for-in or reduce() ) and convenience ( no 'hard' out of bound errors, no need to grow/shrink explicitly );

that said, semantically, a js array is just a special associative collection with index keys and a special property 'length' satisfying the invariant of being greater than all its index properties. While being a pretty elegant definition, it has the drawback of rendering sparsely defined arrays somewhat confusing and error prone as you noticed.

But why are we allowed to do the above thnigs ?

even if we were not allowed to define sparse arrays, we could still put undefined elements into arrays, resulting in basically the same usability problems you see with sparse arrays. So, say, having [0,undefined,...,undefined,1,undefined] the same as [0,...,1,] would buy you nothing but more memory consuming arrays and slower iterations.

Arrays that are sufficiently sparse are typically implemented in a slower, more memory-efficient way than dense arrays are. more memory-efficient and slower appear like a contradiction to me

"dense arrays" used for general purpose data are typically implemented as a contiguous block of memory filled with elements of the same size; if you add more elements, you continue filling the memory block allocating a new block if exhausted. Given that reallocation implies moving all elements to the new memory block, said memory is typically allocated in abundance to minimize chances of reallocation ( something like the golden ratio times the last capacity ). Hence, such data structures are typically the fastest for ordered/local traversal ( being more CPU/cache friendly ), the slowest for unpredicatble insertions/deletions ( for sufficiently big N ) and have high memory overhead ~ sizeof(elem) * N + extra space for future elems.

Conversely, "sparse arrays/matrices/..." are implemented by 'linking' together smaller memory blocks spreaded in memory or by using some 'logically compressed' form of a dense data structure or both; in either case, memory consumption is reduced for obvious reasons, but traversing them comparatively requires more work and less local memory access patterns.

So, if compared relative to the same effectively traversed elements sparse arrays consume much less memory but are much slower than dense arrays. However, given that you use sparse arrays with sparse data and algorithms acting trivially on 'zeros', sparse arrays can turn out much faster in some scenarios ( eg. multiply very big matrices with few non zero elements ... ).