且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

[深入理解文件系统之五] 从SVR3 到SVR4

更新时间:2022-09-30 12:12:43

为何有必要从SVR3到SRV4?为了吸收AT&T当时开发的操作系统的一些特性,集成VM的更新。


  • 从SVR3到SVR4主要改动的地方包括:

1.用读写锁替代了之前的锁策略;

2.在IO路径上,用pages cache替换了之前的buffer caches, 用来提供meta data传输的吞吐率和效率

[深入理解文件系统之五] 从SVR3 到SVR4

  • 具体的变化如下:


  • 文件描述符方面:

SVR3:文件描述符就是u_ofile[]数组的Index;

SVR4: 文件描述符是动态分配且可调节的, u_ofile[]被去掉,用u_nofiles[]和u_flist, a structure of type ufchunkthat contains an array of NFPCHUNK(which is 24) pointers to file table entries

替代了。每个进程最大文件描述符的个数是由rlimit数据结构来限制的。

There are a number of per-process limits within the u_rlimit[]array. The u_rlimit[RLIMIT_NOFILE]entry defines both a soft and hard file descriptor limit. Allocation of file descriptors will fail once the soft limit is reached. Thesetrlimit()system call can be invoked to increase the soft limit up to that of the hard limit, but not beyond. The hard limit can be raised, but only by root


SVR4中文件描述符的分配图如下:

[深入理解文件系统之五] 从SVR3 到SVR4


  • Virtual Filesystem Switch Table方面的改变

内核编译的时候动态构造,由vfssw[]数组指定的file system switch table , 每一个成员的构造如下:

struct vfssw {

char *vsw_name;

int (*vsw_init)();

struct vfsops *vsw_vfsops;

}

Thevfsstructure with SVR4 contained all of the original Sun vfsfields and introduced a few others including vfs_dev, which allowed a quick and easy scan to see if a filesystem

was already mounted, and the vfs_fstypefield, which is used to index the

vfssw[]array to specify the filesystem type


  • vnode和VOP层的改变

vnode数据结构中去掉了v_shlockc, v_exlockc;

加入了:

v_stream指向vstream设备;

v_filocks指向当前文件所指向的所有文件和锁

v_pages基于SVR4之后,所有的读写操作都是基于page cache,而非之前的buffer cache (buffer cache现在只是在像inodes/directories等meta-data中用到)


对应的vnode operations vector 数组中经历了更多的改动:

从中去掉的函数包括:vop_bmap()/vop_bread()/vop_brelse()/vop_strategy()/vop_rdwr()/vop_select()


新引入的函数包括:

vop_read()/vop_write()/

vop_setfl() : in response to an fcntl() system call

where the F_SETFL (set file status flags) flag is specified. This allows the

filesystem to validate any flags passed.

vop_fid():用来生成唯一的文件句柄

vop_rwlock(): 通过引入了LOCK_SHARED or LOCK_EXCL 标示符,支持了单写者多读者模型

vop_rwunlock():释放上面申请使用的锁

vop_seek(): When specifying an offset to lseek(), this function is called to determine whether the filesystem deems the offset to be appropriate.

vop_cmp():比较两个指定的vnode

vop_frlock(): implement file and record locking

vop_space(): fcntl() system call has an option, F_FREESP, which  allows the caller to free space within a file

vop_realvp():A call  toVOP_REALVP()is made by filesystems when performing a link()system call to ensure that the link goes to the underlying file and not the specfs file, that has no physical representation on disk.

vop_getpage():read pages of data from the file in response to a page fault.

vop_putpage():flush a modified page of file data to disk

vop_map():implementing memory mapped files

vop_addmap():adds a mapping


vop_delmap(): deletes a mapping

vop_poll():implementing the poll()system call.

vop_pathconf():implement the pathconf()and

fpathconf()system calls. Filesystem-specific information can be returned, such as the maximum number of links to a file and the maximum file size


vnode的操作最后都是用宏实现的:

#define VOP_LOOKUP(vp,cp,vpp,pnp,f,rdir,cr) \ (*(vp)->v_op->vop_lookup)(vp,cp,vpp,pnp,f,rdir,cr)


这样一来:The filesystem-independent layer of the kernel will only access the filesystem

through macros. Obtaining a vnode is performed as part of an open()or

creat()system call or by the kernel invoking one of the veneer layer functions

when kernel subsystems wish to access files directly.


基于上述第二步,很多之前的操作,比如bread/bwrite/都被去掉,用没使用buffer的函数替代。















本文转自存储之厨51CTO博客,原文链接:http://blog.51cto.com/xiamachao/1903259 ,如需转载请自行联系原作者