更新时间:2023-02-26 19:06:13
The relevant points you're looking for are the percentiles:
% generate sample data
data = [randn(900,1) ; randn(50,1)*3 + 5; ; randn(50,1)*3 - 5];
subplot(121), hist(data)
subplot(122), boxplot(data)
% find 5th, 95th percentiles (range that contains 90% of the data)
limits = prctile(data, [5 95])
% find data in that range
reducedData = data(limits(1) < data & data < limits(2));
存在其他方法来检测离群值,例如三个标准偏差规则,其中包括许多
Other approachs exist to detect outliers, such as the IQR outlier test and the three standard deviation rule, among many others:
%% three standard deviation rule
z = 3;
bounds = z * std(data)
reducedData = data( abs(data-mean(data)) < bounds );
和
%% IQR outlier test
Q = prctile(data, [25 75]);
IQ = Q(2)-Q(1);
%a = 1.5; % mild outlier
a = 3.0; % extreme outlier
bounds = [Q(1)-a*IQ , Q(2)+a*IQ]
reducedData = data(bounds(1) < data & data < bounds(2));
顺便说一句,如果要获取与曲线下90%面积相对应的z值(|X|<z
),请使用:
BTW if you want to get the z value (|X|<z
) that corresponds to 90% area under the curve, use:
area = 0.9; % two-tailed probability
z = norminv(1-(1-area)/2)