且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Powershell-使用特殊字符对字符串对象进行排序

更新时间:2023-01-20 17:13:25

此行为是设计使然,但并非总是人们想要/期望的.如果您希望字符串按ASCII顺序与每个字符一起排序,请使用以下命令:

 Add-Type @"
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Globalization;

    public class SimpleStringComparer: IComparer, IComparer<string>
    {

        private static readonly CompareInfo compareInfo = CompareInfo.GetCompareInfo(CultureInfo.InvariantCulture.Name);

        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return compareInfo.Compare(x, y, CompareOptions.OrdinalIgnoreCase);
        }
        public SimpleStringComparer() {}
    }
"@


[string[]]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'

[System.Collections.Generic.List[string]]$list = [System.Collections.Generic.List[string]]::new()
$list.AddRange($myList)
[SimpleStringComparer]$comparer = [SimpleStringComparer]::new()
$list.Sort([SimpleStringComparer]::new())
$list
 

输出:

 s'a
S'a
s'a1
S'a1
s-a
S-a
s-a1
S-a1
sa
Sa
sa1
Sa1
s^a
S^a
 

更多信息

在注释中的每个 @TessellatingHeckler 中,可以通过将字符串强制转换为char数组来按字符代码(常规)顺序对字符串进行排序.但是,它仍然以潜在的意外方式处理连字符和撇号(因为这些字符将被忽略):

 $myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'
$myList | Sort-Object -Property { [char[]] $_ }
 

 s'a
S'a
s-a
S-a
s'a1
S'a1
s-a1
S-a1
s^a
S^a
sa
Sa
sa1
Sa1
 

当前的排序行为是设计使然.看来PowerShell实现了单词排序".在此处进行记录: https://msdn.microsoft.com/zh-CN/library/windows/desktop/dd318144(v=vs.85).aspx#SortingFunctions

除了忽略连字符和撇号(比较其他相同字符串时除外)之外,这种类型还将标点符号视为字母数字字符之前的字符,并与重音符号一起处理标点字母.可以看到一个简单的演示,如下所示:

32..255 | %{[string][char][byte]$_} | sort

要定义其他排序行为,当前您可能需要使用.Net,如下所示:

 Add-Type @"
 

     using System;
    using System.Runtime.InteropServices;
    using System.Collections;
    public class NumericStringComparer: IComparer
    {
        //https://msdn.microsoft.com/en-us/library/windows/desktop/bb759947%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
        [DllImport("shlwapi.dll")]
        public static extern int StrCmpLogicalW(string psz1, string psz2);
        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return StrCmpLogicalW(x, y);
        }
        public NumericStringComparer() {}
    }
 

 "@

[System.Collections.ArrayList]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a', , '100a','1a','001a','2a','20a'
$myList.Sort([NumericStringComparer]::new())
$myList -join ', '
 

上面的代码按照Windows资源管理器的方式对字符串进行排序(即,将前导数字视为数字值):

 s'a, s'a1, S'a, s-a, S-a, S-a1, S'a1, s-a1, S^a, s^a, 1a, 001a, 2a, Sa, Sa1, sa, sa1, 20a, 100a
 

我已经提交了一项功能建议,以在Sort-Object上提供更多的PS友好解决方案.参见 https://github.com/PowerShell/PowerShell/issues/4098 >

I'm running

'S-tst','s-s-rst','srst2','s-zaa','s-a','s-zf' | Sort-Object

Shouldn't I have gotten a return of

s-a
S-tst
s-zaa
s-zf
srst2
s-s-rst

but instead I get the following:

s-a
srst2
s-s-rst
S-tst
s-zaa
s-zf

How is this possible ? Does sort-object only look at letters when sorting out ? Is there any way to sort it out by special characters ?

This behaviour is by design, but not always what people want/expect. If you want strings sorted with each character in ASCII order use this:

Add-Type @"
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Globalization;

    public class SimpleStringComparer: IComparer, IComparer<string>
    {

        private static readonly CompareInfo compareInfo = CompareInfo.GetCompareInfo(CultureInfo.InvariantCulture.Name);

        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return compareInfo.Compare(x, y, CompareOptions.OrdinalIgnoreCase);
        }
        public SimpleStringComparer() {}
    }
"@


[string[]]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'

[System.Collections.Generic.List[string]]$list = [System.Collections.Generic.List[string]]::new()
$list.AddRange($myList)
[SimpleStringComparer]$comparer = [SimpleStringComparer]::new()
$list.Sort([SimpleStringComparer]::new())
$list

Outputs:

s'a
S'a
s'a1
S'a1
s-a
S-a
s-a1
S-a1
sa
Sa
sa1
Sa1
s^a
S^a

More Info

Per @TessellatingHeckler in the comments, you can sort strings in character code (ordinal) order by casting the string to a char array. However, that still handles hyphens and apostrophes in a potentially unexpected way (as these characters are ignored):

$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a'
$myList | Sort-Object -Property { [char[]] $_ }

s'a
S'a
s-a
S-a
s'a1
S'a1
s-a1
S-a1
s^a
S^a
sa
Sa
sa1
Sa1

The current sorting behaviour is by design. It appears that PowerShell implements a "Word Sort". This is documented here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd318144(v=vs.85).aspx#SortingFunctions

In addition to ignoring hyphens and apostrophes (except when comparing otherwise identical strings), this sort also treats punctuation characters as coming before alphanumerics, and handles accented letters alongside their counterparts. A simple demo of this can be seen like so:

32..255 | %{[string][char][byte]$_} | sort

To define other sorting behaviours, currently you'd likely need to dip into .Net, like so:

Add-Type @"

    using System;
    using System.Runtime.InteropServices;
    using System.Collections;
    public class NumericStringComparer: IComparer
    {
        //https://msdn.microsoft.com/en-us/library/windows/desktop/bb759947%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
        [DllImport("shlwapi.dll")]
        public static extern int StrCmpLogicalW(string psz1, string psz2);
        public int Compare(object x, object y)
        {
            return Compare(x as string, y as string);
        }
        public int Compare(string x, string y)
        {
            return StrCmpLogicalW(x, y);
        }
        public NumericStringComparer() {}
    }

"@

[System.Collections.ArrayList]$myList = 's-a','s-a1','s''a','s''a1', 'sa','sa1','s^a','S-a','S-a1','S''a','S''a1', 'Sa','Sa1','S^a', , '100a','1a','001a','2a','20a'
$myList.Sort([NumericStringComparer]::new())
$myList -join ', '

The above sorts strings the way Windows Explorer would (i.e. treating leading digits as numeric values):

s'a, s'a1, S'a, s-a, S-a, S-a1, S'a1, s-a1, S^a, s^a, 1a, 001a, 2a, Sa, Sa1, sa, sa1, 20a, 100a

I've submitted a feature suggestion to provide more PS friendly solutions on Sort-Object. See https://github.com/PowerShell/PowerShell/issues/4098