且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用本地powershell命令从html文件中提取特定表?

更新时间:2023-10-21 12:16:46

确定,这没有彻底测试,但适用于您的示例表在PS 2.0与IE11:

OK, this isn't thoroughly tested but works with your example table in PS 2.0 with IE11:

# Parsing HTML with IE.
$oIE = New-Object -ComObject InternetExplorer.Application
$oIE.Navigate("file.html")
$oHtmlDoc = $oIE.Document

# Getting table by ID.
$oTable = $oHtmlDoc.getElementByID("table6")

# Extracting table rows as a collection.
$oTbody = $oTable.childNodes | Where-Object { $_.tagName -eq "tbody" }
$cTrs = $oTbody.childNodes | Where-Object { $_.tagName -eq "tr" }

# Creating a collection of table headers.
$cThs = $cTrs[0].childNodes | Where-Object { $_.tagName -eq "th" }
$cHeaders = @()
foreach ($oTh in $cThs) {
    $cHeaders += `
        ($oTh.childNodes | Where-Object { $_.tagName -eq "b" }).innerHTML
}

# Converting rows to a collection of PS objects exportable to CSV.
$cCsv = @()
foreach ($oTr in $cTrs) {
    $cTds = $oTr.childNodes | Where-Object { $_.tagName -eq "td" }
    # Skipping the first row (headers).
    if ([String]::IsNullOrEmpty($cTds)) { continue }
    $oRow = New-Object PSObject
    for ($i = 0; $i -lt $cHeaders.Count; $i++) {
        $oRow | Add-Member -MemberType NoteProperty -Name $cHeaders[$i] `
            -Value $cTds[$i].innerHTML
    }
    $cCsv += $oRow
}

# Closing IE.
$oIE.Quit()

# Exporting CSV.
$cCsv | Export-Csv -Path "file.csv" -NoTypeInformation

老实说,码。这只是一个如何使用PS中的DOM对象并将它们转换为PS对象的示例。

Honestly, I didn't aim for optimal code. It's just an example of how you could work with DOM objects in PS and convert them to PS objects.