且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将基于行的记录的表转换为基于列的记录(CSV)

更新时间:2023-02-04 13:34:17

这与Aacini的原始答案类似,除了我从来没有在内存中存储多行。大的输入文件将消耗大量内存,这会减慢脚本。只存储一行避免了这个问题。



另一个主要区别是我让代码发现开始一个新行的列名,而不是硬编码的值。 p>

我还使用不同的方法去掉标题中每个列名称的尾随空格。我假设列名不包含任何以下字符: \ / 。我依赖于文件名不能以空格结束的事实,因此〜n 修饰符规范化名称以删除任何尾随空格。



当从值中去除引号空间时,我还使用tokens = * p>

  @echo OFF 
setlocal enableDelayedExpansion

设置input = test.txt
setoutput = result.csv

setrow =
setheader =
setbegin =
setfirst =$ / ftokens = *%% C in(%% B)中的$ f($ input%)b $ b(
for / fusebackq tokens = 1 * delims == )do(
if!begin!equ%% A(
如果没有先定义(
set first = 1
echo!header:〜1!

echo!row:〜1!
setrow =

setrow =!row!,%% C
b如果没有首先定义/ fdelims =%% H in(%% A)do(
如果没有定义begin setbegin = %% A
setheader =! header!,%%〜nH


echo!row:〜1!
)>%output%
$ p>

-12-05



相同的算法在VBS或JScript中可以更加强大地实现,并且速度更快。



或者你可以得到一个跳跃开始,并使用 JREPL.BAT - 一个混合JScript /批处理实用程序,用于在文本上执行正则表达式搜索和替换。它允许将用户定义的JScript代码片段合并到过程中,但是在批处理上下文中执行。



整个命令可以放在一个lonnnnnggggggg行,会真的丑陋。相反,我使用批处理行继续定义一个变量与大多数用户定义的JScript代码,并传递使用 / JBEG 。不可能将双引号文字传递给CSCRIPT,所以我使用'\x22'



脚本期望源文件作为第一个也是唯一的参数传递,并且输出使用具有.csv扩展名的相同基本名称写入相同的位置。

  @echo off 
setlocal
set beg = ^
var begin,header ='。',line ='',q ='\x22 '; ^
function writeLn(){^
if(header)output.WriteLine(header.substr(2)); ^
header =''; ^
if line $($ 1,$ 2){^
if( $ 1 == begin)writeLn(); ^
if(!begin)begin = $ 1; ^
if(header)header + =','+ q + $ 1 + q; ^
line + ','+ q + $ 2 + q; ^
return false; ^
}
call jrepl^(。+?)* = *(。*)repl($ 1,$ 2 ); / jmatch / jbeg%beg%/ jendwriteLn(); / f%1 / o%〜dpn1.csv
exit / b

使用完全相同的JScript代码,但我使用 / JLIB 选项直接从文件而不是从变量加载它。该脚本使用标准的混合Jscript /批处理技术。此选项允许我在代码中使用双引号文字。

  @if(@X)==(@ Y) @end / *开始一个JScript注释的无害的混合行

:: ****批次代码********
@echo off
call jrepl ^(。+?)* = *(。*)repl($ 1,$ 2); / jmatch / jlib%〜f0/ jendwriteLn(); / f%1 / o%〜dpn1.csv
exit / b

****** Jscript代码****** /
$ b b var begin,header ='。',line ='',q ='';

function writeLn(){
if(header)output.WriteLine(header.substr 2));
header ='';
if(line)output.WriteLine(line.substr(1));
line ='';
}

函数repl($ 1,$ 2){
if($ 1 == begin)writeLn();
if(!begin)begin = $ 1;
if =','+ q + $ 1 + q;
line + =','+ q + $ 2 + q;
return false;
}
pre>

Given a table in a text file that contains row based data, what method would you recommend for converting to a column based table? (e.g. CSV).

Input_data.txt:

Source =         X:\folder_abc
Destination =    Y:\Abc_folder
Total bytes =    208,731,021
MB per min =     256.5
Source =         X:\folder_def
Destination =    Y:\xyz_folder
Total bytes =    123,134,545
MB per min =     326
Source =         X:\folder_foo
Destination =    Y:\Baz_folder
Total bytes =    24,344
MB per min =     532
...etc.

Desired Result (only formatted with tabs here for legibility):

Source,             Destination,        Total bytes,    MB per min
"X:\folder_abc",    "Y:\Abc_folder",    "208,731,021",  "256.5"
"X:\folder_def",    "Y:\xyz_folder",    "123,134,545",  "326"
"X:\folder_foo",    "Y:\Baz_folder",    "24,344",       "532"
...

Tools at my disposal are Windows batch files and Powershell. Prefer a .bat solution because I'm more comfortable there, but if that is too circuitous or opaque we can punt it.

UPDATE, as per comments

I've figured out how to turn the records into name & value variables, but don't know how to manipulate them from that point to transpose into columns.

for /f "tokens=1,2 delims==" %%a in ('findstr /c:"=" "%logfile%"') do (
  @echo %%a %%b
  )

It just occured to me I can do one column per text file and then append them all in Excel. Crude but workable perhaps(?)

for /f "tokens=1,2 delims==" %%a in ('findstr /c:"=" "%logfile%"') do (
  @echo %%b >>  %%a.csv
  )

UPDATE-2: quote all values in desired result, as dbenham pointed out not doing so will cause problems.

This is similar to Aacini's original answer, except I never store more than one line in memory. A large input file would consume a lot of memory, which would slow down the script. Storing only one line avoids that problem.

Another major difference is I let the code discover the column name that starts a new row, rather than hard coding the value.

I also use a different method to strip off the trailing space(s) from each column name in the header. I assume that the column names do not contain any of the following characters: :, ., \, or /. I rely on the fact that file names cannot end with a space, so the ~n modifier normalizes the "name" to remove any trailing space(s).

I also use "tokens=*" when stripping leading spaces from the values, just in case a value contains spaces.

@echo OFF
setlocal enableDelayedExpansion

set "input=test.txt"
set "output=result.csv"

set "row="
set "header="
set "begin="
set "first="
(
  for /f "usebackq tokens=1* delims==" %%A in ("%input%") do for /f "tokens=*" %%C in ("%%B") do (
    if "!begin!" equ "%%A" (
      if not defined first (
        set first=1
        echo !header:~1!
      )
      echo !row:~1!
      set "row="
    )
    set "row=!row!,"%%C""
    if not defined first for /f "delims=" %%H in ("%%A") do (
      if not defined begin set "begin=%%A"
      set "header=!header!,"%%~nH""
    )
  )
  echo !row:~1!
)>"%output%"


EDIT 2014-12-05

The same algorithm could be implemented more robustly in VBS or JScript, and it would be faster.

Or you could get a bit of a jump start and use JREPL.BAT - a hybrid JScript/batch utility that performs regular expression search and replace on text. It allows user defined JScript code snippets to be incorporated into the process, but is executed within a batch context.

The entire command could be put on one lonnnnnggggggg line, but that would be really ugly. Instead I use batch line continuation to define a variable with most of the user defined JScript code and pass that in using /JBEG. It is impossible to pass a double quote literal to CSCRIPT, so I use '\x22' instead.

The script expects the source file to be passed as the first and only argument, and the output is written to the same location using the same base name with a .csv extension.

@echo off
setlocal
set beg=^
var begin, header='.', line='', q='\x22';^
function writeLn(){^
  if (header) output.WriteLine(header.substr(2));^
  header='';^
  if (line) output.WriteLine(line.substr(1));^
  line='';^
}^
function repl($1,$2){^
  if ($1==begin) writeLn();^
  if (!begin) begin=$1;^
  if (header) header+=','+q+$1+q;^
  line+=','+q+$2+q;^
  return false;^
}
call jrepl "^(.+?) *= *(.*)" "repl($1,$2);" /jmatch /jbeg "%beg%" /jend "writeLn();" /f %1 /o "%~dpn1.csv"
exit /b

Below uses the exact same JScript code, but I use the /JLIB option to load it directly from the file instead of from a variable. The script uses standard hybrid Jscript/batch technology. This option allows me to use a double quote literal in the code.

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScript comment

::**** Batch code ********
@echo off
call jrepl "^(.+?) *= *(.*)" "repl($1,$2);" /jmatch /jlib "%~f0" /jend "writeLn();" /f %1 /o "%~dpn1.csv"
exit /b

****** Jscript code ******/

var begin, header='.', line='', q='"';

function writeLn(){
  if (header) output.WriteLine(header.substr(2));
  header='';
  if (line) output.WriteLine(line.substr(1));
  line='';
}

function repl($1,$2){
  if ($1==begin) writeLn();
  if (!begin) begin=$1;
  if (header) header+=','+q+$1+q;
  line+=','+q+$2+q;
  return false;
}