且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

无法删除双引号中包含的回车符和换行符

更新时间:2023-02-09 10:39:58

以下是可能的解决方案:

perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'

如果当前行的引号不平衡(即"的奇数),则必须在字段中间结束,因此我们将换行符删掉,追加下一条输入行,然后重新开始循环. /p>

I want to remove any non printable new line characters in the column data.

I have enclosed all the columns with double quotes to delete the new line characters present in the column easily and to ignore the record delimiter after each end of line.

Say,I have 4 columns seperated by comma and enclosed by quotes in a text file. I'm trying to remove \n and \r characters only if it is present in between the double quotes

Currently used trim,but it deleted every line break and made it a sequence file without any record seperator.

tr -d '\n\r' < in.txt > out.txt

Sample data:

"1","test\n

Sample","data","col4"\n

"2\n

","Test","Sample","data" \n

"3","Sam\n

ple","te\n

st","data"\n

Expected Output:

"1","testSample","data","col4"\n

"2","Test","Sample","data" \n

"3","Sample","test","data"\n

Any suggestions ? Thanks in advance

Here's a possible solution:

perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'

If the current line has unbalanced quotes (i.e. an odd number of "), it must end in the middle of a field, so we chomp out the newline, append the next input line, and restart the loop.