且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从C中的文件正确读取某些字符串?

更新时间:2023-11-13 16:13:04

您正在尝试从文件中读取内容是很简单的,但是可以通过设置一个标志来告诉您是否已经看到'a''b',跳过所有空格和':'字符,将所有其他字符存储在缓冲区中,根据需要重新分配,然后在找到第二个'a''b'时,将该字符放回FILE* ungetc进行流处理,nul终止并返回缓冲区.

The read you are attempting from the file is non-trivial but can be handled fairly simply by setting a flag telling you whether you are already seen an 'a' or 'b', skipping all whitespace and ':' characters, storing all other characters in your buffer, reallocating as needed, and then when the second 'a' or 'b'is found, putting that character back in the FILE* stream with ungetc, nul-terminating and returning your buffer.

听起来很容易-对吗?好吧,差不多了.让我们看看您的readword()函数需要什么.

Sounds easy enough -- right? Well, that's pretty much it. Let's look at what would be needed in your readword() function.

首先,由于要在readword()中分配buffer,因此不需要将char **buffer作为参数传递.您已经将readword声明为char *readword(...),因此只需将FILE*指针作为参数传递,并返回指向已分配,填充和 nul终止缓冲区的指针.

First, since you are allocating for buffer in readword(), there is no need to pass char **buffer as a parameter. You have already declared readword as char *readword(...) so just pass the FILE* pointer as a parameter and return a pointer to your allocated, filled and nul-terminated buffer.

您可以按照自己喜欢的任何方式处理重新分配方案,可以从分配合理数量的字符开始,然后将当前大小增加一倍(或增加一些倍数),或者每次用尽时都添加固定的数量.下面的示例仅从32个字符的缓冲区开始,然后在每次需要重新分配时添加另一个32个字符. (如果数据大小确实未知,我可能会以32个字符开头,然后每次用完时都会翻倍-完全由您决定).

You can handle the reallocation scheme any way you like, You can either start with some reasonable number of characters allocated and then double (or add some multiple to) the current size, or just add a fixed amount each time you run out. The example below simply starts with a 32-char buffer and then adds another 32-chars each time reallocation is needed. (if the data size was truly unknown, I would probably start with 32-chars and then double each time I ran out -- completely up to you).

使用ctype.h中提供的isspace()函数可确保正确处理所有空格.

Using the isspace() function found in ctype.h ensures all whitespace is handled correctly.

最后几个问题只是确保您在缓冲区中返回一个 nul终止的字符串,并确保当被称为.

The last few issues are simply ensuring you return a nul-terminated string in buffer and making sure you re-initialize your pointer to the end of your buffer in each new block of memory when realloc is called.

将其完全放在一起,您可以执行与以下操作类似的操作.在readword()函数之后添加了一个简单的示例程序,以读取您的示例文件并输出从该文件读取的组合字符串,

Putting it altogether, you could do something similar to the following. A simple example program is added after the readword() function to read your example file and output the combined strings read from the file,

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define NCHR  32

char *readword (FILE *fp)
{
    int c,                      /* current character */
        firstline = 0;          /* flag for 'a' or 'b' found at 1st char */
    size_t n = 0, nchr = NCHR;  /* chars read, number of chars allocated */
    char *buffer = NULL, *p;    /* buffer to fill, pointer to buffer */

    buffer = malloc (nchr);             /* allocate initial NCHR */
    if (!buffer) {                      /* validate */
        perror ("malloc-buffer");
        return NULL;
    }
    p = buffer;                         /* set pointer to buffer */

    while ((c = fgetc (fp)) != EOF) {   /* read each char */
        if (isspace (c) || c == ':')    /* skip all whitespace and ':' */
            continue;
        if (c == 'a' || c == 'b') {     /* begins with 'a' or 'b' */
            if (firstline) {            /* already had a/b line */
                ungetc (c, fp);         /* put the char back */
                *p = 0;                 /* nul-terminate */
                return buffer;          /* return filled buffer */
            }
            firstline = 1;              /* set firstline flag */
            continue;
        }
        else {
            if (n == nchr - 2) {        /* check if realloc needed */
                void *tmp = realloc (buffer, nchr + NCHR);
                if (!tmp)               /* validate */
                    exit (EXIT_FAILURE);
                buffer = tmp;           /* assign new block to buffer */
                p = buffer + n;         /* set p at buffer end */
                nchr += NCHR;           /* update no. chars allocated */
            }
            *p++ = c;       /* assign the current char and advance p */
            n++;            /* increment your character count */
        }
    }
    *p = 0;         /* nul-terminate */

    return buffer;
}

int main (int argc, char **argv) {

    char buf[NCHR], *word;
    int nwords, toggle = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    if (!fgets (buf, NCHR, fp)) {
        fputs ("error: read of line 1 failed.\n", stderr);
        return 1;
    }
    if (sscanf (buf, "%d", &nwords) != 1) {
        fputs ("error: invalid file format.\n", stderr);
        return 1;
    }
    nwords *= 2;   /* actual number of words is twice the number of pairs */

    while (nwords-- && (word = readword (fp))) {
        printf ("%c: %s\n", toggle ? 'b' : 'a', word);
        free (word);
        if (toggle) {
            putchar ('\n');
            toggle = 0;
        }
        else
            toggle = 1;
    }

    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    return 0;
}

(注意:toggle上方只是一个10标志,用于在相应行的开头输出"a:""b:"并添加一个在两行之间读取'\n'.

(note: above the toggle is simply a 1 or 0 flag used to either output "a:" or "b:" at the beginning of the appropriate line and add a '\n' between the pairs of lines read.)

使用/输出示例

$ ./bin/read_multiline_pairs dat/pairsbinline.txt
a: 010101000001010111110100101010000000111100000000000011110000
b: 0000011111000001000110101010100111110001

a: 0000001111111111110000111111111111000
b: 00000001111001010101

内存使用/错误检查

在动态分配存储空间时始终验证您的内存使用情况,并确保已释放所有分配的内存.

Always verify your memory use when you dynamically allocate storage and ensure you have freed all the memory you allocate.

$ valgrind ./bin/read_multiline_pairs dat/pairsbinline.txt
==14257== Memcheck, a memory error detector
==14257== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14257== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==14257== Command: ./bin/read_multiline_pairs dat/pairsbinline.txt
==14257==
a: 010101000001010111110100101010000000111100000000000011110000
b: 0000011111000001000110101010100111110001

a: 0000001111111111110000111111111111000
b: 00000001111001010101

==14257==
==14257== HEAP SUMMARY:
==14257==     in use at exit: 0 bytes in 0 blocks
==14257==   total heap usage: 8 allocs, 8 frees, 872 bytes allocated
==14257==
==14257== All heap blocks were freed -- no leaks are possible
==14257==
==14257== For counts of detected and suppressed errors, rerun with: -v
==14257== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

仔细研究一下,如果您有任何疑问,请告诉我.问题的最大部分是处​​理每一对所有行的读取和串联.其余的编码留给您.

Look things over and let me know if you have questions. The largest part of the problem was handling the read and concatenation of all the lines for each pair. The rest of the coding is left to you.