且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Kafka-Python中流式传输CSV数据

更新时间:2023-02-16 12:04:27

Kafka Connect(Apache Kafka的一部分)是在Kafka与其他系统(包括平面文件)之间进行提取和导出的好方法.

Kafka Connect (part of Apache Kafka) is a good way to do ingest and egress between Kafka and other systems, including flat files.

您可以使用 Kafka Connect SpoolDir连接器将CSV文件流式传输到Kafka.从 Confluent Hub 安装,然后为您提供配置源文件:

You can use the Kafka Connect SpoolDir connector to stream CSV files into Kafka. Install it from Confluent Hub, and then provide it with configuration for your source file:

curl -i -X PUT -H "Accept:application/json" \
    -H  "Content-Type:application/json" http://localhost:8083/connectors/source-csv-spooldir-00/config \
    -d '{
        "connector.class": "com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector",
        "topic": "orders_spooldir_00",
        "input.path": "/data/unprocessed",
        "finished.path": "/data/processed",
        "error.path": "/data/error",
        "input.file.pattern": ".*\\.csv",
        "schema.generation.enabled":"true",
        "csv.first.row.as.header":"true"
        }'

有关更多示例,请参见此博客和详细信息.

See this blog for more examples and details.