터칭 데이터

Kafka 기본 개념 정리 본문

Kafka와 Spark Streaming

Kafka 기본 개념 정리

터칭 데이터 2024. 1. 26. 10:40

 

 

 

Absolutely, Apache Kafka does have many components. Here's a brief overview of how they all relate to each other:

  1. Producer: Producers are the source of data in Kafka. They send records to topics.
  2. Consumer: Consumers read from topics and process the records.
  3. Topic: A Topic is a category or feed name to which records are published. Topics are split into one or more partitions.
  4. Partition: Partitions allow a topic to be parallelized by splitting the data across multiple nodes.
  5. Segment: Each partition is further divided into segments. A segment is a collection of messages, stored in one or more log files.
  6. Message/Record: The actual data that is stored in Kafka. Each message consists of a key, a value, and a timestamp.
  7. Broker: A single Kafka server is called a broker. The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk.
  8. Kafka Cluster: A Kafka cluster is a system that comprises of multiple brokers. Producers send data to the cluster, which is then stored and processed by one or more brokers.
  9. Replication: Replication is a feature that ensures that published messages are copied to multiple brokers. This provides fault tolerance in case a broker fails.
  10. Controller: In a Kafka cluster, one of the brokers serves as the controller, which is responsible for maintaining the list of partition replicas and monitoring for failed replicas. The controller is also responsible for performing administrative tasks like reassigning partitions.
  11. Kafka Connect: Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other data systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka.

These components all work together to ensure that Kafka is fast, scalable, and fault-tolerant. Producers send data to brokers, which store the data in topics. The topics are divided into partitions and segments for parallelism and efficiency. Consumers read the data from the topics. Replication ensures that the data is safe in case of a broker failure, and the controller broker oversees the whole process.

 

 

 

 

 

 

'Kafka와 Spark Streaming' 카테고리의 다른 글

마무리와 숙제  (0) 2024.01.26
Kafka Stream 예제 프로그램  (0) 2024.01.26
Streaming WordCount 예제 프로그램  (0) 2024.01.26
Spark 환경설정  (0) 2024.01.26
Spark Streaming 소개  (0) 2024.01.26