Pulsar is a next-generation messaging platform. Separation of compute from storage, multi-tenancy, and tiered storage.
Pulsar vs Kafka¶
The key difference: stateless brokers + Apache BookKeeper for storage.
Advantages¶
- Multi-tenancy — native isolation
- Tiered storage — offload to S3
- Geo-replication — built-in
- Pulsar Functions — serverless processing
import pulsar, json
client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('persistent://t/ns/orders')
producer.send(json.dumps(order).encode())
consumer = client.subscribe('persistent://t/ns/orders',
subscription_name='proc', consumer_type=pulsar.ConsumerType.Shared)
while True:
msg = consumer.receive()
process(json.loads(msg.data()))
consumer.acknowledge(msg)
Architecture and Practical Deployment¶
Pulsar’s key architectural advantage lies in the separation of brokers from the storage layer (Apache BookKeeper). Brokers are stateless and can be horizontally scaled independently of data. This simplifies operations like rolling upgrades or adding capacity without moving data.
Pulsar Functions allow message processing directly inside the platform without the need for external stream processing like Flink or Spark. For simple transformations, routing, or enrichment, you can deploy a Python or Java function directly into Pulsar. In practice, Pulsar excels particularly in multi-tenant environments where different teams need isolated namespaces with their own retention policies and rate limits. Tiered storage automatically moves older data to cheap object storage (S3, GCS), reducing hot storage costs.
Summary¶
Pulsar is an alternative to Kafka for multi-tenancy and geo-replication. Separating compute from storage means better scalability.