ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Streaming Data Architecture

Design real-time data pipelines that process events as they occur. Covers stream processing frameworks, exactly-once semantics, windowing, stateful processing, and the patterns that make streaming architecture production-ready.

Batch processing answers questions about the past. Streaming answers questions about the present. When your business needs to detect fraud as it happens, update dashboards in real-time, or trigger alerts within seconds, batch processing is too slow. Streaming architecture processes events as they arrive, enabling sub-second response times.


Streaming vs Batch

Batch Processing:
  Collect all orders from today → Process overnight → Dashboard updated at 6 AM
  Latency: Hours
  Use case: Reporting, analytics, ML training

Stream Processing:
  Each order processed immediately → Dashboard updated in real-time
  Latency: Milliseconds to seconds
  Use case: Fraud detection, real-time analytics, alerting

Lambda Architecture (both):
  Batch layer: Historical accuracy (reprocessable)
  Speed layer: Real-time approximation
  Serving layer: Merge batch + speed

Kappa Architecture (streaming only):
  Everything is a stream
  Historical data = replay the stream
  One pipeline to maintain, not two

Stream Processing Frameworks

FrameworkBest ForLatencyComplexity
Apache Kafka StreamsJVM-native, stateful processingMillisecondsMedium
Apache FlinkComplex event processing, large scaleMillisecondsHigh
Apache Spark Structured StreamingUnified batch + streamSub-secondMedium
Amazon Kinesis Data AnalyticsAWS-native, managedSub-secondLow
Dataflow (Google)GCP-native, auto-scalingSub-secondMedium

Windowing

Window Types

Tumbling Window (fixed, non-overlapping):
  [0:00 - 0:05] [0:05 - 0:10] [0:10 - 0:15]
  Use: 5-minute aggregations

Sliding Window (fixed, overlapping):
  [0:00 - 0:05] [0:01 - 0:06] [0:02 - 0:07]
  Use: Moving averages

Session Window (gap-based):
  [login...click...click...[5 min gap]...click...logout]
  Use: User session analytics

Hopping Window:
  Size: 10 min, hop: 5 min → [0:00-0:10] [0:05-0:15]
  Use: Overlapping aggregations
DataStream<OrderEvent> orders = env.addSource(kafkaConsumer);

orders
    .keyBy(OrderEvent::getCustomerId)
    .window(TumblingEventTimeWindows.of(Time.minutes(5)))
    .aggregate(new OrderAggregator())
    .addSink(dashboardSink);

// For late events
orders
    .keyBy(OrderEvent::getCustomerId)
    .window(TumblingEventTimeWindows.of(Time.minutes(5)))
    .allowedLateness(Time.minutes(1))  // Accept events up to 1 min late
    .sideOutputLateData(lateOutputTag)  // Route late data separately
    .aggregate(new OrderAggregator());

Stateful Processing

// Flink: Fraud detection with state
public class FraudDetector extends KeyedProcessFunction<String, Transaction, Alert> {
    
    private ValueState<Boolean> flagState;
    private ValueState<Long> timerState;
    
    @Override
    public void processElement(Transaction tx, Context ctx, Collector<Alert> out) {
        Boolean previouslyFlagged = flagState.value();
        
        if (previouslyFlagged != null && tx.getAmount() > 500) {
            // Small transaction followed by large transaction → fraud alert
            out.collect(new Alert(tx.getAccountId(), tx));
            cleanup(ctx);
        }
        
        if (tx.getAmount() < 1.00) {
            // Flag small "test" transactions
            flagState.update(true);
            
            // Set timer: if no large transaction in 1 minute, clear flag
            long timer = ctx.timerService().currentProcessingTime() + 60_000;
            ctx.timerService().registerProcessingTimeTimer(timer);
            timerState.update(timer);
        }
    }
    
    @Override
    public void onTimer(long timestamp, OnTimerContext ctx, Collector<Alert> out) {
        cleanup(ctx);  // No large transaction within 1 minute
    }
}

Exactly-Once Semantics

At-Most-Once:
  Process message → don't track → if crash, message lost
  Use: Metrics where occasional loss is acceptable

At-Least-Once:
  Process message → crash before commit → replay → process again (duplicate)
  Use: With idempotent operations (upsert)

Exactly-Once:
  Process message + update state + commit offset atomically
  Use: Financial transactions, critical business events
  
Kafka implementation:
  Transactional producer + consumer isolation.level=read_committed
  Flink checkpointing with Kafka offsets

Anti-Patterns

Anti-PatternConsequenceFix
Ignoring late eventsData loss, inaccurate windowsAllowed lateness + late data handling
No backpressure handlingOOM, pipeline crash under loadBackpressure propagation, rate limiting
State without checkpointingState lost on failureRegular checkpointing to durable storage
Streaming everythingUnnecessary complexity for batch use casesStream what needs real-time, batch the rest
No dead letter queueMalformed events block pipelineRoute failures to DLQ for investigation

Streaming architecture is about matching the speed of your data processing to the speed your business needs. Not everything needs to be real-time — but the things that do need to be reliably real-time.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →