How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

What difficulty level is this interview question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Bloomberg.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Bloomberg during technical interviews.

Implement streaming enrichment aggregator | Bloomberg Interview Question

Implement streaming enrichment aggregator

Company: Bloomberg

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are building a real-time document enrichment aggregator for a Kafka-like event stream. Each incoming event has the following fields: - `document_id`: string - `enricher_id`: string - `enrichment`: any JSON-compatible value Example events: ```json {"document_id": "doc1", "enricher_id": "Sentiment Analysis", "enrichment": "positive"} {"document_id": "doc2", "enricher_id": "Entity Linking", "enrichment": ["AAPL"]} {"document_id": "doc1", "enricher_id": "Topic Classification", "enrichment": ["AUTO", "ANALYST_CHANGE"]} {"document_id": "doc1", "enricher_id": "Entity Linking", "enrichment": ["TSLA"]} {"document_id": "doc3", "enricher_id": "Summarization", "enrichment": "Yesterday the Yankees won against the Mets..."} ``` The aggregator is configured with a required set of enrichers, for example: ```python ["Sentiment Analysis", "Entity Linking"] ``` Implement an in-memory Python aggregator with the following behavior: 1. Consume events one at a time. 2. Ignore events whose `enricher_id` is not in the configured required set. 3. For each `document_id`, keep the latest enrichment value for each required enricher. 4. As soon as a document has received all required enrichments, publish exactly one aggregated output for that document. 5. If 10 seconds have passed since the first relevant event for a document and not all required enrichments have arrived, publish the partial result exactly once. 6. After a document has been published, any later events for that same document should be ignored. 7. After publishing, clean up any in-memory state for that document. 8. Assume there is no background thread. Timeout handling should be done lazily when new events arrive, or through an explicit timeout-check method. The published output should have this form: ```json { "document_id": "doc1", "enricher_ids": ["Sentiment Analysis", "Entity Linking"], "enrichments": { "Sentiment Analysis": "positive", "Entity Linking": ["TSLA"] } } ``` Write the Python implementation and explain any important edge cases you would consider.

Quick Answer: This question evaluates competency in stateful stream processing, event-driven aggregation, in-memory state management, time-based expiry, and producing idempotent outputs.

You are building a real-time document enrichment aggregator for a Kafka-like event stream.

Each incoming event has the following fields:

document_id : string
enricher_id : string
enrichment : any JSON-compatible value

Example events:

{"document_id": "doc1", "enricher_id": "Sentiment Analysis", "enrichment": "positive"}
{"document_id": "doc2", "enricher_id": "Entity Linking", "enrichment": ["AAPL"]}
{"document_id": "doc1", "enricher_id": "Topic Classification", "enrichment": ["AUTO", "ANALYST_CHANGE"]}
{"document_id": "doc1", "enricher_id": "Entity Linking", "enrichment": ["TSLA"]}
{"document_id": "doc3", "enricher_id": "Summarization", "enrichment": "Yesterday the Yankees won against the Mets..."}

The aggregator is configured with a required set of enrichers, for example:

["Sentiment Analysis", "Entity Linking"]

Implement an in-memory Python aggregator with the following behavior:

Consume events one at a time.
Ignore events whose enricher_id is not in the configured required set.
For each document_id , keep the latest enrichment value for each required enricher.
As soon as a document has received all required enrichments, publish exactly one aggregated output for that document.
If 10 seconds have passed since the first relevant event for a document and not all required enrichments have arrived, publish the partial result exactly once.
After a document has been published, any later events for that same document should be ignored.
After publishing, clean up any in-memory state for that document.
Assume there is no background thread. Timeout handling should be done lazily when new events arrive, or through an explicit timeout-check method.

The published output should have this form:

{
  "document_id": "doc1",
  "enricher_ids": ["Sentiment Analysis", "Entity Linking"],
  "enrichments": {
    "Sentiment Analysis": "positive",
    "Entity Linking": ["TSLA"]
  }
}

Write the Python implementation and explain any important edge cases you would consider.

Implement streaming enrichment aggregator

Company: Bloomberg

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Quick Answer: This question evaluates competency in stateful stream processing, event-driven aggregation, in-memory state management, time-based expiry, and producing idempotent outputs.

{"document_id": "doc1", "enricher_id": "Sentiment Analysis", "enrichment": "positive"} {"document_id": "doc2", "enricher_id": "Entity Linking", "enrichment": ["AAPL"]} {"document_id": "doc1", "enricher_id": "Topic Classification", "enrichment": ["AUTO", "ANALYST_CHANGE"]} {"document_id": "doc1", "enricher_id": "Entity Linking", "enrichment": ["TSLA"]} {"document_id": "doc3", "enricher_id": "Summarization", "enrichment": "Yesterday the Yankees won against the Mets..."}

Implement streaming enrichment aggregator

Quick Overview

Comments (0)

Implement streaming enrichment aggregator

Quick Overview

Comments (0)