
Design a high-throughput hash-based lookup to be called inside a tight kernel. Choose between open addressing and chaining, specify the load factor, probe sequence, and table layout to favor vectorization and contiguous accesses. Show how to use bitwise operations, prefetching, and alignment to reduce collisions and cache misses. Explain trade-offs and how you would profile and validate the design.