When I moved to the large financial institution, the team I joined managed the market data and trade data storage layer. The engineering problem was deceptively simple to state: store every price tick, every trade execution, and every risk calculation — billions of records per day — and answer analytical queries over them quickly.
The existing system was PostgreSQL. It worked, technically. Queries that needed to run in seconds for trading decisions took minutes. Operational costs for storage were climbing. The database team was spending more time running VACUUM than building features.
Understanding why required understanding what time-series data actually is and why it’s different.
...