April 2019 – My Blog

Separate storage from computation

Traditional data warehouse have a storage capacity constrain by default. Overtime you will get terabytes of data, and optimizing issues

They are also constrain by their batch nature. The faster you get more data, the more the box resources will be used for that purpose.

Large datasets are created one event at a time. If you can fire an event into storage for that, you can get out of the batch processing hell.

Data lake: single source of true storage, ideally as files in cloud services for cheap scalability.

Month: April 2019