As a real estate marketplace, data is at the core of everything we do. Property buyers, renters, and sellers rely on ZipMatch to provide trustworthy data about pricing, market trends and property features. Professionals rely on data for contacting relevant buyers and renters. Developers, academics and government institutions rely on our data for insights into the Philippine real estate industry. In short, data is pretty darn important to us. And where is this data typically stored? Databases of course.
In this post, we’ll discuss our three major database technologies at ZipMatch and their importance to our goal of bringing reliability and transparency in the Philippine real estate industry.
1 – MYSQL on RDS
MySQL is still the heart of our web application and will remain so for the foreseeable future. In 2014, we moved MYSQL into AWS’s RDS service for a few reasons.
For one, it’s always good practice to separate databases from web servers to ensure there isn’t resource contention. This also allowed us to scale web servers horizontally.
Secondly, RDS allowed us to essentially stop managing MYSQL altogether. Automated backups, version updates and even master / slave replication became something RDS handled for us. This means the ZipMatch engineering team doesn’t waste time doing operational database tasks anymore and can focus resources into developing a superior product to our customers.
And finally, performance. As our database got busier and busier, vertical scaling was important. Upgrading to more powerful instances and using SSD based storage etc.
2 – Dynamo DB
Our lead lifecycle is a long “decision-tree” style service which serves to route, score, and perform various fraud-checks on all inquiries ZipMatch receives. This is a necessary step in order to provide a quality, spam-free experience between our home seekers and sellers.
We chose Dynamo in order to store a lot of data, without much relational structure. Since all entries are simple key / value, Dynamo was a great way to write data quickly in a flat structure. The end result is we have a full audit history of each lead available in dynamo.
The Lead lifecycle uses `CQRS` (Command Query Responsibility Segregation) and `EventSourcing` patterns with Dynamo for persistence. We use SQS like a `Command Bus` and push events as rich JSON documents with contextual data to allow command handlers to be decoupled.
3 – Redshift
And last but not least, our Data warehouse weapon-of-choice, Redshift. We use Amazon Redshift to take large amounts of data from different sources, and compile them in one large database for reporting, dashboards, detailed analytics etc.
We run nightly ETL jobs from our MySQL / RDS databases mentioned above, as well as perform occasional imports from 3rd party systems.
We see three benefits of using a data warehouse. One, there’s a central repository of information. Ideally, in a data warehouse you combine multiple datasets to correlate information.
Two, it’s a separate environment for reporting and analysis. Running giant queries on production servers is bad form, even if they are read-only. A data warehouse gives you a safe environment to run punishing queries.
Three, performance. Redshift is tuned to scale with data size, and can easily support multi-million row datasets. The speed / scale of working with a proper Data Warehouse technology saves a lot of time and cost in the long run.
To conclude, as ZipMatch continues to innovate, it is only necessary for us to continue to leverage different database technologies for different problems. For a company like ours that aims to be a reliable and trustworthy real estate marketplace, it’s imperative that data be accessible, secure and performant.