Currently we are developing quite a big application which will have to work with some huge amounts of records.
The idea is that the e-mails will have to be stored (with attachments) and via a web-api users should be able to search in their stored e-mails. Users should be able to search (within their own messages they have exported into the database/storage) on at least the following items:
- from
- to
- subject
- date (range)
- attachments (names & types only)
- message contents
- (optional) mailbox / folder structure
The application should be able to work with big numbers of users and extreme numbers of e-mails (easily growing from millions to billions). The users should be able to download the whole originals message (with attachments) so they can import it into their email client.
I was thinking about indexing the e-mails into a database, and just storing the full e-mail with attachments with a unique key as a package
into a seperate storage. With this way I should keep the database load as low as possible and therefore the search as quick as possible.
I have found several database schemas for handling e-mail like this. I couldn't find any database that is able to handle with hundreds of millions and maybe even billions of records (e-mails).
Is this the most efficient way to keep it simple, efficient and fast or am I forgetting anything?
// editThe idea is to run this on the amazon cloud (perhaps any suggestions related to it?)