Michael's Interest in Database Management System
Amazon
A database is just a file (or a set of many files) that stores data, and a database management
system, or commonly known as DBMS, is a software application that interacts with
databases and their users. Essentially you can read and write data to the
database, and it seems so simple this way. Each transaction consists of some
number of write operations and some number of read operations and it is done.
Simple huh? Not quite.
If every transaction were to be executed sequentially, then it is simple as I said.
However we are humans, and as such we are greedy!
We want more than performing a simple sequence of transactions; we want them to
be completed more quickly. That's right; we want SPEED. So somebody figured out
that each transaction can be on its own thread, or better yet, on a remote host.
When transactions are executed in parallel, they tend to finish more quickly
than when they are executed in serial order. That's what we want. Unfortunately
if we pay no attention to the order of the operations that are executed we are
in danger of causing data inconsistency, a terrible disaster in, for example, a financial institution.
For instance, transaction t1 wants to execute r1(x)w1(x) and t2 wants to execute
w2(x)r2(x). If the execution order is t1t2 or t2t1 it is fine. However if they
interleave like r1(x)w2(x)w1(x)r2(x), then this execution must be rolled back. As you can see
t1 reads the value of x, then decides to write to it. t2 butts in and writes to
x some value, and before t1 writes to x, x is a different value from what t1
observed last time.
This situation is not supposed to happen. It's analogous to the following situation.
I see my balance is $100.00, so I deposit $50.00 to it to make it $150.00, but someone
deposits $80.00 to it without my knowledge, and in the end my account has $230.00.
Do you see my point? Checking transactions for data dependency,
unfortunately, is complicated and easily confusing.
In addition, failures are a big problem for distributed database system. A
sample scenario is that several databases are deployed in different geographical
regions, and a user application needs to contact different databases for
different data through the help of a system known as the global transaction
manager.
You can immediately see the problem that can occur: While a transaction
is being executed some database may fail and come back up shortly. You need to
make sure the transaction is aborted and rolled back to the original state and
EVERY database that is involved in this transaction must process accordingly. In
our
GTM project for Berkeley DB we needed to consider several
failures types and make sure none of them results in database inconsistency.
Hard, huh? But we pulled it :)
My experience in this area is obtained in the course I took and the project I
did in that course. Also in the
projects I conducted with HSM,
I had experience using an HSM to do field-level encryption of a
database. I also know SQL quite well.