An Introduction to Databases

At its simplest level, a database is a collection of data that is related to each other in one way or another. It is not necessary that this data be electronic; an attendance register that schools used to have before computerization were in the form of a database. They stored the information of every student, their standard, division roll numbers, attendance records, fees paid and other details. Similarly, a telephone directory (anyone remembers these big, bulky books?) used to contain formatted records of everyone who had a telephone connection. Today though, when we say “database,” we usually mean databases that run on computers. Let’s take an example of electronic database. Almost everyone today is familiar with online shopping websites, or online banking. These web applications use databases to store information. An online shopping website uses a database to store customer information like their address, credit card details, and maintains a history of their shopping. Similarly, a banking application keeps a record of each and every transaction that has occurred in a customer’s account. Payrolls, HR systems, and all such software maintain a database to store information of an employee in an organized way. And yes, this is done by using databases. To put it succinctly, any web application that stores some kind of data typically uses a database.

A few Definitions…
Before we discuss databases further, let us understand a few terms that define a database. Don’t worry if you do not understand something; these definitions will make sense when we discuss the types of databases later on.

Data Redundancy
Redundancy is when same data are duplicated in different places (files). Redundancy leads to excess storage use and may cause data inconsistency as well. There are two types of data redundancies:
i. data could be repeated over and over again in the same file
ii. when the same data is stored in different files

Data Inconsistency
Data inconsistency is a condition that occurs when the same data exists in different formats in multiple tables, or when different and conflicting versions of the same data appear in different places. Data inconsistency creates unreliable information, because it will be difficult to determine which version of the information is correct. This leads to a number of problems, including loss of information and incorrect results.

Data Integrity
Data integrity refers to the accuracy and consistency of data stored in a database over its lifecycle. Each time data is replicated or transferred, it should remain intact and unaltered between updates. Data Integrity can be maintained using constraints that define the rules according to which the operations like updation, deletion, insertions etc. have to be performed.

Database Design
Database design is part of a process of data handling and processing. Before you can enter data into the database, the data needs to be collected and cleansed. Once it is stored, you will need to know how to take advantage of it by analyzing it and assessing whether your business is meeting its goals.

Database Management software
A DBMS (also called as Database Systems) is a special software package that is designed to define and manage data within one or more databases. DBMS lets users to create a database, store, manage, update/modify and retrieve data from that database by users or application programs. A database system hides certain details about how data are actually stored and maintained. It provides users with an abstract view of the data. A database system has a set of programs through which users or other programs can access, modify and retrieve the stored data. The DBMS serves as an interface between the database and end users or application programs. Some examples of open source and commercial DBMS include MySQL, Oracle, PostgreSQL, SQL Server, Microsoft Access, and MongoDB.

Types of Databases
Since databases are a systematic collection of data, it stands to reason that there has to be a system in place to organize this data. There are various types of databases and various ways to classify them; for the purpose of this introductory article we will stick only to the most popular types of databases.

Flat File Database
The simplest form of an electronic database is the flat file database. A flat file database typically comprises text files with no markup, representing relational data by separating it with a comma or other delimiter. The end of a row is marked by the new-line character (usually invisible). It is important, if the file is to be read correctly, that each row contain the same number of delimiters. Another kind of flat data file is the fixed-width data file. In such files, all the columns share a fixed width in characters. These flat files can be read by a computer program and manipulated in various ways. Flat file databases are the most easiest to create, but they have almost no protections for data integrity, and they often contain many redundant elements.

Hierarchical Databases
A hierarchical database consists of a collection of data records that are connected to each other through links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records. In other words, hierarchical databases are organized in a tree-like structure. In such a database, one parent table can have many child tables, but no child table can have more than one parent. The best way to visualize a hierarchical database model is to envisage the Windows or Linux file system. Both the systems have a directory under which there can be subdirectories, and in those subdirectories, there can be other subdirectories or files. You navigate through them by following a path. While logical, hierarchical databases have an issue about redundancy, data integrity, and comparability of data.

Relational Databases
A relation is a mathematical concept based on theory and predicate logic. Informally, a relation looks like a table of values. Briefly, in the relational model, data would be organized into tables. A relation typically contains a set of rows. The data elements in each row represent certain facts that correspond to a real-world entity or relationship. Each column has a column header that gives an indication of the meaning of the data items in that column. These repeating columns would be called “keys.” In a relational database, the redundancy is minimized. A bank would enter the customer’s data only once, in one place. Any changes would be made only in one place. The only redundancy allowed is the repetition of a key column (or columns) that is used to create relationships among the tables. This significantly reduces the chances of error and protects the integrity of the data in the database.

Distributed Databases
A distributed database is a set of databases stored on multiple computers. The data on several computers can be simultaneously accessed and modified using a network. Each database server in the distributed database is controlled by its local DBMS, and each cooperates to maintain the consistency of the global database. A distributed database consists of loosely coupled sites that share no physical component. The database systems that run on each site are independent of each other, and transactions may access data at one or more sites.

In Summary

We have barely scratched the surfaces as far as databases are concerned. Databases play an important role in web applications. It is a challenging field that requires complete understanding of database designing principles, and an understanding of the application flow. Huge applications have specialized database experts whose responsibility is to maintain databases by planning, monitoring, and improving performance. If you plan to make a career in databases, you may need expert guidance available at qualified training institutes.