Following the technology progress in the areas
of processors, computer memory, computer
storage and computer networks , the sizes,
capabilities, and performance of databases and
their respective DBMSs have grown in orders of
magnitude. The development of database
technology can be divided into three eras based
on data model or structure: navigational , [5]
SQL/relational, and post-relational.
The two main early navigational data models
were the hierarchical model, epitomized by
IBM's IMS system, and the CODASYL model
( network model ), implemented in a number of
products such as IDMS.
The relational model, first proposed in 1970 by
Edgar F. Codd , departed from this tradition by
insisting that applications should search for
data by content, rather than by following links.
The relational model employs sets of ledger-
style tables, each used for a different type of
entity. Only in the mid-1980s did computing
hardware became powerful enough to allow the
wide deployment of relational systems (DBMSs
plus applications). By the early 1990s, however,
relational systems dominated in all large-scale
data processing applications, and as of 2014
they remain dominant except in niche areas.
The dominant database language, standardised
SQL for the relational model, has influenced
database languages for other data
models. [ citation needed ]
Object databases developed in the 1980s to
overcome the inconvenience of object-relational
impedance mismatch, which led to the coining
of the term "post-relational" and also the
development of hybrid object-relational
databases.
The next generation of post-relational databases
in the late 2000s became known as NoSQL
databases, introducing fast key-value stores and
document-oriented databases. A competing
"next generation" known as NewSQL databases
attempted new implementations that retained
the relational/SQL model while aiming to match
the high performance of NoSQL compared to
commercially available relational DBMSs.
1960s, navigational DBMS
Basic structure of navigational CODASYL
database model.
Further information: Navigational database
The introduction of the term database coincided
with the availability of direct-access storage
(disks and drums) from the mid-1960s
onwards. The term represented a contrast with
the tape-based systems of the past, allowing
shared interactive use rather than daily batch
processing. The Oxford English dictionary cites
[6] a 1962 report by the System Development
Corporation of California as the first to use the
term "data-base" in a specific technical sense.
As computers grew in speed and capability, a
number of general-purpose database systems
emerged; by the mid-1960s a number of such
systems had come into commercial use.
Interest in a standard began to grow, and
Charles Bachman , author of one such product,
the Integrated Data Store (IDS), founded the
"Database Task Group" within CODASYL, the
group responsible for the creation and
standardization of COBOL . In 1971 the Database
Task Group delivered their standard, which
generally became known as the "CODASYL
approach", and soon a number of commercial
products based on this approach entered the
market.
The CODASYL approach relied on the "manual"
navigation of a linked data set which was
formed into a large network. Applications could
find records by one of three methods:
use of a primary key (known as a CALC key,
typically implemented by hashing)
navigating relationships (called sets) from
one record to another
scanning all the records in a sequential
order.
Later systems added B-Trees to provide
alternate access paths. Many CODASYL
databases also added a very straightforward
query language. However, in the final tally,
CODASYL was very complex and required
significant training and effort to produce useful
applications.
IBM also had their own DBMS in 1968, known
as IMS . IMS was a development of software
written for the Apollo program on the
System/360 . IMS was generally similar in
concept to CODASYL, but used a strict
hierarchy for its model of data navigation
instead of CODASYL's network model. Both
concepts later became known as navigational
databases due to the way data was accessed,
and Bachman's 1973 Turing Award presentation
was The Programmer as Navigator. IMS is
classified [ by whom? ] as a hierarchical
database . IDMS and Cincom Systems' TOTAL
database are classified as network databases.
IMS remains in use as of 2014. [7]
1970s, relational DBMS
Edgar Codd worked at IBM in San Jose,
California , in one of their offshoot offices that
was primarily involved in the development of
hard disk systems. He was unhappy with the
navigational model of the CODASYL approach,
notably the lack of a "search" facility. In 1970,
he wrote a number of papers that outlined a
new approach to database construction that
eventually culminated in the groundbreaking A
Relational Model of Data for Large Shared Data
Banks . [8]
In this paper, he described a new system for
storing and working with large databases.
Instead of records being stored in some sort of
linked list of free-form records as in CODASYL,
Codd's idea was to use a " table " of fixed-length
records, with each table used for a different
type of entity. A linked-list system would be
very inefficient when storing "sparse" databases
where some of the data for any one record
could be left empty. The relational model solved
this by splitting the data into a series of
normalized tables (or relations ), with optional
elements being moved out of the main table to
where they would take up room only if needed.
Data may be freely inserted, deleted and edited
in these tables, with the DBMS doing whatever
maintenance needed to present a table view to
the application/user.
In the relational model, related records are
linked together with a "key"
The relational model also allowed the content of
the database to evolve without constant
rewriting of links and pointers. The relational
part comes from entities referencing other
entities in what is known as one-to-many
relationship, like a traditional hierarchical
model, and many-to-many relationship, like a
navigational (network) model. Thus, a relational
model can express both hierarchical and
navigational models, as well as its native
tabular model, allowing for pure or combined
modeling in terms of these three models, as the
application requires.
For instance, a common use of a database
system is to track information about users, their
name, login information, various addresses and
phone numbers. In the navigational approach all
of these data would be placed in a single
record, and unused items would simply not be
placed in the database. In the relational
approach, the data would be normalized into a
user table, an address table and a phone
number table (for instance). Records would be
created in these optional tables only if the
address or phone numbers were actually
provided.
Linking the information back together is the key
to this system. In the relational model, some bit
of information was used as a " key ", uniquely
defining a particular record. When information
was being collected about a user, information
stored in the optional tables would be found by
searching for this key. For instance, if the login
name of a user is unique, addresses and phone
numbers for that user would be recorded with
the login name as its key. This simple "re-
linking" of related data back into a single
collection is something that traditional
computer languages are not designed for.
Just as the navigational approach would require
programs to loop in order to collect records, the
relational approach would require loops to
collect information about any one record.
Codd's solution to the necessary looping was a
set-oriented language, a suggestion that would
later spawn the ubiquitous SQL. Using a branch
of mathematics known as tuple calculus, he
demonstrated that such a system could support
all the operations of normal databases
(inserting, updating etc.) as well as providing a
simple system for finding and returning sets of
data in a single operation.
Codd's paper was picked up by two people at
Berkeley, Eugene Wong and Michael
Stonebraker . They started a project known as
INGRES using funding that had already been
allocated for a geographical database project
and student programmers to produce code.
Beginning in 1973, INGRES delivered its first
test products which were generally ready for
widespread use in 1979. INGRES was similar to
System R in a number of ways, including the
use of a "language" for data access, known as
QUEL . Over time, INGRES moved to the
emerging SQL standard.
IBM itself did one test implementation of the
relational model, PRTV , and a production one,
Business System 12, both now discontinued.
Honeywell wrote MRDS for Multics , and now
there are two new implementations: Alphora
Dataphor and Rel. Most other DBMS
implementations usually called relational are
actually SQL DBMSs.
In 1970, the University of Michigan began
development of the MICRO Information
Management System[9] based on D.L. Childs'
Set-Theoretic Data model. [10][11][12] Micro
was used to manage very large data sets by
the US Department of Labor , the U.S.
Environmental Protection Agency, and
researchers from the University of Alberta , the
University of Michigan , and Wayne State
University . It ran on IBM mainframe computers
using the Michigan Terminal System. [13] The
system remained in production until 1998.
of processors, computer memory, computer
storage and computer networks , the sizes,
capabilities, and performance of databases and
their respective DBMSs have grown in orders of
magnitude. The development of database
technology can be divided into three eras based
on data model or structure: navigational , [5]
SQL/relational, and post-relational.
The two main early navigational data models
were the hierarchical model, epitomized by
IBM's IMS system, and the CODASYL model
( network model ), implemented in a number of
products such as IDMS.
The relational model, first proposed in 1970 by
Edgar F. Codd , departed from this tradition by
insisting that applications should search for
data by content, rather than by following links.
The relational model employs sets of ledger-
style tables, each used for a different type of
entity. Only in the mid-1980s did computing
hardware became powerful enough to allow the
wide deployment of relational systems (DBMSs
plus applications). By the early 1990s, however,
relational systems dominated in all large-scale
data processing applications, and as of 2014
they remain dominant except in niche areas.
The dominant database language, standardised
SQL for the relational model, has influenced
database languages for other data
models. [ citation needed ]
Object databases developed in the 1980s to
overcome the inconvenience of object-relational
impedance mismatch, which led to the coining
of the term "post-relational" and also the
development of hybrid object-relational
databases.
The next generation of post-relational databases
in the late 2000s became known as NoSQL
databases, introducing fast key-value stores and
document-oriented databases. A competing
"next generation" known as NewSQL databases
attempted new implementations that retained
the relational/SQL model while aiming to match
the high performance of NoSQL compared to
commercially available relational DBMSs.
1960s, navigational DBMS
Basic structure of navigational CODASYL
database model.
Further information: Navigational database
The introduction of the term database coincided
with the availability of direct-access storage
(disks and drums) from the mid-1960s
onwards. The term represented a contrast with
the tape-based systems of the past, allowing
shared interactive use rather than daily batch
processing. The Oxford English dictionary cites
[6] a 1962 report by the System Development
Corporation of California as the first to use the
term "data-base" in a specific technical sense.
As computers grew in speed and capability, a
number of general-purpose database systems
emerged; by the mid-1960s a number of such
systems had come into commercial use.
Interest in a standard began to grow, and
Charles Bachman , author of one such product,
the Integrated Data Store (IDS), founded the
"Database Task Group" within CODASYL, the
group responsible for the creation and
standardization of COBOL . In 1971 the Database
Task Group delivered their standard, which
generally became known as the "CODASYL
approach", and soon a number of commercial
products based on this approach entered the
market.
The CODASYL approach relied on the "manual"
navigation of a linked data set which was
formed into a large network. Applications could
find records by one of three methods:
use of a primary key (known as a CALC key,
typically implemented by hashing)
navigating relationships (called sets) from
one record to another
scanning all the records in a sequential
order.
Later systems added B-Trees to provide
alternate access paths. Many CODASYL
databases also added a very straightforward
query language. However, in the final tally,
CODASYL was very complex and required
significant training and effort to produce useful
applications.
IBM also had their own DBMS in 1968, known
as IMS . IMS was a development of software
written for the Apollo program on the
System/360 . IMS was generally similar in
concept to CODASYL, but used a strict
hierarchy for its model of data navigation
instead of CODASYL's network model. Both
concepts later became known as navigational
databases due to the way data was accessed,
and Bachman's 1973 Turing Award presentation
was The Programmer as Navigator. IMS is
classified [ by whom? ] as a hierarchical
database . IDMS and Cincom Systems' TOTAL
database are classified as network databases.
IMS remains in use as of 2014. [7]
1970s, relational DBMS
Edgar Codd worked at IBM in San Jose,
California , in one of their offshoot offices that
was primarily involved in the development of
hard disk systems. He was unhappy with the
navigational model of the CODASYL approach,
notably the lack of a "search" facility. In 1970,
he wrote a number of papers that outlined a
new approach to database construction that
eventually culminated in the groundbreaking A
Relational Model of Data for Large Shared Data
Banks . [8]
In this paper, he described a new system for
storing and working with large databases.
Instead of records being stored in some sort of
linked list of free-form records as in CODASYL,
Codd's idea was to use a " table " of fixed-length
records, with each table used for a different
type of entity. A linked-list system would be
very inefficient when storing "sparse" databases
where some of the data for any one record
could be left empty. The relational model solved
this by splitting the data into a series of
normalized tables (or relations ), with optional
elements being moved out of the main table to
where they would take up room only if needed.
Data may be freely inserted, deleted and edited
in these tables, with the DBMS doing whatever
maintenance needed to present a table view to
the application/user.
In the relational model, related records are
linked together with a "key"
The relational model also allowed the content of
the database to evolve without constant
rewriting of links and pointers. The relational
part comes from entities referencing other
entities in what is known as one-to-many
relationship, like a traditional hierarchical
model, and many-to-many relationship, like a
navigational (network) model. Thus, a relational
model can express both hierarchical and
navigational models, as well as its native
tabular model, allowing for pure or combined
modeling in terms of these three models, as the
application requires.
For instance, a common use of a database
system is to track information about users, their
name, login information, various addresses and
phone numbers. In the navigational approach all
of these data would be placed in a single
record, and unused items would simply not be
placed in the database. In the relational
approach, the data would be normalized into a
user table, an address table and a phone
number table (for instance). Records would be
created in these optional tables only if the
address or phone numbers were actually
provided.
Linking the information back together is the key
to this system. In the relational model, some bit
of information was used as a " key ", uniquely
defining a particular record. When information
was being collected about a user, information
stored in the optional tables would be found by
searching for this key. For instance, if the login
name of a user is unique, addresses and phone
numbers for that user would be recorded with
the login name as its key. This simple "re-
linking" of related data back into a single
collection is something that traditional
computer languages are not designed for.
Just as the navigational approach would require
programs to loop in order to collect records, the
relational approach would require loops to
collect information about any one record.
Codd's solution to the necessary looping was a
set-oriented language, a suggestion that would
later spawn the ubiquitous SQL. Using a branch
of mathematics known as tuple calculus, he
demonstrated that such a system could support
all the operations of normal databases
(inserting, updating etc.) as well as providing a
simple system for finding and returning sets of
data in a single operation.
Codd's paper was picked up by two people at
Berkeley, Eugene Wong and Michael
Stonebraker . They started a project known as
INGRES using funding that had already been
allocated for a geographical database project
and student programmers to produce code.
Beginning in 1973, INGRES delivered its first
test products which were generally ready for
widespread use in 1979. INGRES was similar to
System R in a number of ways, including the
use of a "language" for data access, known as
QUEL . Over time, INGRES moved to the
emerging SQL standard.
IBM itself did one test implementation of the
relational model, PRTV , and a production one,
Business System 12, both now discontinued.
Honeywell wrote MRDS for Multics , and now
there are two new implementations: Alphora
Dataphor and Rel. Most other DBMS
implementations usually called relational are
actually SQL DBMSs.
In 1970, the University of Michigan began
development of the MICRO Information
Management System[9] based on D.L. Childs'
Set-Theoretic Data model. [10][11][12] Micro
was used to manage very large data sets by
the US Department of Labor , the U.S.
Environmental Protection Agency, and
researchers from the University of Alberta , the
University of Michigan , and Wayne State
University . It ran on IBM mainframe computers
using the Michigan Terminal System. [13] The
system remained in production until 1998.
No comments:
Post a Comment