C175 Ch.1

verickle's version from 2017-08-21 19:27

ETL Process

Question Answer
1Get the data from the source location.
2Map the data from its original form into a data model that is suitable for manipulation at the staging area.
3Validate and clean the data.
4Apply any transformations to the data that are required before the data sets are loaded into the repository.
5Map the data from its staging area model to its loading model.
6Move the data set to the repository.
7Load the data into the warehouse.

File Types

Question Answer
Flat filesFiles having no internal hierarchy
Heap filesFiles containing an unsorted set of records that are uniquely identified by a record id which allows them to be inserted or deleted using that id.
Index filesFiles that store a list of lookup field values from a data file – along with the location (address) in the data file of the corresponding record. Because the lookup field is much smaller than the entire record, the entire index will usually fit in main memory for quick look up. Once the address of the record is obtained from the index, the entire record can then be directly accessed from the data file instead of reading in the entire data file – record by record, in order to locate the desired one.
Hashed filesFiles are encrypted using hash functions that convert data consisting of various formats into numeric values. This allows for faster data lookup without the use of an index file.

Database Types

Question Answer
Operational databaseContains data about the things used to run the day-to-day operations of the business. It is designed mainly to support a company's daily operations.
Centralized databaseStores the entire information and application programs at a single site. The users at different locations access the central database to make processing.
Workgroup databaseSupports fewer than 50 users or a specific department in an organization. It has proprietary data formats and features that are specific to the version to the software.
Analytical databaseStores business, market or project data used in business analysis, and forecasting processes. It is used for tactical or strategic decision making.
Enterprise databaseSupports the entire organization and many users (more than 50, usually hundreds) across different departments. It provides support for present and expected future needs.

Basic File Terminology

Question Answer
DataRaw facts, such as a telephone number, a birth date, a customer name, and a year-to-date (YTD) sales value. Has little meaning unless it has been organized in some logical manner.
FieldA character or group of characters (alphabetic or numeric) that has a specific meaning. Used to define and store data.
RecordA logically connected set of one or more fields that describes a person, place, or thing. For example, the customer's name, address, phone number, date of birth, credit limit, and unpaid balance.
FileA collection of related records. For example, data about the students currently enrolled at Gigantic University.


Question Answer
HardwareAll of the system's physical devices.
Operating systemSoftware manages all hardware components and makes it possible for all other software to run on the computers.
DBMS softwareManages the database within the database system.
Application programs and utility softwareUsed to access and manipulate data in the DBMS and to manage the computer environment in which data access and manipulation take place.
System administratorsOversee the database system's general operations.
Database administratorsManage the DBMS and ensure that the database is functioning properly.
Database designersDesign the database structure.
System analysts and programmersDesign and implement the application programs.
End usersThe people who use the application programs to run the organization's daily operations.
ProceduresThe instructions and rules that govern the design and use of the database system.
DataThe collection of facts stored in the database.

DBMS Functions

Question Answer
Data dictionary managementThe DBMS stores definitions of the data elements and their relationships (metadata) in a data dictionary. In turn, all programs that access the data in the database work through the DBMS.
Data storage managementThe DBMS creates and manages the complex structures required for data storage, thus relieving you from the difficult task of defining and programming the physical data characteristics.
Data transformation and presentationThe DBMS transforms entered data to conform to required data structures. The DBMS relieves you of the chore of distinguishing between the logical data format and the physical data format.
Security managementThe DBMS creates a security system that enforces user security and data privacy.
Multiuser access controlTo provide data integrity and data consistency, the DBMS uses sophisticated algorithms to ensure that multiple users can access the database concurrently without compromising its integrity.
Backup and recovery managementThe DBMS provides backup and data recovery to ensure data safety and integrity.
Data integrity managementThe DBMS promotes and enforces integrity rules, thus minimizing data redundancy and maximizing data consistency.
Database access languages and application programming interfacesThe DBMS provides data access through a query language.
Database communication interfacesA current-generation DBMS accepts end-user requests via multiple, different network environments.