👉 RDBMS is a database system that stores structured data in tables and maintains relationships between them using keys.

Teradata is a massively parallel processing (MPP) relational database management system (RDBMS) designed for large-scale data warehousing and analytics.

It is mainly used by enterprises to store and analyze very large volumes of structured data.

=====================================================================

🔹 Data Warehouse – Definition

A Data Warehouse (DW) is a centralized system used to store large volumes of historical, structured data from multiple sources for reporting and analytics.

👉 It is mainly used for business intelligence (BI) and decision-making.
👉 It is optimized for OLAP (Online Analytical Processing).

🔹 Popular Data Warehouse Platforms

Teradata
Amazon Redshift
Snowflake
Google BigQuery
Azure Synapse Analytics

=====================================================================

🔹 Data Mart – Definition

A Data Mart is a subset of a Data Warehouse designed for a specific department or business function (like Sales, Finance, HR).

👉 It contains focused, subject-specific data.
👉 Used for faster reporting for a particular team.

🔹 Example

If a company has a Data Warehouse containing all data:

Sales team needs only sales data
Finance team needs only revenue & expense data

👉 Separate Data Marts are created for each team.

🔹 Types of Data Mart

There are 3 main types of Data Marts:

📌 1️⃣ Dependent Data Mart

Feature	Description
Source	Created from Data Warehouse
Data Quality	High (already cleaned & integrated)
Control	Centralized
Example	Sales Data Mart from Enterprise DW

👉 Most common in large enterprises
👉 Ensures consistency across departments

📌 2️⃣ Independent Data Mart

Feature	Description
Source	Directly from operational systems
Data Quality	May vary
Control	Decentralized
Example	HR team creates own reporting DB

👉 Faster to implement
👉 Risk of data inconsistency

📌 3️⃣ Hybrid Data Mart

Feature	Description
Source	Combination of DW + operational systems
Flexibility	High
Use Case	When some data is in DW and some external

👉 Balanced approach

🔹 Quick Comparison Table

Feature	Dependent	Independent	Hybrid
Source	Data Warehouse	Operational systems	Both
Data Consistency	High	Medium	High
Implementation Speed	Medium	Fast	Medium
Best For	Large organizations	Small teams	Mixed requirements

=====================================================================

Relational Database Terminology

🔷 1. Tables

Core structure for storing data.
Consist of rows (records) and columns (fields).
Supports indexing, primary keys, foreign keys, etc.

Columns - A column always contains the same kind of information or contains similar data.

Row - A row is one instance of all the columns in the table.

Database - The database is a collection of logically related data. Many users access them for different purposes.

Primary Key - The primary key is used to identify a row in a table uniquely. No duplicate values are allowed in a primary key column, and they cannot accept NULL values. It is a mandatory field in a table.

Foreign Key - Foreign keys are used to build a relationship between the tables. A foreign key in a child table is defined as the primary key in the parent table.

A table can have more than one foreign key. It can accept duplicate values and also null values. Foreign keys are optional in a table.

🔷 2. Views

Virtual tables based on one or more tables or views.
Do not store data physically—just SQL definitions.
Use cases:
- Restrict access to certain columns/rows.
- Simplify complex joins/queries.
- Standardize reporting logic.
X-Views: Special views that only show data specific to the accessing user.

🔷 3. Macros

Store predefined SQL commands under a name.
Can accept parameters.
Useful for repeating business logic or securing access.

Example:


CREATE MACRO GetEmployee (emp_id INT)
AS (
   SELECT * FROM Employees WHERE EmployeeID = :emp_id;
);

🔷 4. Triggers

A set of SQL statements that automatically executes in response to data changes (INSERT, UPDATE, DELETE).
Attached to a table.
Used for enforcing rules or auditing changes.

🔷 5. Stored Procedures

A program written in SQL and stored on the database.
Can contain control-of-flow logic (IF, LOOP, etc.).
Can be scheduled, parameterized, or reused.
Example use case: ETL workflows, validations, auditing.
Supports error handling and transaction control.

🔷 6. User-Defined Functions (UDFs)

Custom functions written in C/C++ or SQL.
Types:

Scalar UDFs – return a single value.
Aggregate UDFs – return a summarized value.
Table UDFs – return a table.

Used for:
- Complex calculations.
- Reusability.
- Protecting PII (Personally Identifiable Information).

🔷 7. Indexes

Used to optimize data access.
Types:

Primary Index (UPI/NUPI) – determines data distribution.
Secondary Index – improves access for non-primary keys.
Join Index – speeds up joins.
Hash Index – supports fast lookups (less common now).

=====================================================================

🔹 Teradata Components

Teradata Corporation built Teradata using MPP (Massively Parallel Processing) architecture.

Teradata works using multiple parallel components.

🔹 1️⃣ Parsing Engine (PE)
The Parsing Engine is a virtual processor (vproc) that interprets SQL requests, receives input records, and passes data. To do that, it sends the messages over the BYNET to the AMPs (Access Module Processor).

Function	Explanation
SQL Parsing	Checks syntax
Optimization	Creates execution plan
Session Management	Handles user sessions
Request Distribution	Sends work to AMPs

👉 PE does NOT store data.

🔹 2️⃣ BYNET

This is the message-passing layer or simply the networking layer in Teradata. It receives the execution plan from the parsing engine and passes it to AMPs and the nodes. After that, it gets the processed output from the AMPs and sends it back to the parsing engine.

BYNET is Teradata’s high-speed interconnect network that allows nodes, PEs, and AMPs to communicate internally in a Massively Parallel Processing (MPP) environment.

To maintain adequate availability, the BYNET 0 and BYNET 1 two types of BYNETs are available. This ensures that a secondary BYNET is available in case of the failure of the primary BYNET.

Feature	Explanation
High-speed network	Connects nodes
Message passing	Transfers data between AMPs
Parallel communication	Enables MPP processing

👉 Acts like backbone network.

🔹 3️⃣ AMP (Access Module Processor)

These are the virtual processors of Teradata. They receive the execution plan and the data from the parsing engine. The data will undergo any required conversion, filtering, aggregation, sorting, etc., and will be further sent to the corresponding disks for storage.

The AMP is a virtual processor (vproc) designed for and dedicated to managing a portion of the entire database. It performs all database management functions such as sorting, aggregating, and formatting data. The AMP receives data from the PE, formats rows, and distributes them to the disk storage units it controls. The AMP also retrieves the rows requested by the Parsing Engine.

Table records will be distributed to each AMP for data storage. Only that AMP can read or write data into the disks that have access permission.

Function	Explanation
Data Storage	Stores actual table data
Data Retrieval	Fetches rows
Index Handling	Manages primary index
Parallel Processing	Each AMP works independently

👉 Data is distributed across AMPs.

🔹 4️⃣ Node

The basic unit of a Teradata system is called a Node. Each node has its operating system, CPU memory, a copy of RDBMS software, and some disk space. A single cabinet can have one or more nodes in it.

A Node is a physical server in the Teradata system. It includes both hardware and software components that support the execution of Teradata’s parallel database processing.

Feature	Description
Physical Server	Runs Teradata software
Contains	Multiple AMPs + PE
Scalable	Add nodes to scale system

🔹 Parallel Database Extensions (PDE)

The Parallel Database Extensions is a software interface layer that lies between the operating system and database. It controls the virtual processors (vprocs).

🔹 Disks

Disks are disk drives associated with an AMP that store the data rows. On current systems, they are implemented using a disk array.

🔄 Query Execution Flow in Teradata

Query Submission
A user or application submits an SQL query.
Parsing Engine (PE)
- Parser: Checks SQL syntax and semantics.
- Security: Validates user access.
- Optimizer: Generates the most cost-effective execution plan.
- Dispatcher: Sends the execution steps to BYNET.
BYNET
Routes the steps from the PE to the appropriate AMPs.
AMPs (Access Module Processors)
- Execute the steps (e.g., read/write data).
- Retrieve or update data from their own disks.
- Send results back to the PE.
PE (again)

Assembles the rows.
Sends the final result back to the client.

=====================================================================

🚀 How Teradata Distributes Rows Across AMPs

In Teradata, row distribution is fully automatic, even, and hash-based. The core idea is that each AMP (Access Module Processor) is responsible for a portion of each table’s data, and Teradata ensures the rows are distributed evenly using a hashing algorithm.

🔁 Hash-Based Distribution Mechanism

When a row is inserted into a table:

Teradata hashes the Primary Index column(s) value.
The resulting hash value is mapped to an AMP.
That AMP stores the row on its associated disk.

📌 Hashing is done on Primary Index (PI) — it’s how Teradata determines where to store a row.

🎯 Why Even Distribution Matters

Ensures parallelism is fully utilized.
Avoids skew (when some AMPs hold significantly more rows than others).
Prevents slowest AMP bottleneck in full table operations.

✅ Ideally:

Each table's rows are evenly distributed across all AMPs.
All AMPs perform roughly equal work on large queries.

🔄 Row Distribution Mechanism

Teradata uses a hashing algorithm on the PI column(s).
Hash result determines the AMP where the row will be stored.
Leads to parallelism across AMPs.

🔄 Row Distribution via Hashing in Teradata

📥 How Data Gets Distributed Across AMPs

When a row is inserted into a Teradata table:

The Primary Index (PI) value of the row is input into a hashing algorithm.
The algorithm outputs a 32-bit Row Hash.
The high-order 16 bits are used to find a Hash Bucket Number.
The Hash Map uses that bucket number to determine which AMP will store the row.
The AMP stores the row in its associated vDisk.

=====================================================================

🔹 Index Types in Teradata

In Teradata, indexes are mainly used for data distribution and faster retrieval.

1️⃣ PI (Primary Index)

The Primary Index (PI) is the core mechanism for distributing rows across AMPs and retrieving data efficiently.

Primary Index decides:

Decides where the data is stored (which AMP).
Mandatory (every table must have one).
How data is distributed

👉 Every table MUST have one PI
👉 It does NOT have to be unique

✅ Key Characteristics

Attribute	Description
Purpose	Determines how rows are distributed and accessed
When defined?	At CREATE TABLE time only – cannot be altered later
Max Columns	Can consist of 1 to 64 columns
Stored in	Data Dictionary (DD)

🧠 Best Practices for Choosing a PI

Choose a highly unique column to avoid skew (UPI preferred).
Consider query access patterns (frequent filters, joins).
Try to align PI with foreign keys in child tables for AMP-local joins.
Avoid columns with few distinct values (e.g., Gender, Status) as PI → causes skew.

📈 Why PI Matters for Performance

Every row access by PI is a 1-AMP operation → fastest.
Good PI choice = even AMP workload → maximum parallelism.
Bad PI choice = skew → one AMP overworked, others underutilized.

🚫 Common Mistakes

Using low cardinality columns as PI (e.g., "India", "Yes/No").
Forgetting that PI ≠ PK in Teradata.
Choosing PI without analyzing access patterns.

🔹 2️⃣ UPI (Unique Primary Index)

A Primary Index where column values are unique.

👉 No duplicate values allowed
👉 Best performance
👉 Even data distribution

🔹 3️⃣ NUPI (Non-Unique Primary Index)

Primary Index where duplicates are allowed.

The hashing algorithm still determines the target AMP, but:

All rows with the same NUPI value go to the same AMP.
This can lead to uneven (skewed) distribution.

Example: Skew from Poor NUPI Choice
If you choose Order_Status (values = "C", "O") as a NUPI:

All rows get hashed to just 2 AMPs.
Remaining AMPs stay idle → severe performance degradation.

👉 Most commonly used
👉 Can cause data skew if not chosen properly

If many rows have same cust_id → skew risk.

🔹 4️⃣ SI (Secondary Index)

A Secondary Index (SI) is an additional index created on a column other than the Primary Index to improve query performance.

A Secondary Index (SI) in Teradata is a physical access path to the data that is separate from the Primary Index (PI). It is used to improve query performance when the query's search condition does not use the PI columns.

👉 It does NOT control data distribution
👉 It helps in faster data retrieval
👉 It uses extra storage

🔍 Why Use a Secondary Index?

When queries frequently use non-PI columns in WHERE conditions.
To avoid full table scans.
To reduce query response time.
To optimize joins or aggregations on non-PI columns.

Additional index created for faster search on non-PI columns.

Optional
Improves query performance
Requires extra storage
SI is optional
Used for non-PI columns
Improves read performance
Increases write overhead

🚦 Performance Tips

USI: Great for point queries (e.g., WHERE emp_name = 'Alice')
NUSI: Better for range queries (e.g., WHERE salary BETWEEN 50000 AND 70000)
Too many SIs can increase storage and slow down DML operations like INSERT/UPDATE.

🛠️ How It Works

When you define a SI, Teradata creates a sub-table that holds:
- The index value.
- The row ID (ROWID) of the corresponding data row in the base table.
For USI, the sub-table is hash-distributed.
For NUSI, the sub-table is locally stored on the same AMP as the base table row (no redistribution).

🔹 5️⃣ USI (Unique Secondary Index)

A Unique Secondary Index (USI) is a secondary index created on a column where duplicate values are NOT allowed.

👉 Ensures uniqueness
👉 Used for fast lookup
👉 Stored in a separate index subtable

👉 Secondary index with unique values
👉 Stored in separate subtable
👉 Very fast lookup

🔹 How USI Works Internally

User runs query:


SELECT * FROM customer WHERE email_id = 'rk@gmail.com';

Teradata hashes the USI column
Goes to specific AMP storing USI subtable
Finds RowID
Fetches actual row from base table

👉 Very fast because it’s unique.

🔹 6️⃣ NUSI (Non-Unique Secondary Index)

A Non-Unique Secondary Index (NUSI) is a secondary index created on a column where duplicate values are allowed.

👉 Used for filtering
👉 Improves SELECT performance
👉 Does NOT control data distribution

👉 Allows duplicates
👉 Stored on same AMP
👉 Helps filtering

🔹 How NUSI Works Internally

Query runs:


SELECT * FROM orders WHERE status = 'SHIPPED';

Teradata checks NUSI subtable
Finds list of RowIDs for matching rows
Fetches rows from base table

👉 Since duplicates exist, multiple rows are returned.

✅ What is a NoPI Table?

A NoPI (No Primary Index) Table is a table that does not have a defined Primary Index. This was introduced in Teradata 13.0 to improve bulk load performance and simplify certain workloads.

CREATE TABLE table_x
 	(col_x INTEGER
 	,col_y CHAR(10)
 	,col_z DATE)
NO PRIMARY INDEX;

📦 Why Use NoPI Tables?
Ideal for staging or landing tables in ETL pipelines.
Great for bulk loading large volumes of data quickly.
Often used temporarily before redistributing data into final structured tables.
🧠 Best Practices:
Use NoPI tables for staging or intermediate ETL steps.
Avoid long-term use unless you really do not need joins or indexed access.
After loading, redistribute data into base tables with a PI for performance.

=====================================================================

🔀 Partitioned Primary Index (PPI)

🔹 What is PPI?

PPI is a Teradata feature that divides rows into partitions based on a defined column or expression, while still distributing rows across AMPs based on the Primary Index (PI).

👉 Data is distributed across AMPs (like normal PI)
👉 Within each AMP, data is divided into partitions
👉 Improves range query performance

Key Benefits:

Improves query performance for range-based queries.
Allows partition elimination, where irrelevant partitions are excluded at query time.
Ideal for incremental loads, archiving, and deletions.

⚙️ How PPI Works:

Hashing for Distribution:
- PI → Hash Value → Determines which AMP stores the row.
Partitioning for Storage Order:
- Once inside an AMP, rows are grouped into partitions.
- Within each partition, data is sorted by Row Hash (row ID).
Partition Elimination:

The optimizer analyzes query filters (e.g., WHERE order_date BETWEEN …) and skips partitions that won’t qualify.
This reduces I/O and scan time dramatically.

SELECT * FROM sales WHERE oder_date= DATE '2024-05-10';

👉 Only May partition scanned
👉 Faster query

🔹 When NOT to Use PPI

❌ Small tables
❌ Queries without filter on partition column
❌ Very high insert workloads (slightly more overhead)

🌲 Multilevel Partitioned Primary Index (MLPPI)

🔹 What is MLPPI?

MLPPI allows sub-partitioning — i.e., a partition within a partition — for greater granularity and performance.

Example Use Case:

Partition 1: by Claim_Date
Sub-partition: by State

This improves query efficiency when filters match both levels, enabling multi-level partition elimination.

In Teradata, when using Multi-Level Partitioned Primary Index (MLPPI), you can define the partitions using either:

RANGE_N — for range-based partitioning (like dates, numbers)
CASE_N — for custom logic or discrete values

🧩 Use `RANGE_N` When:

You are partitioning by a range of values (e.g., dates or numeric ranges).
Best for time-series data, like logs or transaction tables.

🧩 Use `CASE_N` When:

You want to partition using categorical values, custom conditions, or non-uniform ranges.
Good when data doesn't fit into neat, equal ranges.

=====================================================================

✅ What is Teradata Columnar (Column Partitioning)?

Column Partitioning (also called Columnar or CPPI – Column-Partitioned Primary Index) stores columns separately instead of storing full rows together.

👉 Traditional Teradata = Row-based storage
👉 Column Partitioning = Column-based storage
👉 Improves performance for analytical queries

It can be applied to:

Tables (especially NoPI tables)
Join Indexes

You can also combine Column Partitioning with Row Partitioning (MLPPI) for a multi-dimensional partitioning strategy.

🔹 Normal Table (Row Storage)

If table has 20 columns:


SELECT col1, col2 FROM big_table;

👉 Teradata reads all 20 columns from disk
👉 Higher I/O

🔹 Column Partitioning

👉 Data stored column-wise
👉 Only required columns read

SELECT col1, col2 FROM big_table;

👉 Only col1 and col2 scanned
👉 Reduced I/O
👉 Faster query

🔹 Row vs Column Partitioning

Feature	Row-Based	Column Partitioning
Storage style	Row	Column
Best for	OLTP	Analytics
I/O for few columns	High	Low
Insert performance	Faster	Slightly slower

🔹 Limitations

❌ Slightly slower inserts
❌ Not ideal for transactional workloads
❌ More complex storage structure

=========================================================================

🔹 Join Index (JI) in Teradata

A Join Index is a system-maintained physical structure that stores pre-joined or pre-aggregated data

of join of two or more tables to speed up query performance.

👉 Reduces repeated join computation
👉 Stored physically like a table
👉 Can be row-based or columnar (Columnar Join Index = CJI)

Eliminate base table joins
Reduce data redistribution
Avoid aggregate processing
Improve query response time

🔷 1. What is a Single-Table Join Index (STJI)?

A Single-Table Join Index is a special type of join index that:

Is defined on only one table.
Uses a different column as its Primary Index (PI) than the base table’s PI.
Helps optimize joins between foreign keys and primary keys—especially in star schema or snowflake models.
Avoids row redistribution and base table access during queries.

🧩 Why Use STJI?

Teradata stores rows based on the PI. If you frequently join on a non-PI column (e.g., a foreign key), Teradata may redistribute data to perform the join.

✅ STJI prevents redistribution by pre-sorting and rehashing rows using the foreign key (or any chosen column).

What This Does:

Creates a copy of the orders table, hashed on customer_id instead of order_id.
When a query joins orders.customer_id to customers.customer_id, Teradata can use this index.
No data redistribution needed because both tables are now hashed on the same column.

🔹 2.Multi-Table Join Index (MTJI) in Teradata

📌 Definition

A Multi-Table Join Index (MTJI) is a Join Index that combines more than two base tables into a precomputed, stored join.

👉 Speeds up queries that join multiple tables frequently
👉 Can be row-based or columnar
👉 Stored physically like a table

🔹 Advantages

✅ Speeds up multi-table queries
✅ Reduces CPU & AMP load
✅ Precomputed join simplifies SQL
✅ Good for data warehouse reporting

🔹 Disadvantages

❌ Uses extra disk space
❌ Slower inserts/updates/deletes on base tables (index maintenance)
❌ Needs careful design to select only necessary columns

🔹 3. Aggregate Join Index (AJI) in Teradata

An Aggregate Join Index (AJI) is a precomputed, stored table that combines aggregation and joins.

👉 Optimized for analytical queries
👉 Stores aggregated values (SUM, COUNT, AVG, etc.)
👉 Can be row-based or columnar
👉 Reduces query processing time on large fact tables

🔹 Advantages

✅ Improves query performance for aggregation queries
✅ Reduces CPU and I/O on large tables
✅ Simplifies complex queries for users
✅ Can be combined with columnar storage for further optimization

🔹 Disadvantages

❌ Extra storage required
❌ Insert/Update/Delete slower due to maintenance
❌ Needs careful selection of aggregation columns

🔹 4. Columnar Join Index (CJI) in Teradata

A Columnar Join Index (CJI) is a precomputed, columnar storage index based on a join of one or more tables to improve query performance for analytical workloads.

👉 Combines benefits of Join Index (prejoin data) and Columnar storage (column-based access)
👉 Speeds up queries with aggregations, filters, and joins

🔹 How Columnar Join Index Works

1️⃣ Base tables: sales and customer

2️⃣ Query often executed:


SELECT c.region, SUM(s.amount)
FROM sales s
JOIN customer c
ON s.cust_id = c.cust_id
GROUP BY c.region;

3️⃣ CJI created:


CREATE JOIN INDEX cji_sales_customer AS
SELECT s.amount, c.region
FROM sales s
JOIN customer c
ON s.cust_id = c.cust_id
PRIMARY INDEX (cust_id)
PARTITION BY COLUMN;

Stores only needed columns (amount and region)
Precomputed join stored in columnar format
Queries hitting CJI skip join computation

🔹 Advantages

✅ Reduces query I/O
✅ Speeds up aggregation & join queries
✅ Only scans required columns
✅ Reduces CPU and AMP workload

🔹 Disadvantages

❌ Requires extra disk space
❌ Insert/Update/Delete may have slight overhead
❌ Needs careful column selection

=========================================================================

🔍 What is a Full Table Scan (FTS)?

A Full Table Scan happens when Teradata (or any database engine) reads every row in a table to find rows that match a query’s condition—because no index or partitioning strategy can narrow it down.

⚠️ This is often the least efficient way to access data, especially for large tables.

❌ What happens here?

The Primary Index (PI) is on emp_id, but the query filters on dept_id, which is not indexed.
Result: Teradata must read every row to find those where dept_id = 10 → Full Table Scan.

=========================================================================

✅ Best Practices

Choose PKs based on business rules and uniqueness.
Choose PIs based on query patterns, join usage, and data distribution.
Don't always assume PK = PI. Analyze based on performance needs.

=====================================================================

🔹 Compression in Teradata

In Teradata, Compression is used to reduce storage space and improve query performance.

👉 It reduces disk usage
👉 Improves I/O performance
👉 Saves storage cost

✅ It’s especially effective in large tables with:

Repeating values (like status = ‘Active’)
Nullable columns with many nulls
Low-cardinality columns (few distinct values)

🧱 How `COMPRESS` Works

Instead of storing the full value for every row, Teradata:

Stores common/repeating values only once in the table header.
Rows with those values point to the compressed value.
Values not listed in COMPRESS are stored normally.

🔹 1️⃣ Multi Value Compression (MVC)

📌 What It Does

Stores frequently repeating values using internal codes.

Multi-value compression has three variations:

COMPRESS – ONLY Nulls are compressed.
COMPRESS NULL – Nulls are compressed. (COMPRESS & COMPRESS NULL mean the same thing)
COMPRESS <constant> – Nulls and the specified <constant> value are compressed.

📌 Example


CREATE TABLE employee (
  gender CHAR(1) COMPRESS ('M','F'),
  status VARCHAR(10) COMPRESS ('ACTIVE','INACTIVE')
);

🔹 Benefit

If 80% rows have ACTIVE, storage is reduced.

🔹 2️⃣ Block Level Compression (BLC)

Performs compression on whole data blocks at the file system level before they are written to storage.

👉 Transparent to users.

Feature	Description
Works at	Data block level
Compresses	Entire block of rows
Best For	Large tables
Enabled	Automatically in newer versions

🔹 3️⃣ Algorithmic Compression (ALC)

When column values are mostly unique, Algorithmic Compression can provide better compression results than Multi-value Compression. You can define your own compression and decompression algorithms and apply them to data at the column level.

A user can create their own compression algorithms as external C/C++ scalar UDFs, and then specify them in the column definition of a CREATE TABLE/ALTER TABLE statement. Teradata invokes these algorithms to compress and decompress the data when the data is inserted into or retrieved from the table.

ALC allows you to implement the compression scheme that is most suitable for data in a particular column. The cost of compression and decompression depends on the algorithm chosen.

Feature	Description
Uses algorithm	To compress numeric patterns
Works for	Dates, integers
Storage reduction	Higher than MVC sometimes

=====================================================================

🔹 Space in Teradata

In Teradata, Space means the disk storage allocated to a database or user to store tables, indexes, temporary data, etc.

👉 Space is allocated at Database/User level
👉 Managed automatically across AMPs
👉 Measured in bytes (KB, MB, GB)

🔍 1. Perm Space

Permanent Space (Perm Space) is the disk space allocated to store permanent database objects like tables, indexes, views, macros, etc.

👉 Stores actual business data
👉 Not automatically cleared
👉 Allocated to Database/User
👉 Distributed across AMPs

Definition: Maximum allowed space a user/database can occupy permanently.
Not pre-allocated, only consumed as needed.
Used for:
- Table data (rows)
- Stored procedures
- User-Defined Functions (UDFs)
Required for creating tables.

Feature	Description
Stores	Tables, indexes, views
Allocated to	Database/User
Remains until	Object is dropped
Most important	Yes

🔍 2. Spool Space

Spool Space is temporary disk space used by Teradata to store intermediate results while executing a query.

👉 Used during SELECT, JOIN, GROUP BY, ORDER BY
👉 Automatically cleared after query finishes
👉 If exceeded → query fails

Definition: Temporary space used during SQL execution.
Used for:
- Sorting
- Joining
- Aggregation
- Storing query results
Reclaimed automatically when the query ends.
Inherits limits from parent user/database unless explicitly set.

📌 Example:

You have 100 GB total disk space.
10 GB used for user data
10 GB for overhead
❓ Available spool space?
✅ Answer: 80 GB

Feature	Description
Used for	Intermediate query results
Automatically freed	After query ends
Very important	Yes (query may fail if exceeded)

🔍 3. Temp Space

Temp Space (Temporary Space) is the disk space used to store Global Temporary Tables (GTTs) during a user session.

👉 Used only for temporary tables
👉 Data is session-specific
👉 Automatically cleared after session ends

Tables created in Temp Space survive restarts (not the data).
Treated like Perm Space, but temporarily allocated for session data.

Feature	Description
Used for	Global Temporary Tables
Released	At session end
Stored in	Temp Space allocation

4️⃣ Journal Space

Journal Space is the disk space used to store before and after images of rows for data recovery purposes.

👉 Helps in recovering data if transaction fails
👉 Used for rollback and permanent journal

Feature	Description
Used for	Recovery
Stores	Before/After image of rows
Optional	Can be disabled

=====================================================================

📌 What is `COLLECT STATISTICS` in Teradata?

COLLECT STATISTICS (a.k.a. COLLECT STATS) gathers data demographics (like row count, value distribution, uniqueness) on columns, indexes, or partitions.

Collect Statistics is a process in Teradata that gathers metadata about table columns or indexes to help the optimizer choose the best execution plan.

✅ These stats are used by the Teradata Optimizer to:

Choose optimal query plans
Decide whether to use an index or full table scan
Improve join strategies, aggregations, and access paths

💡 Why It Matters

Without stats, Teradata assumes uniform distribution and guesses row counts, which can result in:

Wrong join order
Full table scans
Skewed AMP workload
Longer response times

🔹 Why We Use Collect Statistics

Scenario	Benefit
Joins on large tables	Helps optimizer choose best join method (merge/hash/nested)
Aggregation queries	Chooses fastest access path
Skewed data	Helps optimizer avoid skew
Where / Group By / Order By	Provides accurate row estimates
Indexes	Helps optimizer decide which index to use

🔹 How It Helps Optimizer

Determines row count and value distribution
Helps decide join strategy:
- Merge Join
- Hash Join
- Product Join
Helps in AMP assignment
Reduces spool space usage

📊 What Does It Collect?

Distinct values count
Null count
Min/Max values
Frequency histogram
Row count
Skew factor

These help the optimizer estimate:

Selectivity (how many rows a filter returns)
Join cardinality
Join cost and order
Whether to use an index, join index, or full table scan

=====================================================================

🔐 Teradata Data Protection Features

Teradata provides multi-layered protection, both at the system level (hardware & OS) and database level (AMP-level redundancy, journaling, and transaction integrity).

🔁 Real-world analogy

Imagine a shopping mall with multiple backup generators (nodes) and alternate power cables (buses). If one cable or generator fails, power (data processing) continues seamlessly.

🗃️ DATABASE-LEVEL PROTECTION

1. Fallback

Each row stored twice on different AMPs (primary + fallback copy).
Fallback is a data protection feature in Teradata that ensures AMP-level fault tolerance by keeping a duplicate copy of each row of a table on a different AMP in the same cluster.
If one AMP goes down, fallback AMP serves the data.

✅ Transparent to users, but requires:

2x disk space
Extra I/O for write operations

🔁 Real-world analogy:

Imagine every important document is kept in two different filing cabinets. If one cabinet is lost in a fire (AMP failure), the second one keeps business running.

📦 How It Works:

For every row inserted into a Fallback table, a second copy is created.
The primary row goes to one AMP (based on hashing).
The fallback row is stored on a different AMP in the same cluster.
If an AMP fails, queries will automatically read from the fallback AMP.

🧠 Example:

Suppose a table employees is defined with Fallback.

Row emp_id = 101 hashes to AMP 1 → primary row is stored here.
Teradata automatically stores a fallback copy on AMP 2 (same cluster).

If AMP 1 fails, Teradata accesses the copy on AMP 2 seamlessly — no data loss and no user disruption.

2. Transient Journal (TJ)

The Transient Journal (TJ) is a temporary, automatic rollback mechanism used to protect data integrity during a transaction.
Keeps before images of rows during transactions.
To ensure atomicity in transactions — either all changes are committed, or none (rollback if failure occurs).
If transaction fails → rollback to original state.

🧩 How it works:

During a transaction, before any row is changed, the "before image" of the row is saved in the Transient Journal on the same AMP.
If the transaction succeeds (gets an END TRANSACTION), the saved copies are discarded.
If the transaction fails (e.g., AMP crash, session drop), the changes are rolled back using the saved before images.

✅ Fully automatic and transparent to the user.

3. Down AMP Recovery Journal

Activated when an AMP goes down.
Tracks changes that would have affected that AMP.
Ensures no data is lost during recovery.

4. Permanent Journal (Optional)

Logs before/after images permanently for auditing or full rollback.
Useful for regulatory or historical tracking.

5. Locks

Ensure data consistency.
Prevents multiple users from changing same data simultaneously.

🔐 Types: Read lock, Write lock, Exclusive lock, etc.

==========================================================================

🛡️ What is RAID?

RAID (Redundant Array of Independent Disks) is a hardware-level protection mechanism that improves data availability and fault tolerance by distributing or duplicating data across multiple disk drives.

RAID is different from Fallback, which is a Teradata software-level protection strategy.

🔰 Types of RAID Supported in Teradata

🔁 RAID 1 – Mirroring

How it works: Every disk has an exact mirror copy on another disk.
Redundancy: 100%
If one disk fails: The mirrored disk is used — no performance loss.
Disk Space Cost: High (needs 2× storage).
Performance: Excellent, especially for read operations.

✅ Recommended for high-performance, critical systems.

🧮 RAID 5 – Parity-based

How it works: Data is striped across 3 disks, and a parity block is stored on a 4th disk. This parity helps reconstruct any one missing block.
Redundancy: Can tolerate one disk failure per rank.
If a disk fails: Data is recalculated using parity — performance decreases slightly.
Disk Space Cost: Low (more efficient than RAID 1).
Performance: Good under normal conditions, slower during recovery.

✅ Used where cost-efficiency is more important than top-tier speed.

🧮 RAID S – (RAID 5 variant used with EMC)

Similar to RAID 5
Found in EMC systems
Parity and striping logic is optimized for EMC array design

==========================================================================

📘 What is BTEQ in Teradata?

BTEQ (Basic Teradata Query) is a command-line utility that enables users to interact with the Teradata database. It is widely used by developers, DBAs, and data engineers to run SQL scripts, perform data exports/imports, and automate batch processing tasks.

🚀 When to Use BTEQ

Running SQL jobs in batch mode (cron jobs or scheduled tasks)
Exporting large volumes of data from Teradata
Loading data quickly into staging tables
Writing ETL-style scripts
Running SQL scripts with conditional or iterative logic

==========================================================================

FASTLOAD

Teradata FastLoad is a high-performance utility designed to quickly load large volumes of data into empty tables in Teradata. It bypasses the traditional insert path and loads data directly into the database, achieving very fast bulk loading speeds.

Key Features:

Loads only empty tables: Target table must be empty before loading.
Supports only INSERT operations: Cannot perform updates or deletes.
Requires no secondary indexes or referential integrity on target table.
Uses multiple sessions (parallelism) for speed.
Supports error handling with error tables.
Supports checkpoints for recovery.

SESSIONS 4;

RECORD 100 THRU 100000;

ERRLIMIT 25;

LOGON tdpid/userid,password;

DROP TABLE FastTable;

DROP TABLE Error1;

DROP TABLE Error2;

CREATE TABLE FastTable, NO FALLBACK (

ID INTEGER,

UFACTOR INTEGER,

MISC CHAR(42)

) PRIMARY INDEX (ID);

DEFINE

ID (INTEGER),

UFACTOR (INTEGER),

MISC (CHAR(42))

FILE=FileName;

SHOW;

BEGIN LOADING FastTable ERRORFILES Error1, Error2 CHECKPOINT 10000;

INSERT INTO FastTable (ID, UFACTOR, MISC) VALUES (:ID, :UFACTOR, :MISC);

END LOADING;

LOGOFF;

MULTI LOAD

Teradata MultiLoad is a powerful utility designed to perform high-speed batch maintenance of multiple tables simultaneously in Teradata. Unlike FastLoad, which only inserts into empty tables, MultiLoad can insert, update, delete, and upsert rows in one or more existing tables.

Can load up to 5 tables and perform up to 20 DML operations (insert, update, delete, upsert) within one job/script

==========================================================================

ANSI VS BTET

=======================================================================

Table constraint

Constraints are rules applied on tables or columns to enforce data integrity in Teradata.

👉 Ensures accuracy, consistency, and validity of data

👉 Supported mostly for Primary Indexes, Primary Keys, Foreign Keys, and Check conditions

In Teradata, a table constraint is a rule applied at the table level (not just column level) to enforce data integrity. Table constraints are typically used for:

Composite primary keys
Composite unique constraints
Foreign key constraints referencing multiple columns
Check constraints on multiple columns

1. Composite PRIMARY KEY (Table-level)


CREATE TABLE orders (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

This ensures each (order_id, product_id) pair is unique.

2. Composite UNIQUE Constraint


CREATE TABLE employee (
    emp_id INT,
    email VARCHAR(100),
    phone_number VARCHAR(15),
    UNIQUE (email, phone_number)
);

Ensures no two employees have the same email + phone combo.

3. Table-Level FOREIGN KEY Constraint


CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

4. CHECK Constraint


CREATE TABLE payments (
    payment_id INT,
    amount DECIMAL(10,2),
    method VARCHAR(20),
    CHECK (amount > 0 AND method IN ('cash', 'card', 'upi'))
);

Restricts values based on multiple column logic.

🧠 Notes for Teradata:

Foreign key and check constraints are declarative only in Teradata—they are not enforced by the database engine.
Use referential integrity manually via joins or scripts if enforcement is required.

=======================================================================

Types of Table

🔹 1. Permanent Table

Definition: Stores actual business data permanently.

Persistence: Data remains until dropped manually.
Stores persistent data that is not deleted after a session ends.
Data is written to disk and protected using fallback or RAID.
Uses: OLTP/OLAP, fact and dimension tables.

Space: Uses Perm Space.

🧾 Example:


CREATE TABLE employee (
   emp_id INTEGER,
   name VARCHAR(100)
);

🔹 2. Volatile Table

Definition: Temporary tables whose definition and data exist only during the session.

Persistence: Automatically dropped at session end.
Temporary table only available during the session.
Data is stored in spool space and lost after logout.
Good for staging or session-specific processing.
Uses: Intermediate calculations, session-level processing.

Space: Uses Temp Space.

🧾 Example:


CREATE VOLATILE TABLE temp_employee (
   emp_id INTEGER,
   name VARCHAR(100)
) ON COMMIT PRESERVE ROWS;

🔹 3. Global Temporary Table (GTT)

Definition: Permanent table definition, temporary data for each session.

Persistence: Structure stays; data cleared at session end.
Table definition persists, but data is session-specific.
Data is deleted at the end of each session.
Requires ON COMMIT clause.
Uses: Temporary processing across multiple queries.

Space: Uses Temp Space.

🧾 Example:


CREATE GLOBAL TEMPORARY TABLE temp_salary (
   emp_id INTEGER,
   salary DECIMAL(10,2)
) ON COMMIT PRESERVE ROWS;

🔹 4. Derived Table

Definition: Tables created on-the-fly inside queries, no explicit storage.

Persistence: Exists only for duration of the query.
Virtual table created in a subquery.
Exists only for the duration of the query.
Uses: Query processing, subqueries.

Space: Uses Spool Space.
Cannot be reused.

🧾 Example:


SELECT dept, AVG(salary)
FROM (
   SELECT dept, salary FROM employee
) AS derived
GROUP BY dept;

🔹 5. Queue Table

Used for asynchronous processing (message queuing).
Follows FIFO (First In First Out) principle.
Supports special operations like READQ and CONSUME.

🧾 Example:


CREATE QUEUE TABLE msg_queue (
   msg_id INTEGER,
   msg_text VARCHAR(500)
) PRIMARY INDEX (msg_id);

🔹 6. Multiset Table

Allows duplicate rows.
Must be explicitly specified.

🧾 Example:


CREATE MULTISET TABLE logs (
   user_id INTEGER,
   login_time TIMESTAMP
);

🔹 7. Set Table

Does not allow duplicate rows.
Teradata performs a duplicate row check on insert.

🧾 Example:


CREATE SET TABLE departments (
   dept_id INTEGER,
   dept_name VARCHAR(100)
);

🔹 8. NoPI Table (No Primary Index)

Rows are not hashed by PI, but distributed round-robin.
Good for staging and bulk loading.
Can reduce skew in large loads.

🧾 Example:


CREATE SET TABLE staging_table (
   emp_id INTEGER,
   dept_id INTEGER
) NO PRIMARY INDEX;

🔹 9. Column-Partitioned Table (Columnar)

Stores data column-wise instead of row-wise.
Improves performance for analytic queries on few columns.

🧾 Example:


CREATE COLUMN PARTITIONED TABLE sales_data (
   sales_id INTEGER,
   product_id INTEGER,
   amount DECIMAL(10,2),
   sales_date DATE
) PARTITION BY COLUMN;

🔹 10. Fallback Table

Any table (except volatile and queue) can be fallback.
Stores duplicate copy of each row on a different AMP for data protection.

🧾 Example:


CREATE TABLE customer (
   cust_id INTEGER,
   cust_name VARCHAR(100)
) FALLBACK;

==========================================================================

🔹 Types of SQL in Teradata

SQL commands are categorized based on purpose and functionality.

🔹 1️⃣ DDL – Data Definition Language

Purpose: Defines or modifies database objects like tables, indexes, databases.
Examples of Commands: CREATE, ALTER, DROP, RENAME, TRUNCATE
Example:


CREATE TABLE employee (
  emp_id INT,
  name VARCHAR(50),
  dept_id INT
);

🔹 2️⃣ DML – Data Manipulation Language

Purpose: Manipulates data stored in tables.
Examples of Commands: INSERT, UPDATE, DELETE, MERGE
Example:


INSERT INTO employee (emp_id, name, dept_id)
VALUES (1, 'RK', 10);

🔹 3️⃣ DQL – Data Query Language

Purpose: Retrieves data from database tables.
Command: SELECT
Example:


SELECT name, dept_id
FROM employee
WHERE emp_id = 1;

🔹 4️⃣ DCL – Data Control Language

Purpose: Controls access and permissions in the database.
Examples of Commands: GRANT, REVOKE
Example:


GRANT SELECT ON employee TO analyst_role;

🔹 5️⃣ TCL – Transaction Control Language

Purpose: Manages transactions in the database.
Examples of Commands: COMMIT, ROLLBACK, SAVEPOINT
Example:


BEGIN TRANSACTION;
UPDATE employee SET dept_id = 20 WHERE emp_id = 1;
ROLLBACK; -- undo changes

🔹 Summary Table

SQL Type	Purpose	Examples
DDL	Define/alter objects	CREATE, ALTER, DROP
DML	Manipulate data	INSERT, UPDATE, DELETE, MERGE
DQL	Query data	SELECT
DCL	Control access	GRANT, REVOKE
TCL	Manage transactions	COMMIT, ROLLBACK, SAVEPOINT

==========================================================================

🔹 Teradata Set Operators in Teradata

Set operators in Teradata are used to combine results of two or more queries into a single result set.

🔹 Rules for Set Operators

Same number of columns in each query
Compatible data types in corresponding columns
Result column names taken from first query

🔹 Notes

UNION → removes duplicates → slower than UNION ALL
UNION ALL → faster, preserves duplicates
INTERSECT / MINUS → useful for comparing datasets
Can be combined with ORDER BY at the end of the full query
Even though UNION ALL keeps all duplicate rows, SQL tables and set operations are inherently unordered.
- The order of rows in the result set is not guaranteed unless you explicitly use an ORDER BY clause.

==========================================================================

Teradata String Manipulation Functions

Function	Description	Syntax	Example	Output
CONCAT	Concatenates two or more strings	`CONCAT(str1, str2, ...)`	`CONCAT('Hello',' ','RK')`	`Hello RK`
LENGTH / CHARACTER_LENGTH	Returns number of characters in string	`LENGTH(str)` or `CHARACTER_LENGTH(str)`	`CHARACTER_LENGTH('Robert')`	`6`
LOWER	Converts string to lowercase	`LOWER(str)`	`LOWER('ROBERT')`	`robert`
UPPER	Converts string to uppercase	`UPPER(str)`	`UPPER('Robert')`	`ROBERT`
LPAD	Pads left side of string to specified length	`LPAD(str, size, padstr)`	`LPAD('RK',5,'*')`	`***RK`
RPAD	Pads right side of string to specified length	`RPAD(str, size, padstr)`	`RPAD('RK',5,'*')`	`RK***`
TRIM	Removes spaces from string ends	`TRIM([LEADING	TRAILING	BOTH] [char] FROM str)`
LTRIM	Removes leading spaces	`LTRIM(str)`	`LTRIM(' Robert')`	`Robert`
RTRIM	Removes trailing spaces	`RTRIM(str)`	`RTRIM('Robert ')`	`Robert`
REPLACE (remove)	Removes search string	`REPLACE(str, search)`	`REPLACE('Hello RK','RK')`	`Hello`
REPLACE (replace)	Replaces search string with another	`REPLACE(str, search, replace)`	`REPLACE('Hello RK','RK','AI')`	`Hello AI`
REVERSE	Reverses the string	`REVERSE(str)`	`REVERSE('Robert')`	`treboR`
SPLIT	Splits string by delimiter	`SPLIT(str, delimiter)`	`SPLIT('a,b,c',',')`	`['a','b','c']`
STRPOS / POSITION	Returns starting position of substring	`STRPOS(str, substr)` or `POSITION(substr IN str)`	`POSITION('e' IN 'Robert')`	`4`
SUBSTR / SUBSTRING	Returns substring from position with length	`SUBSTRING(str FROM pos FOR len)` or `SUBSTR(str,pos,len)`	`SUBSTRING('Robert' FROM 1 FOR 3)`	`Rob`
CHR	Returns character from ASCII value	`CHR(n)`	`CHR(65)`	`A`
TO_UTF8	Encodes string to UTF-8 varbinary	`TO_UTF8(str)`	`TO_UTF8('RK')`	`0x5254` (binary)
FROM_UTF8	Decodes UTF-8 varbinary to string	`FROM_UTF8(binary)`	`FROM_UTF8(0x5254)`	`RK`
TRANSLATE	Replaces characters in string based on mapping	`SELECT TRANSLATE(str, from, to)`	`TRANSLATE('abc','abc','xyz')`	`xyz`
INDEX	Locates position of character (Teradata extension)	`INDEX(str, char)`	`INDEX('Robert','b')`	`2`

==========================================================================

🔹 Teradata Date/Time Functions

1️⃣ Built-in Current Functions

Function	Description	Example	Output
`CURRENT_DATE`	Returns the current date	`SELECT CURRENT_DATE;`	`2024-05-22`
`CURRENT_TIME`	Returns the current time	`SELECT CURRENT_TIME;`	`10:01:13`
`CURRENT_TIMESTAM P`	Returns current timestamp (date + time)	`SELECT CURRENT_TIMESTAMP;`	`2024-05-22 10:01:13.990000+00.00`
`DATE`	Returns current date (legacy)	`SELECT DATE;`	`2024-05-22`
`TIME`	Returns current time (legacy)	`SELECT TIME;`	`10:01:13`
`DATABASE`	Returns current database name	`SELECT DATABASE;`	`TDUSER`

2️⃣ EXTRACT Function

Extracts portions of DATE, TIME, or TIMESTAMP.

Part	Syntax	Example	Output
Year	`EXTRACT(YEAR FROM date)`	`EXTRACT(YEAR FROM CURRENT_DATE);`	`2024`
Month	`EXTRACT(MONTH FROM date)`	`EXTRACT(MONTH FROM CURRENT_DATE);`	`5`
Day	`EXTRACT(DAY FROM date)`	`EXTRACT(DAY FROM CURRENT_DATE);`	`22`
Hour	`EXTRACT(HOUR FROM timestamp)`	`EXTRACT(HOUR FROM CURRENT_TIMESTAMP);`	`10`
Minute	`EXTRACT(MINUTE FROM timestamp)`	`EXTRACT(MINUTE FROM CURRENT_TIMESTAMP);`	`1`
Second	`EXTRACT(SECOND FROM timestamp)`	`EXTRACT(SECOND FROM CURRENT_TIMESTAMP);`	`13.990000`

3️⃣ Interval Arithmetic

Teradata INTERVAL functions allow date/time arithmetic.

Interval Type	Description	Example	Output
`YEAR`	Add years	`CURRENT_DATE + INTERVAL '04' YEAR;`	`2028-05-22`
`YEAR TO MONTH`	Add years & months	`CURRENT_DATE + INTERVAL '04-03' YEAR TO MONTH;`	`2028-08-22`
`DAY TO MINUTE`	Add days, hours, minutes	`CURRENT_TIMESTAMP + INTERVAL '03 05:10' DAY TO MINUTE;`	`2024-05-25 15:11:13`
`DAY TO SECOND`	Add days, hours, minutes, seconds	Similar syntax	-

4️⃣ Common Date Functions

Function	Description	Example	Output
`LAST_DAY(date)`	Returns last day of the month	`LAST_DAY(CURRENT_DATE);`	`2024-05-31`
`NEXT_DAY(date, weekday)`	Returns next specified weekday	`NEXT_DAY(CURRENT_DATE, 'MONDAY');`	`2024-05-27`
`MONTHS_BETWEEN(date1, date2)`	Returns integer months between two dates	`MONTHS_BETWEEN(DATE '2024-05-22', DATE '2024-01-22');`	`4`
`ADD_MONTHS(date, n)`	Adds `n` months	`ADD_MONTHS(CURRENT_DATE, 3);`	`2024-08-22`
`TO_DATE(string)`	Converts string to DATE	`TO_DATE('2024-05-22','YYYY-MM-DD');`	`2024-05-22`
`TO_TIMESTAMP(string)`	Converts string to TIMESTAMP	`TO_TIMESTAMP('2024-05-22 10:01:13');`	`2024-05-22 10:01:13`
`TRUNC(date, unit)`	Truncate to unit	`TRUNC(CURRENT_DATE,'MONTH');`	`2024-05-01`
`ROUND(date, unit)`	Round to nearest unit	`ROUND(CURRENT_DATE,'MONTH');`	`2024-06-01`
`NUMTODSINTERVAL(n,'DAY')`	Converts numeric to INTERVAL	`NUMTODSINTERVAL(3,'DAY');`	`3 00:00:00`
`NUMTOYMINTERVAL(n,'MONTH')`	Converts numeric to INTERVAL	`NUMTOYMINTERVAL(5,'MONTH');`	`0-05`

5️⃣ DISTINCT Option (Aggregate on Dates or Numeric)

DISTINCT ensures unique values only.

Example:


SELECT COUNT(DISTINCT Deduction) FROM Salary;
SELECT SUM(DISTINCT x), AVG(DISTINCT y) FROM Salary;

==========================================================================

CASE, COALESCE, and NULLIF

🔹 1️⃣ CASE Expression

Purpose: Conditional logic in SQL. Returns a value based on matching conditions.

Syntax (Simple CASE)


CASE expression
    WHEN value1 THEN result1
    WHEN value2 THEN result2
    ...
    ELSE default_result
END

Syntax (Searched CASE)


CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ...
    ELSE default_result
END

Example

Employee Table

Employee_Id	Age	Salary	Marital_Status
202001	22	20000	1
202002	25	25000	1
202003	30	35000	2
202004	26	32000	2
202005	32	40000	2

Query:


SELECT Employee_Id, Age, Salary,
CASE Marital_Status
    WHEN 1 THEN 'Single'
    WHEN 2 THEN 'Married'
    ELSE 'Not Sure'
END AS Marital_Status
FROM Employee;

Output:

Employee_Id	Age	Salary	Marital_Status
202001	22	20000	Single
202002	25	25000	Single
202003	30	35000	Married
202004	26	32000	Married
202005	32	40000	Married

🔹 2️⃣ COALESCE Function

Purpose: Handles NULL values. Returns the first non-NULL value from a list of expressions.

Syntax


COALESCE(expr1, expr2, ..., exprN)

Example

Student Table

Roll_No	First_Name	Last_Name	Landline_no	Mobile_no
101	RK	Sharma	NULL	9876543210
102	Amit	Kumar	0123456789	NULL
103	Priya	Singh	NULL	NULL
104	Rahul	Verma	NULL	0

Query:


SELECT Roll_No, First_Name, Last_Name,
COALESCE(Landline_no, Mobile_no, 'phone# not available') AS phone_no
FROM Student;

Output:

Roll_No	First_Name	Last_Name	phone_no
101	RK	Sharma	9876543210
102	Amit	Kumar	0123456789
103	Priya	Singh	phone# not available
104	Rahul	Verma	0

-----------------------------------------------------------------------------------------------------------------------
COALESCE(NULL, 1, 'table')

🔹 Important Note About Data Types in Teradata

Teradata requires that all expressions in COALESCE must be of compatible types (or can be implicitly converted).
In your example: 1 is an integer, 'table' is a string.
Teradata will promote the integer 1 to VARCHAR to match the string type (because 'table' is a string literal).

So the result type will be VARCHAR, and the integer 1 will become '1'.

🔹 3️⃣ NULLIF Function

Purpose: Returns NULL if two expressions are equal; otherwise returns the first expression.

Syntax


NULLIF(expr1, expr2)

Example


SELECT Roll_No,
NULLIF(Mobile_no, 0) AS Mobile
FROM Student;

Output:

Roll_No	Mobile
101	9876543210
102	NULL
103	NULL
104	NULL

✅ In this example, NULLIF(Mobile_no, 0) returns NULL for Roll_No 104 because Mobile_no = 0.

🔹 Quick Comparison

Function	Purpose	Key Points
CASE	Conditional logic	Can be simple (`CASE expr WHEN`) or searched (`CASE WHEN condition`)
COALESCE	Handle NULLs	Returns first non-NULL value
NULLIF	NULL replacement	Returns NULL if two expressions are equal

==========================================================================

🔹 Teradata Joins Overview

Joins are used to combine rows from two or more tables based on related columns.

Join Type	Description	Syntax	Example / Output
INNER JOIN	Returns rows only when there is a match in both tables	`SELECT cols FROM Table1 INNER JOIN Table2 ON Table1.col = Table2.col;`	Example: Join `Student` and `Attendance` on `RollNo`. RollNo 1005 is excluded because it has no attendance record. Output:1001 → 90%1002 → 72%1003 → 68%1004 → 95%
LEFT OUTER JOIN	Returns all rows from left table + matching rows from right table	`SELECT cols FROM Table1 LEFT OUTER JOIN Table2 ON Table1.col = Table2.col;`	RollNo 1005 from Student is included; `%` is `NULL` because attendance is missing. Output: 1001 → 90%, …, 1005 → NULL
RIGHT OUTER JOIN	Returns all rows from right table + matching rows from left table	`SELECT cols FROM Table1 RIGHT OUTER JOIN Table2 ON Table1.col = Table2.col;`	Similar to LEFT JOIN, but ensures all right table rows appear
FULL OUTER JOIN	Returns all rows from both tables, matching where possible	`SELECT cols FROM Table1 FULL OUTER JOIN Table2 ON Table1.col = Table2.col;`	Combines LEFT and RIGHT OUTER JOIN results. Non-matches from either table show NULL.
CROSS JOIN	Returns all combinations of rows from both tables (Cartesian product)	`SELECT cols FROM Table1 CROSS JOIN Table2;`	Example: Student RollNo = 1001 CROSS JOIN Attendance Output: 1001-1001 → 90%, 1001-1002 → 72%, 1001-1003 → 68%, 1001-1004 → 95%
SELF JOIN	Joins a table with itself	`SELECT A.col, B.col FROM Table A, Table B WHERE A.col = B.col;`	Useful to compare rows in the same table (e.g., Employee-Manager relationships)

🔹 Important Teradata Notes on Joins with NULLs

In SQL, NULL ≠ NULL, so joins on NULL usually do not match.
Teradata follows the ANSI standard: rows with NULL in the join column do not match any other row (even NULL) for INNER, LEFT, RIGHT joins.
CROSS JOIN is unaffected because it does not use ON.

==========================================================================

Tables

NULL in t2.id does NOT match t1.id=NULL.

NULL ≠ NULL in SQL.

INNER JOIN: NULLs never match.

LEFT/RIGHT OUTER JOIN: NULL in one table → unmatched rows → other table columns = NULL.

🔹 LEFT JOIN (`t1 LEFT JOIN t2 ON t1.id = t2.id`)

✅ Key: LEFT JOIN returns all t1 rows, matched t2 rows if any, otherwise NULL.

🔹 RIGHT JOIN (`t1 RIGHT JOIN t2 ON t1.id = t2.id`)

🔹 INNER JOIN (`t1 INNER JOIN t2 ON t1.id = t2.id`)

❌ Per ANSI standard. In Teradata, NULL does not equal NULL, so inner join on NULL yields no row.

==========================================================================

Teradata

🔹 Data Warehouse – Definition

🔹 Popular Data Warehouse Platforms

🔹 Data Mart – Definition

🔹 Example

🔹 Types of Data Mart

📌 1️⃣ Dependent Data Mart

📌 2️⃣ Independent Data Mart

📌 3️⃣ Hybrid Data Mart

🔹 Quick Comparison Table

Relational Database Terminology

🔷 1. Tables

Columns - A column always contains the same kind of information or contains similar data.

Primary Key - The primary key is used to identify a row in a table uniquely. No duplicate values are allowed in a primary key column, and they cannot accept NULL values. It is a mandatory field in a table.

Foreign Key - Foreign keys are used to build a relationship between the tables. A foreign key in a child table is defined as the primary key in the parent table.

🔷 2. Views

🔷 3. Macros

🔷 4. Triggers

🔷 5. Stored Procedures

🔷 6. User-Defined Functions (UDFs)

🔷 7. Indexes

🔹 Teradata Components

🔹 1️⃣ Parsing Engine (PE)The Parsing Engine is a virtual processor (vproc) that interprets SQL requests, receives input records, and passes data. To do that, it sends the messages over the BYNET to the AMPs (Access Module Processor).

🔹 2️⃣ BYNET

🔹 3️⃣ AMP (Access Module Processor)

🔹 4️⃣ Node

🔹 Parallel Database Extensions (PDE)

🔹 Disks

🔄 Query Execution Flow in Teradata

🚀 How Teradata Distributes Rows Across AMPs

🔁 Hash-Based Distribution Mechanism

🎯 Why Even Distribution Matters

✅ Ideally:

🔄 Row Distribution Mechanism

🔄 Row Distribution via Hashing in Teradata

📥 How Data Gets Distributed Across AMPs

🔹 Index Types in Teradata

In Teradata, indexes are mainly used for data distribution and faster retrieval.

1️⃣ PI (Primary Index)

✅ Key Characteristics

🧠 Best Practices for Choosing a PI

📈 Why PI Matters for Performance

🚫 Common Mistakes

🔹 2️⃣ UPI (Unique Primary Index)

🔹 3️⃣ NUPI (Non-Unique Primary Index)

Example: Skew from Poor NUPI Choice

🔹 4️⃣ SI (Secondary Index)

🔍 Why Use a Secondary Index?

🚦 Performance Tips

🛠️ How It Works

🔹 5️⃣ USI (Unique Secondary Index)

🔹 How USI Works Internally

🔹 6️⃣ NUSI (Non-Unique Secondary Index)

🔹 How NUSI Works Internally

✅ What is a NoPI Table?

📦 Why Use NoPI Tables?

🧠 Best Practices:

🔀 Partitioned Primary Index (PPI)

🔹 What is PPI?

⚙️ How PPI Works:

🔹 When NOT to Use PPI

🌲 Multilevel Partitioned Primary Index (MLPPI)

🔹 What is MLPPI?

🧩 Use RANGE_N When:

🧩 Use CASE_N When:

🔹 Normal Table (Row Storage)

🔹 Column Partitioning

🔹 Row vs Column Partitioning

🔹 Limitations

🔹 Join Index (JI) in Teradata

🔷 1. What is a Single-Table Join Index (STJI)?

🧩 Why Use STJI?

What This Does:

🔹 2.Multi-Table Join Index (MTJI) in Teradata

📌 Definition

🔹 Advantages

🔹 Disadvantages

🔹 3. Aggregate Join Index (AJI) in Teradata

🔹 Advantages

🔹 Disadvantages

🔹 1️⃣ Parsing Engine (PE)
The Parsing Engine is a virtual processor (vproc) that interprets SQL requests, receives input records, and passes data. To do that, it sends the messages over the BYNET to the AMPs (Access Module Processor).

🧩 Use `RANGE_N` When:

🧩 Use `CASE_N` When:

🧱 How `COMPRESS` Works

📌 What is `COLLECT STATISTICS` in Teradata?