Teradata

                                                                        

👉 RDBMS is a database system that stores structured data in tables and maintains relationships between them using keys.


Teradata is a massively parallel processing (MPP) relational database management system (RDBMS) designed for large-scale data warehousing and analytics.

It is mainly used by enterprises to store and analyze very large volumes of structured data.

=====================================================================

🔹 Data Warehouse – Definition

A Data Warehouse (DW) is a centralized system used to store large volumes of historical, structured data from multiple sources for reporting and analytics.

👉 It is mainly used for business intelligence (BI) and decision-making.
👉 It is optimized for OLAP (Online Analytical Processing).

🔹 Popular Data Warehouse Platforms

Teradata
Amazon Redshift
Snowflake
Google BigQuery
Azure Synapse Analytics


=====================================================================

🔹 Data Mart – Definition

A Data Mart is a subset of a Data Warehouse designed for a specific department or business function (like Sales, Finance, HR).

👉 It contains focused, subject-specific data.
👉 Used for faster reporting for a particular team.

🔹 Example

If a company has a Data Warehouse containing all data:

  • Sales team needs only sales data

  • Finance team needs only revenue & expense data

👉 Separate Data Marts are created for each team.

🔹 Types of Data Mart

There are 3 main types of Data Marts:


📌 1️⃣ Dependent Data Mart

FeatureDescription
SourceCreated from Data Warehouse
Data QualityHigh (already cleaned & integrated)
ControlCentralized
ExampleSales Data Mart from Enterprise DW

👉 Most common in large enterprises
👉 Ensures consistency across departments


📌 2️⃣ Independent Data Mart

FeatureDescription
SourceDirectly from operational systems
Data QualityMay vary
ControlDecentralized
ExampleHR team creates own reporting DB

👉 Faster to implement
👉 Risk of data inconsistency


📌 3️⃣ Hybrid Data Mart

FeatureDescription
SourceCombination of DW + operational systems
FlexibilityHigh
Use CaseWhen some data is in DW and some external

👉 Balanced approach

🔹 Quick Comparison Table

FeatureDependentIndependentHybrid
SourceData WarehouseOperational systemsBoth
Data ConsistencyHighMediumHigh
Implementation SpeedMediumFastMedium
Best ForLarge organizationsSmall teamsMixed requirements


=====================================================================

Relational Database Terminology

🔷 1. Tables

  • Core structure for storing data.

  • Consist of rows (records) and columns (fields).

  • Supports indexing, primary keys, foreign keys, etc.

Columns - A column always contains the same kind of information or contains similar data.

Row - A row is one instance of all the columns in the table.

Database - The database is a collection of logically related data. Many users access them for different purposes.

Primary Key - The primary key is used to identify a row in a table uniquely. No duplicate values are allowed in a primary key column, and they cannot accept NULL values. It is a mandatory field in a table.

Foreign Key - Foreign keys are used to build a relationship between the tables. A foreign key in a child table is defined as the primary key in the parent table.

A table can have more than one foreign key. It can accept duplicate values and also null values. Foreign keys are optional in a table.


🔷 2. Views

  • Virtual tables based on one or more tables or views.

  • Do not store data physically—just SQL definitions.

  • Use cases:

    • Restrict access to certain columns/rows.

    • Simplify complex joins/queries.

    • Standardize reporting logic.

  • X-Views: Special views that only show data specific to the accessing user.


🔷 3. Macros

  • Store predefined SQL commands under a name.

  • Can accept parameters.

  • Useful for repeating business logic or securing access.

  • Example:


    CREATE MACRO GetEmployee (emp_id INT) AS ( SELECT * FROM Employees WHERE EmployeeID = :emp_id; );

🔷 4. Triggers

  • set of SQL statements that automatically executes in response to data changes (INSERT, UPDATE, DELETE).

  • Attached to a table.

  • Used for enforcing rules or auditing changes.


🔷 5. Stored Procedures

  • program written in SQL and stored on the database.

  • Can contain control-of-flow logic (IF, LOOP, etc.).

  • Can be scheduled, parameterized, or reused.

  • Example use case: ETL workflows, validations, auditing.

  • Supports error handling and transaction control.


🔷 6. User-Defined Functions (UDFs)

  • Custom functions written in C/C++ or SQL.

  • Types:

    • Scalar UDFs – return a single value.

    • Aggregate UDFs – return a summarized value.

    • Table UDFs – return a table.

  • Used for:

    • Complex calculations.

    • Reusability.

    • Protecting PII (Personally Identifiable Information).


🔷 7. Indexes

  • Used to optimize data access.

  • Types:

    • Primary Index (UPI/NUPI) – determines data distribution.

    • Secondary Index – improves access for non-primary keys.

    • Join Index – speeds up joins.

    • Hash Index – supports fast lookups (less common now).


=====================================================================

🔹 Teradata Components

Teradata Corporation built Teradata using MPP (Massively Parallel Processing) architecture.

Teradata works using multiple parallel components.

🔹 1️⃣ Parsing Engine (PE)
The Parsing Engine is a virtual processor (vproc) that interprets SQL requests, receives input records, and passes data. To do that, it sends the messages over the BYNET to the AMPs (Access Module Processor).

FunctionExplanation
SQL Parsing                 Checks syntax
Optimization           Creates execution plan
Session Management           Handles user sessions
Request Distribution          Sends work to AMPs

👉 PE does NOT store data.


🔹 2️⃣ BYNET

This is the message-passing layer or simply the networking layer in Teradata. It receives the execution plan from the parsing engine and passes it to AMPs and the nodes. After that, it gets the processed output from the AMPs and sends it back to the parsing engine.

BYNET is Teradata’s high-speed interconnect network that allows nodesPEs, and AMPs to communicate internally in a Massively Parallel Processing (MPP) environment.

To maintain adequate availability, the BYNET 0 and BYNET 1 two types of BYNETs are available. This ensures that a secondary BYNET is available in case of the failure of the primary BYNET.

FeatureExplanation
High-speed network           Connects nodes
Message passing          Transfers data between AMPs
Parallel communication          Enables MPP processing

👉 Acts like backbone network.


🔹 3️⃣ AMP (Access Module Processor)

These are the virtual processors of Teradata. They receive the execution plan and the data from the parsing engine. The data will undergo any required conversion, filtering, aggregation, sorting, etc., and will be further sent to the corresponding disks for storage.

The AMP is a virtual processor (vproc) designed for and dedicated to managing a portion of the entire database. It performs all database management functions such as sorting, aggregating, and formatting data. The AMP receives data from the PE, formats rows, and distributes them to the disk storage units it controls. The AMP also retrieves the rows requested by the Parsing Engine.

Table records will be distributed to each AMP for data storage. Only that AMP can read or write data into the disks that have access permission.

FunctionExplanation
Data Storage                       Stores actual table data
Data RetrievalFetches rows
Index HandlingManages primary index
Parallel ProcessingEach AMP works independently

👉 Data is distributed across AMPs.


🔹 4️⃣ Node

The basic unit of a Teradata system is called a Node. Each node has its operating system, CPU memory, a copy of RDBMS software, and some disk space. A single cabinet can have one or more nodes in it.

Node is a physical server in the Teradata system. It includes both hardware and software components that support the execution of Teradata’s parallel database processing.

Feature                    Description
Physical Server            Runs Teradata software
Contains            Multiple AMPs + PE
Scalable            Add nodes to scale system

🔹  Parallel Database Extensions (PDE)

The Parallel Database Extensions is a software interface layer that lies between the operating system and database. It controls the virtual processors (vprocs).


🔹 Disks

Disks are disk drives associated with an AMP that store the data rows. On current systems, they are implemented using a disk array.

🔄 Query Execution Flow in Teradata

  1. Query Submission
    A user or application submits an SQL query.

  2. Parsing Engine (PE)

    • Parser: Checks SQL syntax and semantics.

    • Security: Validates user access.

    • Optimizer: Generates the most cost-effective execution plan.

    • Dispatcher: Sends the execution steps to BYNET.

  3. BYNET
    Routes the steps from the PE to the appropriate AMPs.

  4. AMPs (Access Module Processors)

    • Execute the steps (e.g., read/write data).

    • Retrieve or update data from their own disks.

    • Send results back to the PE.

  5. PE (again)

    • Assembles the rows.

    • Sends the final result back to the client.

=====================================================================

🚀 How Teradata Distributes Rows Across AMPs

In Teradata, row distribution is fully automaticeven, and hash-based. The core idea is that each AMP (Access Module Processor) is responsible for a portion of each table’s data, and Teradata ensures the rows are distributed evenly using a hashing algorithm.

🔁 Hash-Based Distribution Mechanism

When a row is inserted into a table:

  1. Teradata hashes the Primary Index column(s) value.

  2. The resulting hash value is mapped to an AMP.

  3. That AMP stores the row on its associated disk.

📌 Hashing is done on Primary Index (PI) — it’s how Teradata determines where to store a row.

🎯 Why Even Distribution Matters

  • Ensures parallelism is fully utilized.

  • Avoids skew (when some AMPs hold significantly more rows than others).

  • Prevents slowest AMP bottleneck in full table operations.

✅ Ideally:

  • Each table's rows are evenly distributed across all AMPs.

  • All AMPs perform roughly equal work on large queries.

🔄 Row Distribution Mechanism

  • Teradata uses a hashing algorithm on the PI column(s).

  • Hash result determines the AMP where the row will be stored.

  • Leads to parallelism across AMPs.

🔄 Row Distribution via Hashing in Teradata

📥 How Data Gets Distributed Across AMPs

When a row is inserted into a Teradata table:

  1. The Primary Index (PI) value of the row is input into a hashing algorithm.

  2. The algorithm outputs a 32-bit Row Hash.

  3. The high-order 16 bits are used to find a Hash Bucket Number.

  4. The Hash Map uses that bucket number to determine which AMP will store the row.

  5. The AMP stores the row in its associated vDisk.

=====================================================================

🔹 Index Types in Teradata

In Teradata, indexes are mainly used for data distribution and faster retrieval. 

1️⃣ PI (Primary Index) 

The Primary Index (PI) is the core mechanism for distributing rows across AMPs and retrieving data efficiently.

Primary Index decides:

  • Decides where the data is stored (which AMP).

  • Mandatory (every table must have one).

  • How data is distributed

👉 Every table MUST have one PI
👉 It does NOT have to be unique

✅ Key Characteristics

AttributeDescription
Purpose                    Determines how rows are distributed and accessed
When defined?At CREATE TABLE time only – cannot be altered later
Max ColumnsCan consist of 1 to 64 columns
Stored inData Dictionary (DD)

🧠 Best Practices for Choosing a PI

  • Choose a highly unique column to avoid skew (UPI preferred).

  • Consider query access patterns (frequent filters, joins).

  • Try to align PI with foreign keys in child tables for AMP-local joins.

  • Avoid columns with few distinct values (e.g., Gender, Status) as PI → causes skew.

📈 Why PI Matters for Performance

  • Every row access by PI is a 1-AMP operation → fastest.

  • Good PI choice = even AMP workload → maximum parallelism.

  • Bad PI choice = skew → one AMP overworked, others underutilized.

🚫 Common Mistakes

  • Using low cardinality columns as PI (e.g., "India", "Yes/No").

  • Forgetting that PI ≠ PK in Teradata.

  • Choosing PI without analyzing access patterns.



🔹 2️⃣ UPI (Unique Primary Index)

A Primary Index where column values are unique.

👉 No duplicate values allowed
👉 Best performance
👉 Even data distribution



🔹 3️⃣ NUPI (Non-Unique Primary Index)

Primary Index where duplicates are allowed.

  • The hashing algorithm still determines the target AMP, but:

    • All rows with the same NUPI value go to the same AMP.

    • This can lead to uneven (skewed) distribution.

  •  Example: Skew from Poor NUPI Choice

    If you choose Order_Status (values = "C", "O") as a NUPI:

    • All rows get hashed to just 2 AMPs.

    • Remaining AMPs stay idle → severe performance degradation.

👉 Most commonly used
👉 Can cause data skew if not chosen properly

If many rows have same cust_id → skew risk.


🔹 4️⃣ SI (Secondary Index)

A Secondary Index (SI) is an additional index created on a column other than the Primary Index to improve query performance.

Secondary Index (SI) in Teradata is a physical access path to the data that is separate from the Primary Index (PI). It is used to improve query performance when the query's search condition does not use the PI columns.

👉 It does NOT control data distribution
👉 It helps in faster data retrieval
👉 It uses extra storage

🔍 Why Use a Secondary Index?

  • When queries frequently use non-PI columns in WHERE conditions.

  • To avoid full table scans.

  • To reduce query response time.

  • To optimize joins or aggregations on non-PI columns.

Additional index created for faster search on non-PI columns.

  •  Optional
  •  Improves query performance
  •  Requires extra storage
  • SI is optional
  • Used for non-PI columns
  • Improves read performance
  • Increases write overhead





🚦 Performance Tips

  • USI: Great for point queries (e.g., WHERE emp_name = 'Alice')

  • NUSI: Better for range queries (e.g., WHERE salary BETWEEN 50000 AND 70000)

  • Too many SIs can increase storage and slow down DML operations like INSERT/UPDATE.

🛠️ How It Works

  • When you define a SI, Teradata creates a sub-table that holds:

    • The index value.

    • The row ID (ROWID) of the corresponding data row in the base table.

  • For USI, the sub-table is hash-distributed.

  • For NUSI, the sub-table is locally stored on the same AMP as the base table row (no redistribution).


🔹 5️⃣ USI (Unique Secondary Index)

A Unique Secondary Index (USI) is a secondary index created on a column where duplicate values are NOT allowed.

👉 Ensures uniqueness
👉 Used for fast lookup
👉 Stored in a separate index subtable

👉 Secondary index with unique values
👉 Stored in separate subtable
👉 Very fast lookup


🔹 How USI Works Internally

  1. User runs query:

    SELECT * FROM customer WHERE email_id = 'rk@gmail.com';
  2. Teradata hashes the USI column

  3. Goes to specific AMP storing USI subtable

  4. Finds RowID

  5. Fetches actual row from base table

👉 Very fast because it’s unique.



🔹 6️⃣ NUSI (Non-Unique Secondary Index)

A Non-Unique Secondary Index (NUSI) is a secondary index created on a column where duplicate values are allowed.

👉 Used for filtering
👉 Improves SELECT performance
👉 Does NOT control data distribution

👉 Allows duplicates
👉 Stored on same AMP
👉 Helps filtering



🔹 How NUSI Works Internally

  1. Query runs:

    SELECT * FROM orders WHERE status = 'SHIPPED';
  2. Teradata checks NUSI subtable

  3. Finds list of RowIDs for matching rows

  4. Fetches rows from base table

👉 Since duplicates exist, multiple rows are returned.



✅ What is a NoPI Table?

NoPI (No Primary Index) Table is a table that does not have a defined Primary Index. This was introduced in Teradata 13.0 to improve bulk load performance and simplify certain workloads.

CREATE TABLE table_x
 	(col_x INTEGER
 	,col_y CHAR(10)
 	,col_z DATE)
NO PRIMARY INDEX;

📦 Why Use NoPI Tables?

  • Ideal for staging or landing tables in ETL pipelines.

  • Great for bulk loading large volumes of data quickly.

  • Often used temporarily before redistributing data into final structured tables.

🧠 Best Practices:

  • Use NoPI tables for staging or intermediate ETL steps.

  • Avoid long-term use unless you really do not need joins or indexed access.

  • After loading, redistribute data into base tables with a PI for performance.

=====================================================================

🔀 Partitioned Primary Index (PPI)

🔹 What is PPI?

PPI is a Teradata feature that divides rows into partitions based on a defined column or expression, while still distributing rows across AMPs based on the Primary Index (PI).

👉 Data is distributed across AMPs (like normal PI)
👉 Within each AMP, data is divided into partitions
👉 Improves range query performance

Key Benefits:

  • Improves query performance for range-based queries.

  • Allows partition elimination, where irrelevant partitions are excluded at query time.

  • Ideal for incremental loads, archiving, and deletions.

⚙️ How PPI Works:

  1. Hashing for Distribution:

    • PI → Hash Value → Determines which AMP stores the row.

  2. Partitioning for Storage Order:

    • Once inside an AMP, rows are grouped into partitions.

    • Within each partition, data is sorted by Row Hash (row ID).

  3. Partition Elimination:

    • The optimizer analyzes query filters (e.g., WHERE order_date BETWEEN …) and skips partitions that won’t qualify.

    • This reduces I/O and scan time dramatically.

SELECT * FROM sales WHERE oder_date= DATE '2024-05-10';

👉 Only May partition scanned
👉 Faster query

🔹 When NOT to Use PPI

❌ Small tables
❌ Queries without filter on partition column
❌ Very high insert workloads (slightly more overhead)


🌲 Multilevel Partitioned Primary Index (MLPPI)

🔹 What is MLPPI?

MLPPI allows sub-partitioning — i.e., a partition within a partition — for greater granularity and performance.

Example Use Case:

  • Partition 1: by Claim_Date

  • Sub-partition: by State

This improves query efficiency when filters match both levels, enabling multi-level partition elimination.

In Teradata, when using Multi-Level Partitioned Primary Index (MLPPI), you can define the partitions using either:

  • RANGE_N — for range-based partitioning (like dates, numbers)

  • CASE_N — for custom logic or discrete values

🧩 Use RANGE_N When:

  • You are partitioning by a range of values (e.g., dates or numeric ranges).

  • Best for time-series data, like logs or transaction tables.


🧩 Use CASE_N When:

  • You want to partition using categorical values, custom conditions, or non-uniform ranges.

  • Good when data doesn't fit into neat, equal ranges.


=====================================================================

✅ What is Teradata Columnar (Column Partitioning)?

Column Partitioning (also called Columnar or CPPI – Column-Partitioned Primary Index) stores columns separately instead of storing full rows together.

👉 Traditional Teradata = Row-based storage
👉 Column Partitioning = Column-based storage
👉 Improves performance for analytical queries

It can be applied to:

  • Tables (especially NoPI tables)

  • Join Indexes

You can also combine Column Partitioning with Row Partitioning (MLPPI) for a multi-dimensional partitioning strategy.

🔹 Normal Table (Row Storage)

If table has 20 columns:

SELECT col1, col2 FROM big_table;

👉 Teradata reads all 20 columns from disk
👉 Higher I/O

🔹 Column Partitioning

👉 Data stored column-wise
👉 Only required columns read

SELECT col1, col2 FROM big_table;

👉 Only col1 and col2 scanned
👉 Reduced I/O
👉 Faster query


🔹 Row vs Column Partitioning

FeatureRow-BasedColumn Partitioning
Storage style                    Row                        Column
Best forOLTPAnalytics
I/O for few columnsHighLow
Insert performanceFasterSlightly slower

🔹 Limitations

❌ Slightly slower inserts
❌ Not ideal for transactional workloads
❌ More complex storage structure

=========================================================================

🔹 Join Index (JI) in Teradata

Join Index is a system-maintained physical structure that stores pre-joined or pre-aggregated data

of join of two or more tables to speed up query performance.

👉 Reduces repeated join computation
👉 Stored physically like a table
👉 Can be row-based or columnar (Columnar Join Index = CJI)

  • Eliminate base table joins

  • Reduce data redistribution

  • Avoid aggregate processing

  • Improve query response time

🔷 1. What is a Single-Table Join Index (STJI)?

Single-Table Join Index is a special type of join index that:

  • Is defined on only one table.

  • Uses a different column as its Primary Index (PI) than the base table’s PI.

  • Helps optimize joins between foreign keys and primary keys—especially in star schema or snowflake models.

  • Avoids row redistribution and base table access during queries.

🧩 Why Use STJI?

Teradata stores rows based on the PI. If you frequently join on a non-PI column (e.g., a foreign key), Teradata may redistribute data to perform the join.

✅ STJI prevents redistribution by pre-sorting and rehashing rows using the foreign key (or any chosen column).

What This Does:

  • Creates a copy of the orders table, hashed on customer_id instead of order_id.

  • When a query joins orders.customer_id to customers.customer_id, Teradata can use this index.

  • No data redistribution needed because both tables are now hashed on the same column.


🔹 2.Multi-Table Join Index (MTJI) in Teradata

📌 Definition

A Multi-Table Join Index (MTJI) is a Join Index that combines more than two base tables into a precomputed, stored join.

👉 Speeds up queries that join multiple tables frequently
👉 Can be row-based or columnar
👉 Stored physically like a table

🔹 Advantages

✅ Speeds up multi-table queries
✅ Reduces CPU & AMP load
✅ Precomputed join simplifies SQL
✅ Good for data warehouse reporting

🔹 Disadvantages

❌ Uses extra disk space
❌ Slower inserts/updates/deletes on base tables (index maintenance)
❌ Needs careful design to select only necessary columns


🔹 3. Aggregate Join Index (AJI) in Teradata

An Aggregate Join Index (AJI) is a precomputed, stored table that combines aggregation and joins.

👉 Optimized for analytical queries
👉 Stores aggregated values (SUM, COUNT, AVG, etc.)
👉 Can be row-based or columnar
👉 Reduces query processing time on large fact tables


🔹 Advantages

✅ Improves query performance for aggregation queries
✅ Reduces CPU and I/O on large tables
✅ Simplifies complex queries for users
✅ Can be combined with columnar storage for further optimization

🔹 Disadvantages

❌ Extra storage required
❌ Insert/Update/Delete slower due to maintenance
❌ Needs careful selection of aggregation columns


🔹 4. Columnar Join Index (CJI) in Teradata

A Columnar Join Index (CJI) is a precomputed, columnar storage index based on a join of one or more tables to improve query performance for analytical workloads.

👉 Combines benefits of Join Index (prejoin data) and Columnar storage (column-based access)
👉 Speeds up queries with aggregations, filters, and joins


🔹 How Columnar Join Index Works

1️⃣ Base tables: sales and customer

2️⃣ Query often executed:

SELECT c.region, SUM(s.amount)
FROM sales s
JOIN customer c
ON s.cust_id = c.cust_id
GROUP BY c.region;

3️⃣ CJI created:

CREATE JOIN INDEX cji_sales_customer AS
SELECT s.amount, c.region
FROM sales s
JOIN customer c
ON s.cust_id = c.cust_id
PRIMARY INDEX (cust_id)
PARTITION BY COLUMN;
  • Stores only needed columns (amount and region)

  • Precomputed join stored in columnar format

  • Queries hitting CJI skip join computation

🔹 Advantages

✅ Reduces query I/O
✅ Speeds up aggregation & join queries
✅ Only scans required columns
✅ Reduces CPU and AMP workload

🔹 Disadvantages

❌ Requires extra disk space
❌ Insert/Update/Delete may have slight overhead
❌ Needs careful column selection



=========================================================================

🔍 What is a Full Table Scan (FTS)?

Full Table Scan happens when Teradata (or any database engine) reads every row in a table to find rows that match a query’s condition—because no index or partitioning strategy can narrow it down.

⚠️ This is often the least efficient way to access data, especially for large tables.


❌ What happens here?

  • The Primary Index (PI) is on emp_id, but the query filters on dept_id, which is not indexed.

  • Result: Teradata must read every row to find those where dept_id = 10 → Full Table Scan.


=========================================================================

✅ Best Practices

  • Choose PKs based on business rules and uniqueness.

  • Choose PIs based on query patternsjoin usage, and data distribution.

  • Don't always assume PK = PI. Analyze based on performance needs.


=====================================================================

🔹 Compression in Teradata

In Teradata, Compression is used to reduce storage space and improve query performance.

👉 It reduces disk usage
👉 Improves I/O performance
👉 Saves storage cost

✅ It’s especially effective in large tables with:

  • Repeating values (like status = ‘Active’)

  • Nullable columns with many nulls

  • Low-cardinality columns (few distinct values)

🧱 How COMPRESS Works

Instead of storing the full value for every row, Teradata:

  • Stores common/repeating values only once in the table header.

  • Rows with those values point to the compressed value.

  • Values not listed in COMPRESS are stored normally.


🔹 1️⃣ Multi Value Compression (MVC)

📌 What It Does

Stores frequently repeating values using internal codes.

Multi-value compression has three variations:

  • COMPRESS – ONLY Nulls are compressed.
  • COMPRESS NULL – Nulls are compressed. (COMPRESS & COMPRESS NULL mean the same thing)
  • COMPRESS <constant> – Nulls and the specified <constant> value are compressed.

📌 Example

CREATE TABLE employee (
gender CHAR(1) COMPRESS ('M','F'),
status VARCHAR(10) COMPRESS ('ACTIVE','INACTIVE')
);

🔹 Benefit

If 80% rows have ACTIVE, storage is reduced.


🔹 2️⃣ Block Level Compression (BLC)
Performs compression on whole data blocks at the file system level before they are written to storage.
👉 Transparent to users.

FeatureDescription
Works at                    Data block level
CompressesEntire block of rows
Best ForLarge tables
EnabledAutomatically in newer versions

🔹 3️⃣ Algorithmic Compression (ALC)

When column values are mostly unique, Algorithmic Compression can provide better compression results than Multi-value Compression. You can define your own compression and decompression algorithms and apply them to data at the column level.

A user can create their own compression algorithms as external C/C++ scalar UDFs, and then specify them in the column definition of a CREATE TABLE/ALTER TABLE statement. Teradata invokes these algorithms to compress and decompress the data when the data is inserted into or retrieved from the table.

ALC allows you to implement the compression scheme that is most suitable for data in a particular column. The cost of compression and decompression depends on the algorithm chosen.

FeatureDescription
Uses algorithm                To compress numeric patterns
Works forDates, integers
Storage reductionHigher than MVC sometimes

=====================================================================

🔹 Space in Teradata

In Teradata, Space means the disk storage allocated to a database or user to store tables, indexes, temporary data, etc.

👉 Space is allocated at Database/User level
👉 Managed automatically across AMPs
👉 Measured in bytes (KB, MB, GB)


🔍 1. Perm Space

Permanent Space (Perm Space) is the disk space allocated to store permanent database objects like tables, indexes, views, macros, etc.

👉 Stores actual business data
👉 Not automatically cleared
👉 Allocated to Database/User
👉 Distributed across AMPs

  • Definition: Maximum allowed space a user/database can occupy permanently.

  • Not pre-allocated, only consumed as needed.

  • Used for:

    • Table data (rows)

    • Stored procedures

    • User-Defined Functions (UDFs)

  • Required for creating tables.

FeatureDescription
Stores                        Tables, indexes, views
Allocated toDatabase/User
Remains untilObject is dropped
Most importantYes

🔍 2. Spool Space

Spool Space is temporary disk space used by Teradata to store intermediate results while executing a query.

👉 Used during SELECT, JOIN, GROUP BY, ORDER BY
👉 Automatically cleared after query finishes
👉 If exceeded → query fails

  • Definition: Temporary space used during SQL execution.

  • Used for:

    • Sorting

    • Joining

    • Aggregation

    • Storing query results

  • Reclaimed automatically when the query ends.

  • Inherits limits from parent user/database unless explicitly set.


📌 Example:

You have 100 GB total disk space.

  • 10 GB used for user data

  • 10 GB for overhead

  • ❓ Available spool space?
    ✅ Answer: 80 GB

FeatureDescription
Used for                                Intermediate query results
Automatically freedAfter query ends
Very importantYes (query may fail if exceeded)

🔍 3. Temp Space

Temp Space (Temporary Space) is the disk space used to store Global Temporary Tables (GTTs) during a user session.

👉 Used only for temporary tables
👉 Data is session-specific
👉 Automatically cleared after session ends

  • Tables created in Temp Space survive restarts (not the data).

  • Treated like Perm Space, but temporarily allocated for session data.


FeatureDescription
Used for                Global Temporary Tables
ReleasedAt session end
Stored inTemp Space allocation

 4️⃣ Journal Space

Journal Space is the disk space used to store before and after images of rows for data recovery purposes.

👉 Helps in recovering data if transaction fails
👉 Used for rollback and permanent journal

FeatureDescription
Used for                Recovery
StoresBefore/After image of rows
OptionalCan be disabled

=====================================================================

📌 What is COLLECT STATISTICS in Teradata?

COLLECT STATISTICS (a.k.a. COLLECT STATS) gathers data demographics (like row count, value distribution, uniqueness) on columns, indexes, or partitions.

Collect Statistics is a process in Teradata that gathers metadata about table columns or indexes to help the optimizer choose the best execution plan.

✅ These stats are used by the Teradata Optimizer to:

  • Choose optimal query plans

  • Decide whether to use an index or full table scan

  • Improve join strategiesaggregations, and access paths

💡 Why It Matters

Without stats, Teradata assumes uniform distribution and guesses row counts, which can result in:

  • Wrong join order

  • Full table scans

  • Skewed AMP workload

  • Longer response times

🔹 Why We Use Collect Statistics

ScenarioBenefit
Joins on large tables                        Helps optimizer choose best join method (merge/hash/nested)
Aggregation queriesChooses fastest access path
Skewed dataHelps optimizer avoid skew
Where / Group By / Order ByProvides accurate row estimates
IndexesHelps optimizer decide which index to use

🔹 How It Helps Optimizer

  1. Determines row count and value distribution

  2. Helps decide join strategy:

    • Merge Join

    • Hash Join

    • Product Join

  3. Helps in AMP assignment

  4. Reduces spool space usage

📊 What Does It Collect?

  • Distinct values count

  • Null count

  • Min/Max values

  • Frequency histogram

  • Row count

  • Skew factor

These help the optimizer estimate:

  • Selectivity (how many rows a filter returns)

  • Join cardinality

  • Join cost and order

  • Whether to use an indexjoin index, or full table scan

=====================================================================

🔐 Teradata Data Protection Features

Teradata provides multi-layered protection, both at the system level (hardware & OS) and database level (AMP-level redundancy, journaling, and transaction integrity).


🔁 Real-world analogy

Imagine a shopping mall with multiple backup generators (nodes) and alternate power cables (buses). If one cable or generator fails, power (data processing) continues seamlessly.


🗃️ DATABASE-LEVEL PROTECTION

1. Fallback

  • Each row stored twice on different AMPs (primary + fallback copy).

  • Fallback is a data protection feature in Teradata that ensures AMP-level fault tolerance by keeping a duplicate copy of each row of a table on a different AMP in the same cluster.

  • If one AMP goes down, fallback AMP serves the data.

✅ Transparent to users, but requires:

  • 2x disk space

  • Extra I/O for write operations

🔁 Real-world analogy:

Imagine every important document is kept in two different filing cabinets. If one cabinet is lost in a fire (AMP failure), the second one keeps business running.

📦 How It Works:

  • For every row inserted into a Fallback table, a second copy is created.

  • The primary row goes to one AMP (based on hashing).

  • The fallback row is stored on a different AMP in the same cluster.

  • If an AMP fails, queries will automatically read from the fallback AMP.

🧠 Example:

Suppose a table employees is defined with Fallback.

  • Row emp_id = 101 hashes to AMP 1 → primary row is stored here.

  • Teradata automatically stores a fallback copy on AMP 2 (same cluster).

If AMP 1 fails, Teradata accesses the copy on AMP 2 seamlessly — no data loss and no user disruption.


2. Transient Journal (TJ)

  • The Transient Journal (TJ) is a temporary, automatic rollback mechanism used to protect data integrity during a transaction.

  • Keeps before images of rows during transactions.

  • To ensure atomicity in transactions — either all changes are committed, or none (rollback if failure occurs).

  • If transaction fails → rollback to original state.

🧩 How it works:

  1. During a transaction, before any row is changed, the "before image" of the row is saved in the Transient Journal on the same AMP.

  2. If the transaction succeeds (gets an END TRANSACTION), the saved copies are discarded.

  3. If the transaction fails (e.g., AMP crash, session drop), the changes are rolled back using the saved before images.

✅ Fully automatic and transparent to the user.


3. Down AMP Recovery Journal

  • Activated when an AMP goes down.

  • Tracks changes that would have affected that AMP.

  • Ensures no data is lost during recovery.


4. Permanent Journal (Optional)

  • Logs before/after images permanently for auditing or full rollback.

  • Useful for regulatory or historical tracking.


5. Locks

  • Ensure data consistency.

  • Prevents multiple users from changing same data simultaneously.

🔐 Types: Read lock, Write lock, Exclusive lock, etc.

==========================================================================

🛡️ What is RAID?

RAID (Redundant Array of Independent Disks) is a hardware-level protection mechanism that improves data availability and fault tolerance by distributing or duplicating data across multiple disk drives.

RAID is different from Fallback, which is a Teradata software-level protection strategy.

🔰 Types of RAID Supported in Teradata

🔁 RAID 1 – Mirroring

  • How it works: Every disk has an exact mirror copy on another disk.

  • Redundancy: 100%

  • If one disk fails: The mirrored disk is used — no performance loss.

  • Disk Space CostHigh (needs 2× storage).

  • Performance: Excellent, especially for read operations.

✅ Recommended for high-performance, critical systems.


🧮 RAID 5 – Parity-based

  • How it works: Data is striped across 3 disks, and a parity block is stored on a 4th disk. This parity helps reconstruct any one missing block.

  • Redundancy: Can tolerate one disk failure per rank.

  • If a disk fails: Data is recalculated using parity — performance decreases slightly.

  • Disk Space CostLow (more efficient than RAID 1).

  • Performance: Good under normal conditions, slower during recovery.

✅ Used where cost-efficiency is more important than top-tier speed.


🧮 RAID S – (RAID 5 variant used with EMC)

  • Similar to RAID 5

  • Found in EMC systems

  • Parity and striping logic is optimized for EMC array design

==========================================================================

📘 What is BTEQ in Teradata?

BTEQ (Basic Teradata Query) is a command-line utility that enables users to interact with the Teradata database. It is widely used by developers, DBAs, and data engineers to run SQL scriptsperform data exports/imports, and automate batch processing tasks.

🚀 When to Use BTEQ

  • Running SQL jobs in batch mode (cron jobs or scheduled tasks)

  • Exporting large volumes of data from Teradata

  • Loading data quickly into staging tables

  • Writing ETL-style scripts

  • Running SQL scripts with conditional or iterative logic

==========================================================================

FASTLOAD
Teradata FastLoad is a high-performance utility designed to quickly load large volumes of data into empty tables in Teradata. It bypasses the traditional insert path and loads data directly into the database, achieving very fast bulk loading speeds.

Key Features:
  • Loads only empty tables: Target table must be empty before loading.
  • Supports only INSERT operations: Cannot perform updates or deletes.
  • Requires no secondary indexes or referential integrity on target table.
  • Uses multiple sessions (parallelism) for speed.
  • Supports error handling with error tables.
  • Supports checkpoints for recovery.
SESSIONS 4;
RECORD 100 THRU 100000;
ERRLIMIT 25;
LOGON tdpid/userid,password;

DROP TABLE FastTable;
DROP TABLE Error1;
DROP TABLE Error2;

CREATE TABLE FastTable, NO FALLBACK (
  ID INTEGER,
  UFACTOR INTEGER,
  MISC CHAR(42)
) PRIMARY INDEX (ID);

DEFINE
  ID (INTEGER),
  UFACTOR (INTEGER),
  MISC (CHAR(42))
  FILE=FileName;

SHOW;

BEGIN LOADING FastTable ERRORFILES Error1, Error2 CHECKPOINT 10000;
INSERT INTO FastTable (ID, UFACTOR, MISC) VALUES (:ID, :UFACTOR, :MISC);
END LOADING;

LOGOFF;



MULTI LOAD
Teradata MultiLoad is a powerful utility designed to perform high-speed batch maintenance of multiple tables simultaneously in Teradata. Unlike FastLoad, which only inserts into empty tables, MultiLoad can insert, update, delete, and upsert rows in one or more existing tables.

Can load up to 5 tables and perform up to 20 DML operations (insert, update, delete, upsert) within one job/script

==========================================================================
ANSI VS BTET


=======================================================================

Table constraint

Constraints are rules applied on tables or columns to enforce data integrity in Teradata.

👉 Ensures accuracy, consistency, and validity of data

👉 Supported mostly for Primary Indexes, Primary Keys, Foreign Keys, and Check conditions

In Teradata, a table constraint is a rule applied at the table level (not just column level) to enforce data integrity. Table constraints are typically used for:

  • Composite primary keys

  • Composite unique constraints

  • Foreign key constraints referencing multiple columns

  • Check constraints on multiple columns

1. Composite PRIMARY KEY (Table-level)


CREATE TABLE orders ( order_id INT, product_id INT, quantity INT, PRIMARY KEY (order_id, product_id) );

This ensures each (order_id, product_id) pair is unique.


2. Composite UNIQUE Constraint


CREATE TABLE employee ( emp_id INT, email VARCHAR(100), phone_number VARCHAR(15), UNIQUE (email, phone_number) );

Ensures no two employees have the same email + phone combo.


3. Table-Level FOREIGN KEY Constraint


CREATE TABLE order_items ( order_id INT, product_id INT, quantity INT, FOREIGN KEY (order_id) REFERENCES orders(order_id) );

4. CHECK Constraint


CREATE TABLE payments ( payment_id INT, amount DECIMAL(10,2), method VARCHAR(20), CHECK (amount > 0 AND method IN ('cash', 'card', 'upi')) );

Restricts values based on multiple column logic.

🧠 Notes for Teradata:

  • Foreign key and check constraints are declarative only in Teradata—they are not enforced by the database engine.

  • Use referential integrity manually via joins or scripts if enforcement is required.


=======================================================================

Types of Table

🔹 1. Permanent Table

  • Definition: Stores actual business data permanently.

  • Persistence: Data remains until dropped manually.

  • Stores persistent data that is not deleted after a session ends.

  • Data is written to disk and protected using fallback or RAID.

  • Uses: OLTP/OLAP, fact and dimension tables.

  • Space: Uses Perm Space.

🧾 Example:


CREATE TABLE employee ( emp_id INTEGER, name VARCHAR(100) );

🔹 2. Volatile Table

  • Definition: Temporary tables whose definition and data exist only during the session.

  • Persistence: Automatically dropped at session end.

  • Temporary table only available during the session.

  • Data is stored in spool space and lost after logout.

  • Good for staging or session-specific processing.

  • Uses: Intermediate calculations, session-level processing.

  • Space: Uses Temp Space.

🧾 Example:


CREATE VOLATILE TABLE temp_employee ( emp_id INTEGER, name VARCHAR(100) ) ON COMMIT PRESERVE ROWS;

🔹 3. Global Temporary Table (GTT)

  • Definition: Permanent table definition, temporary data for each session.

  • Persistence: Structure stays; data cleared at session end.

  • Table definition persists, but data is session-specific.

  • Data is deleted at the end of each session.

  • Requires ON COMMIT clause.

  • Uses: Temporary processing across multiple queries.

  • Space: Uses Temp Space.

🧾 Example:


CREATE GLOBAL TEMPORARY TABLE temp_salary ( emp_id INTEGER, salary DECIMAL(10,2) ) ON COMMIT PRESERVE ROWS;

🔹 4. Derived Table

  • Definition: Tables created on-the-fly inside queries, no explicit storage.

  • Persistence: Exists only for duration of the query.

  • Virtual table created in a subquery.

  • Exists only for the duration of the query.

  • Uses: Query processing, subqueries.

  • Space: Uses Spool Space.

  • Cannot be reused.

🧾 Example:


SELECT dept, AVG(salary) FROM ( SELECT dept, salary FROM employee ) AS derived GROUP BY dept;

🔹 5. Queue Table

  • Used for asynchronous processing (message queuing).

  • Follows FIFO (First In First Out) principle.

  • Supports special operations like READQ and CONSUME.

🧾 Example:


CREATE QUEUE TABLE msg_queue ( msg_id INTEGER, msg_text VARCHAR(500) ) PRIMARY INDEX (msg_id);

🔹 6. Multiset Table

  • Allows duplicate rows.

  • Must be explicitly specified.

🧾 Example:


CREATE MULTISET TABLE logs ( user_id INTEGER, login_time TIMESTAMP );

🔹 7. Set Table

  • Does not allow duplicate rows.

  • Teradata performs a duplicate row check on insert.

🧾 Example:


CREATE SET TABLE departments ( dept_id INTEGER, dept_name VARCHAR(100) );

🔹 8. NoPI Table (No Primary Index)

  • Rows are not hashed by PI, but distributed round-robin.

  • Good for staging and bulk loading.

  • Can reduce skew in large loads.

🧾 Example:


CREATE SET TABLE staging_table ( emp_id INTEGER, dept_id INTEGER ) NO PRIMARY INDEX;

🔹 9. Column-Partitioned Table (Columnar)

  • Stores data column-wise instead of row-wise.

  • Improves performance for analytic queries on few columns.

🧾 Example:


CREATE COLUMN PARTITIONED TABLE sales_data ( sales_id INTEGER, product_id INTEGER, amount DECIMAL(10,2), sales_date DATE ) PARTITION BY COLUMN;

🔹 10. Fallback Table

  • Any table (except volatile and queue) can be fallback.

  • Stores duplicate copy of each row on a different AMP for data protection.

🧾 Example:


CREATE TABLE customer ( cust_id INTEGER, cust_name VARCHAR(100) ) FALLBACK;
==========================================================================

🔹 Types of SQL in Teradata

SQL commands are categorized based on purpose and functionality.

🔹 1️⃣ DDL – Data Definition Language

  • Purpose: Defines or modifies database objects like tables, indexes, databases.

  • Examples of Commands: CREATE, ALTER, DROP, RENAME, TRUNCATE

  • Example:

CREATE TABLE employee (
emp_id INT,
name VARCHAR(50),
dept_id INT
);

🔹 2️⃣ DML – Data Manipulation Language

  • Purpose: Manipulates data stored in tables.

  • Examples of Commands: INSERT, UPDATE, DELETE, MERGE

  • Example:

INSERT INTO employee (emp_id, name, dept_id)
VALUES (1, 'RK', 10);

🔹 3️⃣ DQL – Data Query Language

  • Purpose: Retrieves data from database tables.

  • Command: SELECT

  • Example:

SELECT name, dept_id
FROM employee
WHERE emp_id = 1;

🔹 4️⃣ DCL – Data Control Language

  • Purpose: Controls access and permissions in the database.

  • Examples of Commands: GRANT, REVOKE

  • Example:

GRANT SELECT ON employee TO analyst_role;

🔹 5️⃣ TCL – Transaction Control Language

  • Purpose: Manages transactions in the database.

  • Examples of Commands: COMMIT, ROLLBACK, SAVEPOINT

  • Example:

BEGIN TRANSACTION;
UPDATE employee SET dept_id = 20 WHERE emp_id = 1;
ROLLBACK; -- undo changes

🔹 Summary Table

SQL TypePurposeExamples
DDLDefine/alter objects        CREATE, ALTER, DROP
DMLManipulate dataINSERT, UPDATE, DELETE, MERGE
DQLQuery dataSELECT
DCLControl accessGRANT, REVOKE
TCLManage transactionsCOMMIT, ROLLBACK, SAVEPOINT
==========================================================================

🔹 Teradata Set Operators in Teradata

Set operators in Teradata are used to combine results of two or more queries into a single result set.


🔹 Rules for Set Operators

  1. Same number of columns in each query

  2. Compatible data types in corresponding columns

  3. Result column names taken from first query



🔹 Notes
  • UNION → removes duplicates → slower than UNION ALL

  • UNION ALL → faster, preserves duplicates

  • INTERSECT / MINUS → useful for comparing datasets

  • Can be combined with ORDER BY at the end of the full query

  • Even though UNION ALL keeps all duplicate rows, SQL tables and set operations are inherently unordered.

    • The order of rows in the result set is not guaranteed unless you explicitly use an ORDER BY clause.

==========================================================================

Teradata String Manipulation Functions

FunctionDescriptionSyntaxExampleOutput
CONCATConcatenates two or more strings                CONCAT(str1, str2, ...)                    CONCAT('Hello',' ','RK')Hello RK
LENGTH / CHARACTER_LENGTH    Returns number of characters in string

LENGTH(str) or CHARACTER_LENGTH(str)CHARACTER_LENGTH('Robert')6
LOWERConverts string to lowercase

LOWER(str)LOWER('ROBERT')robert
UPPERConverts string to uppercase

UPPER(str)UPPER('Robert')ROBERT
LPADPads left side of string to specified length

LPAD(str, size, padstr)LPAD('RK',5,'*')***RK
RPADPads right side of string to specified length

RPAD(str, size, padstr)RPAD('RK',5,'*')RK***
TRIMRemoves spaces from string ends

`TRIM([LEADINGTRAILINGBOTH] [char] FROM str)`
LTRIMRemoves leading spaces

LTRIM(str)LTRIM(' Robert')Robert
RTRIMRemoves trailing spaces

RTRIM(str)RTRIM('Robert ')Robert
REPLACE (remove)Removes search string

REPLACE(str, search)REPLACE('Hello RK','RK')Hello
REPLACE (replace)Replaces search string with another

REPLACE(str, search, replace)REPLACE('Hello RK','RK','AI')Hello AI
REVERSEReverses the string

REVERSE(str)REVERSE('Robert')treboR
SPLITSplits string by delimiter

SPLIT(str, delimiter)SPLIT('a,b,c',',')['a','b','c']
STRPOS / POSITIONReturns starting position of substring

STRPOS(str, substr) or POSITION(substr IN str)POSITION('e' IN 'Robert')4
SUBSTR / SUBSTRINGReturns substring from position with length

SUBSTRING(str FROM pos FOR len) or SUBSTR(str,pos,len)SUBSTRING('Robert' FROM 1 FOR 3)Rob
CHRReturns character from ASCII value

CHR(n)CHR(65)A
TO_UTF8Encodes string to UTF-8 varbinary

TO_UTF8(str)TO_UTF8('RK')0x5254 (binary)
FROM_UTF8Decodes UTF-8 varbinary to string

FROM_UTF8(binary)FROM_UTF8(0x5254)RK
TRANSLATEReplaces characters in string based on mapping

SELECT TRANSLATE(str, from, to)TRANSLATE('abc','abc','xyz')xyz
INDEXLocates position of character (Teradata extension)INDEX(str, char)INDEX('Robert','b')2
==========================================================================

🔹 Teradata Date/Time Functions

1️⃣ Built-in Current Functions

FunctionDescriptionExampleOutput
CURRENT_DATEReturns the current dateSELECT CURRENT_DATE;2024-05-22
CURRENT_TIMEReturns the current timeSELECT CURRENT_TIME;10:01:13
CURRENT_TIMESTAM    PReturns current timestamp (date + time)SELECT CURRENT_TIMESTAMP;2024-05-22 10:01:13.990000+00.00
DATEReturns current date (legacy)SELECT DATE;2024-05-22
TIMEReturns current time (legacy)SELECT TIME;10:01:13
DATABASEReturns current database nameSELECT DATABASE;TDUSER

2️⃣ EXTRACT Function

Extracts portions of DATE, TIME, or TIMESTAMP.

PartSyntaxExampleOutput
YearEXTRACT(YEAR FROM date)EXTRACT(YEAR FROM CURRENT_DATE);2024
MonthEXTRACT(MONTH FROM date)EXTRACT(MONTH FROM CURRENT_DATE);5
DayEXTRACT(DAY FROM date)EXTRACT(DAY FROM CURRENT_DATE);22
HourEXTRACT(HOUR FROM timestamp)EXTRACT(HOUR FROM CURRENT_TIMESTAMP);10
MinuteEXTRACT(MINUTE FROM timestamp)EXTRACT(MINUTE FROM CURRENT_TIMESTAMP);1
SecondEXTRACT(SECOND FROM timestamp)EXTRACT(SECOND FROM CURRENT_TIMESTAMP);13.990000

3️⃣ Interval Arithmetic

Teradata INTERVAL functions allow date/time arithmetic.

Interval TypeDescriptionExampleOutput
YEARAdd yearsCURRENT_DATE + INTERVAL '04' YEAR;2028-05-22
YEAR TO MONTHAdd years & monthsCURRENT_DATE + INTERVAL '04-03' YEAR TO MONTH;2028-08-22
DAY TO MINUTEAdd days, hours, minutesCURRENT_TIMESTAMP + INTERVAL '03 05:10' DAY TO MINUTE;2024-05-25 15:11:13
DAY TO SECONDAdd days, hours, minutes, secondsSimilar syntax-

4️⃣ Common Date Functions

FunctionDescriptionExampleOutput
LAST_DAY(date)Returns last day of the monthLAST_DAY(CURRENT_DATE);2024-05-31
NEXT_DAY(date, weekday)Returns next specified weekdayNEXT_DAY(CURRENT_DATE, 'MONDAY');2024-05-27
MONTHS_BETWEEN(date1, date2)Returns integer months between two datesMONTHS_BETWEEN(DATE '2024-05-22', DATE '2024-01-22');4
ADD_MONTHS(date, n)Adds n monthsADD_MONTHS(CURRENT_DATE, 3);2024-08-22
TO_DATE(string)Converts string to DATETO_DATE('2024-05-22','YYYY-MM-DD');2024-05-22
TO_TIMESTAMP(string)Converts string to TIMESTAMPTO_TIMESTAMP('2024-05-22 10:01:13');2024-05-22 10:01:13
TRUNC(date, unit)Truncate to unitTRUNC(CURRENT_DATE,'MONTH');2024-05-01
ROUND(date, unit)Round to nearest unitROUND(CURRENT_DATE,'MONTH');2024-06-01
NUMTODSINTERVAL(n,'DAY')Converts numeric to INTERVALNUMTODSINTERVAL(3,'DAY');3 00:00:00
NUMTOYMINTERVAL(n,'MONTH')Converts numeric to INTERVALNUMTOYMINTERVAL(5,'MONTH');0-05

5️⃣ DISTINCT Option (Aggregate on Dates or Numeric)

  • DISTINCT ensures unique values only.

Example:

SELECT COUNT(DISTINCT Deduction) FROM Salary;
SELECT SUM(DISTINCT x), AVG(DISTINCT y) FROM Salary;
==========================================================================

CASE, COALESCE, and NULLIF

🔹 1️⃣ CASE Expression

Purpose: Conditional logic in SQL. Returns a value based on matching conditions.

Syntax (Simple CASE)

CASE expression
WHEN value1 THEN result1
WHEN value2 THEN result2
...
ELSE default_result
END

Syntax (Searched CASE)

CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE default_result
END

Example

Employee Table

Employee_IdAgeSalaryMarital_Status
20200122200001
20200225250001
20200330350002
20200426320002
20200532400002

Query:

SELECT Employee_Id, Age, Salary,
CASE Marital_Status
WHEN 1 THEN 'Single'
WHEN 2 THEN 'Married'
ELSE 'Not Sure'
END AS Marital_Status
FROM Employee;

Output:

Employee_IdAgeSalaryMarital_Status
2020012220000Single
2020022525000Single
2020033035000Married
2020042632000Married
2020053240000Married

🔹 2️⃣ COALESCE Function

Purpose: Handles NULL values. Returns the first non-NULL value from a list of expressions.

Syntax

COALESCE(expr1, expr2, ..., exprN)

Example

Student Table

Roll_NoFirst_NameLast_NameLandline_noMobile_no
101RKSharmaNULL9876543210
102AmitKumar0123456789NULL
103PriyaSinghNULLNULL
104RahulVermaNULL0

Query:

SELECT Roll_No, First_Name, Last_Name,
COALESCE(Landline_no, Mobile_no, 'phone# not available') AS phone_no
FROM Student;

Output:

Roll_NoFirst_NameLast_Namephone_no
101RKSharma9876543210
102AmitKumar0123456789
103PriyaSinghphone# not available
104RahulVerma0

-----------------------------------------------------------------------------------------------------------------------
COALESCE(NULL, 1, 'table')

🔹 Important Note About Data Types in Teradata

  • Teradata requires that all expressions in COALESCE must be of compatible types (or can be implicitly converted).

  • In your example: 1 is an integer, 'table' is a string.

  • Teradata will promote the integer 1 to VARCHAR to match the string type (because 'table' is a string literal).

So the result type will be VARCHAR, and the integer 1 will become '1'.



🔹 3️⃣ NULLIF Function

Purpose: Returns NULL if two expressions are equal; otherwise returns the first expression.

Syntax

NULLIF(expr1, expr2)

Example

SELECT Roll_No,
NULLIF(Mobile_no, 0) AS Mobile
FROM Student;

Output:

Roll_NoMobile
1019876543210
102NULL
103NULL
104NULL

✅ In this example, NULLIF(Mobile_no, 0) returns NULL for Roll_No 104 because Mobile_no = 0.


🔹 Quick Comparison

FunctionPurposeKey Points
CASEConditional logicCan be simple (CASE expr WHEN) or searched (CASE WHEN condition)
COALESCE    Handle NULLsReturns first non-NULL value
NULLIFNULL replacementReturns NULL if two expressions are equal

==========================================================================

🔹 Teradata Joins Overview

Joins are used to combine rows from two or more tables based on related columns.

Join TypeDescriptionSyntaxExample / Output
INNER JOINReturns rows only when there is a match in both tablesSELECT cols FROM Table1 INNER JOIN Table2 ON Table1.col = Table2.col;

Example:
Join Student and Attendance on RollNo. RollNo 1005 is excluded because it has no attendance record. Output:1001 → 90%1002 → 72%1003 → 68%1004 → 95%


LEFT OUTER JOINReturns all rows from left table + matching rows from right tableSELECT cols FROM Table1 LEFT OUTER JOIN Table2 ON Table1.col = Table2.col;RollNo 1005 from Student is included; % is NULL because attendance is missing. Output: 1001 → 90%, …, 1005 → NULL


RIGHT OUTER JOINReturns all rows from right table + matching rows from left tableSELECT cols FROM Table1 RIGHT OUTER JOIN Table2 ON Table1.col = Table2.col;



Similar to LEFT JOIN, but ensures all right table rows appear
FULL OUTER JOINReturns all rows from both tables, matching where possibleSELECT cols FROM Table1 FULL OUTER JOIN Table2 ON Table1.col = Table2.col;


Combines LEFT and RIGHT OUTER JOIN results. Non-matches from either table show NULL.

CROSS JOINReturns all combinations of rows from both tables (Cartesian product)SELECT cols FROM Table1 CROSS JOIN Table2;Example: Student RollNo = 1001 CROSS JOIN Attendance Output: 1001-1001 → 90%, 1001-1002 → 72%, 1001-1003 → 68%, 1001-1004 → 95%


SELF JOINJoins a table with itselfSELECT A.col, B.col FROM Table A, Table B WHERE A.col = B.col;Useful to compare rows in the same table (e.g., Employee-Manager relationships)

🔹 Important Teradata Notes on Joins with NULLs

  • In SQL, NULL ≠ NULL, so joins on NULL usually do not match.

  • Teradata follows the ANSI standard: rows with NULL in the join column do not match any other row (even NULL) for INNER, LEFT, RIGHT joins.

  • CROSS JOIN is unaffected because it does not use ON.

==========================================================================

Tables

NULL in t2.id does NOT match t1.id=NULL.
NULL ≠ NULL in SQL.
INNER JOIN: NULLs never match.
LEFT/RIGHT OUTER JOIN: NULL in one table → unmatched rows → other table columns = NULL.



🔹 LEFT JOIN (t1 LEFT JOIN t2 ON t1.id = t2.id)



✅ Key: LEFT JOIN returns all t1 rows, matched t2 rows if any, otherwise NULL.

🔹 RIGHT JOIN (t1 RIGHT JOIN t2 ON t1.id = t2.id)




🔹 INNER JOIN (t1 INNER JOIN t2 ON t1.id = t2.id)


❌ Per ANSI standard. In Teradata, NULL does not equal NULL, so inner join on NULL yields no row.

==========================================================================






Comments

Popular posts from this blog

Teradata

Git

work