Execution Order

1. FROM

2. ON

3. WHERE

4. GROUP BY

5. HAVING

6. SELECT

7. QUALIFY

8. ORDER BY

=====================================================================

🔹 `BETWEEN` Clause with String in Teradata

Yes ✅ — BETWEEN works with strings in Teradata.

But important point:

👉 It works based on lexicographical (dictionary) order,
NOT numeric order.

✅ Basic Syntax


column_name BETWEEN 'start_value' AND 'end_value'

It is equivalent to:


column_name >= 'start_value'
AND column_name <= 'end_value'

🔹 Example 1 – Simple String Range


SELECT name
FROM employee
WHERE name BETWEEN 'A' AND 'M';

What it does:

Returns all names starting from:

Because alphabetically:


'A' < 'B' < 'C' < ... < 'M'

🔹 Example 2 – Full Word Comparison


WHERE name BETWEEN 'John' AND 'Mary';

This includes:


John
Johnny
Jordan
...
Mary

But NOT:


Adam   (comes before John)
Zack   (comes after Mary)

🔥 Important: It Compares Character by Character

Comparison happens like this:


'Apple' < 'Banana'

Because:

A < B

⚠️ Case Sensitivity

In Teradata:

If column is NOT CASESPECIFIC → case insensitive
If CASESPECIFIC → uppercase and lowercase are different

Example:


'A' < 'a'

ASCII values differ.

🔹 Example 3 – Date Stored as VARCHAR


WHERE date_col BETWEEN '2024-01-01' AND '2024-12-31';

Works correctly only if format is YYYY-MM-DD

❌ If format is:


'01-12-2024'

Then BETWEEN will not work logically.

🚨 Problem

SELECT *

FROM Employee

WHERE Department_Number BETWEEN 567 AND 401;

This can never be TRUE.

👉 Result = No rows returned

BETWEEN does NOT automatically swap values.

🔥 Interview Trick Question

❓ What happens here?


WHERE name BETWEEN 'A' AND 'B';

It returns:

All names starting with A
All names starting with B

Because:


'A' <= name <= 'B'

But it will NOT return:


Brian   (comes after 'B')

Because:


'Brian' > 'B'

✅ 1️⃣ If Column is NOT CASESPECIFIC (Default in Teradata)

👉 Comparison is case-insensitive

Example:


SELECT name
FROM employee
WHERE name BETWEEN 'a' AND 'm';

This will return:


Adam
alex
Brian
maria

Even though cases differ, Teradata treats 'A' and 'a' as same (unless defined otherwise).

❗ 2️⃣ If Column is CASESPECIFIC

If table is created like:


CREATE TABLE employee
(
   name VARCHAR(50) CASESPECIFIC
);

👉 Then comparison is case-sensitive

Now ASCII order matters:


'A' (65)
'Z' (90)
'a' (97)
'z' (122)

So:


WHERE name BETWEEN 'A' AND 'm';

Range becomes:


'A' (65) → 'm' (109)

This includes:


Apple
Zoo
abc
john

But may exclude unexpected values depending on ASCII order.

🔎 Important ASCII Order Impact


All UPPERCASE letters come before lowercase letters.

So:


'Z' < 'a'

Because:


90 < 97

🔥 Example Problem

Data:


Apple
banana
Mango
zebra

Query:


WHERE name BETWEEN 'A' AND 'M';

CASESPECIFIC column result:

Returns:


Apple
Mango

But NOT:


banana   (because lowercase 'b' > 'M')

✅ Safe Way to Avoid Case Issues

Use UPPER() or LOWER():


WHERE UPPER(name) BETWEEN 'A' AND 'M';

Now it becomes consistent and predictable.

⚡ Best Practice (Interview Recommended)

If working with string ranges:

✔ Normalize case first
✔ Or confirm column collation
✔ Avoid relying on ASCII behavior blindly

📌 Summary

Column Type	Case Sensitive?	BETWEEN Behavior
Default (NOT CASESPECIFIC)	❌ No	Case ignored
CASESPECIFIC	✅ Yes	ASCII order applied

====================================================================

🔹 DATE Arithmetic in Teradata

In Teradata, DATE is internally stored as an integer (number of days), which allows direct arithmetic operations.

That’s why you can:

Add days to a date
Subtract dates
Compare date ranges
Perform logical filtering

✅ 1️⃣ Add Days to Current Date


SELECT CURRENT_DATE + 40;

👉 Returns date 40 days from today
✔ Output datatype → DATE

✅ 2️⃣ Add Days to a Specific Date


SELECT DATE '1999-01-01' + 300;

👉 Always use ANSI literal format:


DATE 'YYYY-MM-DD'

✔ Returns DATE

✅ 3️⃣ Subtract Two Dates


SELECT CURRENT_DATE - DATE '2007-11-30';

👉 Returns INTEGER
(Number of days between dates)

⚠ Important:

DATE − DATE → INTEGER
DATE + INTEGER → DATE

✅ 4️⃣ Days Worked by Employee


SELECT CURRENT_DATE - Hire_Date
FROM Employee;

👉 Returns number of days worked.

If you want years:


SELECT (CURRENT_DATE - Hire_Date) / 365.25
FROM Employee;

✔ Result → Decimal years

✅ 5️⃣ Employees Working More Than 20 Years


SELECT *
FROM Employee
WHERE CURRENT_DATE - Hire_Date > 365.25 * 20;

👉 Logical comparison using integer result.

✅ 6️⃣ Employees Hired in Last 100 Days


SELECT *
FROM Employee
WHERE Hire_Date > CURRENT_DATE - 100;

👉 Very common interview question.

🔥 Important Behavior Summary

Operation	Result Type
DATE + INTEGER	DATE
DATE - INTEGER	DATE
DATE - DATE	INTEGER
INTEGER + DATE	DATE

🔎 Month / Year Arithmetic (Better Approach)

Instead of hardcoding 365.25:

Use ADD_MONTHS():


SELECT ADD_MONTHS(CURRENT_DATE, 12);

👉 Adds 12 months safely.

Find employees with > 20 years:


SELECT *
FROM Employee
WHERE Hire_Date <= ADD_MONTHS(CURRENT_DATE, -240);

✔ More accurate than 365.25 * 20
✔ Handles leap years correctly

⚠ Common Mistakes

❌ Wrong date literal:


'01-01-1999'

✔ Correct:


DATE '1999-01-01'

🎯 Interview-Level Tricky Questions

❓ What is result datatype of:


CURRENT_DATE - Hire_Date

👉 INTEGER

❓ Why use ADD_MONTHS instead of multiplying days?

👉 Because leap years make day calculation inaccurate.

💡 Real Data Engineering Use Cases

SLA calculation (within 30 days)
Late payment detection
Customer aging buckets
Retention period checks
Incremental data loads (last N days)

==================================================================

🔹 `CAST` Function in Teradata

✅ General Syntax


CAST(expression AS data_type)

Used to convert data from one datatype to another.

🔹 1️⃣ Character Conversions

Example:


SELECT CAST(Last_Name AS CHAR(10))
FROM Employee;

👉 Converts string into fixed-length CHAR(10)
👉 If shorter → padded with spaces
👉 If longer → truncated

Example:


SELECT CAST(1.777 AS CHAR(10));

👉 Result:


'1.777     '

✔ Left-justified
✔ Space padded

Example (Truncation):


SELECT CAST(1.77 AS CHAR(2));

👉 Result:


'1.'

✔ Truncated to 2 characters
✔ No rounding here — just string cut

🔥 Important: Invalid Conversion


SELECT CAST(Last_Name AS INTEGER)
FROM Employee;

👉 ❌ This will FAIL
Because non-numeric characters cannot convert to INTEGER.

Works only if column contains valid numeric strings like:


'123'

Otherwise:


Invalid character in numeric conversion

This fails in both:

ANSI mode
Teradata mode

🔹 2️⃣ Numeric to Integer


SELECT CAST(3.7777 AS INTEGER);

👉 Result:

✔ Decimal portion is truncated
✔ No rounding

🔹 3️⃣ Decimal to Decimal (Precision Change)

When reducing precision or scale:

Teradata uses:
👉 Round to Nearest (Banker's Rounding)

Meaning:

If digit is 5 → round to nearest EVEN number

Examples


SELECT CAST(6.74 AS DEC(2,1));

Result:

6.7

(Regular rounding down)

SELECT CAST(6.75 AS DEC(2,1));

Result:

6.8

Why?

7.5 → nearest even is 8

SELECT CAST(6.85 AS DEC(2,1));

Result:

6.8

Why?

8.5 → nearest even is 8 (already even)

🔥 Banker's Rounding Explained

Value	Rounded (1 decimal)	Reason
6.74	6.7	< 5
6.75	6.8	8 is even
6.85	6.8	8 is even
6.95	7.0	0 is even

This avoids statistical bias in financial systems.

🔹 4️⃣ Arithmetic + CAST


SELECT CAST(Budget_Amount * 1.375 AS DEC(15,3))
FROM Department;

👉 First multiplication
👉 Then result rounded to 3 decimal places

🔹 Important Differences

Conversion	Behavior
Decimal → Integer	Truncates
Decimal → Decimal	Rounds (Banker's rounding)
Char → Char smaller size	Truncates
Char → Numeric	Error if invalid
Numeric → Char	Left-justified

==============================================

🔹 Rounding Rules in Teradata

Teradata supports two rounding methods, controlled by:


DBSControl → RoundHalfWayMagUp

This affects rounding when reducing decimal precision using:

CAST
Implicit decimal conversion
Teradata extended numeric formatting

✅ Method 1: Traditional Rounding

🔹 `RoundHalfWayMagUp = TRUE`

Rule:

If digit to the right is ≥ 5 → round up
If digit is < 5 → round down

This is the common school method.

Examples (DEC(2,1))


SELECT CAST(1.34999 AS DEC(2,1)); → 1.3
SELECT CAST(1.35000 AS DEC(2,1)); → 1.4
SELECT CAST(1.35001 AS DEC(2,1)); → 1.4
SELECT CAST(1.35999 AS DEC(2,1)); → 1.4
SELECT CAST(1.36000 AS DEC(2,1)); → 1.4
SELECT CAST(1.45000 AS DEC(2,1)); → 1.5
SELECT CAST(1.45999 AS DEC(2,1)); → 1.5
SELECT CAST(1.46000 AS DEC(2,1)); → 1.5

✔ Always rounds up when digit is 5 or more.

✅ Method 2: Banker's Rounding (Default)

🔹 `RoundHalfWayMagUp = FALSE`

Also called:
👉 Round Half to Even
👉 Statistically unbiased rounding

Rule:

5 → round up
< 5 → round down
= 5 → round to nearest EVEN digit

This reduces cumulative rounding bias in financial systems.

Examples (DEC(2,1))


SELECT CAST(1.34999 AS DEC(2,1)); → 1.3

(4 < 5 → round down)


SELECT CAST(1.35000 AS DEC(2,1)); → 1.4

(5 → check rounding digit = 3 → 4 is even → round up)


SELECT CAST(1.45000 AS DEC(2,1)); → 1.4

Why?

Digit being rounded = 4
Next digit = 5
4 is EVEN → stay 4
Result → 1.4

SELECT CAST(1.55000 AS DEC(2,1)); → 1.6

Why?

Rounding digit = 5
5 is ODD
Next digit = 5
Round to EVEN → 6

🔥 Key Difference Example

Value	Traditional	Banker's (Default)
1.45	1.5	1.4
1.55	1.6	1.6
2.25	2.3	2.2

🔎 Why Banker's Rounding?

If you always round 5 upward:


1.5 → 2
2.5 → 3
3.5 → 4

Numbers systematically increase.

Banker's rounding balances:


1.5 → 2
2.5 → 2
3.5 → 4
4.5 → 4

✔ Statistically unbiased
✔ Preferred for financial calculations

🔹 Important Notes

Applies when reducing scale (DEC(10,4) → DEC(10,2))
Applies in CAST
Applies in implicit conversions
Does NOT apply when casting decimal → integer (that truncates)

Example:


SELECT CAST(3.777 AS INTEGER);

→ 3 (Truncation, not rounding)

💡 Real Data Engineering Impact

If you are:

Calculating financial totals
Aggregating currency data
Performing revenue adjustments
Converting precision in reporting layers

Rounding method can change totals significantly in large datasets.

=======================================================================

🔹 String Concatenation in Teradata

In Teradata, string concatenation is done using:

||

This operator joins multiple values into a single VARCHAR result.

✅ Basic Examples

1️⃣ Combine Last and First Name


SELECT Last_Name || First_Name
FROM Employee;

Result datatype → VARCHAR

2️⃣ Add Separator


SELECT Last_Name || ', ' || First_Name
FROM Employee;

Example output:


Brown, Adam
Smith, John

3️⃣ Concatenation with Expression


SELECT First_Name || ' ' || Last_Name || ' is ' ||
       (CURRENT_DATE - Birthdate) / 365
FROM Employee;

👉 Numeric values automatically converted to VARCHAR
👉 Result is always VARCHAR

4️⃣ Concatenating Numbers and Strings


SELECT 123 || 'ABC' || 456;

Result:

123ABC456

✔ Numbers are implicitly converted to character.

🔥 Resulting Data Type

All concatenation results → VARCHAR

Even if inputs are:

CHAR
INTEGER
DATE
DECIMAL

Teradata converts everything to VARCHAR.

🔹 Using CAST with Concatenation

You can explicitly define output length:


SELECT CAST(Last_Name || ', ' || First_Name AS CHAR(100))
FROM Employee;

👉 Result becomes fixed-length CHAR(100)

⚠ Truncation Example


SELECT CAST(123 || 'ABC' || 456 AS CHAR(3));

Result:

✔ Truncated to 3 characters
✔ No rounding, simple cut-off

🔹 Using SUBSTRING with Concatenation

Example:

Display first initial + last name


SELECT
   SUBSTRING(first_name FROM 1 FOR 1) || '. ' || last_name
   AS employee_name
FROM employee
WHERE department_number = 403;

Output:


A. Brown
P. Hopkins
D. Lombardo
A. Villegas
J. Charles

🔎 Important Behavior

1️⃣ NULL Handling


SELECT Last_Name || First_Name

If either column is NULL → entire result is NULL.

To avoid this:


SELECT COALESCE(Last_Name,'') || COALESCE(First_Name,'')

2️⃣ Performance Tip (Interview Level)

Heavy concatenation:

Prevents index usage
Should not be used in WHERE clause filters

❌ Avoid:


WHERE First_Name || Last_Name = 'JohnSmith'

✔ Better:


WHERE First_Name = 'John'
AND Last_Name = 'Smith'

🔥 Real Data Engineering Use Cases

Creating display columns
Generating full names
Building export files
Creating composite keys (careful!)
Formatting audit messages

==================================================================

🔹 `FORMAT` in Teradata

FORMAT is a Teradata extension (not ANSI standard).

It changes how a value is displayed,
NOT how it is stored internally.

✅ Basic Character Formatting


SELECT Last_Name (FORMAT 'X(31)')
FROM Employee;

✔ Displays value in 31-character width
✔ Maximum width = 255

Equivalent to:


FORMAT 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

But cleaner way:


FORMAT 'X(255)'

🔹 Important: FORMAT Does NOT Change Data Type

Example:


SELECT Salary (FORMAT '$999,999.99')
FROM Employee;

✔ Still numeric internally
✔ Only output appearance changes

🔹 Common FORMAT Symbols

Symbol	Meaning
`9`	Show digit (including leading zeros)
`Z`	Suppress leading zeros
`$`	Dollar sign
`%`	Percent sign
`,`	Comma separator
`.`	Decimal point
`-`	Negative sign
`X`	Character position
`B`	Blank space
`/`	Slash

🔥 Examples

1️⃣ Numeric with Leading Zeros


SELECT 123 (FORMAT '99999');

Result:

2️⃣ Suppress Leading Zeros


SELECT 123 (FORMAT 'ZZZZZ');

Result:

✔ Leading zeros suppressed
✔ Spaces instead

3️⃣ Currency Format


SELECT 12345.6 (FORMAT '$ZZZ,ZZ9.99');

Result:


$12,345.60

4️⃣ Percent Display


SELECT .85 (FORMAT 'ZZ9%');

Result:

85%

5️⃣ Dash for Negative Numbers


SELECT -123 (FORMAT 'ZZZ9-');

Result:


 123-

6️⃣ Character Formatting


SELECT 'ABC' (FORMAT 'X(10)');

Result:

ABC

(Padded with spaces to 10)

🔎 FORMAT vs CAST

FORMAT	CAST
Changes display only	Changes datatype
Teradata specific	ANSI standard
Used mostly in reports	Used in data transformation

Example difference:


SELECT CAST(123 AS CHAR(10));

→ Actually becomes CHAR datatype


SELECT 123 (FORMAT '99999');

→ Still numeric

⚠ Important Notes

1️⃣ FORMAT is mainly for:

Reports
BTEQ output
User display

2️⃣ Not recommended in:

ETL transformations
Production joins
WHERE conditions

3️⃣ FORMAT does not affect sorting logic
Sorting still uses original numeric value.

🎯 Interview Questions

Difference between FORMAT and CAST?
Does FORMAT change datatype?
What is maximum X(n) size?
What is Z vs 9 difference?
Why is FORMAT not ANSI compliant?

==========================================================================

🔹 Aggregating Groups (GROUP BY) in Teradata

GROUP BY is used to:

✔ Define logical groups
✔ Perform aggregate calculations per group
✔ Return one row per group

🔹 Key Concepts

✅ Non-Aggregate Columns

Columns not inside aggregate functions (SUM, COUNT, etc.)

✅ Aggregate Columns

Columns inside:

SUM()
COUNT()
AVG()
MIN()
MAX()

🔥 Core GROUP BY Rules

1️⃣ Every projected non-aggregate must appear in GROUP BY

✔ Correct:


SELECT department_number,
       SUM(salary_amount)
FROM employee
GROUP BY department_number;

❌ Incorrect:


SELECT department_number,
       manager_employee_number,
       SUM(salary_amount)
FROM employee
GROUP BY department_number;

Error:


3504 Selected non-aggregate values must be part of the associated group.

2️⃣ GROUP BY can use column name or position

By name:


GROUP BY department_number;

By position:


GROUP BY 1;

(1 = first column in SELECT list)

3️⃣ NULL Forms a Group

Even though NULL values are ignored inside SUM/AVG,
they still create a group:

Example result:

department_number	Sum(salary_amount)
403	233000.00
402	77000.00
301	116400.00
999	100000.00
NULL	129950.00
401	245575.00
501	200125.00

👉 NULL department employees are grouped together.

4️⃣ GROUP BY Does NOT Sort

If you need sorted result:


ORDER BY department_number;

Grouping ≠ Ordering.

5️⃣ You Can Group By Columns Not Projected

Example:


SELECT SUM(salary_amount)
FROM employee
GROUP BY department_number,
         manager_employee_number;

Here:

We do NOT display department or manager
But grouping is based on both

👉 More grouping columns = more granular groups
👉 More groups = more rows returned

🔎 Understanding Granularity

If grouped by:


GROUP BY department_number;

→ One row per department

If grouped by:


GROUP BY department_number,
         manager_employee_number;

→ One row per department-manager combination

Granularity increases.

🔹 Logical Execution Order

FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

Aggregation happens before SELECT output.

🔹 HAVING (Filtering Groups)


SELECT department_number,
       SUM(salary_amount) AS total_sal
FROM employee
GROUP BY department_number
HAVING SUM(salary_amount) > 200000;

✔ WHERE filters rows
✔ HAVING filters groups

🔥 Why Aggregation Removes Detail Rows

Original table:

Many rows per department

After GROUP BY:

One summary row per department

Detail rows are collapsed into summary rows.

==========================================================================

🔹 Adding Grouping Columns in Teradata

When you add more columns to a GROUP BY, you:

✔ Increase granularity
✔ Increase number of groups
✔ Increase number of result rows

At the extreme:
If grouping columns form a unique combination,
then aggregation = detail rows (no real collapsing happens).

SELECT
   Manager_Employee_Number AS Mgr,
   Department_Number AS Dept,
   Job_Code AS JCd,
   SUM(Salary_Amount) AS SumSal
FROM employee_sales.employee
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;

What’s happening?

Grouping is done on:

Manager
Department
Job Code

So each unique combination gets its own group.

🔹 Why Rows Increase?

If you group only by:


GROUP BY Department_Number;

→ Few rows (1 per department)

If you group by:


GROUP BY Manager_Employee_Number,
         Department_Number,
         Job_Code;

→ Many more rows
Because combination is more specific.

🔥 When GROUP BY Becomes Unique

If this combination:


Manager + Department + Job_Code

is unique per employee,

Then:


SUM(Salary_Amount)

is simply the employee's salary.

👉 No real aggregation effect
👉 Detail = Aggregated result

🔹 Valid GROUP BY Syntax Variations

For this query, all are valid:

By Position


GROUP BY 1, 2, 3


GROUP BY 3, 1, 2

(Order doesn’t matter logically)

By Alias Name


GROUP BY Mgr, Dept, JCd

Mixed Form


GROUP BY 1, Department_Number, JCd

Teradata allows:

Column names
Aliases
Positional references
Mixture of them

🔎 Important Concept: Granularity

GROUP BY Columns	Result
Dept	Dept-level totals
Dept, Manager	Dept-manager totals
Dept, Manager, Job	Very detailed totals
Unique Key Columns	Same as detail rows

🔥 NULL Behavior

NULL values still create groups:

Example:


Dept = NULL

All NULL dept rows grouped together.

🔹 GROUP BY vs DISTINCT

If grouping columns form a unique combination:


SELECT DISTINCT Manager,
                Department,
                Job_Code
FROM employee;

May produce similar row count as:


GROUP BY Manager,
         Department,
         Job_Code;

But:

DISTINCT removes duplicates
GROUP BY aggregates

==========================================================================

🔹 CASE Overview in Teradata

The CASE expression is used to evaluate conditions and return derived values in the result set.

It is:

ANSI compliant
Used in SELECT, WHERE, ORDER BY, HAVING
Very common in reporting & analytics
Helpful for data enrichment and derived columns

🔥 Two Forms of CASE

1️⃣ Valued Form (Simple CASE)

Based on equality
Compares one expression against multiple values
Cannot test ranges or complex conditions
Cannot test NULL using =

2️⃣ Searched Form

Uses full boolean conditions
Can use:
- =
- <>
- >, <
- BETWEEN
- IN
- IS NULL
More flexible
Most commonly used in real projects

🔹 Valued Form Explained

Example:


SELECT Last_Name,
       First_Name,
       CASE USER
            WHEN 'sql01' THEN Department_Number
            WHEN 'sql02' THEN Manager_Employee_Number
       END
FROM Employee
WHERE Job_Code = 512101;

What this means internally:


CASE
   WHEN USER = 'sql01' THEN Department_Number
   WHEN USER = 'sql02' THEN Manager_Employee_Number
END

So:

USER is evaluated
Compared using equality
First match stops evaluation
If nothing matches → returns NULL (default)

🔎 Important Rules

✔ Each WHEN tested sequentially
✔ First TRUE stops evaluation
✔ ELSE executes if nothing matches
✔ ELSE NULL is default
✔ Must end with END

⚠ NULL Important Rule

This is invalid in valued form:


WHEN NULL THEN ...

Because:


USER = NULL

is never TRUE (NULL comparison is unknown)

To check NULL → use searched CASE:


CASE
   WHEN USER IS NULL THEN 'Unknown'
END

🔹 With Explicit ELSE


SELECT Last_Name,
       First_Name,
       CASE USER
            WHEN 'sql01' THEN Department_Number
            WHEN 'sql02' THEN Manager_Employee_Number
            ELSE NULL
       END
FROM Employee
WHERE Job_Code = 512101;

ELSE NULL is optional because it’s default.

🔹 Why Output Shows NULL ( ? )

Example result:

Last_Name	First_Name	CASE Result
Ratzlaff	Larry	NULL
Wilson	Edward	NULL
Rabbit	Peter	NULL

Why?

Because:

Current USER is neither 'sql01' nor 'sql02'
So CASE returned NULL

🔥 Default Column Heading

If you don’t alias:


CASE USER
...
END

Teradata gives default column name like:


CASE_Expression

Better practice:


CASE USER
...
END AS Derived_Value

🆚 Valued vs Searched Quick Comparison

Feature	Valued	Searched
Equality only	✅	❌
Multiple columns	❌	✅
Range checks	❌	✅
IS NULL	❌	✅
Most used in projects	⚠	✅

🔥 Real Data Engineering Use Cases

You’ll use CASE to:

Create flags (High Salary / Low Salary)
Categorize departments
Derive KPI bands
Handle NULL values
Build reporting dimensions
Conditional aggregation

Example:


SUM(
   CASE
      WHEN Salary_Amount > 100000 THEN Salary_Amount
      ELSE 0
   END
)

==========================================================================

🔹 Valued CASE and NULL in Teradata

This is a very important concept.

In the valued (simple) CASE, comparisons are based only on equality (=).

🔥 Why `WHEN NULL` is Illegal

Example (Invalid):


SELECT Last_Name,
       Department_Number,
       Salary_Amount,
       CASE Department_Number
            WHEN 401 THEN Salary_Amount * 1.1
            WHEN NULL THEN Salary_Amount * 0.85   -- ❌ Invalid
            ELSE NULL
       END
FROM Employee;

❗ How Teradata Interprets This

Valued CASE works like this:


CASE
   WHEN Department_Number = 401 THEN ...
   WHEN Department_Number = NULL THEN ...
END

But:


Department_Number = NULL

is never TRUE.

In SQL:

NULL is unknown
Any comparison with NULL returns UNKNOWN
UNKNOWN ≠ TRUE

So Teradata throws:


3731: The user must use IS NULL or IS NOT NULL to test for NULL values.

🔎 Important Rule

Valued CASE:

✔ Uses equality only
✔ Cannot test NULL
✔ Cannot use IS NULL
✔ Single expression comparison

🔥 Why ELSE Works


ELSE Salary_Amount * 0.85

ELSE is not a condition.

It is simply:

👉 “Return this value if no WHEN matched.”

It does not evaluate TRUE/FALSE.

✅ Correct Way to Handle NULL

You must use searched CASE instead.

Correct Version:


SELECT Last_Name,
       Department_Number,
       Salary_Amount,
       CASE
            WHEN Department_Number = 401 THEN Salary_Amount * 1.1
            WHEN Department_Number IS NULL THEN Salary_Amount * 0.85
            ELSE NULL
       END
FROM Employee;

Now:

NULL is tested properly
No error
Works correctly

🧠 Why SQL Works This Way

Because SQL uses 3-valued logic:

Expression	Result
5 = 5	TRUE
5 = 6	FALSE
5 = NULL	UNKNOWN
NULL = NULL	UNKNOWN

Only TRUE triggers a WHEN clause.

UNKNOWN does not.

🔥 Quick Comparison

Case Type	Can test NULL?	How?
Valued CASE	❌ No	Not possible
Searched CASE	✅ Yes	`IS NULL`

==========================================================================

🔹 `NULLIF` in Teradata

NULLIF is a simple but powerful function used to return NULL when two expressions are equal.

==========================================================================

🔹 `COALESCE` in Teradata

COALESCE returns the first non-NULL value from a list of expressions.

COALESCE(null, 1, 4.090 , 'text') = 1.000

COALESCE(null, 'string', 4.00 , 'text') = string

==========================================================================

🔹 `LIKE` with CHAR and `ESCAPE` in Teradata

LIKE is used for pattern matching in character columns.

It works with:

CHAR
VARCHAR
Any string type

🔥 Basic LIKE Syntax


WHERE column LIKE 'pattern'

🎯 Wildcards

Symbol	Meaning
`%`	Any number of characters (0 or more)
`_`	Exactly one character

✅ Examples

Starts with 'A'


WHERE last_name LIKE 'A%'

Ends with 'son'


WHERE last_name LIKE '%son'

Exactly 5 characters starting with A


WHERE last_name LIKE 'A____'

(1 A + 4 underscores = 5 total)

Safe Practice

Trim if needed:


WHERE TRIM(col) LIKE 'ABC%'

🔥 ESCAPE Clause (Very Important)

Sometimes you want to search for actual % or _ characters.

Example:
Find rows containing 50%.

This will not work correctly:


WHERE discount LIKE '50%'

Because % is wildcard.

✅ Correct Way Using ESCAPE


WHERE discount LIKE '50\%' ESCAPE '\'

Explanation:

\ is escape character
\% means literal percent sign

🔎 Another Example

Search for underscore _:


WHERE code LIKE 'AB\_12' ESCAPE '\'

This matches:


AB_12

Not:


ABX12

🔥 Full Syntax


WHERE column LIKE 'pattern' ESCAPE 'escape_character'

You can use any single character as escape:


WHERE col LIKE 'A#_%' ESCAPE '#'

🚀 Advanced Example


SELECT *
FROM Employee
WHERE last_name LIKE '%\_%' ESCAPE '\';

Finds names containing underscore.

🔥 Performance Note (Teradata)

LIKE 'ABC%' → Can use index
LIKE '%ABC' → Full table scan
LIKE '%ABC%' → Full table scan

Leading wildcard kills index usage.

==========================================================================

🔹 `EXTRACT` Function in Teradata

EXTRACT is used to return specific parts of a DATE or TIME value.

It returns an INTEGER.

It is partially ANSI compliant.

🔥 Syntax


EXTRACT(part FROM date_or_time_value)

Supported Parts

For DATE:

YEAR
MONTH
DAY

For TIME:

HOUR
MINUTE
SECOND

🎯 Example with DATE


SELECT
   EXTRACT(YEAR  FROM DATE '2022-12-20' + 30) AS Yr,
   EXTRACT(MONTH FROM DATE '2022-12-20' - 30) AS Mth,
   EXTRACT(DAY   FROM DATE '2022-12-20' + 30) AS Dy;

Explanation

DATE '2022-12-20' + 30 → Adds 30 days → 2023-01-19
DATE '2022-12-20' - 30 → Subtracts 30 days → 2022-11-20

Result:

Yr	Mth	Dy
2023	11	19

🔥 Example with TIME


SELECT
   EXTRACT(HOUR   FROM TIME '10:20:30') AS Hr,
   EXTRACT(MINUTE FROM TIME '10:20:30') AS Mn,
   EXTRACT(SECOND FROM TIME '10:20:30') AS Scd;

Result:

Hr	Mn	Scd
10	20	30

🔎 Important Notes

1️⃣ Returns Integer


EXTRACT(YEAR FROM CURRENT_DATE)

Returns something like:

(Not a date — just a number)

2️⃣ Cannot Extract Combined Parts

❌ Not allowed:


EXTRACT(YEAR-MONTH FROM date_col)

If you want year-month:


CAST(EXTRACT(YEAR FROM date_col) AS CHAR(4))
|| '-' ||
CAST(EXTRACT(MONTH FROM date_col) AS CHAR(2))

3️⃣ Works with TIMESTAMP Also


EXTRACT(HOUR FROM CURRENT_TIMESTAMP)

🔥 Very Common Real Use Cases

📊 Grouping by Year


SELECT
   EXTRACT(YEAR FROM order_date) AS Order_Year,
   COUNT(*)
FROM orders
GROUP BY 1;

📅 Monthly Reports


SELECT
   EXTRACT(MONTH FROM order_date) AS Order_Month,
   SUM(amount)
FROM orders
GROUP BY 1;

🧠 Filtering by Year


WHERE EXTRACT(YEAR FROM order_date) = 2025

⚠ Performance Warning (Important in Teradata)

This:


WHERE EXTRACT(YEAR FROM order_date) = 2025

May prevent index usage because function is applied on column.

Better version:


WHERE order_date
BETWEEN DATE '2025-01-01'
    AND DATE '2025-12-31'

Much more efficient.

📌 Quick Summary

Feature	Value
Works on	DATE, TIME, TIMESTAMP
Returns	INTEGER
ANSI compliant	Partially
Combined extraction	❌ Not allowed
Performance impact	Yes (if used in WHERE)

==========================================================================

🔹 `ADD_MONTHS` in Teradata

ADD_MONTHS is used to add or subtract months from a date.

It correctly follows the Gregorian calendar, handling:

28, 29, 30, 31 days
Leap years
Month-end adjustments

🔥 Syntax


ADD_MONTHS(date_expression, integer_expression)

date_expression → DATE value
integer_expression → Number of months to add
- Positive → Future
- Negative → Past

🎯 Basic Examples

Current Date


SELECT CURRENT_DATE;

Example output:


2023-01-15

Add 2 Months


SELECT ADD_MONTHS(CURRENT_DATE, 2);


2023-03-15

Add 14 Years


SELECT ADD_MONTHS(CURRENT_DATE, 12*14);


2037-01-15

Subtract 11 Months


SELECT ADD_MONTHS(CURRENT_DATE, -11);


2022-02-15

🔥 Month-End Handling (Very Important)

This is where ADD_MONTHS becomes powerful.

Case 1: 31st July + 2 Months


SELECT ADD_MONTHS('2001-07-31', 2);

Result:


2001-09-30

Why?

September has only 30 days.
Teradata adjusts to last valid day.

Case 2: 31st December + 2 Months


SELECT ADD_MONTHS('2003-12-31', 2);

Result:


2004-02-29

Why?

2004 is a leap year.
February has 29 days.
Adjusted automatically.

Case 3: 31st December + 14 Months


SELECT ADD_MONTHS('2003-12-31', 14);

Calculation:

12 months → 2004-12-31
+2 months → 2005-02-28

Result:


2005-02-28

==========================================================================

When to use join n cte over subquery

Use JOIN when:

You're combining related rows from two or more tables
The logic is simple and fits naturally as a relation
You need to apply filters, aggregations, or transformations across tables

Use CTE (WITH ...) when:

Query is long, nested, or has repeated logic
You need to modularize parts of the logic (like temp views)
You need recursive queries (like hierarchies)
You want reusability and cleaner code

Use subqueries when:

You need a quick, simple filter or scalar value
executed once per rownot good for big data
You don't need to reuse the result
Nesting logic makes sense (e.g., top-N, filtering based on aggregate)

=========================

Why we take all col in staging table as varchar

In data warehousing or ETL (Extract, Transform, Load) processes, staging tables are often used as a temporary storage area to hold raw data from source systems. A common practice is to define all columns in a staging table as VARCHAR (or TEXT / STRING), regardless of their actual data types in the source.

✅ Why We Load Invalid Data into Staging Instead of Failing Immediately

1. 🔍 Audit and Data Traceability

Staging = raw data snapshot — it's meant to capture everything from the source exactly as it is.
If you fail the load, you lose visibility into what failed and why.
Keeping bad records helps analysts and developers trace and correct data issues.

🧠 Think of staging as a black box recorder—you don’t want to delete parts of the crash before the investigation.

2. 🛠 Error Handling During Transformation

Transformation logic (after staging) is where rules and validation are applied.
This is where you:
- Filter out invalid rows
- Log them separately
- Fix them (if possible)
- Notify business/data teams

This gives you control over bad data, rather than letting it crash your entire ETL job.

3. 🚫 Avoid Pipeline Failure

If you stop the load every time there’s a single bad value:
- You risk missing the rest of the valid data
- It leads to operational overhead (manual restarts, reprocessing)
- It’s not scalable for large or dirty datasets

4. ✅ Selective Processing

You can mark bad rows with a flag like is_valid = false or move them to an error table.
Business can later fix the source or send corrected data.

============================================================

GROUP BY AND NULL

When using GROUP BY, NULL values in the group-by column create their own group. Aggregate functions like COUNT(column) will ignore NULL values in that column.

Columns not listed in GROUP BY must be wrapped in aggregate functions.

dense rank will rank with ties no skipping

==============================

Purpose of merge in sql

The purpose of the MERGE statement in SQL is to combine multiple data manipulation operations—INSERT, UPDATE, and DELETE—into a single statement to synchronize a target table with a source table based on matching conditions. This allows you to efficiently modify data by:

Updating rows in the target table that match rows in the source table.
Inserting rows into the target table that exist in the source but not in the target.
Deleting rows from the target table that do not exist in the source.

This makes MERGE particularly useful for tasks like data synchronization, incremental updates, and managing Slowly Changing Dimensions (SCD) in data warehouses .

MERGE is used to synchronize two tables by performing:

UPDATE → when a match is found
INSERT → when no match is found
(DELETE is also possible)

It combines INSERT + UPDATE logic in one statement, which is faster and easier to manage than separate queries.

View and usage?
A view in DBMS (Database Management System) is a virtual table created from one or more underlying tables based on a query. It does not store data physically but contains a definition or projection of data from the original tables. When a view is accessed, the DBMS executes the underlying query to present the data dynamically.
Why use views in DBMS?

Simplify complex queries: Views can encapsulate complex joins, filters, and aggregations, making it easier for users to query data without writing complicated SQL each time .
Enhance security: Views restrict user access by exposing only specific columns or rows of data, preventing users from seeing sensitive information in the base tables .
Data abstraction: Views hide the complexity of the database schema and present data in a user-friendly format, improving usability.
Reuse and consistency: Views save frequently used queries as named objects, promoting query reuse and consistent data presentation

recursive CTE

A recursive CTE (Common Table Expression) in SQL is a query construct that references itself repeatedly to process hierarchical or iterative data until a termination condition is met. It is defined using a WITH clause that includes:

An anchor member: the base query that provides the initial result set.
A recursive member: a query that references the CTE itself to build upon the previous result.

The recursion continues by repeatedly executing the recursive member using the prior iteration's results until no new rows are returned, at which point the recursion stops and all intermediate results are combined.

How recursive CTE works (simplified):

The anchor query runs first, producing the initial rows.
The recursive query runs next, using the results of the anchor query.
The recursive query references the CTE itself, using the previous iteration's output.
Steps 2 and 3 repeat, each time using the last result set, until the recursive query returns no rows.
The final result is the union of all these intermediate result sets.

rollback commit savepoint

they’re core concepts in transaction control in SQL. These are especially important for data integrity in OLTP systems and during complex ETL processes.

COMMIT

Purpose: Permanently saves all changes made during the current transaction to the database.
Once a transaction is committed, the changes become visible to other users and cannot be undone by rollback.
It marks the successful end of a transaction.

ROLLBACK

Purpose: Undoes all changes made in the current transaction, reverting the database to the state before the transaction began.
It is used to cancel a transaction when an error occurs or when the changes are not desired.
After rollback, none of the changes made in the transaction are saved.

SAVEPOINT

Purpose: Creates a named intermediate point within a transaction to which you can later roll back without undoing the entire transaction. SAVEPOINT allows you to mark a point within a transaction, so you can roll back partially to that point instead of undoing the whole transaction.
Useful for complex transactions where you might want to partially undo some operations but keep others.
Multiple savepoints can be defined in a single transaction.

Partitioned Table?

A partitioned table in a database is a large table that is logically divided into smaller, more manageable pieces called partitions, each of which stores a subset of the data based on a partitioning key or criteria. Although it appears as a single table to users and applications, the data physically resides in multiple partitions, which can be managed and accessed independently.
A partitioned table is a large table that is divided into smaller, more manageable pieces called partitions based on the values of one or more columns.
Each partition behaves like a mini-table, but queries still treat the whole table as one.

What is Database Sharding
Database sharding is a technique of horizontally partitioning a large database into smaller, more manageable pieces called shards, where each shard holds a subset of the data and is stored on a separate server or node. This allows the database workload to be distributed across multiple machines, enabling better scalability and performance.
Each shard is a separate database instance that contains a subset of the data — usually split by a key like customer_id, region, or tenant_id. 💡 Think of it like splitting a large book into chapters and storing them across multiple shelves (servers).

Each shard is a separate database instance that contains a subset of the data — usually split by a key like customer_id, region, or tenant_id.

💡 Think of it like splitting a large book into chapters and storing them across multiple shelves (servers).UPDATE table_name
SET column_name = value
WHERE condition;

Window function

A Window Function performs a calculation across a set of rows (the “window”) that are related to the current row — without aggregating the result into one row.

📌 Unlike GROUP BY, window functions preserve each row in the result.

Common Table Expression (CTE)

A CTE (Common Table Expression) is a temporary, named result set you define using WITH, which you can reference like a table in your main query.

running total

Indexing

Indexing in SQL is a technique used to speed up the retrieval of rows from a database table by creating a data structure that allows the database engine to find data quickly without scanning the entire table. It works like an index in a book, pointing directly to the location of the data.

How indexing works:

An index is created on one or more columns of a table.
The database builds a separate data structure (often a B-tree or similar) that stores the indexed column values and pointers to the corresponding rows.
When a query searches for values in the indexed column(s), the database uses the index to quickly locate the rows instead of scanning the whole table.

Types of indexes:

Clustered index: Physically sorts and stores data rows in order based on the indexed column. Only one per table.
Non-clustered index: Creates a separate structure to point to data rows without altering physical order. Multiple allowed.
Unique index: Ensures indexed columns have unique values.
Composite (multi-column) index: Indexes multiple columns together to optimize queries filtering on those columns.

An index is a data structure (like a sorted list or tree) that allows the database to quickly locate and retrieve rows from a table without scanning the entire table.

🔍 Think of an index like the index of a book — it helps you find topics quickly instead of reading every page.

==========================================================================

asked ones

-->if
a table has 1,1,1,1

and b has 1,1,1
tell count of all join, all are 12

==========================================================================

-->dense rank , rank , row number take no arguent

==========================================================================

-->PIVOT AS CASE

Both PIVOT and CASE WHEN + SUM() are used to transform rows into columns or summarize data by category, but they differ in syntax, flexibility, and use cases.

SELECT

Product,

SUM(CASE WHEN Region = 'East' THEN Sales ELSE 0 END) AS East_Sales,

SUM(CASE WHEN Region = 'West' THEN Sales ELSE 0 END) AS West_Sales

FROM SalesTable

GROUP BY Product;

==========================================================================

There are 9 sample records in the table.

======================================================================

What you're trying to write (sum two pivot columns [A] + [B] into a new column) is not part of the PIVOT syntax itself.
You need to pivot first, and then sum the resulting columns in a SELECT statement outside the pivot.

======================================================================

Employee visiting office floor

======================================================================

"new vs. repeat customer" problem

======================================================================

calculating matches played, wins, and losses per team

======================================================================

second highest salary

Cumulative Sum of Sales per Month

SELECT

order_month,

SUM(sales_amount) OVER (ORDER BY order_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total

FROM sales_data;

======================================================================

======================================================================

Specific to Teradata optimization (e.g., primary index, collect stats)

======================================================================

Recursive CTE

Work with hierarchical data (e.g., employee-manager, category-parent category).
Example: Find all employees reporting (directly or indirectly) to a specific manager.

https://www.youtube.com/watch?v=LZGaRcDxj8I

1st tym , only base exec then for all recursive part

here sec cte is curr but v need prev run cte emp id

========================================================================

Window

The RANK function is also a sub-part of window functions. The following points should be remembered while using this function: It always works with the OVER() clause. It assigns a rank to each row based on the ORDER BY clause.

In the OVER() clause, you specify the PARTITION BY and ORDER BY clauses. For ranking functions, the ORDER BY clause, including the name(s) of the column(s) or an expression, is mandatory. Before the ORDER BY clause is the optional PARTITION BY clause, which includes the name(s) of the column(s) or an expression.

ALL WINDOW NEED OVER

rank() over (order by marks desc) – skip number

NTILE() function in SQL Server is a window function that distributes rows of an ordered partition into a pre-defined number of roughly equal groups.

What NTILE() Does:

Splits ordered rows into N approximately equal-sized buckets.

Rows are assigned bucket numbers starting from 1 to N.

Useful for creating quartiles, deciles, percentiles, or balanced groups.

LAG , LEAD

ParameterDescriptions
expressionIt is a column name or any built-in function whose value return by the function.
offsetIt contains the number of rows succeeding from the current row. It should be a positive integer value. If it is zero, the function evaluates the result for the current row. If we omit this, the function uses 1 by default.=next
default_valueIt is a value that will return when we have no subsequent row from the current row. If we omit this, the function returns the null value.
OVEROVER It is responsible for partitioning rows into groups. If it is empty, the function performs an operation using all rows.
PARTITION BYIt split the rows in the result set into partitions to which a function is applied. If we have not specified this clause, all rows treated as a single row in the result set.
ORDER BYIt determines the sequence of rows in the partitions before the function is applied.

Parameter	Descriptions
expression	It is a column name or any built-in function whose value return by the function.
offset	It contains the number of rows succeeding from the current row. It should be a positive integer value. If it is zero, the function evaluates the result for the current row. If we omit this, the function uses 1 by default.=next
default_value	It is a value that will return when we have no subsequent row from the current row. If we omit this, the function returns the null value.
OVER	OVER It is responsible for partitioning rows into groups. If it is empty, the function performs an operation using all rows.
PARTITION BY	It split the rows in the result set into partitions to which a function is applied. If we have not specified this clause, all rows treated as a single row in the result set.
ORDER BY	It determines the sequence of rows in the partitions before the function is applied.

 
partition by will reset, null if no next entry 

Q1 Number of row in join

--àNull wont join with another null in tables

INNER JOIN -

FULL OUTER - 

Q2.

if multiple and want one use row number
join and sub query both r interchangeable 

dense rank , rank row number - careful wen to use wat
for like,or str function us lower / upper

IF U USE ONE UPDATE AT A TYM , AL BECOMES SAME GENDER N THEN U WONT B ABLE TO CHG

Q3.

new ve repeat cust