SQL and QUERY

 Execution Order

1. FROM

2. ON

3. WHERE

4. GROUP BY

5. HAVING

6. SELECT

7. QUALIFY

8. ORDER BY

=====================================================================

🔹 BETWEEN Clause with String in Teradata

Yes ✅ — BETWEEN works with strings in Teradata.

But important point:

👉 It works based on lexicographical (dictionary) order,
NOT numeric order.

✅ Basic Syntax

column_name BETWEEN 'start_value' AND 'end_value'

It is equivalent to:

column_name >= 'start_value'
AND column_name <= 'end_value'

🔹 Example 1 – Simple String Range

SELECT name
FROM employee
WHERE name BETWEEN 'A' AND 'M';

What it does:

Returns all names starting from:

  • A

  • B

  • C

  • M

Because alphabetically:

'A' < 'B' < 'C' < ... < 'M'

🔹 Example 2 – Full Word Comparison

WHERE name BETWEEN 'John' AND 'Mary';

This includes:

John
Johnny
Jordan
...
Mary

But NOT:

Adam (comes before John)
Zack (comes after Mary)

🔥 Important: It Compares Character by Character

Comparison happens like this:

'Apple' < 'Banana'

Because:

  • A < B

⚠️ Case Sensitivity

In Teradata:

  • If column is NOT CASESPECIFIC → case insensitive

  • If CASESPECIFIC → uppercase and lowercase are different

Example:

'A' < 'a'

ASCII values differ.

🔹 Example 3 – Date Stored as VARCHAR

WHERE date_col BETWEEN '2024-01-01' AND '2024-12-31';

Works correctly only if format is YYYY-MM-DD

❌ If format is:

'01-12-2024'

Then BETWEEN will not work logically.


🚨 Problem

SELECT *

FROM Employee

WHERE Department_Number BETWEEN 567 AND 401;

This can never be TRUE.

👉 Result = No rows returned

BETWEEN does NOT automatically swap values.


🔥 Interview Trick Question

❓ What happens here?

WHERE name BETWEEN 'A' AND 'B';

It returns:

  • All names starting with A

  • All names starting with B

Because:

'A' <= name <= 'B'

But it will NOT return:

Brian (comes after 'B')

Because:

'Brian' > 'B'

✅ 1️⃣ If Column is NOT CASESPECIFIC (Default in Teradata)

👉 Comparison is case-insensitive

Example:

SELECT name
FROM employee
WHERE name BETWEEN 'a' AND 'm';

This will return:

Adam
alex
Brian
maria

Even though cases differ, Teradata treats 'A' and 'a' as same (unless defined otherwise).

❗ 2️⃣ If Column is CASESPECIFIC

If table is created like:

CREATE TABLE employee
(
name VARCHAR(50) CASESPECIFIC
);

👉 Then comparison is case-sensitive

Now ASCII order matters:

'A' (65)
'Z' (90)
'a' (97)
'z' (122)

So:

WHERE name BETWEEN 'A' AND 'm';

Range becomes:

'A' (65) → 'm' (109)

This includes:

Apple
Zoo
abc
john

But may exclude unexpected values depending on ASCII order.

🔎 Important ASCII Order Impact

All UPPERCASE letters come before lowercase letters.

So:

'Z' < 'a'

Because:

90 < 97

🔥 Example Problem

Data:

Apple
banana
Mango
zebra

Query:

WHERE name BETWEEN 'A' AND 'M';

CASESPECIFIC column result:

Returns:

Apple
Mango

But NOT:

banana (because lowercase 'b' > 'M')

✅ Safe Way to Avoid Case Issues

Use UPPER() or LOWER():

WHERE UPPER(name) BETWEEN 'A' AND 'M';

Now it becomes consistent and predictable.

⚡ Best Practice (Interview Recommended)

If working with string ranges:

✔ Normalize case first
✔ Or confirm column collation
✔ Avoid relying on ASCII behavior blindly

📌 Summary

Column TypeCase Sensitive?BETWEEN Behavior
Default (NOT CASESPECIFIC)❌ NoCase ignored
CASESPECIFIC✅ YesASCII order applied

====================================================================

🔹 DATE Arithmetic in Teradata

In Teradata, DATE is internally stored as an integer (number of days), which allows direct arithmetic operations.

That’s why you can:

  • Add days to a date

  • Subtract dates

  • Compare date ranges

  • Perform logical filtering

✅ 1️⃣ Add Days to Current Date

SELECT CURRENT_DATE + 40;

👉 Returns date 40 days from today
✔ Output datatype → DATE

✅ 2️⃣ Add Days to a Specific Date

SELECT DATE '1999-01-01' + 300;

👉 Always use ANSI literal format:

DATE 'YYYY-MM-DD'

✔ Returns DATE

✅ 3️⃣ Subtract Two Dates

SELECT CURRENT_DATE - DATE '2007-11-30';

👉 Returns INTEGER
(Number of days between dates)

⚠ Important:

  • DATE − DATE → INTEGER

  • DATE + INTEGER → DATE

✅ 4️⃣ Days Worked by Employee

SELECT CURRENT_DATE - Hire_Date
FROM Employee;

👉 Returns number of days worked.

If you want years:

SELECT (CURRENT_DATE - Hire_Date) / 365.25
FROM Employee;

✔ Result → Decimal years

✅ 5️⃣ Employees Working More Than 20 Years

SELECT *
FROM Employee
WHERE CURRENT_DATE - Hire_Date > 365.25 * 20;

👉 Logical comparison using integer result.

✅ 6️⃣ Employees Hired in Last 100 Days

SELECT *
FROM Employee
WHERE Hire_Date > CURRENT_DATE - 100;

👉 Very common interview question.

🔥 Important Behavior Summary

OperationResult Type
DATE + INTEGERDATE
DATE - INTEGERDATE
DATE - DATEINTEGER
INTEGER + DATEDATE

🔎 Month / Year Arithmetic (Better Approach)

Instead of hardcoding 365.25:

Use ADD_MONTHS():

SELECT ADD_MONTHS(CURRENT_DATE, 12);

👉 Adds 12 months safely.

Find employees with > 20 years:

SELECT *
FROM Employee
WHERE Hire_Date <= ADD_MONTHS(CURRENT_DATE, -240);

✔ More accurate than 365.25 * 20
✔ Handles leap years correctly

⚠ Common Mistakes

❌ Wrong date literal:

'01-01-1999'

✔ Correct:

DATE '1999-01-01'

🎯 Interview-Level Tricky Questions

❓ What is result datatype of:

CURRENT_DATE - Hire_Date

👉 INTEGER

❓ Why use ADD_MONTHS instead of multiplying days?

👉 Because leap years make day calculation inaccurate.

💡 Real Data Engineering Use Cases

  • SLA calculation (within 30 days)

  • Late payment detection

  • Customer aging buckets

  • Retention period checks

  • Incremental data loads (last N days)

==================================================================

🔹 CAST Function in Teradata

✅ General Syntax

CAST(expression AS data_type)

Used to convert data from one datatype to another.

🔹 1️⃣ Character Conversions

Example:

SELECT CAST(Last_Name AS CHAR(10))
FROM Employee;

👉 Converts string into fixed-length CHAR(10)
👉 If shorter → padded with spaces
👉 If longer → truncated

Example:

SELECT CAST(1.777 AS CHAR(10));

👉 Result:

'1.777 '

✔ Left-justified
✔ Space padded

Example (Truncation):

SELECT CAST(1.77 AS CHAR(2));

👉 Result:

'1.'

✔ Truncated to 2 characters
✔ No rounding here — just string cut

🔥 Important: Invalid Conversion

SELECT CAST(Last_Name AS INTEGER)
FROM Employee;

👉 ❌ This will FAIL
Because non-numeric characters cannot convert to INTEGER.

Works only if column contains valid numeric strings like:

'123'

Otherwise:

Invalid character in numeric conversion

This fails in both:

  • ANSI mode

  • Teradata mode

🔹 2️⃣ Numeric to Integer

SELECT CAST(3.7777 AS INTEGER);

👉 Result:

3

✔ Decimal portion is truncated
✔ No rounding

🔹 3️⃣ Decimal to Decimal (Precision Change)

When reducing precision or scale:

Teradata uses:
👉 Round to Nearest (Banker's Rounding)

Meaning:

  • If digit is 5 → round to nearest EVEN number

Examples

SELECT CAST(6.74 AS DEC(2,1));

Result:

6.7

(Regular rounding down)

SELECT CAST(6.75 AS DEC(2,1));

Result:

6.8

Why?

  • 7.5 → nearest even is 8

SELECT CAST(6.85 AS DEC(2,1));

Result:

6.8

Why?

  • 8.5 → nearest even is 8 (already even)

🔥 Banker's Rounding Explained

ValueRounded (1 decimal)Reason
6.746.7< 5
6.756.88 is even
6.856.88 is even
6.957.00 is even

This avoids statistical bias in financial systems.

🔹 4️⃣ Arithmetic + CAST

SELECT CAST(Budget_Amount * 1.375 AS DEC(15,3))
FROM Department;

👉 First multiplication
👉 Then result rounded to 3 decimal places

🔹 Important Differences

ConversionBehavior
Decimal → IntegerTruncates
Decimal → DecimalRounds (Banker's rounding)
Char → Char smaller sizeTruncates
Char → NumericError if invalid
Numeric → CharLeft-justified

==============================================

🔹 Rounding Rules in Teradata

Teradata supports two rounding methods, controlled by:

DBSControl → RoundHalfWayMagUp

This affects rounding when reducing decimal precision using:

  • CAST

  • Implicit decimal conversion

  • Teradata extended numeric formatting

✅ Method 1: Traditional Rounding

🔹 RoundHalfWayMagUp = TRUE

Rule:

If digit to the right is ≥ 5 → round up
If digit is < 5 → round down

This is the common school method.

Examples (DEC(2,1))

SELECT CAST(1.34999 AS DEC(2,1)); → 1.3
SELECT CAST(1.35000 AS DEC(2,1)); → 1.4
SELECT CAST(1.35001 AS DEC(2,1)); → 1.4
SELECT CAST(1.35999 AS DEC(2,1)); → 1.4
SELECT CAST(1.36000 AS DEC(2,1)); → 1.4
SELECT CAST(1.45000 AS DEC(2,1)); → 1.5
SELECT CAST(1.45999 AS DEC(2,1)); → 1.5
SELECT CAST(1.46000 AS DEC(2,1)); → 1.5

✔ Always rounds up when digit is 5 or more.

✅ Method 2: Banker's Rounding (Default)

🔹 RoundHalfWayMagUp = FALSE

Also called:
👉 Round Half to Even
👉 Statistically unbiased rounding

Rule:

  • 5 → round up

  • < 5 → round down

  • = 5 → round to nearest EVEN digit

This reduces cumulative rounding bias in financial systems.

Examples (DEC(2,1))

SELECT CAST(1.34999 AS DEC(2,1)); → 1.3

(4 < 5 → round down)

SELECT CAST(1.35000 AS DEC(2,1)); → 1.4

(5 → check rounding digit = 3 → 4 is even → round up)

SELECT CAST(1.45000 AS DEC(2,1)); → 1.4

Why?

  • Digit being rounded = 4

  • Next digit = 5

  • 4 is EVEN → stay 4

  • Result → 1.4

SELECT CAST(1.55000 AS DEC(2,1)); → 1.6

Why?

  • Rounding digit = 5

  • 5 is ODD

  • Next digit = 5

  • Round to EVEN → 6

🔥 Key Difference Example

ValueTraditionalBanker's (Default)
1.451.51.4
1.551.61.6
2.252.32.2

🔎 Why Banker's Rounding?

If you always round 5 upward:

1.5 → 2
2.5 → 3
3.5 → 4

Numbers systematically increase.

Banker's rounding balances:

1.5 → 2
2.5 → 2
3.5 → 4
4.5 → 4

✔ Statistically unbiased
✔ Preferred for financial calculations

🔹 Important Notes

  • Applies when reducing scale (DEC(10,4) → DEC(10,2))

  • Applies in CAST

  • Applies in implicit conversions

  • Does NOT apply when casting decimal → integer (that truncates)

Example:

SELECT CAST(3.777 AS INTEGER);

→ 3 (Truncation, not rounding)

💡 Real Data Engineering Impact

If you are:

  • Calculating financial totals

  • Aggregating currency data

  • Performing revenue adjustments

  • Converting precision in reporting layers

Rounding method can change totals significantly in large datasets.

=======================================================================

🔹 String Concatenation in Teradata

In Teradata, string concatenation is done using:

||

This operator joins multiple values into a single VARCHAR result.

✅ Basic Examples

1️⃣ Combine Last and First Name

SELECT Last_Name || First_Name
FROM Employee;

Result datatype → VARCHAR

2️⃣ Add Separator

SELECT Last_Name || ', ' || First_Name
FROM Employee;

Example output:

Brown, Adam
Smith, John

3️⃣ Concatenation with Expression

SELECT First_Name || ' ' || Last_Name || ' is ' ||
(CURRENT_DATE - Birthdate) / 365
FROM Employee;

👉 Numeric values automatically converted to VARCHAR
👉 Result is always VARCHAR

4️⃣ Concatenating Numbers and Strings

SELECT 123 || 'ABC' || 456;

Result:

123ABC456

✔ Numbers are implicitly converted to character.

🔥 Resulting Data Type

All concatenation results → VARCHAR

Even if inputs are:

  • CHAR

  • INTEGER

  • DATE

  • DECIMAL

Teradata converts everything to VARCHAR.

🔹 Using CAST with Concatenation

You can explicitly define output length:

SELECT CAST(Last_Name || ', ' || First_Name AS CHAR(100))
FROM Employee;

👉 Result becomes fixed-length CHAR(100)

⚠ Truncation Example

SELECT CAST(123 || 'ABC' || 456 AS CHAR(3));

Result:

123

✔ Truncated to 3 characters
✔ No rounding, simple cut-off

🔹 Using SUBSTRING with Concatenation

Example:

Display first initial + last name

SELECT
SUBSTRING(first_name FROM 1 FOR 1) || '. ' || last_name
AS employee_name
FROM employee
WHERE department_number = 403;

Output:

A. Brown
P. Hopkins
D. Lombardo
A. Villegas
J. Charles

🔎 Important Behavior

1️⃣ NULL Handling

SELECT Last_Name || First_Name

If either column is NULL → entire result is NULL.

To avoid this:

SELECT COALESCE(Last_Name,'') || COALESCE(First_Name,'')

2️⃣ Performance Tip (Interview Level)

Heavy concatenation:

  • Prevents index usage

  • Should not be used in WHERE clause filters

❌ Avoid:

WHERE First_Name || Last_Name = 'JohnSmith'

✔ Better:

WHERE First_Name = 'John'
AND Last_Name = 'Smith'

🔥 Real Data Engineering Use Cases

  • Creating display columns

  • Generating full names

  • Building export files

  • Creating composite keys (careful!)

  • Formatting audit messages

==================================================================

🔹 FORMAT in Teradata

FORMAT is a Teradata extension (not ANSI standard).

It changes how a value is displayed,
NOT how it is stored internally.

✅ Basic Character Formatting

SELECT Last_Name (FORMAT 'X(31)')
FROM Employee;

✔ Displays value in 31-character width
✔ Maximum width = 255

Equivalent to:

FORMAT 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

But cleaner way:

FORMAT 'X(255)'

🔹 Important: FORMAT Does NOT Change Data Type

Example:

SELECT Salary (FORMAT '$999,999.99')
FROM Employee;

✔ Still numeric internally
✔ Only output appearance changes

🔹 Common FORMAT Symbols

SymbolMeaning
9Show digit (including leading zeros)
ZSuppress leading zeros
$Dollar sign
%Percent sign
,Comma separator
.Decimal point
-Negative sign
XCharacter position
BBlank space
/Slash

🔥 Examples

1️⃣ Numeric with Leading Zeros

SELECT 123 (FORMAT '99999');

Result:

00123

2️⃣ Suppress Leading Zeros

SELECT 123 (FORMAT 'ZZZZZ');

Result:

123

✔ Leading zeros suppressed
✔ Spaces instead

3️⃣ Currency Format

SELECT 12345.6 (FORMAT '$ZZZ,ZZ9.99');

Result:

$12,345.60

4️⃣ Percent Display

SELECT .85 (FORMAT 'ZZ9%');

Result:

85%

5️⃣ Dash for Negative Numbers

SELECT -123 (FORMAT 'ZZZ9-');

Result:

123-

6️⃣ Character Formatting

SELECT 'ABC' (FORMAT 'X(10)');

Result:

ABC

(Padded with spaces to 10)

🔎 FORMAT vs CAST

FORMATCAST
Changes display onlyChanges datatype
Teradata specificANSI standard
Used mostly in reportsUsed in data transformation

Example difference:

SELECT CAST(123 AS CHAR(10));

→ Actually becomes CHAR datatype

SELECT 123 (FORMAT '99999');

→ Still numeric

⚠ Important Notes

1️⃣ FORMAT is mainly for:

  • Reports

  • BTEQ output

  • User display

2️⃣ Not recommended in:

  • ETL transformations

  • Production joins

  • WHERE conditions

3️⃣ FORMAT does not affect sorting logic
Sorting still uses original numeric value.

🎯 Interview Questions

  1. Difference between FORMAT and CAST?

  2. Does FORMAT change datatype?

  3. What is maximum X(n) size?

  4. What is Z vs 9 difference?

  5. Why is FORMAT not ANSI compliant?

==========================================================================

🔹 Aggregating Groups (GROUP BY) in Teradata

GROUP BY is used to:

✔ Define logical groups
✔ Perform aggregate calculations per group
✔ Return one row per group

🔹 Key Concepts

✅ Non-Aggregate Columns

Columns not inside aggregate functions (SUM, COUNT, etc.)

✅ Aggregate Columns

Columns inside:

  • SUM()

  • COUNT()

  • AVG()

  • MIN()

  • MAX()

🔥 Core GROUP BY Rules

1️⃣ Every projected non-aggregate must appear in GROUP BY

✔ Correct:

SELECT department_number,
SUM(salary_amount)
FROM employee
GROUP BY department_number;

❌ Incorrect:

SELECT department_number,
manager_employee_number,
SUM(salary_amount)
FROM employee
GROUP BY department_number;

Error:

3504 Selected non-aggregate values must be part of the associated group.

2️⃣ GROUP BY can use column name or position

By name:

GROUP BY department_number;

By position:

GROUP BY 1;

(1 = first column in SELECT list)

3️⃣ NULL Forms a Group

Even though NULL values are ignored inside SUM/AVG,
they still create a group:

Example result:

department_numberSum(salary_amount)
403233000.00
40277000.00
301116400.00
999100000.00
NULL129950.00
401245575.00
501200125.00

👉 NULL department employees are grouped together.

4️⃣ GROUP BY Does NOT Sort

If you need sorted result:

ORDER BY department_number;

Grouping ≠ Ordering.

5️⃣ You Can Group By Columns Not Projected

Example:

SELECT SUM(salary_amount)
FROM employee
GROUP BY department_number,
manager_employee_number;

Here:

  • We do NOT display department or manager

  • But grouping is based on both

👉 More grouping columns = more granular groups
👉 More groups = more rows returned

🔎 Understanding Granularity

If grouped by:

GROUP BY department_number;

→ One row per department

If grouped by:

GROUP BY department_number,
manager_employee_number;

→ One row per department-manager combination

Granularity increases.

🔹 Logical Execution Order

  1. FROM

  2. WHERE

  3. GROUP BY

  4. HAVING

  5. SELECT

  6. ORDER BY

Aggregation happens before SELECT output.

🔹 HAVING (Filtering Groups)

SELECT department_number,
SUM(salary_amount) AS total_sal
FROM employee
GROUP BY department_number
HAVING SUM(salary_amount) > 200000;

✔ WHERE filters rows
✔ HAVING filters groups

🔥 Why Aggregation Removes Detail Rows

Original table:

  • Many rows per department

After GROUP BY:

  • One summary row per department

Detail rows are collapsed into summary rows.

==========================================================================

🔹 Adding Grouping Columns in Teradata

When you add more columns to a GROUP BY, you:

✔ Increase granularity
✔ Increase number of groups
✔ Increase number of result rows

At the extreme:
If grouping columns form a unique combination,
then aggregation = detail rows (no real collapsing happens).

SELECT
Manager_Employee_Number AS Mgr,
Department_Number AS Dept,
Job_Code AS JCd,
SUM(Salary_Amount) AS SumSal
FROM employee_sales.employee
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;

What’s happening?

Grouping is done on:

  • Manager

  • Department

  • Job Code

So each unique combination gets its own group.

🔹 Why Rows Increase?

If you group only by:

GROUP BY Department_Number;

→ Few rows (1 per department)

If you group by:

GROUP BY Manager_Employee_Number,
Department_Number,
Job_Code;

→ Many more rows
Because combination is more specific.

🔥 When GROUP BY Becomes Unique

If this combination:

Manager + Department + Job_Code

is unique per employee,

Then:

SUM(Salary_Amount)

is simply the employee's salary.

👉 No real aggregation effect
👉 Detail = Aggregated result

🔹 Valid GROUP BY Syntax Variations

For this query, all are valid:

By Position

GROUP BY 1, 2, 3
GROUP BY 3, 1, 2

(Order doesn’t matter logically)

By Alias Name

GROUP BY Mgr, Dept, JCd

Mixed Form

GROUP BY 1, Department_Number, JCd

Teradata allows:

  • Column names

  • Aliases

  • Positional references

  • Mixture of them

🔎 Important Concept: Granularity

GROUP BY ColumnsResult
DeptDept-level totals
Dept, ManagerDept-manager totals
Dept, Manager, JobVery detailed totals
Unique Key ColumnsSame as detail rows

🔥 NULL Behavior

NULL values still create groups:

Example:

Dept = NULL

All NULL dept rows grouped together.

🔹 GROUP BY vs DISTINCT

If grouping columns form a unique combination:

SELECT DISTINCT Manager,
Department,
Job_Code
FROM employee;

May produce similar row count as:

GROUP BY Manager,
Department,
Job_Code;

But:

  • DISTINCT removes duplicates

  • GROUP BY aggregates

==========================================================================

🔹 CASE Overview in Teradata

The CASE expression is used to evaluate conditions and return derived values in the result set.

It is:

  • ANSI compliant

  • Used in SELECT, WHERE, ORDER BY, HAVING

  • Very common in reporting & analytics

  • Helpful for data enrichment and derived columns

🔥 Two Forms of CASE

1️⃣ Valued Form (Simple CASE)

  • Based on equality

  • Compares one expression against multiple values

  • Cannot test ranges or complex conditions

  • Cannot test NULL using =

2️⃣ Searched Form

  • Uses full boolean conditions

  • Can use:

    • =

    • <>

    • >, <

    • BETWEEN

    • IN

    • IS NULL

  • More flexible

  • Most commonly used in real projects

🔹 Valued Form Explained

Example:

SELECT Last_Name,
First_Name,
CASE USER
WHEN 'sql01' THEN Department_Number
WHEN 'sql02' THEN Manager_Employee_Number
END
FROM Employee
WHERE Job_Code = 512101;

What this means internally:

CASE
WHEN USER = 'sql01' THEN Department_Number
WHEN USER = 'sql02' THEN Manager_Employee_Number
END

So:

  • USER is evaluated

  • Compared using equality

  • First match stops evaluation

  • If nothing matches → returns NULL (default)

🔎 Important Rules

✔ Each WHEN tested sequentially
✔ First TRUE stops evaluation
✔ ELSE executes if nothing matches
✔ ELSE NULL is default
✔ Must end with END

⚠ NULL Important Rule

This is invalid in valued form:

WHEN NULL THEN ...

Because:

USER = NULL

is never TRUE (NULL comparison is unknown)

To check NULL → use searched CASE:

CASE
WHEN USER IS NULL THEN 'Unknown'
END

🔹 With Explicit ELSE

SELECT Last_Name,
First_Name,
CASE USER
WHEN 'sql01' THEN Department_Number
WHEN 'sql02' THEN Manager_Employee_Number
ELSE NULL
END
FROM Employee
WHERE Job_Code = 512101;

ELSE NULL is optional because it’s default.

🔹 Why Output Shows NULL ( ? )

Example result:

Last_NameFirst_NameCASE Result
RatzlaffLarryNULL
WilsonEdwardNULL
RabbitPeterNULL

Why?

Because:

  • Current USER is neither 'sql01' nor 'sql02'

  • So CASE returned NULL

🔥 Default Column Heading

If you don’t alias:

CASE USER
...
END

Teradata gives default column name like:

CASE_Expression

Better practice:

CASE USER
...
END AS Derived_Value

🆚 Valued vs Searched Quick Comparison

FeatureValuedSearched
Equality only
Multiple columns
Range checks
IS NULL
Most used in projects

🔥 Real Data Engineering Use Cases

You’ll use CASE to:

  • Create flags (High Salary / Low Salary)

  • Categorize departments

  • Derive KPI bands

  • Handle NULL values

  • Build reporting dimensions

  • Conditional aggregation

Example:

SUM(
CASE
WHEN Salary_Amount > 100000 THEN Salary_Amount
ELSE 0
END
)

==========================================================================

🔹 Valued CASE and NULL in Teradata

This is a very important concept.

In the valued (simple) CASE, comparisons are based only on equality (=).

🔥 Why WHEN NULL is Illegal

Example (Invalid):

SELECT Last_Name,
Department_Number,
Salary_Amount,
CASE Department_Number
WHEN 401 THEN Salary_Amount * 1.1
WHEN NULL THEN Salary_Amount * 0.85 -- ❌ Invalid
ELSE NULL
END
FROM Employee;

❗ How Teradata Interprets This

Valued CASE works like this:

CASE
WHEN Department_Number = 401 THEN ...
WHEN Department_Number = NULL THEN ...
END

But:

Department_Number = NULL

is never TRUE.

In SQL:

  • NULL is unknown

  • Any comparison with NULL returns UNKNOWN

  • UNKNOWN ≠ TRUE

So Teradata throws:

3731: The user must use IS NULL or IS NOT NULL to test for NULL values.

🔎 Important Rule

Valued CASE:

✔ Uses equality only
✔ Cannot test NULL
✔ Cannot use IS NULL
✔ Single expression comparison

🔥 Why ELSE Works

ELSE Salary_Amount * 0.85

ELSE is not a condition.

It is simply:

👉 “Return this value if no WHEN matched.”

It does not evaluate TRUE/FALSE.

✅ Correct Way to Handle NULL

You must use searched CASE instead.

Correct Version:

SELECT Last_Name,
Department_Number,
Salary_Amount,
CASE
WHEN Department_Number = 401 THEN Salary_Amount * 1.1
WHEN Department_Number IS NULL THEN Salary_Amount * 0.85
ELSE NULL
END
FROM Employee;

Now:

  • NULL is tested properly

  • No error

  • Works correctly

🧠 Why SQL Works This Way

Because SQL uses 3-valued logic:

ExpressionResult
5 = 5TRUE
5 = 6FALSE
5 = NULLUNKNOWN
NULL = NULLUNKNOWN

Only TRUE triggers a WHEN clause.

UNKNOWN does not.


🔥 Quick Comparison

Case TypeCan test NULL?How?
Valued CASE❌ NoNot possible
Searched CASE✅ YesIS NULL

==========================================================================

🔹 NULLIF in Teradata

NULLIF is a simple but powerful function used to return NULL when two expressions are equal.

==========================================================================

🔹 COALESCE in Teradata

COALESCE returns the first non-NULL value from a list of expressions.

COALESCE(null, 1, 4.090 , 'text')   = 1.000

COALESCE(null, 'string', 4.00 , 'text')   = string

==========================================================================

🔹 LIKE with CHAR and ESCAPE in Teradata

LIKE is used for pattern matching in character columns.

It works with:

  • CHAR

  • VARCHAR

  • Any string type

🔥 Basic LIKE Syntax

WHERE column LIKE 'pattern'

🎯 Wildcards

SymbolMeaning
%Any number of characters (0 or more)
_Exactly one character

✅ Examples

Starts with 'A'

WHERE last_name LIKE 'A%'

Ends with 'son'

WHERE last_name LIKE '%son'

Exactly 5 characters starting with A

WHERE last_name LIKE 'A____'

(1 A + 4 underscores = 5 total)


Safe Practice

Trim if needed:

WHERE TRIM(col) LIKE 'ABC%'


🔥 ESCAPE Clause (Very Important)

Sometimes you want to search for actual % or _ characters.

Example:
Find rows containing 50%.

This will not work correctly:

WHERE discount LIKE '50%'

Because % is wildcard.

✅ Correct Way Using ESCAPE

WHERE discount LIKE '50\%' ESCAPE '\'

Explanation:

  • \ is escape character

  • \% means literal percent sign

🔎 Another Example

Search for underscore _:

WHERE code LIKE 'AB\_12' ESCAPE '\'

This matches:

AB_12

Not:

ABX12

🔥 Full Syntax

WHERE column LIKE 'pattern' ESCAPE 'escape_character'

You can use any single character as escape:

WHERE col LIKE 'A#_%' ESCAPE '#'

🚀 Advanced Example
SELECT *
FROM Employee
WHERE last_name LIKE '%\_%' ESCAPE '\';

Finds names containing underscore.

🔥 Performance Note (Teradata)

  • LIKE 'ABC%' → Can use index

  • LIKE '%ABC' → Full table scan

  • LIKE '%ABC%' → Full table scan

Leading wildcard kills index usage.

==========================================================================

🔹 EXTRACT Function in Teradata

EXTRACT is used to return specific parts of a DATE or TIME value.

It returns an INTEGER.

It is partially ANSI compliant.

🔥 Syntax

EXTRACT(part FROM date_or_time_value)

Supported Parts

For DATE:

  • YEAR

  • MONTH

  • DAY

For TIME:

  • HOUR

  • MINUTE

  • SECOND

🎯 Example with DATE

SELECT
EXTRACT(YEAR FROM DATE '2022-12-20' + 30) AS Yr,
EXTRACT(MONTH FROM DATE '2022-12-20' - 30) AS Mth,
EXTRACT(DAY FROM DATE '2022-12-20' + 30) AS Dy;

Explanation

  • DATE '2022-12-20' + 30 → Adds 30 days → 2023-01-19

  • DATE '2022-12-20' - 30 → Subtracts 30 days → 2022-11-20

Result:

YrMthDy
20231119


🔥 Example with TIME

SELECT
EXTRACT(HOUR FROM TIME '10:20:30') AS Hr,
EXTRACT(MINUTE FROM TIME '10:20:30') AS Mn,
EXTRACT(SECOND FROM TIME '10:20:30') AS Scd;

Result:

HrMnScd
102030


🔎 Important Notes

1️⃣ Returns Integer

EXTRACT(YEAR FROM CURRENT_DATE)

Returns something like:

2026

(Not a date — just a number)

2️⃣ Cannot Extract Combined Parts

❌ Not allowed:

EXTRACT(YEAR-MONTH FROM date_col)

If you want year-month:

CAST(EXTRACT(YEAR FROM date_col) AS CHAR(4))
|| '-' ||
CAST(EXTRACT(MONTH FROM date_col) AS CHAR(2))

3️⃣ Works with TIMESTAMP Also

EXTRACT(HOUR FROM CURRENT_TIMESTAMP)

🔥 Very Common Real Use Cases

📊 Grouping by Year

SELECT
EXTRACT(YEAR FROM order_date) AS Order_Year,
COUNT(*)
FROM orders
GROUP BY 1;

📅 Monthly Reports

SELECT
EXTRACT(MONTH FROM order_date) AS Order_Month,
SUM(amount)
FROM orders
GROUP BY 1;

🧠 Filtering by Year

WHERE EXTRACT(YEAR FROM order_date) = 2025

⚠ Performance Warning (Important in Teradata)

This:

WHERE EXTRACT(YEAR FROM order_date) = 2025

May prevent index usage because function is applied on column.

Better version:

WHERE order_date
BETWEEN DATE '2025-01-01'
AND DATE '2025-12-31'

Much more efficient.


📌 Quick Summary

FeatureValue
Works onDATE, TIME, TIMESTAMP
ReturnsINTEGER
ANSI compliantPartially
Combined extraction❌ Not allowed
Performance impactYes (if used in WHERE)

==========================================================================

🔹 ADD_MONTHS in Teradata

ADD_MONTHS is used to add or subtract months from a date.

It correctly follows the Gregorian calendar, handling:

  • 28, 29, 30, 31 days

  • Leap years

  • Month-end adjustments

🔥 Syntax

ADD_MONTHS(date_expression, integer_expression)
  • date_expression → DATE value

  • integer_expression → Number of months to add

    • Positive → Future

    • Negative → Past

🎯 Basic Examples

Current Date

SELECT CURRENT_DATE;

Example output:

2023-01-15

Add 2 Months

SELECT ADD_MONTHS(CURRENT_DATE, 2);
2023-03-15

Add 14 Years

SELECT ADD_MONTHS(CURRENT_DATE, 12*14);
2037-01-15

Subtract 11 Months

SELECT ADD_MONTHS(CURRENT_DATE, -11);
2022-02-15

🔥 Month-End Handling (Very Important)

This is where ADD_MONTHS becomes powerful.

Case 1: 31st July + 2 Months

SELECT ADD_MONTHS('2001-07-31', 2);

Result:

2001-09-30

Why?

  • September has only 30 days.

  • Teradata adjusts to last valid day.

Case 2: 31st December + 2 Months

SELECT ADD_MONTHS('2003-12-31', 2);

Result:

2004-02-29

Why?

  • 2004 is a leap year.

  • February has 29 days.

  • Adjusted automatically.

Case 3: 31st December + 14 Months

SELECT ADD_MONTHS('2003-12-31', 14);

Calculation:

  • 12 months → 2004-12-31

  • +2 months → 2005-02-28

Result:

2005-02-28

==========================================================================


==========================================================================


==========================================================================




==========================================================================







When to use join n cte over subquery


Use JOIN when:

  • You're combining related rows from two or more tables

  • The logic is simple and fits naturally as a relation

  • You need to apply filters, aggregations, or transformations across tables

Use CTE (WITH ...) when:

  • Query is long, nested, or has repeated logic

  • You need to modularize parts of the logic (like temp views)

  • You need recursive queries (like hierarchies)

  • You want reusability and cleaner code

Use subqueries when:

  • You need a quick, simple filter or scalar value

  • executed once per rownot good for big data

  • You don't need to reuse the result

  • Nesting logic makes sense (e.g., top-N, filtering based on aggregate)


=========================
Why we take all col in staging table as varchar

In data warehousing or ETL (Extract, Transform, Load) processes, staging tables are often used as a temporary storage area to hold raw data from source systems. A common practice is to define all columns in a staging table as VARCHAR (or TEXT / STRING), regardless of their actual data types in the source.

✅ Why We Load Invalid Data into Staging Instead of Failing Immediately

1. 🔍 Audit and Data Traceability

  • Staging = raw data snapshot — it's meant to capture everything from the source exactly as it is.

  • If you fail the load, you lose visibility into what failed and why.

  • Keeping bad records helps analysts and developers trace and correct data issues.

🧠 Think of staging as a black box recorder—you don’t want to delete parts of the crash before the investigation.


2. 🛠 Error Handling During Transformation

  • Transformation logic (after staging) is where rules and validation are applied.

  • This is where you:

    • Filter out invalid rows

    • Log them separately

    • Fix them (if possible)

    • Notify business/data teams

This gives you control over bad data, rather than letting it crash your entire ETL job.


3. 🚫 Avoid Pipeline Failure

  • If you stop the load every time there’s a single bad value:

    • You risk missing the rest of the valid data

    • It leads to operational overhead (manual restarts, reprocessing)

    • It’s not scalable for large or dirty datasets


4. ✅ Selective Processing

  • You can mark bad rows with a flag like is_valid = false or move them to an error table.

  • Business can later fix the source or send corrected data.

============================================================

GROUP BY AND NULL

When using GROUP BYNULL values in the group-by column create their own group. Aggregate functions like COUNT(column) will ignore NULL values in that column.

Columns not listed in GROUP BY must be wrapped in aggregate functions.

dense rank will rank with ties no skipping



==============================

Purpose of merge in sql

The purpose of the MERGE statement in SQL is to combine multiple data manipulation operations—INSERT, UPDATE, and DELETE—into a single statement to synchronize a target table with a source table based on matching conditions. This allows you to efficiently modify data by:

  • Updating rows in the target table that match rows in the source table.

  • Inserting rows into the target table that exist in the source but not in the target.

  • Deleting rows from the target table that do not exist in the source.

This makes MERGE particularly useful for tasks like data synchronization, incremental updates, and managing Slowly Changing Dimensions (SCD) in data warehouses.


MERGE is used to synchronize two tables by performing:

  • UPDATE → when a match is found

  • INSERT → when no match is found

  • (DELETE is also possible)

It combines INSERT + UPDATE logic in one statement, which is faster and easier to manage than separate queries.




View and usage?
view in DBMS (Database Management System) is a virtual table created from one or more underlying tables based on a query. It does not store data physically but contains a definition or projection of data from the original tables. When a view is accessed, the DBMS executes the underlying query to present the data dynamically.
Why use views in DBMS?

  • Simplify complex queries: Views can encapsulate complex joins, filters, and aggregations, making it easier for users to query data without writing complicated SQL each time.

  • Enhance security: Views restrict user access by exposing only specific columns or rows of data, preventing users from seeing sensitive information in the base tables.

  • Data abstraction: Views hide the complexity of the database schema and present data in a user-friendly format, improving usability.

  • Reuse and consistency: Views save frequently used queries as named objects, promoting query reuse and consistent data presentation

recursive CTE

recursive CTE (Common Table Expression) in SQL is a query construct that references itself repeatedly to process hierarchical or iterative data until a termination condition is met. It is defined using a WITH clause that includes:

  • An anchor member: the base query that provides the initial result set.

  • recursive member: a query that references the CTE itself to build upon the previous result.

The recursion continues by repeatedly executing the recursive member using the prior iteration's results until no new rows are returned, at which point the recursion stops and all intermediate results are combined.

How recursive CTE works (simplified):

  1. The anchor query runs first, producing the initial rows.

  2. The recursive query runs next, using the results of the anchor query.

  3. The recursive query references the CTE itself, using the previous iteration's output.

  4. Steps 2 and 3 repeat, each time using the last result set, until the recursive query returns no rows.

  5. The final result is the union of all these intermediate result sets.





rollback  commit savepoint

 they’re core concepts in transaction control in SQL. These are especially important for data integrity in OLTP systems and during complex ETL processes.

COMMIT

  • Purpose: Permanently saves all changes made during the current transaction to the database.

  • Once a transaction is committed, the changes become visible to other users and cannot be undone by rollback.

  • It marks the successful end of a transaction.



ROLLBACK

  • Purpose: Undoes all changes made in the current transaction, reverting the database to the state before the transaction began.

  • It is used to cancel a transaction when an error occurs or when the changes are not desired.

  • After rollback, none of the changes made in the transaction are saved.



SAVEPOINT

  • Purpose: Creates a named intermediate point within a transaction to which you can later roll back without undoing the entire transactionSAVEPOINT allows you to mark a point within a transaction, so you can roll back partially to that point instead of undoing the whole transaction.

  • Useful for complex transactions where you might want to partially undo some operations but keep others.

  • Multiple savepoints can be defined in a single transaction.



Partitioned Table?

partitioned table in a database is a large table that is logically divided into smaller, more manageable pieces called partitions, each of which stores a subset of the data based on a partitioning key or criteria. Although it appears as a single table to users and applications, the data physically resides in multiple partitions, which can be managed and accessed independently.

partitioned table is a large table that is divided into smaller, more manageable pieces called partitions based on the values of one or more columns.

Each partition behaves like a mini-table, but queries still treat the whole table as one.




What is Database Sharding

Database sharding is a technique of horizontally partitioning a large database into smaller, more manageable pieces called shards, where each shard holds a subset of the data and is stored on a separate server or node. This allows the database workload to be distributed across multiple machines, enabling better scalability and performance.
Each shard is a separate database instance that contains a subset of the data — usually split by a key like customer_id, region, or tenant_id. 💡 Think of it like splitting a large book into chapters and storing them across multiple shelves (servers).

Each shard is a separate database instance that contains a subset of the data — usually split by a key like customer_idregion, or tenant_id.

💡 Think of it like splitting a large book into chapters and storing them across multiple shelves (servers).UPDATE table_name 
SET column_name = value
WHERE condition;





Window function

Window Function performs a calculation across a set of rows (the “window”) that are related to the current row — without aggregating the result into one row.

📌 Unlike GROUP BY, window functions preserve each row in the result.














Common Table Expression (CTE)

CTE (Common Table Expression) is a temporary, named result set you define using WITH, which you can reference like a table in your main query.

running total


















 






Indexing


Indexing in SQL is a technique used to speed up the retrieval of rows from a database table by creating a data structure that allows the database engine to find data quickly without scanning the entire table. It works like an index in a book, pointing directly to the location of the data.

How indexing works:

  • An index is created on one or more columns of a table.

  • The database builds a separate data structure (often a B-tree or similar) that stores the indexed column values and pointers to the corresponding rows.

  • When a query searches for values in the indexed column(s), the database uses the index to quickly locate the rows instead of scanning the whole table.

Types of indexes:

  • Clustered index: Physically sorts and stores data rows in order based on the indexed column. Only one per table.

  • Non-clustered index: Creates a separate structure to point to data rows without altering physical order. Multiple allowed.

  • Unique index: Ensures indexed columns have unique values.

  • Composite (multi-column) index: Indexes multiple columns together to optimize queries filtering on those columns.


An index is a data structure (like a sorted list or tree) that allows the database to quickly locate and retrieve rows from a table without scanning the entire table.

🔍 Think of an index like the index of a book — it helps you find topics quickly instead of reading every page.








 













==========================================================================
asked ones
-->if
a table has 1,1,1,1 
and b has 1,1,1
tell count of all join, all are 12
==========================================================================

-->dense rank , rank , row number take no arguent


==========================================================================

-->PIVOT AS CASE

Both PIVOT and CASE WHEN + SUM() are used to transform rows into columns or summarize data by category, but they differ in syntax, flexibility, and use cases.

SELECT

    Product,

    SUM(CASE WHEN Region = 'East' THEN Sales ELSE 0 END) AS East_Sales,

    SUM(CASE WHEN Region = 'West' THEN Sales ELSE 0 END) AS West_Sales

FROM SalesTable

GROUP BY Product;

==========================================================================

There are 9 sample records in the table.



======================================================================

What you're trying to write (sum two pivot columns [A] + [B] into a new columnis not part of the PIVOT syntax itself.
You need to pivot first, and then sum the resulting columns in a SELECT statement outside the pivot.



======================================================================

Employee visiting office floor




======================================================================

"new vs. repeat customer" problem




======================================================================
calculating matches played, wins, and losses per team




======================================================================
 second highest salary





Cumulative Sum of Sales per Month
SELECT
    order_month,
    SUM(sales_amount) OVER (ORDER BY order_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM sales_data;


======================================================================



======================================================================

Specific to Teradata optimization (e.g., primary index, collect stats)


======================================================================

Recursive CTE

Work with hierarchical data (e.g., employee-manager, category-parent category).
Example: Find all employees reporting (directly or indirectly) to a specific manager.




https://www.youtube.com/watch?v=LZGaRcDxj8I



1st tym , only base exec then for all recursive part







here sec cte is curr but v need prev run cte emp id

========================================================================

Window

The RANK function is also a sub-part of window functions. The following points should be remembered while using this function: It always works with the OVER() clause. It assigns a rank to each row based on the ORDER BY clause.

In the OVER() clause, you specify the PARTITION BY and ORDER BY clauses. For ranking functions, the ORDER BY clause, including the name(s) of the column(s) or an expression, is mandatory. Before the ORDER BY clause is the optional PARTITION BY clause, which includes the name(s) of the column(s) or an expression.

ALL WINDOW NEED OVER


rank() over (order by marks desc) – skip number

NTILE() function in SQL Server is a window function that distributes rows of an ordered partition into a pre-defined number of roughly equal groups.

What NTILE() Does:

Splits ordered rows into N approximately equal-sized buckets.

Rows are assigned bucket numbers starting from 1 to N.

Useful for creating quartiles, deciles, percentiles, or balanced groups.



LAG , LEAD

ParameterDescriptions
expressionIt is a column name or any built-in function whose value return by the function.
offsetIt contains the number of rows succeeding from the current row. It should be a positive integer value. If it is zero, the function evaluates the result for the current row. If we omit this, the function uses 1 by default.=next
default_valueIt is a value that will return when we have no subsequent row from the current row. If we omit this, the function returns the null value.
OVEROVER It is responsible for partitioning rows into groups. If it is empty, the function performs an operation using all rows.
PARTITION BYIt split the rows in the result set into partitions to which a function is applied. If we have not specified this clause, all rows treated as a single row in the result set.
ORDER BYIt determines the sequence of rows in the partitions before the function is applied.

 
partition by will reset, null if no next entry
 







Q1 Number of row in join

--àNull wont join with another null in tables

INNER JOIN  -


FULL OUTER - 


Q2.


if multiple and want one use row number
join and sub query both r interchangeable 


dense rank , rank row number - careful wen to use wat
for like,or str function us lower / upper

IF U USE ONE UPDATE AT A TYM , AL BECOMES SAME GENDER N THEN U WONT B ABLE TO CHG


Q3.





new ve repeat cust









or



loop hoal with fake id





total mark of frd


cancellation rate

more than role
apart from wrk







tournament winner








seller







--do left join becoz  user which r not seller shd be checked



Comments

Popular posts from this blog

Git

work

Airflow