SQL and QUERY
Execution Order
1. FROM
2. ON
3. WHERE
4. GROUP BY
5. HAVING
6. SELECT
7. QUALIFY
8. ORDER BY
=====================================================================
🔹 BETWEEN Clause with String in Teradata
Yes ✅ — BETWEEN works with strings in Teradata.
But important point:
👉 It works based on lexicographical (dictionary) order,
NOT numeric order.
✅ Basic Syntax
column_name BETWEEN 'start_value' AND 'end_value'
It is equivalent to:
column_name >= 'start_value'
AND column_name <= 'end_value'
🔹 Example 1 – Simple String Range
SELECT name
FROM employee
WHERE name BETWEEN 'A' AND 'M';
What it does:
Returns all names starting from:
-
A
-
B
-
C
-
…
-
M
Because alphabetically:
'A' < 'B' < 'C' < ... < 'M'
🔹 Example 2 – Full Word Comparison
WHERE name BETWEEN 'John' AND 'Mary';
This includes:
John
Johnny
Jordan
...
Mary
But NOT:
Adam (comes before John)
Zack (comes after Mary)
🔥 Important: It Compares Character by Character
Comparison happens like this:
'Apple' < 'Banana'
Because:
-
A < B
⚠️ Case Sensitivity
In Teradata:
-
If column is NOT CASESPECIFIC → case insensitive
-
If CASESPECIFIC → uppercase and lowercase are different
Example:
'A' < 'a'
ASCII values differ.
🔹 Example 3 – Date Stored as VARCHAR
WHERE date_col BETWEEN '2024-01-01' AND '2024-12-31';
Works correctly only if format is YYYY-MM-DD
❌ If format is:
'01-12-2024'
Then BETWEEN will not work logically.
🚨 Problem
SELECT *
FROM Employee
WHERE Department_Number BETWEEN 567 AND 401;
This can never be TRUE.
👉 Result = No rows returned
BETWEEN does NOT automatically swap values.
🔥 Interview Trick Question
❓ What happens here?
WHERE name BETWEEN 'A' AND 'B';
It returns:
-
All names starting with A
-
All names starting with B
Because:
'A' <= name <= 'B'
But it will NOT return:
Brian (comes after 'B')
Because:
'Brian' > 'B'
====================================================================
🔹 DATE Arithmetic in Teradata
In Teradata, DATE is internally stored as an integer (number of days), which allows direct arithmetic operations.
That’s why you can:
-
Add days to a date
-
Subtract dates
-
Compare date ranges
-
Perform logical filtering
✅ 1️⃣ Add Days to Current Date
SELECT CURRENT_DATE + 40;
👉 Returns date 40 days from today
✔ Output datatype → DATE
✅ 2️⃣ Add Days to a Specific Date
SELECT DATE '1999-01-01' + 300;
👉 Always use ANSI literal format:
DATE 'YYYY-MM-DD'
✔ Returns DATE
✅ 3️⃣ Subtract Two Dates
SELECT CURRENT_DATE - DATE '2007-11-30';
👉 Returns INTEGER
(Number of days between dates)
⚠ Important:
-
DATE − DATE → INTEGER
-
DATE + INTEGER → DATE
✅ 4️⃣ Days Worked by Employee
SELECT CURRENT_DATE - Hire_Date
FROM Employee;
👉 Returns number of days worked.
If you want years:
SELECT (CURRENT_DATE - Hire_Date) / 365.25
FROM Employee;
✔ Result → Decimal years
✅ 5️⃣ Employees Working More Than 20 Years
SELECT *
FROM Employee
WHERE CURRENT_DATE - Hire_Date > 365.25 * 20;
👉 Logical comparison using integer result.
✅ 6️⃣ Employees Hired in Last 100 Days
SELECT *
FROM Employee
WHERE Hire_Date > CURRENT_DATE - 100;
👉 Very common interview question.
🔥 Important Behavior Summary
| Operation | Result Type |
|---|---|
| DATE + INTEGER | DATE |
| DATE - INTEGER | DATE |
| DATE - DATE | INTEGER |
| INTEGER + DATE | DATE |
🔎 Month / Year Arithmetic (Better Approach)
Instead of hardcoding 365.25:
Use ADD_MONTHS():
SELECT ADD_MONTHS(CURRENT_DATE, 12);
👉 Adds 12 months safely.
Find employees with > 20 years:
SELECT *
FROM Employee
WHERE Hire_Date <= ADD_MONTHS(CURRENT_DATE, -240);
✔ More accurate than 365.25 * 20
✔ Handles leap years correctly
⚠ Common Mistakes
❌ Wrong date literal:
'01-01-1999'
✔ Correct:
DATE '1999-01-01'
🎯 Interview-Level Tricky Questions
❓ What is result datatype of:
CURRENT_DATE - Hire_Date
👉 INTEGER
❓ Why use ADD_MONTHS instead of multiplying days?
👉 Because leap years make day calculation inaccurate.
💡 Real Data Engineering Use Cases
-
SLA calculation (within 30 days)
-
Late payment detection
-
Customer aging buckets
-
Retention period checks
-
Incremental data loads (last N days)
==================================================================
==============================================
=======================================================================
🔹 String Concatenation in Teradata
In Teradata, string concatenation is done using:
||
This operator joins multiple values into a single VARCHAR result.
✅ Basic Examples
1️⃣ Combine Last and First Name
SELECT Last_Name || First_Name
FROM Employee;
Result datatype → VARCHAR
2️⃣ Add Separator
SELECT Last_Name || ', ' || First_Name
FROM Employee;
Example output:
Brown, Adam
Smith, John
3️⃣ Concatenation with Expression
SELECT First_Name || ' ' || Last_Name || ' is ' ||
(CURRENT_DATE - Birthdate) / 365
FROM Employee;
👉 Numeric values automatically converted to VARCHAR
👉 Result is always VARCHAR
4️⃣ Concatenating Numbers and Strings
SELECT 123 || 'ABC' || 456;
Result:
123ABC456
✔ Numbers are implicitly converted to character.
🔥 Resulting Data Type
All concatenation results → VARCHAR
Even if inputs are:
-
CHAR
-
INTEGER
-
DATE
-
DECIMAL
Teradata converts everything to VARCHAR.
🔹 Using CAST with Concatenation
You can explicitly define output length:
SELECT CAST(Last_Name || ', ' || First_Name AS CHAR(100))
FROM Employee;
👉 Result becomes fixed-length CHAR(100)
⚠ Truncation Example
SELECT CAST(123 || 'ABC' || 456 AS CHAR(3));
Result:
123
✔ Truncated to 3 characters
✔ No rounding, simple cut-off
🔹 Using SUBSTRING with Concatenation
Example:
Display first initial + last name
SELECT
SUBSTRING(first_name FROM 1 FOR 1) || '. ' || last_name
AS employee_name
FROM employee
WHERE department_number = 403;
Output:
A. Brown
P. Hopkins
D. Lombardo
A. Villegas
J. Charles
🔎 Important Behavior
1️⃣ NULL Handling
SELECT Last_Name || First_Name
If either column is NULL → entire result is NULL.
To avoid this:
SELECT COALESCE(Last_Name,'') || COALESCE(First_Name,'')
2️⃣ Performance Tip (Interview Level)
Heavy concatenation:
-
Prevents index usage
-
Should not be used in WHERE clause filters
❌ Avoid:
WHERE First_Name || Last_Name = 'JohnSmith'
✔ Better:
WHERE First_Name = 'John'
AND Last_Name = 'Smith'
🔥 Real Data Engineering Use Cases
-
Creating display columns
-
Generating full names
-
Building export files
-
Creating composite keys (careful!)
-
Formatting audit messages
==================================================================
==========================================================================
🔹 Aggregating Groups (GROUP BY) in Teradata
GROUP BY is used to:
✔ Define logical groups
✔ Perform aggregate calculations per group
✔ Return one row per group
🔹 Key Concepts
✅ Non-Aggregate Columns
Columns not inside aggregate functions (SUM, COUNT, etc.)
✅ Aggregate Columns
Columns inside:
-
SUM() -
COUNT() -
AVG() -
MIN() -
MAX()
🔥 Core GROUP BY Rules
1️⃣ Every projected non-aggregate must appear in GROUP BY
✔ Correct:
SELECT department_number,
SUM(salary_amount)
FROM employee
GROUP BY department_number;
❌ Incorrect:
SELECT department_number,
manager_employee_number,
SUM(salary_amount)
FROM employee
GROUP BY department_number;
Error:
3504 Selected non-aggregate values must be part of the associated group.
2️⃣ GROUP BY can use column name or position
By name:
GROUP BY department_number;
By position:
GROUP BY 1;
(1 = first column in SELECT list)
3️⃣ NULL Forms a Group
Even though NULL values are ignored inside SUM/AVG,
they still create a group:
Example result:
| department_number | Sum(salary_amount) |
|---|---|
| 403 | 233000.00 |
| 402 | 77000.00 |
| 301 | 116400.00 |
| 999 | 100000.00 |
| NULL | 129950.00 |
| 401 | 245575.00 |
| 501 | 200125.00 |
👉 NULL department employees are grouped together.
4️⃣ GROUP BY Does NOT Sort
If you need sorted result:
ORDER BY department_number;
Grouping ≠ Ordering.
5️⃣ You Can Group By Columns Not Projected
Example:
SELECT SUM(salary_amount)
FROM employee
GROUP BY department_number,
manager_employee_number;
Here:
-
We do NOT display department or manager
-
But grouping is based on both
👉 More grouping columns = more granular groups
👉 More groups = more rows returned
🔎 Understanding Granularity
If grouped by:
GROUP BY department_number;
→ One row per department
If grouped by:
GROUP BY department_number,
manager_employee_number;
→ One row per department-manager combination
Granularity increases.
🔹 Logical Execution Order
-
FROM
-
WHERE
-
GROUP BY
-
HAVING
-
SELECT
-
ORDER BY
Aggregation happens before SELECT output.
🔹 HAVING (Filtering Groups)
SELECT department_number,
SUM(salary_amount) AS total_sal
FROM employee
GROUP BY department_number
HAVING SUM(salary_amount) > 200000;
✔ WHERE filters rows
✔ HAVING filters groups
🔥 Why Aggregation Removes Detail Rows
Original table:
-
Many rows per department
After GROUP BY:
-
One summary row per department
Detail rows are collapsed into summary rows.
==========================================================================
🔹 Adding Grouping Columns in Teradata
When you add more columns to a GROUP BY, you:
✔ Increase granularity
✔ Increase number of groups
✔ Increase number of result rows
At the extreme:
If grouping columns form a unique combination,
then aggregation = detail rows (no real collapsing happens).
SELECT
Manager_Employee_Number AS Mgr,
Department_Number AS Dept,
Job_Code AS JCd,
SUM(Salary_Amount) AS SumSal
FROM employee_sales.employee
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
What’s happening?
Grouping is done on:
-
Manager
-
Department
-
Job Code
So each unique combination gets its own group.
🔹 Why Rows Increase?
If you group only by:
GROUP BY Department_Number;
→ Few rows (1 per department)
If you group by:
GROUP BY Manager_Employee_Number,
Department_Number,
Job_Code;
→ Many more rows
Because combination is more specific.
🔥 When GROUP BY Becomes Unique
If this combination:
Manager + Department + Job_Code
is unique per employee,
Then:
SUM(Salary_Amount)
is simply the employee's salary.
👉 No real aggregation effect
👉 Detail = Aggregated result
🔹 Valid GROUP BY Syntax Variations
For this query, all are valid:
By Position
GROUP BY 1, 2, 3
GROUP BY 3, 1, 2
(Order doesn’t matter logically)
By Alias Name
GROUP BY Mgr, Dept, JCd
Mixed Form
GROUP BY 1, Department_Number, JCd
Teradata allows:
-
Column names
-
Aliases
-
Positional references
-
Mixture of them
🔎 Important Concept: Granularity
| GROUP BY Columns | Result |
|---|---|
| Dept | Dept-level totals |
| Dept, Manager | Dept-manager totals |
| Dept, Manager, Job | Very detailed totals |
| Unique Key Columns | Same as detail rows |
🔥 NULL Behavior
NULL values still create groups:
Example:
Dept = NULL
All NULL dept rows grouped together.
🔹 GROUP BY vs DISTINCT
If grouping columns form a unique combination:
SELECT DISTINCT Manager,
Department,
Job_Code
FROM employee;
May produce similar row count as:
GROUP BY Manager,
Department,
Job_Code;
But:
-
DISTINCT removes duplicates
-
GROUP BY aggregates
==========================================================================
🔹 CASE Overview in Teradata
The CASE expression is used to evaluate conditions and return derived values in the result set.
It is:
-
ANSI compliant
-
Used in SELECT, WHERE, ORDER BY, HAVING
-
Very common in reporting & analytics
-
Helpful for data enrichment and derived columns
🔥 Two Forms of CASE
1️⃣ Valued Form (Simple CASE)
-
Based on equality
-
Compares one expression against multiple values
-
Cannot test ranges or complex conditions
-
Cannot test NULL using
=
2️⃣ Searched Form
-
Uses full boolean conditions
-
Can use:
-
= -
<> -
>,< -
BETWEEN -
IN -
IS NULL
-
-
More flexible
-
Most commonly used in real projects
🔹 Valued Form Explained
Example:
SELECT Last_Name,
First_Name,
CASE USER
WHEN 'sql01' THEN Department_Number
WHEN 'sql02' THEN Manager_Employee_Number
END
FROM Employee
WHERE Job_Code = 512101;
What this means internally:
CASE
WHEN USER = 'sql01' THEN Department_Number
WHEN USER = 'sql02' THEN Manager_Employee_Number
END
So:
-
USERis evaluated -
Compared using equality
-
First match stops evaluation
-
If nothing matches → returns NULL (default)
🔎 Important Rules
✔ Each WHEN tested sequentially
✔ First TRUE stops evaluation
✔ ELSE executes if nothing matches
✔ ELSE NULL is default
✔ Must end with END
⚠ NULL Important Rule
This is invalid in valued form:
WHEN NULL THEN ...
Because:
USER = NULL
is never TRUE (NULL comparison is unknown)
To check NULL → use searched CASE:
CASE
WHEN USER IS NULL THEN 'Unknown'
END
🔹 With Explicit ELSE
SELECT Last_Name,
First_Name,
CASE USER
WHEN 'sql01' THEN Department_Number
WHEN 'sql02' THEN Manager_Employee_Number
ELSE NULL
END
FROM Employee
WHERE Job_Code = 512101;
ELSE NULL is optional because it’s default.
🔹 Why Output Shows NULL ( ? )
Example result:
| Last_Name | First_Name | CASE Result |
|---|---|---|
| Ratzlaff | Larry | NULL |
| Wilson | Edward | NULL |
| Rabbit | Peter | NULL |
Why?
Because:
-
Current USER is neither 'sql01' nor 'sql02'
-
So CASE returned NULL
🔥 Default Column Heading
If you don’t alias:
CASE USER
...
END
Teradata gives default column name like:
CASE_Expression
Better practice:
CASE USER
...
END AS Derived_Value
🆚 Valued vs Searched Quick Comparison
| Feature | Valued | Searched |
|---|---|---|
| Equality only | ✅ | ❌ |
| Multiple columns | ❌ | ✅ |
| Range checks | ❌ | ✅ |
| IS NULL | ❌ | ✅ |
| Most used in projects | ⚠ | ✅ |
🔥 Real Data Engineering Use Cases
You’ll use CASE to:
-
Create flags (High Salary / Low Salary)
-
Categorize departments
-
Derive KPI bands
-
Handle NULL values
-
Build reporting dimensions
-
Conditional aggregation
Example:
SUM(
CASE
WHEN Salary_Amount > 100000 THEN Salary_Amount
ELSE 0
END
)
==========================================================================
🔹 Valued CASE and NULL in Teradata
This is a very important concept.
In the valued (simple) CASE, comparisons are based only on equality (=).
🔥 Why WHEN NULL is Illegal
Example (Invalid):
SELECT Last_Name,
Department_Number,
Salary_Amount,
CASE Department_Number
WHEN 401 THEN Salary_Amount * 1.1
WHEN NULL THEN Salary_Amount * 0.85 -- ❌ Invalid
ELSE NULL
END
FROM Employee;
❗ How Teradata Interprets This
Valued CASE works like this:
CASE
WHEN Department_Number = 401 THEN ...
WHEN Department_Number = NULL THEN ...
END
But:
Department_Number = NULL
is never TRUE.
In SQL:
-
NULL is unknown
-
Any comparison with NULL returns UNKNOWN
-
UNKNOWN ≠ TRUE
So Teradata throws:
3731: The user must use IS NULL or IS NOT NULL to test for NULL values.
🔎 Important Rule
Valued CASE:
✔ Uses equality only
✔ Cannot test NULL
✔ Cannot use IS NULL
✔ Single expression comparison
🔥 Why ELSE Works
ELSE Salary_Amount * 0.85
ELSE is not a condition.
It is simply:
👉 “Return this value if no WHEN matched.”
It does not evaluate TRUE/FALSE.
✅ Correct Way to Handle NULL
You must use searched CASE instead.
Correct Version:
SELECT Last_Name,
Department_Number,
Salary_Amount,
CASE
WHEN Department_Number = 401 THEN Salary_Amount * 1.1
WHEN Department_Number IS NULL THEN Salary_Amount * 0.85
ELSE NULL
END
FROM Employee;
Now:
-
NULL is tested properly
-
No error
-
Works correctly
🧠 Why SQL Works This Way
Because SQL uses 3-valued logic:
| Expression | Result |
|---|---|
| 5 = 5 | TRUE |
| 5 = 6 | FALSE |
| 5 = NULL | UNKNOWN |
| NULL = NULL | UNKNOWN |
Only TRUE triggers a WHEN clause.
UNKNOWN does not.
🔥 Quick Comparison
| Case Type | Can test NULL? | How? |
|---|---|---|
| Valued CASE | ❌ No | Not possible |
| Searched CASE | ✅ Yes | IS NULL |
==========================================================================
🔹 NULLIF in Teradata
NULLIF is a simple but powerful function used to return NULL when two expressions are equal.
==========================================================================
🔹 COALESCE in Teradata
COALESCE returns the first non-NULL value from a list of expressions.
COALESCE(null, 1, 4.090 , 'text') = 1.000
COALESCE(null, 'string', 4.00 , 'text') = string
==========================================================================
🔹 LIKE with CHAR and ESCAPE in Teradata
LIKE is used for pattern matching in character columns.
It works with:
-
CHAR -
VARCHAR -
Any string type
🔥 Basic LIKE Syntax
WHERE column LIKE 'pattern'
🎯 Wildcards
| Symbol | Meaning |
|---|---|
% | Any number of characters (0 or more) |
_ | Exactly one character |
✅ Examples
Starts with 'A'
WHERE last_name LIKE 'A%'
Ends with 'son'
WHERE last_name LIKE '%son'
Exactly 5 characters starting with A
WHERE last_name LIKE 'A____'
(1 A + 4 underscores = 5 total)
Safe Practice
Trim if needed:
WHERE TRIM(col) LIKE 'ABC%'
🔥 ESCAPE Clause (Very Important)
Sometimes you want to search for actual % or _ characters.
Example:
Find rows containing 50%.
This will not work correctly:
WHERE discount LIKE '50%'
Because % is wildcard.
✅ Correct Way Using ESCAPE
WHERE discount LIKE '50\%' ESCAPE '\'
Explanation:
-
\is escape character -
\%means literal percent sign
🔎 Another Example
Search for underscore _:
WHERE code LIKE 'AB\_12' ESCAPE '\'
This matches:
AB_12
Not:
ABX12
🔥 Full Syntax
WHERE column LIKE 'pattern' ESCAPE 'escape_character'
You can use any single character as escape:
WHERE col LIKE 'A#_%' ESCAPE '#'🚀 Advanced Example
SELECT *
FROM Employee
WHERE last_name LIKE '%\_%' ESCAPE '\';
Finds names containing underscore.
🔥 Performance Note (Teradata)
-
LIKE 'ABC%'→ Can use index -
LIKE '%ABC'→ Full table scan -
LIKE '%ABC%'→ Full table scan
Leading wildcard kills index usage.
==========================================================================
🔹 EXTRACT Function in Teradata
EXTRACT is used to return specific parts of a DATE or TIME value.
It returns an INTEGER.
It is partially ANSI compliant.
🔥 Syntax
EXTRACT(part FROM date_or_time_value)
Supported Parts
For DATE:
-
YEAR
-
MONTH
-
DAY
For TIME:
-
HOUR
-
MINUTE
-
SECOND
🎯 Example with DATE
SELECT
EXTRACT(YEAR FROM DATE '2022-12-20' + 30) AS Yr,
EXTRACT(MONTH FROM DATE '2022-12-20' - 30) AS Mth,
EXTRACT(DAY FROM DATE '2022-12-20' + 30) AS Dy;
Explanation
-
DATE '2022-12-20' + 30→ Adds 30 days → 2023-01-19 -
DATE '2022-12-20' - 30→ Subtracts 30 days → 2022-11-20
Result:
| Yr | Mth | Dy |
|---|---|---|
| 2023 | 11 | 19 |
🔥 Example with TIME
SELECT
EXTRACT(HOUR FROM TIME '10:20:30') AS Hr,
EXTRACT(MINUTE FROM TIME '10:20:30') AS Mn,
EXTRACT(SECOND FROM TIME '10:20:30') AS Scd;
Result:
| Hr | Mn | Scd |
|---|---|---|
| 10 | 20 | 30 |
🔎 Important Notes
1️⃣ Returns Integer
EXTRACT(YEAR FROM CURRENT_DATE)
Returns something like:
2026
(Not a date — just a number)
2️⃣ Cannot Extract Combined Parts
❌ Not allowed:
EXTRACT(YEAR-MONTH FROM date_col)
If you want year-month:
CAST(EXTRACT(YEAR FROM date_col) AS CHAR(4))
|| '-' ||
CAST(EXTRACT(MONTH FROM date_col) AS CHAR(2))
3️⃣ Works with TIMESTAMP Also
EXTRACT(HOUR FROM CURRENT_TIMESTAMP)
🔥 Very Common Real Use Cases
📊 Grouping by Year
SELECT
EXTRACT(YEAR FROM order_date) AS Order_Year,
COUNT(*)
FROM orders
GROUP BY 1;
📅 Monthly Reports
SELECT
EXTRACT(MONTH FROM order_date) AS Order_Month,
SUM(amount)
FROM orders
GROUP BY 1;
🧠 Filtering by Year
WHERE EXTRACT(YEAR FROM order_date) = 2025
⚠ Performance Warning (Important in Teradata)
This:
WHERE EXTRACT(YEAR FROM order_date) = 2025
May prevent index usage because function is applied on column.
Better version:
WHERE order_date
BETWEEN DATE '2025-01-01'
AND DATE '2025-12-31'
Much more efficient.
📌 Quick Summary
| Feature | Value |
|---|---|
| Works on | DATE, TIME, TIMESTAMP |
| Returns | INTEGER |
| ANSI compliant | Partially |
| Combined extraction | ❌ Not allowed |
| Performance impact | Yes (if used in WHERE) |
==========================================================================
🔹 ADD_MONTHS in Teradata
ADD_MONTHS is used to add or subtract months from a date.
It correctly follows the Gregorian calendar, handling:
-
28, 29, 30, 31 days
-
Leap years
-
Month-end adjustments
🔥 Syntax
ADD_MONTHS(date_expression, integer_expression)
-
date_expression→ DATE value -
integer_expression→ Number of months to add-
Positive → Future
-
Negative → Past
-
🎯 Basic Examples
Current Date
SELECT CURRENT_DATE;
Example output:
2023-01-15
Add 2 Months
SELECT ADD_MONTHS(CURRENT_DATE, 2);
2023-03-15
Add 14 Years
SELECT ADD_MONTHS(CURRENT_DATE, 12*14);
2037-01-15
Subtract 11 Months
SELECT ADD_MONTHS(CURRENT_DATE, -11);
2022-02-15
🔥 Month-End Handling (Very Important)
This is where ADD_MONTHS becomes powerful.
Case 1: 31st July + 2 Months
SELECT ADD_MONTHS('2001-07-31', 2);
Result:
2001-09-30
Why?
-
September has only 30 days.
-
Teradata adjusts to last valid day.
Case 2: 31st December + 2 Months
SELECT ADD_MONTHS('2003-12-31', 2);
Result:
2004-02-29
Why?
-
2004 is a leap year.
-
February has 29 days.
-
Adjusted automatically.
Case 3: 31st December + 14 Months
SELECT ADD_MONTHS('2003-12-31', 14);
Calculation:
-
12 months → 2004-12-31
-
+2 months → 2005-02-28
Result:
2005-02-28
==========================================================================
==========================================================================
==========================================================================
==========================================================================
When to use join n cte over subquery
Use JOIN when:
You're combining related rows from two or more tables
The logic is simple and fits naturally as a relation
You need to apply filters, aggregations, or transformations across tables
Use CTE (WITH ...) when:
Query is long, nested, or has repeated logic
You need to modularize parts of the logic (like temp views)
You need recursive queries (like hierarchies)
You want reusability and cleaner code
Use subqueries when:
You need a quick, simple filter or scalar value
executed once per rownot good for big data
You don't need to reuse the result
Nesting logic makes sense (e.g., top-N, filtering based on aggregate)
VARCHAR (or TEXT / STRING), regardless of their actual data types in the source.✅ Why We Load Invalid Data into Staging Instead of Failing Immediately
1. 🔍 Audit and Data Traceability
Staging = raw data snapshot — it's meant to capture everything from the source exactly as it is.
If you fail the load, you lose visibility into what failed and why.
Keeping bad records helps analysts and developers trace and correct data issues.
🧠 Think of staging as a black box recorder—you don’t want to delete parts of the crash before the investigation.
2. 🛠 Error Handling During Transformation
Transformation logic (after staging) is where rules and validation are applied.
This is where you:
Filter out invalid rows
Log them separately
Fix them (if possible)
Notify business/data teams
This gives you control over bad data, rather than letting it crash your entire ETL job.
3. 🚫 Avoid Pipeline Failure
If you stop the load every time there’s a single bad value:
You risk missing the rest of the valid data
It leads to operational overhead (manual restarts, reprocessing)
It’s not scalable for large or dirty datasets
4. ✅ Selective Processing
You can mark bad rows with a flag like
is_valid = falseor move them to an error table.Business can later fix the source or send corrected data.
============================================================
GROUP BY AND NULL
When using GROUP BY, NULL values in the group-by column create their own group. Aggregate functions like COUNT(column) will ignore NULL values in that column.
Columns not listed in GROUP BY must be wrapped in aggregate functions.
The purpose of the MERGE statement in SQL is to combine multiple data manipulation operations—INSERT, UPDATE, and DELETE—into a single statement to synchronize a target table with a source table based on matching conditions. This allows you to efficiently modify data by:
Updating rows in the target table that match rows in the source table.
Inserting rows into the target table that exist in the source but not in the target.
Deleting rows from the target table that do not exist in the source.
This makes MERGE particularly useful for tasks like data synchronization, incremental updates, and managing Slowly Changing Dimensions (SCD) in data warehouses.
MERGE is used to synchronize two tables by performing:
UPDATE→ when a match is foundINSERT→ when no match is found(
DELETEis also possible)
It combines INSERT + UPDATE logic in one statement, which is faster and easier to manage than separate queries.
View and usage?
A view in DBMS (Database Management System) is a virtual table created from one or more underlying tables based on a query. It does not store data physically but contains a definition or projection of data from the original tables. When a view is accessed, the DBMS executes the underlying query to present the data dynamically.
Why use views in DBMS?
Simplify complex queries: Views can encapsulate complex joins, filters, and aggregations, making it easier for users to query data without writing complicated SQL each time.
Enhance security: Views restrict user access by exposing only specific columns or rows of data, preventing users from seeing sensitive information in the base tables.
Data abstraction: Views hide the complexity of the database schema and present data in a user-friendly format, improving usability.
Reuse and consistency: Views save frequently used queries as named objects, promoting query reuse and consistent data presentation
recursive CTE
A recursive CTE (Common Table Expression) in SQL is a query construct that references itself repeatedly to process hierarchical or iterative data until a termination condition is met. It is defined using a WITH clause that includes:
An anchor member: the base query that provides the initial result set.
A recursive member: a query that references the CTE itself to build upon the previous result.
The recursion continues by repeatedly executing the recursive member using the prior iteration's results until no new rows are returned, at which point the recursion stops and all intermediate results are combined.
How recursive CTE works (simplified):
The anchor query runs first, producing the initial rows.
The recursive query runs next, using the results of the anchor query.
The recursive query references the CTE itself, using the previous iteration's output.
Steps 2 and 3 repeat, each time using the last result set, until the recursive query returns no rows.
The final result is the union of all these intermediate result sets.
rollback commit savepoint
they’re core concepts in transaction control in SQL. These are especially important for data integrity in OLTP systems and during complex ETL processes.
COMMIT
Purpose: Permanently saves all changes made during the current transaction to the database.
Once a transaction is committed, the changes become visible to other users and cannot be undone by rollback.
It marks the successful end of a transaction.
ROLLBACK
Purpose: Undoes all changes made in the current transaction, reverting the database to the state before the transaction began.
It is used to cancel a transaction when an error occurs or when the changes are not desired.
After rollback, none of the changes made in the transaction are saved.
SAVEPOINT
Purpose: Creates a named intermediate point within a transaction to which you can later roll back without undoing the entire transaction.
SAVEPOINTallows you to mark a point within a transaction, so you can roll back partially to that point instead of undoing the whole transaction.Useful for complex transactions where you might want to partially undo some operations but keep others.
Multiple savepoints can be defined in a single transaction.
Partitioned Table?
A partitioned table in a database is a large table that is logically divided into smaller, more manageable pieces called partitions, each of which stores a subset of the data based on a partitioning key or criteria. Although it appears as a single table to users and applications, the data physically resides in multiple partitions, which can be managed and accessed independently.
A partitioned table is a large table that is divided into smaller, more manageable pieces called partitions based on the values of one or more columns.
Each partition behaves like a mini-table, but queries still treat the whole table as one.
What is Database ShardingDatabase sharding is a technique of horizontally partitioning a large database into smaller, more manageable pieces called shards, where each shard holds a subset of the data and is stored on a separate server or node. This allows the database workload to be distributed across multiple machines, enabling better scalability and performance.Each shard is a separate database instance that contains a subset of the data — usually split by a key like customer_id, region, or tenant_id. 💡 Think of it like splitting a large book into chapters and storing them across multiple shelves (servers).
Each shard is a separate database instance that contains a subset of the data — usually split by a key like customer_id, region, or tenant_id.
💡 Think of it like splitting a large book into chapters and storing them across multiple shelves (servers).UPDATE table_name
SET column_name = value
WHERE condition;
Window function
A Window Function performs a calculation across a set of rows (the “window”) that are related to the current row — without aggregating the result into one row.
📌 Unlike
GROUP BY, window functions preserve each row in the result.
Common Table Expression (CTE)
WITH, which you can reference like a table in your main query.running total
Indexing
Indexing in SQL is a technique used to speed up the retrieval of rows from a database table by creating a data structure that allows the database engine to find data quickly without scanning the entire table. It works like an index in a book, pointing directly to the location of the data.
How indexing works:
An index is created on one or more columns of a table.
The database builds a separate data structure (often a B-tree or similar) that stores the indexed column values and pointers to the corresponding rows.
When a query searches for values in the indexed column(s), the database uses the index to quickly locate the rows instead of scanning the whole table.
Types of indexes:
Clustered index: Physically sorts and stores data rows in order based on the indexed column. Only one per table.
Non-clustered index: Creates a separate structure to point to data rows without altering physical order. Multiple allowed.
Unique index: Ensures indexed columns have unique values.
Composite (multi-column) index: Indexes multiple columns together to optimize queries filtering on those columns.
An index is a data structure (like a sorted list or tree) that allows the database to quickly locate and retrieve rows from a table without scanning the entire table.
🔍 Think of an index like the index of a book — it helps you find topics quickly instead of reading every page.
a table has 1,1,1,1
tell count of all join, all are 12
-->dense rank , rank , row number take no arguent
-->PIVOT AS CASE
PIVOT and CASE WHEN + SUM() are used to transform rows into columns or summarize data by category, but they differ in syntax, flexibility, and use cases.SELECT
Product,
SUM(CASE WHEN Region = 'East' THEN Sales ELSE 0 END) AS East_Sales,
SUM(CASE WHEN Region = 'West' THEN Sales ELSE 0 END) AS West_Sales
FROM SalesTable
GROUP BY Product;
What you're trying to write (sum two pivot columns [A] + [B] into a new column) is not part of the PIVOT syntax itself.
You need to pivot first, and then sum the resulting columns in a SELECT statement outside the pivot.
Employee visiting office floor
"new vs. repeat customer" problem
Specific to Teradata optimization (e.g., primary index, collect stats)
Recursive CTE
Example: Find all employees reporting (directly or indirectly) to a specific manager.
Window
The RANK function is also a sub-part of window functions. The following points should be remembered while using this function: It always works with the OVER() clause. It assigns a rank to each row based on the ORDER BY clause.
In the OVER() clause, you specify the PARTITION BY and ORDER BY clauses. For ranking functions, the ORDER BY clause, including the name(s) of the column(s) or an expression, is mandatory. Before the ORDER BY clause is the optional PARTITION BY clause, which includes the name(s) of the column(s) or an expression.
ALL WINDOW NEED OVER
NTILE() function in SQL Server is a window function that distributes rows of an ordered partition into a pre-defined number of roughly equal groups.
What NTILE() Does:
Splits ordered rows into N approximately equal-sized buckets.
Rows are assigned bucket numbers starting from 1 to N.
Useful for creating quartiles, deciles, percentiles, or balanced groups.
LAG , LEAD
| Parameter | Descriptions |
|---|---|
| expression | It is a column name or any built-in function whose value return by the function. |
| offset | It contains the number of rows succeeding from the current row. It should be a positive integer value. If it is zero, the function evaluates the result for the current row. If we omit this, the function uses 1 by default.=next |
| default_value | It is a value that will return when we have no subsequent row from the current row. If we omit this, the function returns the null value. |
| OVER | OVER It is responsible for partitioning rows into groups. If it is empty, the function performs an operation using all rows. |
| PARTITION BY | It split the rows in the result set into partitions to which a function is applied. If we have not specified this clause, all rows treated as a single row in the result set. |
| ORDER BY | It determines the sequence of rows in the partitions before the function is applied. |
Q1 Number of row in join
--àNull wont join with another null in tables
INNER JOIN -
IF U USE ONE UPDATE AT A TYM , AL BECOMES SAME GENDER N THEN U WONT B ABLE TO CHG
new ve repeat cust
or
total mark of frd




































Comments
Post a Comment