work
dbt uses ur keys from delta n definition n then identify what chged and adds processing col
23/24-scd2
config(
materialized='delta', --table loading mechanism
unique_key='id', --PK of table
stratergy='check', --mechanism how to handle new incoming data,always check .data need to be checked in existing table with new incoming data and compared.
properties={"partitioned_by":"ARRAY['dbt_valid_from']"}, --any specific properties that the table shd have
target_schema='pcc', --schema where table will deploy
check_cols="all", --which col need to tested against for scd2 compare
invalid_hard_deletes=True, --mechanism to tell type 23 ,24, by default value is false and will type 24. if made true it is 23.
updated_at=var('BUSINESS_DATE') --this config set 'dbt_valid_from' date (=start date) when no value is set default to 'CURRENT_DATE' this can be passed via dt config or while falling dt run or via airflow.
)
25 - transactional
23/24-scd2
config(
materialized='incremental', --table loading mechanism
incremental_stratergy='append', --mechanism how to handle new incoming data,append ; append new data with existing data.
'Truncate+Insert' ; delete all n put
properties={"partitioned_by":"ARRAY['dbt_valid_from']"}, --any specific properties that the table shd have
target='pcc', --logical tag to grp obj together
updated_at=var('BUSINESS_DATE') --this config set 'dbt_valid_from' date (=start date) when no value is set default to 'CURRENT_DATE' this can be passed via dt config or while falling dt run or via airflow.
)
================================================
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
ROWS UNBOUNDED PRECEDING ) BRKR_ID
select
max_id.MAX_BRKR_ID + SUM (CAST (1 AS INTEGER)) OVER (ROWS UNBOUNDED PRECEDING ) BRKR_ID,
--ASSUMPTION IS HUB HAS RECORDS, WHENMAX_ID.MAX_BRKR_ID is null or negative , id need to be default to 0
complilment_key,
tpmi_sao_key
from
(
select distinct a.compliment_key , a.tpmi_sao_key
from acq va
left join Vr_h_brkr
on A.compliment_key=B.compliment_key
AND A.tpmi_sao_key=B.tpmi_sao_key
WHERE A.DUP_SQN=1
AND A.SRC_DLET_FLAG=0
AND B.BRKR IS NULL
==this is to identify brkr that are present in src but not in hub , so hub would do insert for these recors.
)
and
(
select bussiness_date
from gcfr process where prc name=tx h brkr
) between a.src-strt ts and a.src end ts
)src
cross join
(
select coalesce(max(brkr_id) , 0 ) as max_brkr_id from vr_h_brkr
) max_id;
HUB-transactional data-25. Satellite-scd2 -23-24 (based on data in src not dates)
23--> full
chg prev if exists and date to 9999 else nothing
full snapshot is the source table data- means everyday snap is coming in , and if some data is missing for the day it means that the data is deleted in src n marked with flag del =1 in target
we have like act no,name,address , today one row came with some values for these attribute, tomorrow address chg , then later some thg else chg . when new record comes the existing ki end date shd chg , new record shd be inserted.
=>input - src data for gcfr full transform process is full snapshot of current state from source
=>output - target table will have data stored with start date n end date that represent valid period of data.
=>process - this input data will be compared with open record (end date = high date ) in the target and changes will be captured and update into target table.
24-->delta
means src has only delta data- it has only new and changed data, --same as full but for missing data in src it wont mark del flag as 1 I target
=>input - src data for gcfr full transform process is changed data from src with chg flag (gcfr action flag). change flag are
0-new record
1-update
2-delete
=>output - same as full, target will have data stored with start date n end date that represent valid period of data
=>process - gcfr prc will apply chg following chg flag in target table
25-->transactional
prev never chg and end date is always 9999
once data flows in , it cant change .example like if v do payment to vendor now transaction time amount and details cant change ever.all these details can never chg . tomorrow record cant chg telling somethg got updated in prev record beoz that's not possible..
so any real life event , transaction , activity happen on acct r all 'transaction data'.
=>input - src data for gcfr full transform process is transaction or event from src
=>output - target table is transaction type table where start date will be set with business date and end date with high_date
=>process - gcfr prc will insert only new transaction into target table

===============================================
================================================
SNAPSHOT TO SCD2
we had full snapshot data loaded from src to target , everyday full data loaded. so storage is up.
we will convert it to scd2.
gcfr-tool does the conversion to scd2 auto , we need to mention tx ( which tell how to select the data from src to loaded in target) , what kinda load 23/24 and 25(full/delta , transaction)
we created new table with opt and 2 tx.
1 taking data from _d for daily load and 2 to load history data we used fulsnap table in tx =fl dt which is snapshot date from src.
CONTROLm - we created only 1 stream - using 1 stream v use hist tx and _d tx to load in same table because all details r same , also v shd have same prcnm in table by gcfr.
cross joi with cal to get prev data becoz down v r using fldt , n our purpose is to remove fldt because table shd be scd2 but fldt chges everyday so table is considering it as transaction n inserting all data everyday
CONTINUOS IMPROVEMENT
- reduce join as much as possible with cte n seperate sel stmt
- add filter on rec del n end date for active , before joins on base tabel
- collect stat on join n where n pi col , partitioned by , compress , pi
Comments
Post a Comment