GCFR mine kt
1. WHAT IS GCFR
> The GCFR (Global Control Framework) is an ETL framework developed by Teradata.
> The GCFR is a set of architectural standards, development methods, processing patterns and a code library that provides the basis for implementing a true time variant (History)data warehouse.
> GCFR tells how the data is loaded (tx type) and what data is loaded (tx) in the target table. (work as filter between source and target)
Purpose
The Control Framework enforces audit trail and reduces programmer effort. It provides:
-Full data and process lineage
-Ability to prove data from source to end uses
- Automated audit function
1.1 GCFR Processing Patterns
IPO (input-process-output) model:
> Each processing pattern in data warehouse adheres to the IPO model.
> Data flow is from Input to Process to Output.
>This simple model applies to all data flows within the GCFR.
INPUT -A single input to the process. The view that presents the input data set to the process or source table PROCESS - The function or program that operates against the input data set to produce a result.
OUTPUT - The result of the Process operating on the input data set.
Figure 1 - IPO Model
The processing patterns interact in a set sequence. It is possible to develop each group in isolation due to the abstraction achieved through the use of views over tables. we need to create Gcfr tx for table only and the view.
1.2 The Global Control Framework consists of the following components:
1. System:
→ All inbound and outbound systems/platforms of the data warehouse must be identified and allocated a unique identifier (Ctl Id) and name (Sys_Name) according to the agreed naming standard.
→ System is not a server, but a logical software application, like a Customer System or Accounts System. (CAP, MAUDE, GCU, File)
--> Register data source system with gcfr so that we as well as gcfr is aware of it.
--> All system info is stored in gcfr view name: GCFR_System
2. Process/transformation:
> The job to transform data, and merge data from source table into target table.
> Tells us what data we have to load data in target table (all logic and joins goes here)
> Defines transformation on source data from various sources as per requirement
> All process info is stored in gcfr view name:GCFR_Process
3. Stream:
>A Stream is a collection of processes that must be completed as a whole to complete a single data pipeline the data warehouse.
> Stream is parent box having many process or collection of dependent process that has to be run together.
> Each stream must be identified and allocated a unique identifier (Stream_Key) and unique name (Stream_Name) and each collection of dependent processes in this stream is identified with the same Stream_Key.
> the processes within 1 stream will be executed in sequence. However process different stream can be run parallel (note that we also can set the dependency between streams with Control-M).
>All stream info is stored in gcfr view name:GCFR_Stream
4. Key column:
> Target table key column, while merging data, used to identify if the record already available in target table or not.
> GCFR uses this column to determine existence of row in target from source and perform operation on it.
> All key info is stored in gcfr view name:GCFR_Transform_KeyCol
5. Business Date Definition
> The Business Date represents the date for which the stream run is happening
> All busdate info is stored in gcfr view name: GCFR_Stream_BusDate
> Full Impact analysis is very easy due to the meta data captured by the control framework.
>Each domain has only 1 bkey table but many bkey process loading into it.
>A surrogate key is any unique identifier required to be generated for the sake of representing some base object (e.g. a Customer, an Event etc).
0.0 TERMS
Corporate Entity - More than one logical, corporate or legal entity having access to and some kind of proprietary ownership of data stored in the data warehouse
System - Logical systems that send data to or receive data from the data warehouse
File - 'Files' includes files, tables or any other data format or transmission method
Process -A function that performs work on one or more input data sets into an output data set
Stream - A dependent set of processes
1.1 WHAT IS CF?
The Control Framework is a set of architectural standards, development methods, processing patterns and a code library that provides the basis for implementing a true time variant data warehouse.
1.1.1 WHAT IS GCFR?
GCFR Global Control Framework is a set of design patterns / processes and metadata repository for managing ETL.
The GCFR (Global Control Framework) is an ETL framework developed by Teradata.
It includes a set of architectural standards, development methods, processing patterns and a code library that provides the basis for implementing a TRUE TIME VRIANT data warehouse.
It also provides process control, logging and auditing capability. The framework resides mostly in database, allowing in-database ETL logic to run thus fully utilising the parallel processing power of Teradata, GCFR also has data ingestion pattern using TPT however this task is normally performed by a dedicated ETL tool.In CAA environment the data is ingested into Teradata ecosystem through RSDS team (BIH) and into URSDS using TTF (and KNIME in the future)
It is an automated framework to manage the ELT processes of the Teradata Global Reference Information Architecture to ensure accuracy and efficiency. The GCFR is used in conjunction with an ELT tool - in itself it is not a tool or a replacement for a tool. The GCFR enforces audit trail and reduces programmer effort.
a . Data Auditing Features
The GCFR process and columns provide full end-to-end lineage of audit of the data:
* The date and time of every process run is recorded for every table in the data warehouse (can know when the last process was run)
* The process which created or updated that row is recorded for every row in every table (can know which process created every row and when)Every row received from any of the source systems (including any rejected ones) can be accounted for
* The process always checks that the file being loaded is the correct one in the sequence and has not been loaded previously (can know all data landed on the data warehouse is accounted for)
* The source of the data and The target us recorded for every process (can know we have a complete audit trail no matter how many times the data is landed on the data warehouse)
b. Overview of the GCFR
* The Teradata Global Control Framework (GCFR) consists of:-
> A base data model that allows registration and control of all data flow processes, input and output, to and from the EDW
> A set of standard database structures to facilitate processing, security and abstraction concepts
> ProcessingPattern code modules that:-
- Maintain the base framework control
-Automate meta data capture from all processes
- Maintain row level time variance in the EDW
- Maintain row level process metadata in the EDW
>This provides Full Data & Process Lineage from source to sink
-At the ROW level
>The GCFR is a key support component of a true Time Variant EDW
-Maintaining row and record lifecycle
-Guaranteeing true repeatability of reporting (as is, as was)
c. Overview of the GCFR
The Control Framework has a set of standardized Processing Patterns perform the data movement actions of the ELT processes
>Implemented as Stored Procedures
> Facilitation of source data through a standard EDW life cycle Extract, load, consumed, archived
The Control Framework is independent of DI tools
> It is available for use natively in conjunction with Windows/Linux/Unix operating systems
> It can be integrated with any DI Tool that is certified to work with Teradata
The Control Framework is a significant kick start to any IDW in terms of:
>Definition and control structures
> Enforcing standards
> Implementing abstraction concepts
> Reducing programmer error dramatically
d. Processing Pattern' Concept
* What is a Processing Pattern?
> A Processing Pattern is an architectural model to simplify and control the ELT processes that govern the data lifecycle from source system to data warehouse and on to end users
> The Control Framework contains implemented processing patterns
> The Control Framework processing patterns update the operational meta data contained within the data model
> The Control Framework processing patterns perform the mechanics of applying the input data set to the target table whilst maintaining data lineage
* Why are Processing Patterns required?
> To reduce programmer error
- Complex code is written once, tested once and then re-used
> To break the work effort into smaller, simpler pieces
- Easier to develop
- Easier to generate and test
- Many aspects of development can be automated
- Many aspects of development can be isolated and shipped to alternative development locations
- Facilitates configuration management and change control on complex projects
- Supports both agile and waterfall methodologies
> Operational support becomes much simpler and therefore more cost effective
- Small number of ELT patterns need to be supported rather than a vast number of individual ELT components
e. The 'IPO' Model
The IPO model has three base components
INPUT
> A single input to the process
> The source data set for the process
PROCESS
> The function or program that operates against the input to produce a result
> The 'Processing Pattern' executable
OUTPUT
> The result of the process operating on the input
> The target data set result for the process
Flow is from Input to Process to Output
This model is used in the standard processing patterns
All processes in the Control Framework comply with this simple definition
f. GCFR Implementation Alternatives
* Data integration processes can be implemented in a variety of ways
> Toolset and technology can impact choice of implementation approach
* Data integration approaches commonly associated with data warehousing
> ELT - extract, load the data to the target platform then transform the data according to the business rules.
> ETL - extract, transform the data according to the business rules then load the data to the target platform
> Hybrid - business requirement driven choice to make best use of the toolset and technology
– ETL and ELT and ETLT and ...........
* The GCFR provides a flexible framework to allow your choice of data integration approach
g. Corporate Entity
More than one logical, corporate or legal entity having access to and some kind of proprietary ownership of data stored in the data warehouse
* Holds a record defining each corporate entity
> Surrogate key CE_ID assigned by the Data Warehouse Data Governance Steward during development
> Examples: -Holding Company -Country A -Cards Group
CE_System Table
> Relationship that defines what systems are owned or accessible to the corporate entity
> Each corporate entity can own or access many logical source data systems
> Primary key is CE_Id and CTL_Id
* These tables support an ability to recognize and isolate multiple corporate entities in a single database schema
>They allow definition to be logical
> They eliminate the need for multiple physical databases with duplicated data
> They allow data access to be restricted based on CE_Id and CTL_Id
h. Parameter
1.1.2 The benefits of using an ETL framework such as GCFR includes:
-Reduce development effort by automating common processing patterns. This also ensures consistency and accuracy for these standard ETL tasks and allowing developers to focus on business specific logic
-Provides a common interface to control and monitor processes with full auditability on process execution and also embedding this information into row level data.
- Ensure a standardise loading and transform model is applied, helping to organise and centralise ETL metadata and transformation logic to improve support and code management.
--The framework is fully utilising Teradata power by doing in-database ETL logig
--The framework can be customisable and extendable.
Terms:
Bussiness date definition : date of src data within given stream src system is expected to provide buss date as per GCFR src extract file guide
1.1.3 GCFR view and use:
1.1.4 The Global Control Framework consists of the following components:
* Process Control Data Model and standard processing columns for all cross functional model tables
* Standard code modules for ELT processes, surrogate keys and reference code data management
* Housekeeping utilities for the daily/intraday control of the ELT processes
1. System:
All inbound and outbound systems/platforms of the data warehouse must be identified and allocated a unique identifier (Ctl Id) and name (Sys Name according to the agreed naming standard. → Typically, a system is not a server, but a logical software application, like a Customer System or Accounts System. (CAP, MAUDE, GCU, File)
→ Register data source system with gcfr so that we as well as gcfr is aware of it.
All system info is stored in gcfr view name: GCFR System
2. Process /transformation:
>The job to transform data, and merge data from source table into target table.
> Tells us what data we have to load data in target table (all logic and joins goes here)
> Defines transformation on source data from various sources as per requirement
> All process info is stored in gcfr view name:GCFR Process
3. Stream:
I
>A Stream is a collection of processes that must be completed as a whole to complete a single data pipeline the data warehouse.
> Stream is parent box having many process or collection of dependent process that has to be run together.
> Each stream must be identified and allocated a unique identifier (Stream_Key) and unique name (Stream_Name) and each collection of dependent processes in this stream is identified with the same Strea
> the processes within 1 stream will be executed in sequence. However process in different stream can be run parallel (note that we also can set the dependency between streams with Control-M). > All stream info is stored in gcfr view name:GCFR Stream
4. Key column:
> Target table key column, while merging data, used to identify if the record already available in target table or not.
> GCFR uses this column to determine existence of row in target from source and perform operation on it.
> All key info is stored in gcfr view name:GCFR Transform KeyCol
Control Framework Components:
Data Model that holds processing metadata
>When, what program, what source, what target, update metrics
>Static metadata, eg System, Files, Paths
>Operational metadata collected by the Standard CTLFW code at run time, e.g. metrics…
1.1.5 What happen when we run a single bussdate run of stream
Business Date
In GCFR, each stream has a Business Date. This Business Date has no dependency on the 'real' date: it is an internal GCFR value. All the processes within the stream run for that Business Date and are expected to complete successfully before the Business Date can be rolled forward.
This allows GCFR to load historic data, load multiple-dates' data on one day, continue running over midnight etc'.
Control Framework Control Patterns:
•Start of Day >Opens the business date for a given stream of ELT
•Start of Stream >Opens the stream instance within a business date for a specific ELT batch
•End of Stream >Completes the stream instance
•End of Day >Completes the business date for a given stream
Business Date Start
>This pattern is used to set the current, previous and next business dates for the relevant stream.
> Processing for the previous business date must be completed otherwise this process will fail to ensure that we do not load wrong data. > All streams must be started via the Start of Day process.
> ELT processes cannot proceed until the stream has been started.
> If the process fails, the stored procedure will send a Return Code of 1 indicating the process has failed.
Some common errors are as following:
--Prev stream not completed
--Stream does not exist
--Business Date is already started
show procedure PRD_GCFR_P_API CP.GCFR CP StreamBus Date Start
1. Validate Stream Registration
2. Validate Stream execution i.e. check if this stream business date is already active
3. Update GCFR Stream BusDate table and increment date and set Processing_Flag to indicate processing is ready for specified Stream_Key
Stream Start
>This pattern is used to instantiate a Stream within the current business date.
>Only one instance of any given Stream_Key can be active at any one time.
> If the process fails the stored procedure will send a Return Code of 1 to indicating the process has failed.
Some common errors are as following:
-- previous instance of this Stream is in an incomplete state as the End of Stream process has not been run Check the Processing_Flag for all instances of this Stream_Key in the GCFR Stream Id table
--Stream does not exist
--Business Date for this Stream has not been started
--Cycle Start Time Stamp is not later than that for the previous Stream.(Intraday only)
--Stream has already been started
show procedure PRD_GCFR_P_API_CP.GCFR_CP_Stream_Start
1. Check for Stream_Key in GCFR_Stream.
2. Check if Stream Business Date is not started i.e. GCFR_Stream_BusDate.Processing_Flag = 1 for the given Stream_Key
3. Check if Stream instance is already active i.e. GCFR_Stream_Id. Processing_Flag = 0 for the given Stream_Key
4. Check if GCFR_Stream_Id. Business_Date <> CAST(iBusiness_Date_Cycle_Start_Ts AS FORMAT'YYYY-MM-DD') for the given Stream_Key
5. Check if GCFR_Stream_Id. Processing_Flag = 1 AND GCFR_Stream_Id. Business_Date_Cycle_Start_Ts >= 1 AND GCFR_Stream_Id. Business_Date_Cycle_Start_Ts >= iBusiness_Date_Cycle_Start_Ts for the given Stream_Key
6. Update GCFR_Stream_Id table and and set Processing_Flag to indicate processing is ready for specified Stream_Key
GCFR Process runs
> Based on your TX definition, gcfr loads the target table.
> Run bkey, bmap, tx
>GCFR transform table process will set to 99-if completed successfully
Stream End
> This pattern is used to close an instantiated/open Stream within the current business date.
> Only one instance of any given Stream_Key can be active at any one time.
> If the process fails the stored procedure will send a Return Code of 1 to indicating the process has failed.
Check the GCFR Execution Log and GCFR Error Log tables for details relating to the processand instance causing the error.
Some common errors are as follows:
-A Process instance of this Stream is in an incomplete state
Check for Process_Instances with an activeProcess State(i.e. not 99) for all instances of this Stream_Key
-Stream does not exist
-Stream has already ended
show procedure PRD_GCFR PAPI CP.GCFR CP StreamBusDate End
1. Check for Stream_Key in GCFR Stream.
2. Getting active processes count associative with current stream from GCFR_Process_Id.
3. Update GCFR_Stream_Id table and and set Processing_Flag to indicate processing is ready for specified Stream Key.
Business Date End
>This pattern is used to close the current Business Date for the given Stream.
>This pattern does NOT check for active stream instances as the batch schedule should ensure all processing is complete prior to closing the current business date.
>This pattern additionally sets the status of all source system extracts for the given Stream and Business Date to consumed.
>This allows automated management of consumed source system extracts (e.g. such as moving to an archive directory for archiving).
> If the process fails the stored procedure will send a Return Code of 1 to indicating the process has failed.
Some common errors are as follows:
-Stream does not exist
-Business Date for this Stream is already closed
show procedure PRD_GCFR_P_API_CP.GCFR_CP_StreamBusDate_End
1. Stream Validation
2. Validate active stream instance for the given stream_key in GCFR_Stream_Id
3. Update GCFR_Stream_Bus Date to set stream business date end
4. End stream business date for the given stream_key
5. Set all extracts for this Stream Instance to consumed
1.2 Detail concept
1.2.1 STREAM Concept
•A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified by a record in this table.
•A Stream is a unit of schedulable work. Once completed a Stream may not be re-run without intervention to revert output data and CTLFW metadata to the pre-run state.
•There is no limit to the number of times a Stream may be run on any given date. Only one instance of any Stream is allowed to be run at a time.
A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified with the same Stream_Key.
SNBC Implementation
* Stream Key is running number starting from 1
* For data loading processes into Intermediate Layer
* Stream is grouped by source system and by cycle frequency e.g. Stream 1 is for daily process to load data into Intermediate Layer from every sub system from CAP
Dependent process in a stream
1.2.1 GCFR System
All inbound and outbound systems of the data warehouse must be identified and allocated a unique identifier (Ctl_Id) and name (System_Name) according to the agreed naming standard.
All source and target systems are registered in GCFR_System. This is a static reference table. A logical system is any source or destination of data in the organization. The data warehouse itself is a system and may have many logical systems contained within the physical data warehouse environment.
ctlid - system id of src/target system as per std. unique in this domain n the pk of this table
SNBC Implementation
-- Ctl_Id is running number starting from 1
--System Name is short abbreviation of inbound and outbound system name
-- Path Name will be used only for load and export processes. SNBC reads data from RSDS so GCFR load process is not implemented.
update_ts - ts when record was created /updated
GCFR SYSTEM FILE LOG - hist of update to this table are captured here. log table has same col to master table but pk includes update ts.
1.2.2 Cycle Frequency is Daily
This table defines each processing stream within the data warehouse. A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified by a record in this table.
A Stream is a unit of schedulable work. Once completed a Stream may not be re-run without intervention to revert output data and CTLFW metadata to the pre-run state. There is no limit to the number of times a Stream may be run on any given date. Only one instance of any Stream is allowed to be run at a time.
update ts
GCFR_STREAM_BUSDATE_LOG
1.3 BKEY Process
BKEY process is GCFR standard process to generate surrogate key from source natural key value.
SNBC Implementation
* Every surrogate key in ADS Intermediate Layer model will be generated by BKEY process e.g. CUST_ID, ACCT_ID etc.
* Technical Implementation:
* Each key will have dedicated BKEY table.
* Creation of BKEY table will be handled as a part of the BKEY registration process using standard GCFR registration Macro.
BKEY Naming Standard and Example
GCFR provides a method for generating surrogate keys, called BKEY. It provides a robust, efficient and standard way to create a surrogate key for any possible combination of natural keys. It ensures that the same natural key is always associated with the same surrogate keys. GCFR keeps that natural-to-surrogate key relationships in BKEY tables, which allow us to generate surrogate keys and, later, join to the BKEY tables to obtain the natural keys.
BKEY generates a single INTEGER surrogate from a single value. So when the natural key is comprised from several column, we concatenate these column into a single string value using pipe delimiters.
1.4 GCFR BMAP Process
BMAP is the GCFR standard process (and data model) for managing reference data. The BMAP process will generate EDW reference code for undefined reference code from source and store it in the reference-data tables.
SNBC Implementation
* GCFR BMAP is used to manage reference code and description in SNBC and mapping from source system's code to SNBC standard code.
* GCFR BMAP process is used to generate unknown reference codes for new reference data in SNBC.
Technical Implementation:
- Standard GCFR BMAP data model will be used as-is.
- All reference code sets will be stored in the same BMAP Standard Map table.
- Creation of BMAP tables is a part of the GCFR standard installation.
GCFR supports maintenance of Reference data. GCFR expects to be provided with code lists (e. g. list of countries) and their meaning. GCFR BMAP can maintain multiple lists for the same domain (this is usually used allow two systems to provide data using their own sets of codes) and tan provide output in different formats, structure or language. In this sense, BMAP provides a many-to- many Reference Data service.
1.4 GCFR Transform Process
All ELT processes that are controlled by the job schedule for the data warehouse are registered in GCFR_Process. This includes any extracts from source but not those extracts that are managed under a source system job schedule. Also included are loads to the operational image, transformations to the core PDM, further transformations to the semantic layer and any extracts from the data warehouse regardless of the source tier of the data.
A process is a single code module; it performs a single specific piece of work and processes to a single data warehouse table or extract file. This is a static reference table.
The GCFR Transform Process is used to transform data within data warehouse. GCFR supports three main transform patterns:
1. Full Transform
2. Delta Transform
3. Transaction Transform
SNBC Implementation
* SNBC uses GCFR full transform processing pattern to load data from RSDS to Intermediate Layer (refer to Selection of GCFR Transform Patten for IL Data Loading Processes).
* Technical Implementation:
- Transform input view: Developed manually from mapping specification.
- Transform Process: Standard GCFR process where two registrations are required for each process-which are Register of Process information and Transform Key Column for the process.
- Transform output view: One-to-One view to target table.
update ts
GCFR_CP_StreamBusDate_Start
Purpose:
Rolls over the previous, current and next business date for the Stream according to the Stream's cycle frequency. Sets the status of the Stream to 'ready for current date processing to commence'. This API first validates the input Stream_Key and checks that the Stream is NOT already started.
GCFR_CP_Stream SpecBD_Start
Purpose:
This CP is used to start a special business date, i.e., any business date which lies in the past or future from the current date, in order to allow data processing for any custom business date. This API first validates the input Stream_Key and checks that the Stream is NOT already started.
GCFR_CP_Stream_Start
Purpose: Instantiate a Stream Instance for the input Stream_Key.
This API first validates the input Stream Key, checks that the Stream is started and there are no active Stream Instances. If the starting timestamp is provided as input it checks to ensure timestamp is later than the timestamp of the most recent Stream Instance for the Stream_Key. Note if the starting timestamp is not provided to the Start of Stream control pattern then the first Process to execute within the Stream will provide th starting timestamp for the Stream,
GCFR_CP_Stream_End
Purpose: Completes the current runtime Stream Instance for the input Stream_Key, This API first validates the input Stream_Key, checks that the Stream is started and there are no active Process Instances still running for the Stream Instance. It then sets the Stream Instance's Processing Flag to 1 (completed).
GCFR_CP_StreamBusDate_End
Purpose: This API first validates the input Stream_Key, checks that there are no Stream Instances active and checks the Stream is actually running. It sets the Stream's Processing Flag to 1 (completed). It sets the Load Status of the Stream's files to 'Consumed (i.e. 2)'.
GCFR Process_Log
History of updates to this table are captured here. The _Log table has identical columns to the corresponding master table however the primary key includes the Update_Ts.
GCFR_Transform_KeyCol
The primary key columns of a target table are registered in GCFR_Transform_KeyCol. This is a static reference table.
GCFR_Stream_BusDate
This is a control table that associates a Stream with the current, previous and next business date when the Stream can be processed. It represents the processing state for a Stream. Every Stream has a current record in this table. It is used to ensure that any dependent processing from a previous day is completed before this day's processing commences.
GCFR_Stream_BusDate_Log
History of updates to this table are captured here. The _Log table has identical columns to the corresponding master table however the primary key also includes the Business_Date and the Processing_Flag
GCFR Process_Type
Every process has a type. This table is a minor entity that provides descriptive information about the Process_Type. This is a static reference table. The addition of this minor entity allows new process types to be easily added. Client specific processes can be added with a specific client defined process type. It is recommended that client specific process types commence numbering from 100 leaving numbers less than 100 for non-client specific processes.
1.5 Processing Columns
>The following standard processing columns are added to every physical permanent table in the data warehouse for hist or time variant(track time n date).
>They are managed by the control framework standard code modules that perform the ELT processes,
>They can be used by ELT developers and end users alike for analysis, audit and reporting.
>Their intent is to provide full support for data lineage and detailed audit of the EDW data.
>All processes that move the data are recorded and the data can be traced back to source.
>The processing columns make all tables true time variant.
>Every state of permanent rows in the data warehouse is recorded. Rows are not updated or physically deleted (unless by the archive process).
>An update causes a row to be closed and a new instance of the row is opened to reflect current state.
>A delete causes a row simply to be closed with a logical status of deleted.
Full Impact analysis is very easy due to the meta data captured by the control framework.
Hold the audit information at the data warehouse table row level
* Permits full data and process lineage from row to source data set Allows verification of who did what and when as can link back to the Control Framework database tables to verify data lineage
* Supports Multi Entity tagging at the "logical System" level and thus allows multi entity identification with NO PROGRAMMING
--The columns exist so that a process will never update target data when an update occurs
› Only ever closes a row and adds a new version
* The columns link back to the Control Framework data model to reveal details of the processing
* The processing columns are applied to EVERY target table in the data warehouse to ensure full data and process lineage
Process Control Data Model Detail
a. Log tables - The data model contains two types of tables:
Data - > Static or runtime process control metadata
Audit- > Log or audit tables that record audit, execution and error messages
Log tables
* Every data table has an associated _Log table
Every time a row in a data table is updated the history of the update is recorded (via a trigger) in the associated _Log table
* The processing columns are managed by the Processing Patterns
* The intent of the processing columns is to provide full support for data lineage and detailed audit of the data warehouse
The processing columns result in a true time variant data warehouse
> The data component of a row is NEVER updated
>Theexisting row is closed and a new row is inserted
> Start and End date/time fields in the processing columns give a complete historical view of the record lifecycle
* The processing columns allow full impact analysis to be conducted
- For example:
—What source data set provided this data?
- When was this data loaded/transformed?
Processing columns are added to every physical table in the data warehouse ,
-To support the true time variant data warehouse
-To provide audit capability
-To enable full data lineage from source data set to business use
-To enable full impact analysis
-Processing columns are maintained by the actions of the processing pattern that does work on the data set
Benefits of processing columns:
They are managed by the control framework standard code modules that perform the ELT processes.
* They can be used by ELT developers and end users alike for analysis, audit and reporting.
* Their intent is to provide full support for data lineage and detailed audit of the EDW data.
* All processes that move the data are recorded and the data can be traced back to source.
The processing columns make all tables true time variant. Every state of permanent rows in the data warehouse is recorded. Rows are not updated or physically deleted (unless by the archive process). An update causes a row to be closed and a new instance of the row is opened to reflect current state. A delete causes a row simply to be closed with a logical status of deleted.
* Full Impact analysis is very easy due to the meta data captured by the control framework.
Start Date
>The business date this row was written into the data warehouse table.
>Should not be confused with the status of the record. Many records will have a logical start date, open date or similar. For example, an account may have an account open date. The account open date would be stored as a column in the physical data model and would NOT be used to populate the processing column Start Date.
End Date
>The business date this row ceased to be the current record of the data.
>This is due to either a change in the value of a column in the row (an update) or that this row is no longer present at source (a delete).
>This is the only time a row is update in situ. No columnis are changed in the row apart from the End_Date.
>End Date equal to high-date (i.e. 9999-12-31) indicates this is the current row.
>Only one row for any given table primary key set can be the current row.
>Update Process_Name and Update_Process_Id must be populated when End_Date is populated.
Record_Deleted_Flag
Default value: 0 = not deleted
Other values:
0 = not deleted
1 = deleted
>No rows are physical deleted from data warehouse table, unless it is as part of the archiving process.
>A row is shown as being deleted at source by either being absent from the source feed or by a source record deleted flag being set.
>The current row is closed by setting the End_Date. A new row written with new Start Date and this flag set to 1. Thus the current row is deleted.
>A row can move in and out of deleted status.
CTL_ID
>The process that creates the row in the table will set this from the Control Framework data.
>This is a horizontal domain in a table. It means that row ownership cannot be shared across source systems.
Process Name
The name of the process that created this row.
Process_Id
The identifier of the instance of the process that created the row.
This can be used to find the process within the Control Tables,
Update Process_Name
The name of the process that closed this row (i.e. the process that populated the End Date).
null if not updated row
Update Process_Id
The identifier of the instance of the process the closed this row.
null if not updated row
Start Ts
An Intraday data feed must provide a timestamp either at the record level or the file level. This becomes the Business_Date_Cycle_Start_Ts stored during the registration process for the source data feed. The Business_Date_Cycle_Start_Ts is used to set the Start Ts of a row being opened.
If an Intraday data feed cannot provide a timestamp then it cannot be intraday.
End Ts
The End Ts of any row being closed is the Start Ts of the new row less one smallest unit of time defined. It is calculated using the Business_Date_Cycle_Start_Ts of the new row.
1.6 Stream Identification
A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified with the same Stream_Key.
Stream Cycle Frequency Codes -
values and functionality
0 - Intra Day
1 - Daily
7 - Weekly
14- Fortnightly
* Once a record have been populated, There always a record's End Date is '9999-12-31'.
* If the data was missing in source table, The delete flag is set to 1 for that period.
* The start date and end date means the validate period of the row content.
* This mode will compare the the existing data with the target data to find our the changed one. We need to provide whole data of the source and GCFR process would scan all of them and all of the target table.
EXAMPLE
(MISSING IS NOT MARKED AS DELETED - DEL IF SRC MARKED THE FLAG DEL)
The same as FULL transformation.
* The difference between FULL transformation is you need a column called <GCFR_DELTA_ACTION_CODE> which contains 0 New/1 Updated/2 Deleted for each row.
In this mode we need <GCFR_DELTA_ACTION_CODE> normally it comes from our data source and only provide the changed data. Therefore it would save a lot of time and resource. If we want to use this mode,please have a interface agreement with the source data provider.
Unlike FULL/DELTA transformation. Transaction transformation don't capture the attribute change and merge it into Target.
* Transaction transformation only Insert/Append to target table if new key column/drop have been detected in source table.
Descript how GCFR FULL/DELTA/TRANSACTION transformation Works, and how to select a transformation mode.
--> transactional -
Unlike FULL/DELTA transformation. Transaction transformation don't capture the attribute change and merge it into Target.Transaction transformation only Insert/Append to target table if new key column/drop have been detected in source table.
-->full -
Once a record have been populated. There always a record's End Date is '9999-12-31'.
If the data was missing in source table. The delete flag is set to 1 for that period.
The start date and end date means the validate period of the row content.
This mode will compare the the existing data with the target data to find our the changed one.We need to provide whole data of the source and GCFR process would scan all of them and all of the target table.
-->delta-
The same as FULL transformation.
The difference between FULL transformation is you need a column called <GCFR_DELTA_ACTION_CODE> which contains 0 New/1 Updated/2 Deleted for each row.
In this mode we need <GCFR_DELTA_ACTION_CODE> normally it comes from our data source and only provide the changed data. Therefore it would save a lot of time and resource. If we want to use this mode please have a interface agreement with the source data provider.
1.7 How GCFR make this happen:
Suppose we Will pupulate datas into TargetTable In Container TargetDB using Process TX02_Transform_01 and the transformation view also called TX02 Transform_01. The key columns defined in GCFR is PK COL and target table contains attributes ATTRN.
There are three Stage for GCFR to populate data into Target table.
1. Populate into IMG table.
2. Populate into INS table.
3. Populate into Target Table.
How GCFR make this happen:
Suppose we Will pupulate datas into TargetTable In Container TargetDB using Process TX02_Transform_01 and the transformation view also called TX02_Transform_01.
The key columns defined in GCFR is PK_COL and target table contains attributes ATTRN.
There are three Stage for GCFR to populate data into Target table,
1. Populate into IMG table,
2. Populate into INS table.
3. Populate into Target Table.
IMG Stage:
1. GCFR will CREATE IMG table if it does not exists, Or if TargetTable have been modified, GCFR will also DROP/CREATE the IMG table.
2. GCFR will populate results of Transformation View into IMG table.
3. GCFR will populate value of control columns on IMG table.
Full Transformation:
Note: Verification_Flag doesn't works for FULL transformation.
Delta Transformation:
Note: GCFR will set <Record_Deleted_Flag> to 1 when GCFR_Delta_Action_Code is 2.
The Insert/Update row will treated the same with 0. This step will be skipped if the process's Verification_Flag is set to 0.
Transaction Transformation:
Note: This step will be skipped if the process's Verification_Flag is set to 0.
INS Stage:
1. Prepare Data need to populated into TargetTable.
2. Tansaction Transformation only detects the data rows which the PK_COL are new.
3. Delta Transformation only expired data rows which GCFR_Delta_Action_Code is 2.
Note: There are difference between FULL/DELTA mode, from source we can tell when detect whether the attribute have changed, Full mode considered the NULL value, Delta didn't. This need to prove if delta mode could handle NULL value porperly.
Transaction Transformation:
Note: If the process's Verification_Flag is set to 0, GCFR will populate the result of Transformation View into INS table without checking existence of the boy column in torgat table
Merge Stage:
1. Populate data rows into TargetTable.
2. For FULL/DELTA transaction, Expire original data rows have been updated/Deleted by set End_Date to business date -1.
Full Transformation: Merge to Target(FULL)
Delta Transformation: Merge to Target(Delta)
Transaction Transformation:
Note: The truncate only happens when Truncate_Target is set to 1.
And Truncate Target Flag only works for Transaction Transformation.
1.8 How
Streambusdate -- streamstart--pr1 -prs2 -pr3-----streanend ----streanenddate
gcfr call vm, n populalate table
Stream is parent box having many process(table to load)
The processing patterns interact in a set sequence. It is possible to develop each group in isolation due to the abstraction achieved through the use of views over The interaction sequence is as follows:
tables
1. Acquisition Layer (Tier 1) Load (Load of source data to staging)
2. BKEY and BMAP
3. Transformations (Acquisition Layer Staging to Acquisition Layer Source Image or Integrated Data Layer (Tier 2), Tier 2 to Tier 2 or Access Layer (Tier 3))
4. Export - The process of development for each processing pattern is the same.
Step 1: INPUT - Create tables and/or views to present the input data set to the process. The input data set is presented to the ELT process that will create/update the target data set.
Step 2: PROCESS and OUTPUT
Step 3: PROCESS and OUTPUT
identify Job Parameters for GCFR Perl Script
Build and test the job by calling GCFR Perl Job Ids
Upon completion of development, an implementation package is built for deployment and incorporation in the ELT Batch Schedule.
1.9 Error handle
like prev stream not close
select * from PRD_GCFR_V.GCFR_Stream_BusDate_Log where stream_key = 5 and business_date >= DATE 2021-08-30'
select * from PRD_GCFR_V.GCFR_Stream_BusDate where stream_key =
select * from PRD_GCFR_V.GCFR_Stream_Id_Log where stream_key = 5 and business_date >= DATE 2021-08-30'
We can manually delete this rows (row 21), to reset the log back to what it was in the past
delete PRD_GCFR_V.GCFR_Stream_Id_Log where Business_date = DATE 2021-08-30' and stream_id = 2190
select * from PRD_GCFR_V.GCFR_Stream_Id where stream_key = 5
update PRD_GCFR_V.GCFR_Stream_Id
set stream_id = 2189
business date = DATE 2021-09-06'
next business date = DATE 2021-09-07'
, prev_business_date = DATE 2021-09-05"
business_date_cycle_Start_ts= TIMESTAMP 2021-09-06 00:00:00.000000"
processing_flag = 1
where
stream_key = 5
Unlike FULL/DELTA transformation. Transaction transformation don't capture the attribute change and merge it into Target.Transaction transformation only Insert/Append to target table if new key column/drop have been detected in source table.
Once a record have been populated. There always a record's End Date is '9999-12-31'.
If the data was missing in source table. The delete flag is set to 1 for that period.
The start date and end date means the validate period of the row content.
This mode will compare the the existing data with the target data to find our the changed one.We need to provide whole data of the source and GCFR process would scan all of them and all of the target table.
The same as FULL transformation.
The difference between FULL transformation is you need a column called <GCFR_DELTA_ACTION_CODE> which contains 0 New/1 Updated/2 Deleted for each row.
Suppose we Will pupulate datas into TargetTable In Container TargetDB using Process TX02_Transform_01 and the transformation view also called TX02 Transform_01. The key columns defined in GCFR is PK COL and target table contains attributes ATTRN.
There are three Stage for GCFR to populate data into Target table.
1. Populate into IMG table.
2. Populate into INS table.
3. Populate into Target Table.
2. GCFR will populate results of Transformation View into IMG table.
3. GCFR will populate value of control columns on IMG table.
Note: Verification_Flag doesn't works for FULL transformation.
Note: GCFR will set <Record_Deleted_Flag> to 1 when GCFR_Delta_Action_Code is 2.
The Insert/Update row will treated the same with 0. This step will be skipped if the process's Verification_Flag is set to 0.
Note: This step will be skipped if the process's Verification_Flag is set to 0.
1. Prepare Data need to populated into TargetTable.
2. Tansaction Transformation only detects the data rows which the PK_COL are new.
3. Delta Transformation only expired data rows which GCFR_Delta_Action_Code is 2.
Note: If the process's Verification_Flag is set to 0, GCFR will populate the result of Transformation View into INS table without checking existence of the boy column in torgat table
1. Populate data rows into TargetTable.
2. For FULL/DELTA transaction, Expire original data rows have been updated/Deleted by set End_Date to business date -1.
Full Transformation: Merge to Target(FULL)
Delta Transformation: Merge to Target(Delta)
Transaction Transformation:
Note: The truncate only happens when Truncate_Target is set to 1.
And Truncate Target Flag only works for Transaction Transformation.
Streambusdate -- streamstart--pr1 -prs2 -pr3-----streanend ----streanenddate
gcfr call vm, n populalate table
Stream is parent box having many process(table to load)
tables
1. Acquisition Layer (Tier 1) Load (Load of source data to staging)
2. BKEY and BMAP
3. Transformations (Acquisition Layer Staging to Acquisition Layer Source Image or Integrated Data Layer (Tier 2), Tier 2 to Tier 2 or Access Layer (Tier 3))
4. Export - The process of development for each processing pattern is the same.
Step 2: PROCESS and OUTPUT
Step 3: PROCESS and OUTPUT
identify Job Parameters for GCFR Perl Script
Build and test the job by calling GCFR Perl Job Ids
Upon completion of development, an implementation package is built for deployment and incorporation in the ELT Batch Schedule.
like prev stream not close
set stream_id = 2189
business date = DATE 2021-09-06'
next business date = DATE 2021-09-07'
, prev_business_date = DATE 2021-09-05"
business_date_cycle_Start_ts= TIMESTAMP 2021-09-06 00:00:00.000000"
processing_flag = 1
where
stream_key = 5
2.0 BKEY
we give src keys concat to bk process n in table v get edw key as sk.
Conceptual Overview
A surrogate key is any unique identifier required to be generated for the sake of representing some base object (e.g. a Customer, an Event etc). Typically in a Logical Data Model (LDM) the primary key of many of the base entities is a generated Surrogate Key (e.g. Party_ld). The intention is to create a binary relationship (hence the term BKEY) between one logical data key (one or many input columns) from source to one single identifying column (the surrogate key) of data in a physical table in the EDW.
The requirement for surrogate keys is normally identified during the modelling, mapping and data profiling processes of the data warehouse development.
Why are Surrogate Keys needed?
2.0.1 Domain Crossover
Surrogate keys are needed to solve the problem of domain crossover from disparate sources. Simply stated, this means two source systems may use the same physical value for identification but identify different objects. For example
An organisation may have two separate customer systems. Each system has a customer number.
Both systems start the customer number from 1 traversing to n.
The customer identified by customer number value 1 in the first system is not the same as the customer identified by customer number value 1 in the second system.
2.1.1 No Single Source Key
Surrogate keys may be needed because no single key exists and the collection of columns that makes the itern unique is too great for ease of use.
For example -
Customers may have names in many parts (e.g, title, first name, last name, known as name), date of birth and location of birth recorded in the source system.
Neither of these alone may is enough to guarantee unique identification, The combination that gives unique identification is difficult to manage consistently across programs and functionality in the data warehouse (concatenation, purging of spaces, and management of unexpected characters).
Abstraction
An advantage of surrogate keys use is the benefit of abstraction. Surrogate keys shield the impact to data warehouse tables should there be changes to the source key fields in source system. Using the surrogate key architecture the impact is limited to surrogate key table and key generation process.
BENEFIT:
>Stability
Surrogate keys are independent of the actual data values in the table, so they are not subject to changes in business RULE OVER TIME . For example, if a department name changes, using a surrogate key avoids having to update all foreign keys referencing that department.
>Uniqueness
Surrogate keys are guaranteed to be unique for each row, enforcing data integrity. They provide a simple, system-generated, business-agnostic column to uniquely identify each row.
>Flexibility
Surrogate keys allow rows to be uniquely identified even if the natural key changes. This is important when tracking historical changes, such as in slowly changing dimensions in data warehouses.
>Performance
Surrogate keys, usually integers, are smaller and processed faster than natural keys, especially if the natural key is a composite of multiple columns. Joins on integer keys are faster than joins on date or character keys.
>Handling Special Values
Specific integer surrogate key values can be used to represent special cases like "not applicable" or "date to be determined", which may not be I possible with the natural key. Simplicity Using a single surrogate key column is simpler than using a composite natural key made up of multiple columns. This simplifies queries, reduces storage, and makes the data model easier to understand.
BKEY Data Model
The BKEY Data Model used in the Control Framework holds the mapping between the logical data key from source and the surrogate key generated by the framework module,
The BKEY Data Model is intended to be implemented 'as-is' when using the Control Framework. However, it is anticipated that each customer who implements the Control Framework may have further needs and requirements that are not met by the standard Control Framework. This data model can be used as a basis for customization and extension to the Control Framework. Any customizations to the data model must be reflected in the physical data model, code modules, scripts and macros.
The model consists of four main tables:
BKEY_<Standard_Key> - Mapping of source key to data warehouse surrogate key
BKEY_<Standard_Key> Next_Id - Next available EDW key for a key set
BKEY_Key Set - metadata describing each instance of the BKEY
BKEY_Domain - metadata describing each instance of the identified domains within each Key Set
BKEY_Key Set
All surrogate key sets are registered in BKEY_Key_Set. This is a static reference table.
BKEY Key Set Log
History of updates to this table are captured here. The Log table has identical columns to the corresponding master table however the primary key includes the Update_Ts.
BKEY_Domain
A Key Set may contain zero or many domains. This is a static lookup table and provides a description for every identified Dornain and links that Domain to the relevant Key Set.
BKEY_Domain_Log
hist of table.
BKEY <Standard_Key>
Specific instances of this table will hold all the surrogate keys generated for a specific key set. This table links the surrogate key (EDW_Key) to the actual source data field(s) (Source Key). The registration process for the key set will create the specific instance of the BKEY_Standard_Key> table. 'BKEY_Standard_Key>' is replaced with the name identified in the key set table.
This table contains the mapping between source key value(s) and the data warehouse surrogate key value. One of these tables should exist for every Key Set Id. A suggested naming convention for this construct is to rename each BKEY_<Standard Key> table as follows. Table Name: BKEYnnnnn_xxxxxxx.... Key_Set_Id and xxxxx = code name assigned as per the agreed Naming Standard D E.g. BKEY0001_Party_ld
2.1 BMAP
sel * from prd_gcfr_v_ultfw.bmap_standard_map
There is only 1 bmap for many satelyt
Conceptual Overview
The reference data code management component of the Control Framework (BMAP) consists of the following.
DA data model that represents the meta data required to manage the source reference data code sets, changes to them over time and the binary relationship between the source code and the code assigned by the data warehouse.
A special transformation module that inspects incoming source ELT data streams for missing reference data and builds temporary references to allow the ELT to proceed without compromising referential integrity.
A series of special load modules that initially load and update the data in the static data in the BMAP data model.
2.1.1 Why is Reference Data Code Management needed?
Reference data flows to the data warehouse from a variety of sources, most of which are usually independent of each other.
Formal source data feed of reference data codes, values and descriptions that may or may not be part of a wider Enterprise Data Management and Governance process.
Adhoc source data feed of reference data codes, values and descriptions.
Manual data notification from an individual or system.
Informal notification where the reference data code is included in a row of a source data feed.
As part of a technology update where reference information is part of a tool or 3rd party product.
This flow of reference data presents a variety of challenges when developing a data warehouse.
The same item of reference data can have different representations in the different source systems.
The source systems may also use different values for the same domain or have divergent domains with the similar and different values.
The reference data item does not remain static in the source system and can be re-used in the source system and given a different meaning. Reference data updates are not available to the data warehouse in a timely fashion.
The user's view of that item of reference data may require multiple representations in the semantic layer.
O Referential integrity with the data warehouse may be compromised if the source reference data code has not been loaded.
gcfr t_utlfw.bmap_standard_map
2.2 Statements
A. START OF BUSSDATE
This pattern is used to set the current, previous and next business dates for the relevant stream. Processing for the previous business date must be completed otherwise this process will fail. All streams must be started via the Start of Day process. ELT processes cannot proceed until the stream has been started.
Increments the Business Date for the Stream to the next date ready for processing.
proc - GCFR_CP_StreamBusDate_Start
B. Start of Stream
This pattern is used to instantiate a Stream within the current business date. Only one instance of
any given Stream_Key can be active at any one time. All streams must be started first via the Start of Day process. ELT processes cannot proceed until the stream has been started and instantiated.
PROC - GCFR_CP_Stream_Start
C. BKEY (Surrogate Key Transform)
Once source data has been loaded to a staging table it is possible to run the BKEY (surrogate key management) transforms that relate to that source data.
The Process Type allocated for this pattern is 21.
proc - GCFR_PP_BKEY
D. BMAP (Reference Code Value Transform)
Once source data has been loaded to a staging table it is possible to run the BMAP (reference code value) transforms that relate to the source data. This pattern detects unexpected source reference code values and assigns the unexpected value an interim data warehouse code value.
The Process Type allocated for this pattern is 22.
proc - GCFR_PP_BMAP
E. Target Table Transformation
Target table transformations apply the source data, BKEY surrogate keys and BMAP EDW codes to the target tables. A target table transformation may be a full transformation, a delta transformation or a transaction transformation. The transformation type is dependent upon the source data supplied. Transformations may take the form of:
•
-Acquisition layer Staging table to Acquisition layer Source Image table
-Acquisition layer Staging or Source Image table to Integrated Data layer Target table
-Integrated Data layer Target table to Integrated Data layer Summary/Common/Aggregation table
-Integrated Data layer Target/Summary/Common/Aggregation table to Access layer physical data mart table
E.1
Development Option 1 (with no associated processing pattern)
-Preparation of Data Set is considered as a separated process
-The preparation process must be finished successfully before starting the GCFR Transform process
Development option 2:
-Preparation of Data Set is considered as a part of process
-The preparation process start after gcfr process initiate is finished
E.2
GCFR Transformation Processing Pattern
Following are key points to consider for Transform.
-Input data is normally from the staging tables
-A staging table can contain either
O A full dump of current state source data
OChanges to current state source data (delta)
OTransactions (events)
--The transformation process pattem compares the incoming source data set specified in the transformation view with the target data set
--Target rows that are to be updated will be end date/time stamped
O Valid only for full master and delta transforms
--Target rows that are to be deleted as they are no longer present in the source data will be flagged as deleted
OValid only for full master transforms or flagged delta
--New state rows will be added as open rows
--A transform process can only logically delete (i.e. end date and close) rows belonging to the same process


F. END OF STREAM
Close open stream with curr buss date.only 1 stream can be active at a tym.
proc- GCFR_CP_Stream_End

g . End of Business DateThis pattern is used to close the current Business Date for the given Stream. This pattern does NOT check for active stream instances as the batch schedule should ensure all processing is complete prior to closing the current business date. This pattern additionally sets the status of all source system extracts for the given Stream and Business Date to consumed. This allows automated management of consumed source system extracts (e.g. such as moving to an archive directory for archiving).
Close open stream with curr buss date.only 1 stream can be active at a tym.
proc- GCFR_CP_Stream_End
Registration:
1. Register stream
--initiate bus date to 26
CALL dev6_GCFR_P_API_UT.GCFR UT Register_Stream (243,1,T02_BRKR_TPMI_01',Cast(26/09/2014' AS DATE FORMAT 'DD/MM/YYYY'));
SEL FROM prd_racoe_ass_gdr_v.gdr_stream_busdate WHERE stream_key=999;
SEL FROM prd_racoe_ass_gdr_v.gdr_stream WHERE stream_key=999;
SEL FROM prd_racoe_ass_gdr_v.gdfr_stream_id WHERE stream_key=999;
2.Register process
EXEC dev6_GCFR_M.GCFR_Register Process(TX02_H_BRKR_01, Transform process to load TPMI in H_BRKR in IL layer',25,15,243, 'dev6_GCFR_INP_V,TX02_H_BRKR_01', 'dev6_ADS_IL_VM','VM_H_BRKR', 'dev6_ADS_IL_T', 'H_BRKR', 'dev6_GCFR_T_TMP'.... 1,0,1,0);
EXEC dev6_GCFR_M.GCFR_Register_Process('TX02_S_BRKR_TPMIDETAILSD_CAP_01', 'Transform process to load tpmi Satellite table in IL Layer',23,15,243,'dev6_GCFR_INP_VTX02_S_BRKR_TPMIDETAILSD_CAP_01','dev6_ADS_IL_VMVM_S_BRKR_TPMIDETAILSD_CAP', 'dev6_ADS_IL_T'S BRKR_TPMIDETAILSD_CAP', 'dev6_GCFR_T_TMP 1,0,0,0);
SEL FROM cont_V.GCFR_Process_Id WHERE process_name LIKE '%TX01_CFR_PCC ANZ_ACCT_OPT%'
SEL* FROM cont_GCFR_V.GCFR_Process WHERE stream_key=999;
3. register key
EXEC dev6_GCFR_M.GCFR_Register Tfm_KeyCOI('cont_ADS_IL_VMVM_H_BRKR','BRKR_ID');
EXEC dev6_GCFR_M.GCFR_Register Tfm_KeyCOI('cont_ADS_IL_VM,VM_S_BRKR_TPMIDETAILSD_CAP/BRKR_ID');
SEL FROM DEV6_GCFR V.GCFR_Stream_BusDate WHERE STREAM_KEY = nm;
SEL FROM prd_racoe_gcfr_v.gcfr_transform_keycol WHERE out_object_name LIKE '%rm %';
PRD_GCFR_T_TMP. TX02_ram_01_IMG/INS
Mandatory steps to execute:
CALL (ENV) GCFR_P_API_CP.GCFR_CP_StreamBusDate_Start(<Stream_Key>,6,RC,RM);
# in order to start a new business date mandatory to execute as the first step
CALL (ENV_GCFR_P_API_CP.GCFR_CP_Stream_Start(<Stream_Key>, ".6,RCRM);
# in order to start a stream. Mandatory second step
Based on your requirement select the specific step to execute your logic:
CALL (ENV_GCFR_EXT_P.spCheckStream ProcessDependency("<Dependency_Name>'RM);
CALL (ENV_GCFR_P_API_PP.GCFR_PP_BMAP('<Process_Name>',6,RC,RM);
CALL (ENV) GCFR_P_API_PP.GCFR_PP_BKEY('<Process_Name>',6,RC,RM);
CALL (ENV_GCFR_P_API_PP.GCFR_PP_Tfm Txn(<Process_Name>',6,RC,RM);
# if you opt for a Transaction transformation
CALL (ENV) GCFR_P_API_PP.GCFR_PP_Tfm Full('<Process_Name>',6,RC,RM);
# if you opt for a full transformation execute this
CALL (ENV_GCFR_P_API_PP.GCFR_PP_Tim Delta('<Process_Name>,6,RC,RM);
# if you opt for a delta transformation while registering Mandatory steps to execute once all process completes inorder to close the active stream and business date:
CALL (ENV_GCFR_P_API_CP.GCFR_CP_Stream_End(<Stream_Key>,6,RC,RM); #Mandatory step to close the stream.
CALL (EN) GCFR_P_API_CP.GCFR_CP_StreamBusDate_End(<Stream_Key>,6,RC.RM); # mandatory step to close the business date. /* Start a new business date for the stream */
TABLES :
SELECT * FROM PRD GCFR V.GCFR Stream ORDER BY Stream_Key;
Frequent Used Tables & Procedures
System: (ENV) GCFR_V.GCFR_System
Stream:
(ENV) GCFR_V.GCFR_Stream
(ENV)_GCFR_V.GCFR_Stream_BusDate --Only need to reset this table if you want to reset business date. Dates in tables GCFR Stream IDGCFR Process ID will be automaticaly copied from this table.
(ENV) GCFR_V.GCFR_Stream_ID
Transform Process:
(ENV_GCFR_V.GCFR_Process
(ENV_GCFR_V.GCFR_Process_ID
Key Column(s)
(ENV) GCFR_V.GCFR_Transform_KeyCol
Execution Log:
(ENV_GCFR_V.GCFR_Execution_Log
SEL * FROM DEV6_GCFR_V.GCFR_Stream_BusDate WHERE stream_key='244"
SEL * FROM DEV6_GCFR_V.GCFR_Stream_Id WHERE stream_key=1244"
SEL * FROM DEV6_GCFR_V.GCFR_process_id WHERE stream_key='244"
UPDATE DEV6_GCFR_V.GCFR_Stream_BusDate
SET NEXT_BUSINESS_DATE = '2014-09-29', BUSINESS_DATE = '2014-09-28, PREV_BUSINESS_DATE = '2014-09-27 PROCESSING_FLAG = 1
WHERE STREAM KEY = 244;
UPDATE DEV6_GCFR_V.GCFR_Stream_ld
SET NEXT BUSINESS_DATE = '2014-09-29', BUSINESS_DATE = '2014-09-281 PREV_BUSINESS_DATE = '2014-09-27 PROCESSING_FLAG = 1
WHERE stream_key='244';
Stream
SEL FROM PRD_GCFR_V.GCFR_Stream WHERE STREAM_KEY = 305
SEL FROM PRD_GCFR_V.GCFR_Process WHERE STREAM_KEY = 305
SEL FROM PRD_GCFR_V.GCFR_Process WHERE PROCESS_NAME = 'TX03_RTL_FCT_QLTY_SALE_011
SEL FROM PRD_GCFR_V.GCFR_Transform WHERE PROCESS_NAME='
TX01_MNL_AU_RTL HILM INDVDL_PRD_TGT_01"
SEL * FROM PRD_GCFR_V.GCFR_Stream_BusDate WHERE STREAM_KEY = 1214
SEL * FROM PRD_GCFR_V.GCFR_Stream_BusDate_Log a WHERE STREAM_KEY = 1214
To see when was last load by bx process can be tx01 bx02 bx03
SEL FROM PRD_GCFR_V.GCFR_Transform WHERE PROCESS_NAME = 'TX01_DRM_AU_BRNCH ACTV_01' ORDER BY Update_TS DESC
Select from PRD_CTLFW_V_GCFR.GCFR_Process WHERE GCFR Process.Target TableName LIKE '%ONLYTABLENAME...%
select * from PRD_CTLFW_V_GCFR.GCFR_Stream_busdate_log
select * from PRD_CTLFW_V_GCFR.GCFR_Stream_id_log
SELECT * FROM DEV6 GCFR_V.GCFR_STREAM BUSDATE
--call statement
CALL DEV6 GCFR_P_API_CP.GCFR_CP_StreamBusDate_Start(snum,6,out1,out2); CALL DEV6_GCFR_P_API_CP.GCFR_CP_Stream_Start(snum.",6,out1,out2); CALL DEV6_GCFR_P_API_PP.GCFR_PP_Tim Full (TX02_nm_03',6,out1,out2); CALL DEV6_GCFR_P_API_CP.GCFR_CP_Stream_End(snum,6,out1,out2); CALL DEV6_GCFR_P_API_CP.GCFR_CP_StreamBusDate_End(snum,6,out1,out2);
UPDATE PRD_CTLFW_V_GCFR.GCFR_Stream_BusDate SET NEXT_BUSINESS_DATE = '2021-09-18', BUSINESS_DATE = 2021-09-17. PREV_BUSINESS_DATE = '2021-09-16,PROCESSING_FLAG = 1 WHERE STREAM_KEY = num:
UPDATE PRD_CTLFW_V_GCFR.GCFR_Stream_ld SET NEXT_BUSINESS_DATE = '2021-09-18, BUSINESS_DATE = '2021-09-17, PREV_BUSINESS_DATE '2021-09-16', BUSINESS_DATE_CYCLE_START_TS=2021-09-17 00:00:00.000000 PROCESSING_FLAG=1 WHERE Stream_Key = num;
Update PRD_CTLFW_V_GCFR.GCFR_Process_Id Set business_date='2021-09-17' WHERE Stream_Key in (num)
UPDATE ROLLBACK:
WEN UPDATE HAPEN , END DT = BUSSDT-1
1. DELETE FROM PRD_RACOE_ASS_ADS_URSDS.CFR_PCC_ANZ_ACCT_OPT
WHERE START DATE > DATE '2024-02-10';
/* update as if last run was 10th-
11 was sunday so no run on that day, 12th and ahead dates ran.
so WHEN 12th ran as it is daily, it would have enddated prev records with end date as 11.
SELECT DISTINCT end_date FROM PRD_RACOE_ASS_ADS_URSDS.CFR_PCC_ANZ_ACCT_OPT
WHERE START DATE='2024-02-10' has 10,11,12...
SELECT DISTINCT end_date FROM PRD_RACOE_ASSADS_URSDS.CFR_PCC_ANZ_ACCT_OPT
WHERE START DATE='2024-02-10' --after update end date> 10, we have 2 date 9999 and 10th feb
when job ran for 10th start date correctly, it will end date prev as 9th
so update end date to 9999 for all where end date >=11
*/
2. UPDATE PRD_RACOE_ASS_ADS_URSDS.CFR_PCC_ANZ_ACCT_opT
SET end_date = DATE '9999-12-31', UPDATE_PROCESS_NAME = NULL, UPDATE_PROCESS_ID = NULL
WHERE end_date >= DATE '2024-02-10';
--whatever START st DEL-12, -1 shd be updated AS END IN WHERE
3. INSTEAD OF doing ALL these 3 below, can re-reg the stream FOR the running DATE
UPDATE PRD_RACOE_ASS_GCFR_V.GCFR_Stream_BusDate
SET NEXT_BUSINESS_DATE = '2024-02-12', BUSINESS_DATE PREV_BUSINESS_DATE = '2024-02-10',PROCESSING_FLAG
WHERE STREAM_KEY = 999;
UPDATE PRD_RACOE_ASS_GCFR_V.GCFR_Stream_Id='2024-02-11', SET NEXT_BUSINESS_DATE = '2024-02-12', BUSINESS_DATE = '2024-02-11'', PREV_BUSINESS_DATE = '2024-02-10', BUSINESS_DATE_CYCLE_START_TS='2024-02-11 00:00:00.000000', PROCESSING FLAG=1 WHERE Stream_Key=999;
Ichs
UPDATE PRD_RACOE_ASS_GCFR_V.GCFR_Process_Id
SET business date='2024-02-11'
WHERE Stream Key IN (999);
OR
register
CALL prd_racoe_ass_GCFR_P_API_UT.GCFR UT_Register_Stream (999,1, 'CFR_PCC_ANZ_ACCT_OPT',Cast('12/02/2024' AS DATE FORMAT
SEL * FROM prd_racoe_ass_gcfr_v.gcfr_stream_busdate WHERE stream_key=999;
SEL * FROM prd_racoe_ass_gcfr_v.gcfr_stream WHERE stream_key=999; SEL * FROM prd_racoe_ass_gcfr_v.gcfr_stream_id WHERE stream_key=999;
'DD/MM/YYYY'));
is enoug but check. SEL * FROM prd_racoe_ass_gcfr_v.GCFR_Process_Id WHERE stream_key=999;
EXEC prd_racoe_ass_GCFR_M.GCFR_Register_Process('TX01_CFR_PCC_ANZ_ACCT_OPT', 'CFR_PCC_ANZ_ACCT_OPT in ursds layer',23,32,999, 'prd_racoe_ass_GCFR_INP_V', 'TX01_CFR_PCC_ANZ_ACCT_OPT', 'PRD_RACOE_ASS_ADS_URSDS', 'CFR_PCC_ANZ_ACCT_OPT', 'PRD_RACOE_ASS_ADS_URSDS', 'CFR_PCC_ANZ_ACCT_OPT', 'prd_racoe_ass_GCFR_T_TMP',,,,1,0,1,0) -----check gcfr v table process to c target table is test not real opt n tx logic, then call for 12, create vr for 12 to c count
SEL * FROM PRD_RACOE_GCFR_V.GCFR_Process_Id WHERE process_name LIKE '%TX01_CFR_PCC_ANZ_ACCT_OPT%'
SEL
FROM PRD_RACOE_ass_GCFR_V.GCFR_Process WHERE stream_key=999;
SEL FROM prd_racoe_ass_gcfr_v.gcfr_transform_keycol
WHERE out_object_name LIKE '%CFR_PCC_ANZ_ACCT_OPT%';
--tx
=============================
2.2 Working NZ
2.3 Registration Macros
--> Process Name, Process Type, Control ID, Stream Key, In DB_Name, In_Object_Name, Out_DB_Name, Out_Object_Name, Target TableDatabaseName, Target TableName, Temp_DatabaseName, Key_Set_Id, Domain_Id, Collect_S
Note:
Comments
Post a Comment