GCFR mine kt

1. WHAT IS GCFR

> The GCFR (Global Control Framework) is an ETL framework developed by Teradata.

> The GCFR is a set of architectural standards, development methods, processing patterns and a code library that provides the basis for implementing a true time variant (History)data warehouse.

> GCFR tells how the data is loaded (tx type) and what data is loaded (tx) in the target table. (work as filter between source and target)

Purpose

The Control Framework enforces audit trail and reduces programmer effort. It provides:

-Full data and process lineage
-Ability to prove data from source to end uses
- Automated audit function

1.1 GCFR Processing Patterns

IPO (input-process-output) model:

> Each processing pattern in data warehouse adheres to the IPO model.

> Data flow is from Input to Process to Output.

>This simple model applies to all data flows within the GCFR.

INPUT -A single input to the process. The view that presents the input data set to the process or source table PROCESS - The function or program that operates against the input data set to produce a result.

OUTPUT - The result of the Process operating on the input data set.

Figure 1 - IPO Model

The processing patterns interact in a set sequence. It is possible to develop each group in isolation due to the abstraction achieved through the use of views over tables. we need to create Gcfr tx for table only and the view.

1.2 The Global Control Framework consists of the following components:

1. System:

→ All inbound and outbound systems/platforms of the data warehouse must be identified and allocated a unique identifier (Ctl Id) and name (Sys_Name) according to the agreed naming standard.

→ System is not a server, but a logical software application, like a Customer System or Accounts System. (CAP, MAUDE, GCU, File)

--> Register data source system with gcfr so that we as well as gcfr is aware of it.

--> All system info is stored in gcfr view name: GCFR_System

2. Process/transformation:

> The job to transform data, and merge data from source table into target table.

> Tells us what data we have to load data in target table (all logic and joins goes here)

> Defines transformation on source data from various sources as per requirement

> All process info is stored in gcfr view name:GCFR_Process

3. Stream:

>A Stream is a collection of processes that must be completed as a whole to complete a single data pipeline the data warehouse.

> Stream is parent box having many process or collection of dependent process that has to be run together.

> Each stream must be identified and allocated a unique identifier (Stream_Key) and unique name (Stream_Name) and each collection of dependent processes in this stream is identified with the same Stream_Key.

> the processes within 1 stream will be executed in sequence. However process different stream can be run parallel (note that we also can set the dependency between streams with Control-M).

>All stream info is stored in gcfr view name:GCFR_Stream

4. Key column:

> Target table key column, while merging data, used to identify if the record already available in target table or not.

> GCFR uses this column to determine existence of row in target from source and perform operation on it.

> All key info is stored in gcfr view name:GCFR_Transform_KeyCol

5. Business Date Definition

> The Business Date represents the date for which the stream run is happening

> All busdate info is stored in gcfr view name: GCFR_Stream_BusDate

Process type

full

delta

transaction

1.6 Processing Columns

> Every physical permanent table in the data warehouse will have 10 extra columns which is filled by gcfr for time variant and history tracking.

> They can be used by ELT developers and end users alike for analysis, audit and reporting.

> All processes that move the data are recorded and the data can be traced back to source.

> The processing columns make all tables true time variant.

> Every state of permanent rows in the data warehouse is recorded.

> Rows are not updated or physically deleted (unless by the archive process).

> An update causes a row to be closed and a new instance of the row is opened to reflect current state. A delete causes a row simply to be closed with a logical status of deleted.
> Full Impact analysis is very easy due to the meta data captured by the control framework.

1.7. BKEY

>Bkey process is used to create Surrogate key for table in datawarehouse
>Each domain has only 1 bkey table but many bkey process loading into it.

>A surrogate key is any unique identifier required to be generated for the sake of representing some base object (e.g. a Customer, an Event etc).

>Typically in a Logical Data Model (LDM) the primary key of many of the base entities is a generated Surrogate Key(e.g. Party_ld).

>The intention is to create a binary relationship (hence the term BKEY) between one logical data key (one or many input columns) from source to one single identifying column (the surrogate key) of data in a physical table in the EDW. >The BKEY Data Model used in the Control Framework holds the mapping between the logical data key from source and the surrogate key generated by the framework module.

Why are Surrogate Keys needed?

1.Domain Crossover

-An organisation may have two separate customer systems.

-Each system has a customer number.

-Both systems start the customer number from 1 traversing to n.

-The customer identified by customer number value 1 in the first system is not the same as the customer identified by customer number value 1 in the second system.

1 . img table -
put tx data
populate prc columns

full:

demo

0.0 TERMS

Corporate Entity - More than one logical, corporate or legal entity having access to and some kind of proprietary ownership of data stored in the data warehouse

System - Logical systems that send data to or receive data from the data warehouse

File - 'Files' includes files, tables or any other data format or transmission method

Process -A function that performs work on one or more input data sets into an output data set

Stream - A dependent set of processes

1.1 WHAT IS CF?

The Control Framework is a set of architectural standards, development methods, processing patterns and a code library that provides the basis for implementing a true time variant data warehouse.

1.1.1 WHAT IS GCFR?
GCFR Global Control Framework is a set of design patterns / processes and metadata repository for managing ETL.

The GCFR (Global Control Framework) is an ETL framework developed by Teradata.
It includes a set of architectural standards, development methods, processing patterns and a code library that provides the basis for implementing a TRUE TIME VRIANT data warehouse.
It also provides process control, logging and auditing capability. The framework resides mostly in database, allowing in-database ETL logic to run thus fully utilising the parallel processing power of Teradata, GCFR also has data ingestion pattern using TPT however this task is normally performed by a dedicated ETL tool.In CAA environment the data is ingested into Teradata ecosystem through RSDS team (BIH) and into URSDS using TTF (and KNIME in the future)

It is an automated framework to manage the ELT processes of the Teradata Global Reference Information Architecture to ensure accuracy and efficiency. The GCFR is used in conjunction with an ELT tool - in itself it is not a tool or a replacement for a tool. The GCFR enforces audit trail and reduces programmer effort.

a . Data Auditing Features
The GCFR process and columns provide full end-to-end lineage of audit of the data:

* The date and time of every process run is recorded for every table in the data warehouse (can know when the last process was run)
* The process which created or updated that row is recorded for every row in every table (can know which process created every row and when)Every row received from any of the source systems (including any rejected ones) can be accounted for
* The process always checks that the file being loaded is the correct one in the sequence and has not been loaded previously (can know all data landed on the data warehouse is accounted for)
* The source of the data and The target us recorded for every process (can know we have a complete audit trail no matter how many times the data is landed on the data warehouse)

b. Overview of the GCFR
* The Teradata Global Control Framework (GCFR) consists of:-

> A base data model that allows registration and control of all data flow processes, input and output, to and from the EDW
> A set of standard database structures to facilitate processing, security and abstraction concepts
> ProcessingPattern code modules that:-
- Maintain the base framework control
-Automate meta data capture from all processes
- Maintain row level time variance in the EDW
- Maintain row level process metadata in the EDW
>This provides Full Data & Process Lineage from source to sink
-At the ROW level
>The GCFR is a key support component of a true Time Variant EDW
-Maintaining row and record lifecycle
-Guaranteeing true repeatability of reporting (as is, as was)

c. Overview of the GCFR
The Control Framework has a set of standardized Processing Patterns perform the data movement actions of the ELT processes
>Implemented as Stored Procedures
> Facilitation of source data through a standard EDW life cycle Extract, load, consumed, archived

The Control Framework is independent of DI tools
> It is available for use natively in conjunction with Windows/Linux/Unix operating systems
> It can be integrated with any DI Tool that is certified to work with Teradata

The Control Framework is a significant kick start to any IDW in terms of:
>Definition and control structures
> Enforcing standards
> Implementing abstraction concepts
> Reducing programmer error dramatically

d. Processing Pattern' Concept

* What is a Processing Pattern?
> A Processing Pattern is an architectural model to simplify and control the ELT processes that govern the data lifecycle from source system to data warehouse and on to end users
> The Control Framework contains implemented processing patterns
> The Control Framework processing patterns update the operational meta data contained within the data model
> The Control Framework processing patterns perform the mechanics of applying the input data set to the target table whilst maintaining data lineage

* Why are Processing Patterns required?
> To reduce programmer error
- Complex code is written once, tested once and then re-used
> To break the work effort into smaller, simpler pieces
- Easier to develop
- Easier to generate and test
- Many aspects of development can be automated
- Many aspects of development can be isolated and shipped to alternative development locations
- Facilitates configuration management and change control on complex projects
- Supports both agile and waterfall methodologies
> Operational support becomes much simpler and therefore more cost effective
- Small number of ELT patterns need to be supported rather than a vast number of individual ELT components

e. The 'IPO' Model

The IPO model has three base components
INPUT
> A single input to the process
> The source data set for the process

PROCESS
> The function or program that operates against the input to produce a result
> The 'Processing Pattern' executable

OUTPUT
> The result of the process operating on the input
> The target data set result for the process
Flow is from Input to Process to Output
This model is used in the standard processing patterns
All processes in the Control Framework comply with this simple definition

f. GCFR Implementation Alternatives

* Data integration processes can be implemented in a variety of ways
> Toolset and technology can impact choice of implementation approach

* Data integration approaches commonly associated with data warehousing
> ELT - extract, load the data to the target platform then transform the data according to the business rules.
> ETL - extract, transform the data according to the business rules then load the data to the target platform
> Hybrid - business requirement driven choice to make best use of the toolset and technology
– ETL and ELT and ETLT and ...........
* The GCFR provides a flexible framework to allow your choice of data integration approach

g. Corporate Entity
More than one logical, corporate or legal entity having access to and some kind of proprietary ownership of data stored in the data warehouse
* Holds a record defining each corporate entity
> Surrogate key CE_ID assigned by the Data Warehouse Data Governance Steward during development
> Examples: -Holding Company -Country A -Cards Group

CE_System Table
> Relationship that defines what systems are owned or accessible to the corporate entity
> Each corporate entity can own or access many logical source data systems
> Primary key is CE_Id and CTL_Id

* These tables support an ability to recognize and isolate multiple corporate entities in a single database schema
>They allow definition to be logical
> They eliminate the need for multiple physical databases with duplicated data
> They allow data access to be restricted based on CE_Id and CTL_Id

h. Parameter

I. Message logging

1.1.2 The benefits of using an ETL framework such as GCFR includes:

-Reduce development effort by automating common processing patterns. This also ensures consistency and accuracy for these standard ETL tasks and allowing developers to focus on business specific logic
-Provides a common interface to control and monitor processes with full auditability on process execution and also embedding this information into row level data.
- Ensure a standardise loading and transform model is applied, helping to organise and centralise ETL metadata and transformation logic to improve support and code management.
--The framework is fully utilising Teradata power by doing in-database ETL logig
--The framework can be customisable and extendable.

Terms:
Bussiness date definition : date of src data within given stream src system is expected to provide buss date as per GCFR src extract file guide

1.1.3 GCFR view and use:

1.1.4 The Global Control Framework consists of the following components:

* Process Control Data Model and standard processing columns for all cross functional model tables
* Standard code modules for ELT processes, surrogate keys and reference code data management
* Housekeeping utilities for the daily/intraday control of the ELT processes

1. System:

All inbound and outbound systems/platforms of the data warehouse must be identified and allocated a unique identifier (Ctl Id) and name (Sys Name according to the agreed naming standard. → Typically, a system is not a server, but a logical software application, like a Customer System or Accounts System. (CAP, MAUDE, GCU, File)

→ Register data source system with gcfr so that we as well as gcfr is aware of it.

All system info is stored in gcfr view name: GCFR System

2. Process /transformation:

>The job to transform data, and merge data from source table into target table.

> Tells us what data we have to load data in target table (all logic and joins goes here)

> Defines transformation on source data from various sources as per requirement

> All process info is stored in gcfr view name:GCFR Process

3. Stream:

>A Stream is a collection of processes that must be completed as a whole to complete a single data pipeline the data warehouse.

> Stream is parent box having many process or collection of dependent process that has to be run together.

> the processes within 1 stream will be executed in sequence. However process in different stream can be run parallel (note that we also can set the dependency between streams with Control-M). > All stream info is stored in gcfr view name:GCFR Stream

4. Key column:

> Target table key column, while merging data, used to identify if the record already available in target table or not.

> GCFR uses this column to determine existence of row in target from source and perform operation on it.

> All key info is stored in gcfr view name:GCFR Transform KeyCol

Control Framework Components:
Data Model that holds processing metadata
>When, what program, what source, what target, update metrics
>Static metadata, eg System, Files, Paths
>Operational metadata collected by the Standard CTLFW code at run time, e.g. metrics…

1.1.5 What happen when we run a single bussdate run of stream

Business Date

In GCFR, each stream has a Business Date. This Business Date has no dependency on the 'real' date: it is an internal GCFR value. All the processes within the stream run for that Business Date and are expected to complete successfully before the Business Date can be rolled forward.
This allows GCFR to load historic data, load multiple-dates' data on one day, continue running over midnight etc'.

Control Framework Control Patterns:
•Start of Day >Opens the business date for a given stream of ELT
•Start of Stream >Opens the stream instance within a business date for a specific ELT batch
•End of Stream >Completes the stream instance
•End of Day >Completes the business date for a given stream

Business Date Start

>This pattern is used to set the current, previous and next business dates for the relevant stream.
> Processing for the previous business date must be completed otherwise this process will fail to ensure that we do not load wrong data. > All streams must be started via the Start of Day process.
> ELT processes cannot proceed until the stream has been started.
> If the process fails, the stored procedure will send a Return Code of 1 indicating the process has failed.

Some common errors are as following:
--Prev stream not completed
--Stream does not exist
--Business Date is already started

show procedure PRD_GCFR_P_API CP.GCFR CP StreamBus Date Start

1. Validate Stream Registration
2. Validate Stream execution i.e. check if this stream business date is already active
3. Update GCFR Stream BusDate table and increment date and set Processing_Flag to indicate processing is ready for specified Stream_Key

Stream Start

>This pattern is used to instantiate a Stream within the current business date.
>Only one instance of any given Stream_Key can be active at any one time.
> If the process fails the stored procedure will send a Return Code of 1 to indicating the process has failed.
Some common errors are as following:
-- previous instance of this Stream is in an incomplete state as the End of Stream process has not been run Check the Processing_Flag for all instances of this Stream_Key in the GCFR Stream Id table
--Stream does not exist
--Business Date for this Stream has not been started
--Cycle Start Time Stamp is not later than that for the previous Stream.(Intraday only)
--Stream has already been started

show procedure PRD_GCFR_P_API_CP.GCFR_CP_Stream_Start
1. Check for Stream_Key in GCFR_Stream.
2. Check if Stream Business Date is not started i.e. GCFR_Stream_BusDate.Processing_Flag = 1 for the given Stream_Key
3. Check if Stream instance is already active i.e. GCFR_Stream_Id. Processing_Flag = 0 for the given Stream_Key
4. Check if GCFR_Stream_Id. Business_Date <> CAST(iBusiness_Date_Cycle_Start_Ts AS FORMAT'YYYY-MM-DD') for the given Stream_Key
5. Check if GCFR_Stream_Id. Processing_Flag = 1 AND GCFR_Stream_Id. Business_Date_Cycle_Start_Ts >= 1 AND GCFR_Stream_Id. Business_Date_Cycle_Start_Ts >= iBusiness_Date_Cycle_Start_Ts for the given Stream_Key
6. Update GCFR_Stream_Id table and and set Processing_Flag to indicate processing is ready for specified Stream_Key

GCFR Process runs

> Based on your TX definition, gcfr loads the target table.
> Run bkey, bmap, tx
>GCFR transform table process will set to 99-if completed successfully

Stream End

> This pattern is used to close an instantiated/open Stream within the current business date.
> Only one instance of any given Stream_Key can be active at any one time.
> If the process fails the stored procedure will send a Return Code of 1 to indicating the process has failed.

Check the GCFR Execution Log and GCFR Error Log tables for details relating to the processand instance causing the error.
Some common errors are as follows:
-A Process instance of this Stream is in an incomplete state
Check for Process_Instances with an activeProcess State(i.e. not 99) for all instances of this Stream_Key
-Stream does not exist
-Stream has already ended

show procedure PRD_GCFR PAPI CP.GCFR CP StreamBusDate End

1. Check for Stream_Key in GCFR Stream.
2. Getting active processes count associative with current stream from GCFR_Process_Id.
3. Update GCFR_Stream_Id table and and set Processing_Flag to indicate processing is ready for specified Stream Key.

Business Date End

>This pattern is used to close the current Business Date for the given Stream.
>This pattern does NOT check for active stream instances as the batch schedule should ensure all processing is complete prior to closing the current business date.
>This pattern additionally sets the status of all source system extracts for the given Stream and Business Date to consumed.
>This allows automated management of consumed source system extracts (e.g. such as moving to an archive directory for archiving).
> If the process fails the stored procedure will send a Return Code of 1 to indicating the process has failed.

Some common errors are as follows:
-Stream does not exist
-Business Date for this Stream is already closed

show procedure PRD_GCFR_P_API_CP.GCFR_CP_StreamBusDate_End
1. Stream Validation
2. Validate active stream instance for the given stream_key in GCFR_Stream_Id
3. Update GCFR_Stream_Bus Date to set stream business date end
4. End stream business date for the given stream_key
5. Set all extracts for this Stream Instance to consumed

1.2 Detail concept
1.2.1 STREAM Concept
•A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified by a record in this table.
•A Stream is a unit of schedulable work. Once completed a Stream may not be re-run without intervention to revert output data and CTLFW metadata to the pre-run state.
•There is no limit to the number of times a Stream may be run on any given date. Only one instance of any Stream is allowed to be run at a time.

A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified with the same Stream_Key.
SNBC Implementation
* Stream Key is running number starting from 1
* For data loading processes into Intermediate Layer
* Stream is grouped by source system and by cycle frequency e.g. Stream 1 is for daily process to load data into Intermediate Layer from every sub system from CAP

Dependent process in a stream

1.2.1 GCFR System
All inbound and outbound systems of the data warehouse must be identified and allocated a unique identifier (Ctl_Id) and name (System_Name) according to the agreed naming standard.
All source and target systems are registered in GCFR_System. This is a static reference table. A logical system is any source or destination of data in the organization. The data warehouse itself is a system and may have many logical systems contained within the physical data warehouse environment.
ctlid - system id of src/target system as per std. unique in this domain n the pk of this table
SNBC Implementation
-- Ctl_Id is running number starting from 1
--System Name is short abbreviation of inbound and outbound system name
-- Path Name will be used only for load and export processes. SNBC reads data from RSDS so GCFR load process is not implemented.

update_ts - ts when record was created /updated

GCFR SYSTEM FILE LOG - hist of update to this table are captured here. log table has same col to master table but pk includes update ts.

1.2.2 Cycle Frequency is Daily

This table defines each processing stream within the data warehouse. A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified by a record in this table.
A Stream is a unit of schedulable work. Once completed a Stream may not be re-run without intervention to revert output data and CTLFW metadata to the pre-run state. There is no limit to the number of times a Stream may be run on any given date. Only one instance of any Stream is allowed to be run at a time.

update ts

GCFR_STREAM_BUSDATE_LOG

1.3 BKEY Process

BKEY process is GCFR standard process to generate surrogate key from source natural key value.
SNBC Implementation
* Every surrogate key in ADS Intermediate Layer model will be generated by BKEY process e.g. CUST_ID, ACCT_ID etc.
* Technical Implementation:
* Each key will have dedicated BKEY table.
* Creation of BKEY table will be handled as a part of the BKEY registration process using standard GCFR registration Macro.
BKEY Naming Standard and Example

GCFR provides a method for generating surrogate keys, called BKEY. It provides a robust, efficient and standard way to create a surrogate key for any possible combination of natural keys. It ensures that the same natural key is always associated with the same surrogate keys. GCFR keeps that natural-to-surrogate key relationships in BKEY tables, which allow us to generate surrogate keys and, later, join to the BKEY tables to obtain the natural keys.

BKEY generates a single INTEGER surrogate from a single value. So when the natural key is comprised from several column, we concatenate these column into a single string value using pipe delimiters.

1.4 GCFR BMAP Process
BMAP is the GCFR standard process (and data model) for managing reference data. The BMAP process will generate EDW reference code for undefined reference code from source and store it in the reference-data tables.
SNBC Implementation
* GCFR BMAP is used to manage reference code and description in SNBC and mapping from source system's code to SNBC standard code.
* GCFR BMAP process is used to generate unknown reference codes for new reference data in SNBC.
Technical Implementation:
- Standard GCFR BMAP data model will be used as-is.
- All reference code sets will be stored in the same BMAP Standard Map table.
- Creation of BMAP tables is a part of the GCFR standard installation.

GCFR supports maintenance of Reference data. GCFR expects to be provided with code lists (e. g. list of countries) and their meaning. GCFR BMAP can maintain multiple lists for the same domain (this is usually used allow two systems to provide data using their own sets of codes) and tan provide output in different formats, structure or language. In this sense, BMAP provides a many-to- many Reference Data service.

1.4 GCFR Transform Process

A process is a function that transforms one or more input data sets into an output data set.
All ELT processes that are controlled by the job schedule for the data warehouse are registered in GCFR_Process. This includes any extracts from source but not those extracts that are managed under a source system job schedule. Also included are loads to the operational image, transformations to the core PDM, further transformations to the semantic layer and any extracts from the data warehouse regardless of the source tier of the data.

A process is a single code module; it performs a single specific piece of work and processes to a single data warehouse table or extract file. This is a static reference table.
The GCFR Transform Process is used to transform data within data warehouse. GCFR supports three main transform patterns:
1. Full Transform
2. Delta Transform
3. Transaction Transform

SNBC Implementation
* SNBC uses GCFR full transform processing pattern to load data from RSDS to Intermediate Layer (refer to Selection of GCFR Transform Patten for IL Data Loading Processes).
* Technical Implementation:
- Transform input view: Developed manually from mapping specification.
- Transform Process: Standard GCFR process where two registrations are required for each process-which are Register of Process information and Transform Key Column for the process.
- Transform output view: One-to-One view to target table.

update ts

GCFR_CP_StreamBusDate_Start
Purpose:
Rolls over the previous, current and next business date for the Stream according to the Stream's cycle frequency. Sets the status of the Stream to 'ready for current date processing to commence'. This API first validates the input Stream_Key and checks that the Stream is NOT already started.

GCFR_CP_Stream SpecBD_Start
Purpose:
This CP is used to start a special business date, i.e., any business date which lies in the past or future from the current date, in order to allow data processing for any custom business date. This API first validates the input Stream_Key and checks that the Stream is NOT already started.

GCFR_CP_Stream_Start
Purpose: Instantiate a Stream Instance for the input Stream_Key.
This API first validates the input Stream Key, checks that the Stream is started and there are no active Stream Instances. If the starting timestamp is provided as input it checks to ensure timestamp is later than the timestamp of the most recent Stream Instance for the Stream_Key. Note if the starting timestamp is not provided to the Start of Stream control pattern then the first Process to execute within the Stream will provide th starting timestamp for the Stream,

GCFR_CP_Stream_End
Purpose: Completes the current runtime Stream Instance for the input Stream_Key, This API first validates the input Stream_Key, checks that the Stream is started and there are no active Process Instances still running for the Stream Instance. It then sets the Stream Instance's Processing Flag to 1 (completed).

GCFR_CP_StreamBusDate_End
Purpose: This API first validates the input Stream_Key, checks that there are no Stream Instances active and checks the Stream is actually running. It sets the Stream's Processing Flag to 1 (completed). It sets the Load Status of the Stream's files to 'Consumed (i.e. 2)'.

GCFR Process_Log
History of updates to this table are captured here. The _Log table has identical columns to the corresponding master table however the primary key includes the Update_Ts.

GCFR_Transform_KeyCol
The primary key columns of a target table are registered in GCFR_Transform_KeyCol. This is a static reference table.

GCFR_Stream_BusDate
This is a control table that associates a Stream with the current, previous and next business date when the Stream can be processed. It represents the processing state for a Stream. Every Stream has a current record in this table. It is used to ensure that any dependent processing from a previous day is completed before this day's processing commences.

GCFR_Stream_BusDate_Log
History of updates to this table are captured here. The _Log table has identical columns to the corresponding master table however the primary key also includes the Business_Date and the Processing_Flag

GCFR Process_Type
Every process has a type. This table is a minor entity that provides descriptive information about the Process_Type. This is a static reference table. The addition of this minor entity allows new process types to be easily added. Client specific processes can be added with a specific client defined process type. It is recommended that client specific process types commence numbering from 100 leaving numbers less than 100 for non-client specific processes.

1.5 Processing Columns
>The following standard processing columns are added to every physical permanent table in the data warehouse for hist or time variant(track time n date).
>They are managed by the control framework standard code modules that perform the ELT processes,
>They can be used by ELT developers and end users alike for analysis, audit and reporting.
>Their intent is to provide full support for data lineage and detailed audit of the EDW data.
>All processes that move the data are recorded and the data can be traced back to source.
>The processing columns make all tables true time variant.
>Every state of permanent rows in the data warehouse is recorded. Rows are not updated or physically deleted (unless by the archive process).
>An update causes a row to be closed and a new instance of the row is opened to reflect current state.
>A delete causes a row simply to be closed with a logical status of deleted.
Full Impact analysis is very easy due to the meta data captured by the control framework.

Hold the audit information at the data warehouse table row level
* Permits full data and process lineage from row to source data set Allows verification of who did what and when as can link back to the Control Framework database tables to verify data lineage
* Supports Multi Entity tagging at the "logical System" level and thus allows multi entity identification with NO PROGRAMMING
--The columns exist so that a process will never update target data when an update occurs
› Only ever closes a row and adds a new version
* The columns link back to the Control Framework data model to reveal details of the processing
* The processing columns are applied to EVERY target table in the data warehouse to ensure full data and process lineage

Process Control Data Model Detail
a. Log tables - The data model contains two types of tables:
Data - > Static or runtime process control metadata
Audit- > Log or audit tables that record audit, execution and error messages

Log tables
* Every data table has an associated _Log table
Every time a row in a data table is updated the history of the update is recorded (via a trigger) in the associated _Log table

* The processing columns are managed by the Processing Patterns

* The intent of the processing columns is to provide full support for data lineage and detailed audit of the data warehouse
The processing columns result in a true time variant data warehouse
> The data component of a row is NEVER updated
>Theexisting row is closed and a new row is inserted
> Start and End date/time fields in the processing columns give a complete historical view of the record lifecycle
* The processing columns allow full impact analysis to be conducted
- For example:
—What source data set provided this data?
- When was this data loaded/transformed?

Processing columns are added to every physical table in the data warehouse ,
-To support the true time variant data warehouse
-To provide audit capability
-To enable full data lineage from source data set to business use
-To enable full impact analysis
-Processing columns are maintained by the actions of the processing pattern that does work on the data set

Benefits of processing columns:

They are managed by the control framework standard code modules that perform the ELT processes.

* They can be used by ELT developers and end users alike for analysis, audit and reporting.

* Their intent is to provide full support for data lineage and detailed audit of the EDW data.

* All processes that move the data are recorded and the data can be traced back to source.

The processing columns make all tables true time variant. Every state of permanent rows in the data warehouse is recorded. Rows are not updated or physically deleted (unless by the archive process). An update causes a row to be closed and a new instance of the row is opened to reflect current state. A delete causes a row simply to be closed with a logical status of deleted.

* Full Impact analysis is very easy due to the meta data captured by the control framework.

Start Date
>The business date this row was written into the data warehouse table.
>Should not be confused with the status of the record. Many records will have a logical start date, open date or similar. For example, an account may have an account open date. The account open date would be stored as a column in the physical data model and would NOT be used to populate the processing column Start Date.

End Date
>The business date this row ceased to be the current record of the data.
>This is due to either a change in the value of a column in the row (an update) or that this row is no longer present at source (a delete).
>This is the only time a row is update in situ. No columnis are changed in the row apart from the End_Date.
>End Date equal to high-date (i.e. 9999-12-31) indicates this is the current row.
>Only one row for any given table primary key set can be the current row.
>Update Process_Name and Update_Process_Id must be populated when End_Date is populated.

Record_Deleted_Flag
Default value: 0 = not deleted
Other values:
0 = not deleted
1 = deleted
>No rows are physical deleted from data warehouse table, unless it is as part of the archiving process.
>A row is shown as being deleted at source by either being absent from the source feed or by a source record deleted flag being set.
>The current row is closed by setting the End_Date. A new row written with new Start Date and this flag set to 1. Thus the current row is deleted.
>A row can move in and out of deleted status.

CTL_ID
>The process that creates the row in the table will set this from the Control Framework data.
>This is a horizontal domain in a table. It means that row ownership cannot be shared across source systems.

Process Name
The name of the process that created this row.

Process_Id
The identifier of the instance of the process that created the row.
This can be used to find the process within the Control Tables,

Update Process_Name
The name of the process that closed this row (i.e. the process that populated the End Date).
null if not updated row

Update Process_Id
The identifier of the instance of the process the closed this row.
null if not updated row

Start Ts
An Intraday data feed must provide a timestamp either at the record level or the file level. This becomes the Business_Date_Cycle_Start_Ts stored during the registration process for the source data feed. The Business_Date_Cycle_Start_Ts is used to set the Start Ts of a row being opened.
If an Intraday data feed cannot provide a timestamp then it cannot be intraday.

End Ts
The End Ts of any row being closed is the Start Ts of the new row less one smallest unit of time defined. It is calculated using the Business_Date_Cycle_Start_Ts of the new row.

1.6 Stream Identification
A Stream is a collection of processes that must be completed as a whole to meaningfully transform a set of input data (files or tables) into a coherent output set of tables within the data warehouse. Each collection of dependent processes is identified with the same Stream_Key.
Stream Cycle Frequency Codes -
values and functionality
0 - Intra Day
1 - Daily
7 - Weekly
14- Fortnightly

1.7 GCFR process type / TRANSFORMATION TYPE

Full - 23 (entire dataset is processed to load)

* Input: Source data for GCFR Full Transform Process is the full snapshot of current state from

source.

* Process: This input data will be compared with open records (End_Date = High-Date) in the target and changes will be captured and update into target table.

* Output: Target table will have data stored with Start Date and End Date that represent the valid period of data.

* Once a record have been populated, There always a record's End Date is '9999-12-31'.
* If the data was missing in source table, The delete flag is set to 1 for that period.
* The start date and end date means the validate period of the row content.
* This mode will compare the the existing data with the target data to find our the changed one. We need to provide whole data of the source and GCFR process would scan all of them and all of the target table.

EXAMPLE

Take an example of 100 records in source on day1

all hundred records are loaded to the target table on day1 as well

mechanism full:

on day2 i will expect all 100 records to come from source and then only do my scd2 on the target. If all records dont come ill mark the existing records in my target which didnt come from source as record_deleted=1 (MISSING IS CONSIDERED AS DEL)

if i get new records suppose 102 records ill do tracking for 100 records and just add the new 2 records

on day3 now i expect atleast minimum 102 records.

delta mechanism:

on day2 to 100 records may or may not come whatever comes ill see if its there in the target and if any changes ill update it else leave the other records as is

delete_flag is not updated here
(MISSING IS NOT MARKED AS DELETED - DEL IF SRC MARKED THE FLAG DEL)

Delta - 24 (only load new/updated data since last load)

Input: Source data for GCFR Delta Transform Process is the changed data from source with changes flag (GCFR_Action_Flag). Change flags are

。 0 for new record

* 1 for update record

* 2 for delete record

Note: Input for GCFR process is the transform view which may consists of join statement from more than one table. Change flag must address changes of source data from every table.

* Process: GCFR process will apply changes following the change flag into target table.

* Output: Same with Full Transform, Target table will have data stored with Start Date and End Date that represent the valid period of data.

The same as FULL transformation.
* The difference between FULL transformation is you need a column called <GCFR_DELTA_ACTION_CODE> which contains 0 New/1 Updated/2 Deleted for each row.
In this mode we need <GCFR_DELTA_ACTION_CODE> normally it comes from our data source and only provide the changed data. Therefore it would save a lot of time and resource. If we want to use this mode,please have a interface agreement with the source data provider.

0 - new record ( SCD2 )

1 - update record ( UPDATED IN OLD ROW )

2 - delete record ( ONLY NEW, NO OLD )

0 -means every time there is a change the old record is preserved, high end date is

updated and new record is added

1 -whatever changes are there do it to existing record

2 -delete old record (dont maintain hist) and keep only new record

Transaction-25

Input: Source data for GCFR Transaction Transform Process is transaction or event from source. • Process: GCFR Transaction process will insert new transaction into target table.

Output: Target table is transaction type table where Start Date will be set with Business Date and End Date will always be HIGH DATE.
Unlike FULL/DELTA transformation. Transaction transformation don't capture the attribute change and merge it into Target.
* Transaction transformation only Insert/Append to target table if new key column/drop have been detected in source table.

Let's incorporate the incremental strategies here and check with suits the best for GCFR type 25

consider an use case of H_CUST load in CAA

1. Load the key column to H_CUST

2. Don't insert new value if its already existing

3. If the source deletes a record which is already loaded into H_CUST, don't have to perform delete here JUST INSERT!

Descript how GCFR FULL/DELTA/TRANSACTION transformation Works, and how to select a transformation mode.
--> transactional -
Unlike FULL/DELTA transformation. Transaction transformation don't capture the attribute change and merge it into Target.Transaction transformation only Insert/Append to target table if new key column/drop have been detected in source table.
-->full -
Once a record have been populated. There always a record's End Date is '9999-12-31'.
If the data was missing in source table. The delete flag is set to 1 for that period.
The start date and end date means the validate period of the row content.
This mode will compare the the existing data with the target data to find our the changed one.We need to provide whole data of the source and GCFR process would scan all of them and all of the target table.
-->delta-
The same as FULL transformation.
The difference between FULL transformation is you need a column called <GCFR_DELTA_ACTION_CODE> which contains 0 New/1 Updated/2 Deleted for each row.
In this mode we need <GCFR_DELTA_ACTION_CODE> normally it comes from our data source and only provide the changed data. Therefore it would save a lot of time and resource. If we want to use this mode please have a interface agreement with the source data provider.
1.7 How GCFR make this happen:
Suppose we Will pupulate datas into TargetTable In Container TargetDB using Process TX02_Transform_01 and the transformation view also called TX02 Transform_01. The key columns defined in GCFR is PK COL and target table contains attributes ATTRN.
There are three Stage for GCFR to populate data into Target table.
1. Populate into IMG table.
2. Populate into INS table.
3. Populate into Target Table.

How GCFR make this happen:
Suppose we Will pupulate datas into TargetTable In Container TargetDB using Process TX02_Transform_01 and the transformation view also called TX02_Transform_01.
The key columns defined in GCFR is PK_COL and target table contains attributes ATTRN.
There are three Stage for GCFR to populate data into Target table,
1. Populate into IMG table,
2. Populate into INS table.
3. Populate into Target Table.
IMG Stage:
1. GCFR will CREATE IMG table if it does not exists, Or if TargetTable have been modified, GCFR will also DROP/CREATE the IMG table.
2. GCFR will populate results of Transformation View into IMG table.
3. GCFR will populate value of control columns on IMG table.
Full Transformation:
Note: Verification_Flag doesn't works for FULL transformation.
Delta Transformation:
Note: GCFR will set <Record_Deleted_Flag> to 1 when GCFR_Delta_Action_Code is 2.
The Insert/Update row will treated the same with 0. This step will be skipped if the process's Verification_Flag is set to 0.
Transaction Transformation:
Note: This step will be skipped if the process's Verification_Flag is set to 0.
INS Stage:
1. Prepare Data need to populated into TargetTable.
2. Tansaction Transformation only detects the data rows which the PK_COL are new.
3. Delta Transformation only expired data rows which GCFR_Delta_Action_Code is 2.
Note: There are difference between FULL/DELTA mode, from source we can tell when detect whether the attribute have changed, Full mode considered the NULL value, Delta didn't. This need to prove if delta mode could handle NULL value porperly.
Transaction Transformation:
Note: If the process's Verification_Flag is set to 0, GCFR will populate the result of Transformation View into INS table without checking existence of the boy column in torgat table
Merge Stage:
1. Populate data rows into TargetTable.
2. For FULL/DELTA transaction, Expire original data rows have been updated/Deleted by set End_Date to business date -1.
Full Transformation: Merge to Target(FULL)
Delta Transformation: Merge to Target(Delta)
Transaction Transformation:
Note: The truncate only happens when Truncate_Target is set to 1.
And Truncate Target Flag only works for Transaction Transformation.
1.8 How
Streambusdate -- streamstart--pr1 -prs2 -pr3-----streanend ----streanenddate
gcfr call vm, n populalate table
Stream is parent box having many process(table to load)
The processing patterns interact in a set sequence. It is possible to develop each group in isolation due to the abstraction achieved through the use of views over The interaction sequence is as follows:
tables
1. Acquisition Layer (Tier 1) Load (Load of source data to staging)
2. BKEY and BMAP
3. Transformations (Acquisition Layer Staging to Acquisition Layer Source Image or Integrated Data Layer (Tier 2), Tier 2 to Tier 2 or Access Layer (Tier 3))
4. Export - The process of development for each processing pattern is the same.
Step 1: INPUT - Create tables and/or views to present the input data set to the process. The input data set is presented to the ELT process that will create/update the target data set.
Step 2: PROCESS and OUTPUT
Step 3: PROCESS and OUTPUT
identify Job Parameters for GCFR Perl Script
Build and test the job by calling GCFR Perl Job Ids
Upon completion of development, an implementation package is built for deployment and incorporation in the ELT Batch Schedule.
1.9 Error handle
like prev stream not close
select * from PRD_GCFR_V.GCFR_Stream_BusDate_Log where stream_key = 5 and business_date >= DATE 2021-08-30'
select * from PRD_GCFR_V.GCFR_Stream_BusDate where stream_key =
select * from PRD_GCFR_V.GCFR_Stream_Id_Log where stream_key = 5 and business_date >= DATE 2021-08-30'
We can manually delete this rows (row 21), to reset the log back to what it was in the past
delete PRD_GCFR_V.GCFR_Stream_Id_Log where Business_date = DATE 2021-08-30' and stream_id = 2190
select * from PRD_GCFR_V.GCFR_Stream_Id where stream_key = 5
update PRD_GCFR_V.GCFR_Stream_Id
set stream_id = 2189
business date = DATE 2021-09-06'
next business date = DATE 2021-09-07'
, prev_business_date = DATE 2021-09-05"
business_date_cycle_Start_ts= TIMESTAMP 2021-09-06 00:00:00.000000"
processing_flag = 1
where
stream_key = 5

2.0 BKEY

we give src keys concat to bk process n in table v get edw key as sk.

Conceptual Overview

A surrogate key is any unique identifier required to be generated for the sake of representing some base object (e.g. a Customer, an Event etc). Typically in a Logical Data Model (LDM) the primary key of many of the base entities is a generated Surrogate Key (e.g. Party_ld). The intention is to create a binary relationship (hence the term BKEY) between one logical data key (one or many input columns) from source to one single identifying column (the surrogate key) of data in a physical table in the EDW.

The requirement for surrogate keys is normally identified during the modelling, mapping and data profiling processes of the data warehouse development.

Why are Surrogate Keys needed?

2.0.1 Domain Crossover

Surrogate keys are needed to solve the problem of domain crossover from disparate sources. Simply stated, this means two source systems may use the same physical value for identification but identify different objects. For example
An organisation may have two separate customer systems. Each system has a customer number.
Both systems start the customer number from 1 traversing to n.
The customer identified by customer number value 1 in the first system is not the same as the customer identified by customer number value 1 in the second system.

2.1.1 No Single Source Key

Surrogate keys may be needed because no single key exists and the collection of columns that makes the itern unique is too great for ease of use.
For example -
Customers may have names in many parts (e.g, title, first name, last name, known as name), date of birth and location of birth recorded in the source system.
Neither of these alone may is enough to guarantee unique identification, The combination that gives unique identification is difficult to manage consistently across programs and functionality in the data warehouse (concatenation, purging of spaces, and management of unexpected characters).

Abstraction

An advantage of surrogate keys use is the benefit of abstraction. Surrogate keys shield the impact to data warehouse tables should there be changes to the source key fields in source system. Using the surrogate key architecture the impact is limited to surrogate key table and key generation process.

BENEFIT:

>Stability
Surrogate keys are independent of the actual data values in the table, so they are not subject to changes in business RULE OVER TIME . For example, if a department name changes, using a surrogate key avoids having to update all foreign keys referencing that department.

>Uniqueness
Surrogate keys are guaranteed to be unique for each row, enforcing data integrity. They provide a simple, system-generated, business-agnostic column to uniquely identify each row.

>Flexibility
Surrogate keys allow rows to be uniquely identified even if the natural key changes. This is important when tracking historical changes, such as in slowly changing dimensions in data warehouses.

>Performance
Surrogate keys, usually integers, are smaller and processed faster than natural keys, especially if the natural key is a composite of multiple columns. Joins on integer keys are faster than joins on date or character keys.

>Handling Special Values
Specific integer surrogate key values can be used to represent special cases like "not applicable" or "date to be determined", which may not be I possible with the natural key. Simplicity Using a single surrogate key column is simpler than using a composite natural key made up of multiple columns. This simplifies queries, reduces storage, and makes the data model easier to understand.

BKEY Data Model

The BKEY Data Model used in the Control Framework holds the mapping between the logical data key from source and the surrogate key generated by the framework module,

The BKEY Data Model is intended to be implemented 'as-is' when using the Control Framework. However, it is anticipated that each customer who implements the Control Framework may have further needs and requirements that are not met by the standard Control Framework. This data model can be used as a basis for customization and extension to the Control Framework. Any customizations to the data model must be reflected in the physical data model, code modules, scripts and macros.

The model consists of four main tables:
BKEY_<Standard_Key> - Mapping of source key to data warehouse surrogate key
BKEY_<Standard_Key> Next_Id - Next available EDW key for a key set
BKEY_Key Set - metadata describing each instance of the BKEY
BKEY_Domain - metadata describing each instance of the identified domains within each Key Set

BKEY_Key Set

All surrogate key sets are registered in BKEY_Key_Set. This is a static reference table.

BKEY Key Set Log

History of updates to this table are captured here. The Log table has identical columns to the corresponding master table however the primary key includes the Update_Ts.

BKEY_Domain

A Key Set may contain zero or many domains. This is a static lookup table and provides a description for every identified Dornain and links that Domain to the relevant Key Set.

BKEY_Domain_Log

hist of table.

BKEY <Standard_Key>

Specific instances of this table will hold all the surrogate keys generated for a specific key set. This table links the surrogate key (EDW_Key) to the actual source data field(s) (Source Key). The registration process for the key set will create the specific instance of the BKEY_Standard_Key> table. 'BKEY_Standard_Key>' is replaced with the name identified in the key set table.

This table contains the mapping between source key value(s) and the data warehouse surrogate key value. One of these tables should exist for every Key Set Id. A suggested naming convention for this construct is to rename each BKEY_<Standard Key> table as follows. Table Name: BKEYnnnnn_xxxxxxx.... Key_Set_Id and xxxxx = code name assigned as per the agreed Naming Standard D E.g. BKEY0001_Party_ld

2.1 BMAP

sel * from prd_gcfr_v_ultfw.bmap_standard_map

There is only 1 bmap for many satelyt

Conceptual Overview

The reference data code management component of the Control Framework (BMAP) consists of the following.

DA data model that represents the meta data required to manage the source reference data code sets, changes to them over time and the binary relationship between the source code and the code assigned by the data warehouse.
A special transformation module that inspects incoming source ELT data streams for missing reference data and builds temporary references to allow the ELT to proceed without compromising referential integrity.

A series of special load modules that initially load and update the data in the static data in the BMAP data model.

2.1.1 Why is Reference Data Code Management needed?

Reference data flows to the data warehouse from a variety of sources, most of which are usually independent of each other.

Formal source data feed of reference data codes, values and descriptions that may or may not be part of a wider Enterprise Data Management and Governance process.

Adhoc source data feed of reference data codes, values and descriptions.

Manual data notification from an individual or system.

Informal notification where the reference data code is included in a row of a source data feed.

As part of a technology update where reference information is part of a tool or 3rd party product.

This flow of reference data presents a variety of challenges when developing a data warehouse.

The same item of reference data can have different representations in the different source systems.

The source systems may also use different values for the same domain or have divergent domains with the similar and different values.

The reference data item does not remain static in the source system and can be re-used in the source system and given a different meaning. Reference data updates are not available to the data warehouse in a timely fashion.

The user's view of that item of reference data may require multiple representations in the semantic layer.

O Referential integrity with the data warehouse may be compromised if the source reference data code has not been loaded.

gcfr t_utlfw.bmap_standard_map

2.2 Statements
A. START OF BUSSDATE

This pattern is used to set the current, previous and next business dates for the relevant stream. Processing for the previous business date must be completed otherwise this process will fail. All streams must be started via the Start of Day process. ELT processes cannot proceed until the stream has been started.

Increments the Business Date for the Stream to the next date ready for processing.
proc - GCFR_CP_StreamBusDate_Start

B. Start of Stream

This pattern is used to instantiate a Stream within the current business date. Only one instance of

any given Stream_Key can be active at any one time. All streams must be started first via the Start of Day process. ELT processes cannot proceed until the stream has been started and instantiated.
PROC - GCFR_CP_Stream_Start

C. BKEY (Surrogate Key Transform)

Once source data has been loaded to a staging table it is possible to run the BKEY (surrogate key management) transforms that relate to that source data.

The Process Type allocated for this pattern is 21.
proc - GCFR_PP_BKEY

D. BMAP (Reference Code Value Transform)

Once source data has been loaded to a staging table it is possible to run the BMAP (reference code value) transforms that relate to the source data. This pattern detects unexpected source reference code values and assigns the unexpected value an interim data warehouse code value.

The Process Type allocated for this pattern is 22.
proc - GCFR_PP_BMAP

E. Target Table Transformation

Target table transformations apply the source data, BKEY surrogate keys and BMAP EDW codes to the target tables. A target table transformation may be a full transformation, a delta transformation or a transaction transformation. The transformation type is dependent upon the source data supplied. Transformations may take the form of:

• -Acquisition layer Staging table to Acquisition layer Source Image table

-Acquisition layer Staging or Source Image table to Integrated Data layer Target table

-Integrated Data layer Target table to Integrated Data layer Summary/Common/Aggregation table

-Integrated Data layer Target/Summary/Common/Aggregation table to Access layer physical data mart table

E.1
Development Option 1 (with no associated processing pattern)
-Preparation of Data Set is considered as a separated process
-The preparation process must be finished successfully before starting the GCFR Transform process

Development option 2:
-Preparation of Data Set is considered as a part of process

-The preparation process start after gcfr process initiate is finished

E.2

GCFR Transformation Processing Pattern
Following are key points to consider for Transform.
-Input data is normally from the staging tables
-A staging table can contain either
O A full dump of current state source data
OChanges to current state source data (delta)
OTransactions (events)

--The transformation process pattem compares the incoming source data set specified in the transformation view with the target data set
--Target rows that are to be updated will be end date/time stamped

O Valid only for full master and delta transforms
--Target rows that are to be deleted as they are no longer present in the source data will be flagged as deleted

OValid only for full master transforms or flagged delta

--New state rows will be added as open rows
--A transform process can only logically delete (i.e. end date and close) rows belonging to the same process

F. END OF STREAM
Close open stream with curr buss date.only 1 stream can be active at a tym.
proc- GCFR_CP_Stream_End

g . End of Business Date
This pattern is used to close the current Business Date for the given Stream. This pattern does NOT check for active stream instances as the batch schedule should ensure all processing is complete prior to closing the current business date. This pattern additionally sets the status of all source system extracts for the given Stream and Business Date to consumed. This allows automated management of consumed source system extracts (e.g. such as moving to an archive directory for archiving).

Registration:

1. Register stream
--initiate bus date to 26

CALL dev6_GCFR_P_API_UT.GCFR UT Register_Stream (243,1,T02_BRKR_TPMI_01',Cast(26/09/2014' AS DATE FORMAT 'DD/MM/YYYY'));

SEL FROM prd_racoe_ass_gdr_v.gdr_stream_busdate WHERE stream_key=999;

SEL FROM prd_racoe_ass_gdr_v.gdr_stream WHERE stream_key=999;

SEL FROM prd_racoe_ass_gdr_v.gdfr_stream_id WHERE stream_key=999;

2.Register process

EXEC dev6_GCFR_M.GCFR_Register Process(TX02_H_BRKR_01, Transform process to load TPMI in H_BRKR in IL layer',25,15,243, 'dev6_GCFR_INP_V,TX02_H_BRKR_01', 'dev6_ADS_IL_VM','VM_H_BRKR', 'dev6_ADS_IL_T', 'H_BRKR', 'dev6_GCFR_T_TMP'.... 1,0,1,0);

EXEC dev6_GCFR_M.GCFR_Register_Process('TX02_S_BRKR_TPMIDETAILSD_CAP_01', 'Transform process to load tpmi Satellite table in IL Layer',23,15,243,'dev6_GCFR_INP_VTX02_S_BRKR_TPMIDETAILSD_CAP_01','dev6_ADS_IL_VMVM_S_BRKR_TPMIDETAILSD_CAP', 'dev6_ADS_IL_T'S BRKR_TPMIDETAILSD_CAP', 'dev6_GCFR_T_TMP 1,0,0,0);

SEL FROM cont_V.GCFR_Process_Id WHERE process_name LIKE '%TX01_CFR_PCC ANZ_ACCT_OPT%'

SEL* FROM cont_GCFR_V.GCFR_Process WHERE stream_key=999;

3. register key

EXEC dev6_GCFR_M.GCFR_Register Tfm_KeyCOI('cont_ADS_IL_VMVM_H_BRKR','BRKR_ID');

EXEC dev6_GCFR_M.GCFR_Register Tfm_KeyCOI('cont_ADS_IL_VM,VM_S_BRKR_TPMIDETAILSD_CAP/BRKR_ID');

SEL FROM DEV6_GCFR V.GCFR_Stream_BusDate WHERE STREAM_KEY = nm;

SEL FROM prd_racoe_gcfr_v.gcfr_transform_keycol WHERE out_object_name LIKE '%rm %';

PRD_GCFR_T_TMP. TX02_ram_01_IMG/INS

Mandatory steps to execute:

CALL (ENV) GCFR_P_API_CP.GCFR_CP_StreamBusDate_Start(<Stream_Key>,6,RC,RM);
# in order to start a new business date mandatory to execute as the first step

CALL (ENV_GCFR_P_API_CP.GCFR_CP_Stream_Start(<Stream_Key>, ".6,RCRM);
# in order to start a stream. Mandatory second step

Based on your requirement select the specific step to execute your logic:

CALL (ENV_GCFR_EXT_P.spCheckStream ProcessDependency("<Dependency_Name>'RM);
CALL (ENV_GCFR_P_API_PP.GCFR_PP_BMAP('<Process_Name>',6,RC,RM);
CALL (ENV) GCFR_P_API_PP.GCFR_PP_BKEY('<Process_Name>',6,RC,RM);
CALL (ENV_GCFR_P_API_PP.GCFR_PP_Tfm Txn(<Process_Name>',6,RC,RM);
# if you opt for a Transaction transformation
CALL (ENV) GCFR_P_API_PP.GCFR_PP_Tfm Full('<Process_Name>',6,RC,RM);
# if you opt for a full transformation execute this
CALL (ENV_GCFR_P_API_PP.GCFR_PP_Tim Delta('<Process_Name>,6,RC,RM);
# if you opt for a delta transformation while registering Mandatory steps to execute once all process completes inorder to close the active stream and business date:

CALL (ENV_GCFR_P_API_CP.GCFR_CP_Stream_End(<Stream_Key>,6,RC,RM); #Mandatory step to close the stream.

CALL (EN) GCFR_P_API_CP.GCFR_CP_StreamBusDate_End(<Stream_Key>,6,RC.RM); # mandatory step to close the business date. /* Start a new business date for the stream */

TABLES :

SELECT * FROM PRD GCFR V.GCFR Stream ORDER BY Stream_Key;

Frequent Used Tables & Procedures

System: (ENV) GCFR_V.GCFR_System

Stream:

(ENV) GCFR_V.GCFR_Stream

(ENV)_GCFR_V.GCFR_Stream_BusDate --Only need to reset this table if you want to reset business date. Dates in tables GCFR Stream IDGCFR Process ID will be automaticaly copied from this table.

(ENV) GCFR_V.GCFR_Stream_ID

Transform Process:

(ENV_GCFR_V.GCFR_Process

(ENV_GCFR_V.GCFR_Process_ID

Key Column(s)

(ENV) GCFR_V.GCFR_Transform_KeyCol

Execution Log:

(ENV_GCFR_V.GCFR_Execution_Log

SEL * FROM DEV6_GCFR_V.GCFR_Stream_BusDate WHERE stream_key='244"

SEL * FROM DEV6_GCFR_V.GCFR_Stream_Id WHERE stream_key=1244"

SEL * FROM DEV6_GCFR_V.GCFR_process_id WHERE stream_key='244"

UPDATE DEV6_GCFR_V.GCFR_Stream_BusDate
SET NEXT_BUSINESS_DATE = '2014-09-29', BUSINESS_DATE = '2014-09-28, PREV_BUSINESS_DATE = '2014-09-27 PROCESSING_FLAG = 1
WHERE STREAM KEY = 244;

UPDATE DEV6_GCFR_V.GCFR_Stream_ld
SET NEXT BUSINESS_DATE = '2014-09-29', BUSINESS_DATE = '2014-09-281 PREV_BUSINESS_DATE = '2014-09-27 PROCESSING_FLAG = 1
WHERE stream_key='244';

Stream

SEL FROM PRD_GCFR_V.GCFR_Stream WHERE STREAM_KEY = 305

SEL FROM PRD_GCFR_V.GCFR_Process WHERE STREAM_KEY = 305

SEL FROM PRD_GCFR_V.GCFR_Process WHERE PROCESS_NAME = 'TX03_RTL_FCT_QLTY_SALE_011

SEL FROM PRD_GCFR_V.GCFR_Transform WHERE PROCESS_NAME='

TX01_MNL_AU_RTL HILM INDVDL_PRD_TGT_01"

SEL * FROM PRD_GCFR_V.GCFR_Stream_BusDate WHERE STREAM_KEY = 1214

SEL * FROM PRD_GCFR_V.GCFR_Stream_BusDate_Log a WHERE STREAM_KEY = 1214

To see when was last load by bx process can be tx01 bx02 bx03

SEL FROM PRD_GCFR_V.GCFR_Transform WHERE PROCESS_NAME = 'TX01_DRM_AU_BRNCH ACTV_01' ORDER BY Update_TS DESC

Select from PRD_CTLFW_V_GCFR.GCFR_Process WHERE GCFR Process.Target TableName LIKE '%ONLYTABLENAME...%

select * from PRD_CTLFW_V_GCFR.GCFR_Stream_busdate_log

select * from PRD_CTLFW_V_GCFR.GCFR_Stream_id_log

SELECT * FROM DEV6 GCFR_V.GCFR_STREAM BUSDATE

--call statement

CALL DEV6 GCFR_P_API_CP.GCFR_CP_StreamBusDate_Start(snum,6,out1,out2); CALL DEV6_GCFR_P_API_CP.GCFR_CP_Stream_Start(snum.",6,out1,out2); CALL DEV6_GCFR_P_API_PP.GCFR_PP_Tim Full (TX02_nm_03',6,out1,out2); CALL DEV6_GCFR_P_API_CP.GCFR_CP_Stream_End(snum,6,out1,out2); CALL DEV6_GCFR_P_API_CP.GCFR_CP_StreamBusDate_End(snum,6,out1,out2);

UPDATE PRD_CTLFW_V_GCFR.GCFR_Stream_BusDate SET NEXT_BUSINESS_DATE = '2021-09-18', BUSINESS_DATE = 2021-09-17. PREV_BUSINESS_DATE = '2021-09-16,PROCESSING_FLAG = 1 WHERE STREAM_KEY = num:

UPDATE PRD_CTLFW_V_GCFR.GCFR_Stream_ld SET NEXT_BUSINESS_DATE = '2021-09-18, BUSINESS_DATE = '2021-09-17, PREV_BUSINESS_DATE '2021-09-16', BUSINESS_DATE_CYCLE_START_TS=2021-09-17 00:00:00.000000 PROCESSING_FLAG=1 WHERE Stream_Key = num;

Update PRD_CTLFW_V_GCFR.GCFR_Process_Id Set business_date='2021-09-17' WHERE Stream_Key in (num)

UPDATE ROLLBACK:
WEN UPDATE HAPEN , END DT = BUSSDT-1

1. DELETE FROM PRD_RACOE_ASS_ADS_URSDS.CFR_PCC_ANZ_ACCT_OPT

WHERE START DATE > DATE '2024-02-10';

/* update as if last run was 10th-

11 was sunday so no run on that day, 12th and ahead dates ran.

so WHEN 12th ran as it is daily, it would have enddated prev records with end date as 11.

SELECT DISTINCT end_date FROM PRD_RACOE_ASS_ADS_URSDS.CFR_PCC_ANZ_ACCT_OPT
WHERE START DATE='2024-02-10' has 10,11,12...

SELECT DISTINCT end_date FROM PRD_RACOE_ASSADS_URSDS.CFR_PCC_ANZ_ACCT_OPT
WHERE START DATE='2024-02-10' --after update end date> 10, we have 2 date 9999 and 10th feb

when job ran for 10th start date correctly, it will end date prev as 9th

so update end date to 9999 for all where end date >=11

2. UPDATE PRD_RACOE_ASS_ADS_URSDS.CFR_PCC_ANZ_ACCT_opT
SET end_date = DATE '9999-12-31', UPDATE_PROCESS_NAME = NULL, UPDATE_PROCESS_ID = NULL
WHERE end_date >= DATE '2024-02-10';

--whatever START st DEL-12, -1 shd be updated AS END IN WHERE

3. INSTEAD OF doing ALL these 3 below, can re-reg the stream FOR the running DATE

UPDATE PRD_RACOE_ASS_GCFR_V.GCFR_Stream_BusDate
SET NEXT_BUSINESS_DATE = '2024-02-12', BUSINESS_DATE PREV_BUSINESS_DATE = '2024-02-10',PROCESSING_FLAG
WHERE STREAM_KEY = 999;

UPDATE PRD_RACOE_ASS_GCFR_V.GCFR_Stream_Id='2024-02-11', SET NEXT_BUSINESS_DATE = '2024-02-12', BUSINESS_DATE = '2024-02-11'', PREV_BUSINESS_DATE = '2024-02-10', BUSINESS_DATE_CYCLE_START_TS='2024-02-11 00:00:00.000000', PROCESSING FLAG=1 WHERE Stream_Key=999;

Ichs

UPDATE PRD_RACOE_ASS_GCFR_V.GCFR_Process_Id
SET business date='2024-02-11'
WHERE Stream Key IN (999);

CALL prd_racoe_ass_GCFR_P_API_UT.GCFR UT_Register_Stream (999,1, 'CFR_PCC_ANZ_ACCT_OPT',Cast('12/02/2024' AS DATE FORMAT

SEL * FROM prd_racoe_ass_gcfr_v.gcfr_stream_busdate WHERE stream_key=999;

SEL * FROM prd_racoe_ass_gcfr_v.gcfr_stream WHERE stream_key=999; SEL * FROM prd_racoe_ass_gcfr_v.gcfr_stream_id WHERE stream_key=999;

'DD/MM/YYYY'));

is enoug but check. SEL * FROM prd_racoe_ass_gcfr_v.GCFR_Process_Id WHERE stream_key=999;

EXEC prd_racoe_ass_GCFR_M.GCFR_Register_Process('TX01_CFR_PCC_ANZ_ACCT_OPT', 'CFR_PCC_ANZ_ACCT_OPT in ursds layer',23,32,999, 'prd_racoe_ass_GCFR_INP_V', 'TX01_CFR_PCC_ANZ_ACCT_OPT', 'PRD_RACOE_ASS_ADS_URSDS', 'CFR_PCC_ANZ_ACCT_OPT', 'PRD_RACOE_ASS_ADS_URSDS', 'CFR_PCC_ANZ_ACCT_OPT', 'prd_racoe_ass_GCFR_T_TMP',,,,1,0,1,0) -----check gcfr v table process to c target table is test not real opt n tx logic, then call for 12, create vr for 12 to c count

SEL * FROM PRD_RACOE_GCFR_V.GCFR_Process_Id WHERE process_name LIKE '%TX01_CFR_PCC_ANZ_ACCT_OPT%'

SEL

FROM PRD_RACOE_ass_GCFR_V.GCFR_Process WHERE stream_key=999;

SEL FROM prd_racoe_ass_gcfr_v.gcfr_transform_keycol

WHERE out_object_name LIKE '%CFR_PCC_ANZ_ACCT_OPT%';

--tx

=============================

========================================================

2.2 Working NZ

2.3 Registration Macros

There are registration macros for each type of GCFR metadata. It is not recommended that each item of metadata is registered individually using the registration macros described below. A BTEQ script (or equivalent) can be developed to call the registration macro. The BTEQ script can be generated from the GCFR Metadata registration tool.

Each registration macro maintains the lineage metadata on the GCFR table. GCFR metadata tables have an associated GCFR _Log table that is updated via a trigger process to record history.

The call format for registration macros is as follows:

EXEC macro_name (argument #1, argument #2,...., argument #n);

All registration macros will Delete the matching existing registration prior to Inserting a new record via the view of the relevant GCFR metadata table.

Register All Systems

All inbound and outbound systems of the data warehouse must be identified and allocated a unique identifier (Ctl_ld) and name (Sys_Name) according to the agreed naming standard.

Typically, a system is not a server, but a logical software application, like a Customer System or Accounts System.

Macro Name: GCFR_Register System

Argument List: Ctl Id, System Name, Path_Name, System Description

View name: GCFR_System

Note the Path_Name is the UNIX format absolute path. It must end with a '/', e.g. 'C:/GCFR_ROOT/ or /CTFLW_ROOT/ The Path_Name represents the GCFR ELT Work root directory as described in the GCFR Software Installation Guide. The GCFR Metadata Registration Tool allows automation for this registration step through the "GCFR System" work-sheet.

Register All Corporate Entities

All entities with a data interest in the data warehouse must be identified and allocated a unique identifier (CE_Id) and name (Corporate_Entity_Name) according to the agreed naming standards. More than one corporate or legal entity can have access to and some kind of proprietary ownership of, data stored in the data warehouse. Global companies often have enterprise data warehouses where data from different countries and legal entities is stored. There is a need to logically separate the data whilst keeping it in the same schema and this is done through the GCFR Corporate Entity.

The GCFR refers to these 'entities' as "corporate entities", as some may be divisions or components of a real legal entity and not themselves actual legal entities yet they share the access and ownership traits. Typically, a Corporate_Entity may be a company or legal entity but equally can be some arbitrary division to limit data access (e.g. Global marketing).

Streams and the Business Date

Business Date Definition

The Business Date represents the date of the source data within a given Stream. The source system is expected to provide the Business Date as per the GCFR Source Extract File Guide (Asset Repository ID KA66947).

Stream Identification

identification couples the loading of source data to the staging area with the transformation processes that populate the target tables.

A Stream is a unit of schedulable work. Once completed a Stream may not be re-run without intervention to revert output data and GCFR metadata to the pre-run state.

There is no limit to the number of times a Stream may be run on any given date. Only one instance of any Stream is allowed to be run at a time.

Examples of Streams include:

-Daily Customer source data feeds and transformations

-Monthly Account Balance data feeds and transformations

-Call Detail Record trickle feeds and transformations

An important aspect of stream identification is that it represents the queuing point for the inbound data. It is expected that inbound data are files [see GCFR Source Extract File Guide (Asset Repository ID KA66947)] that queue on a landing server prior to loading to the staging area of the data warehouse. At any one time only the active stream's source data will be present in the staging tables awaiting transformation. Once a stream has completed processing the next set of inbound data, if available, can be picked up from the waiting queue and loaded staging.

If the GCFR is customized to change the queuing location (e.g. loading directly to staging tables therefore queuing data in the staging table) then stream identification will be different as the load of the data feed and the subsequent transformation become uncoupled.

Register All Streams

A stream must be identified and allocated a unique identifier (Stream_Key) and name (Stream_Name).

The frequency of the stream is identified by the Cycle Freq_Code. For valid values and details of each Cycle_Freq_Code, please refer to section

ahead Business Date is the initial business date for this stream to perform ELT.

-Registration Stored Procedure Name: GCFR_UT_Register_Stream-

-Argument List: Stream_Key, Cycle Freq_Code, Stream_Name, Bus-iness_Date

-View name: GCFR Stream, GCFR_Stream_BusDate

This stored procedure registers the stream in the GCFR_Stream table and sets up the business date information in the GCFR_Stream_BusDate table based on the initial business date and cycle frequency code provided.

This stored procedure creates an initial stream instance (dummy) in the GCFR_Stream_Id operational metadata table. The GCFR Metadata Registration Tool allows automation for this registration step through the "GCFR_Stream" work-sheet.

Cycle Frequency Codes - values and functionality

0 - Intra Day

GCFR Development Guide6-46

GCFR Development Guide

1- Daily

7- Weekly

14- Fortnightly

Monthly - range of values are detailed are as below

Cycle Freq Code : Purpose

Always set to 4th last day of every month. e.g. 28th Jan, 25th Feb 2014, 28th Mar, 27th April

Always set to 3rd last day of every month, e.g. 29th Jan, 26th Feb 2014, 29th Mar, 28th April

Always set to 2nd last day of every month. e.g. 30th Jan, 27th Feb 2014, 30th Mar, 29th April

Always set to last day of every month. For Feb, it will be set to 28th or 29th based upon whether it is a leap year or not

Register All BKEY Key Sets and Domains

The normal SDLC will see source data profiled, a source key model produced, a logical data model produced and mapping from the target model to the source data. During this process any requirements for surrogate keys should be identified. A register of all BKEY Key Sets and the domains they contain should be produced. See GCFR SurrogateKey BKey Guide (Asset Repository ID KA66947) for full details.

The main identifier for each subject area of a Teradata Industry Logical Data Model (Ildm) is usually a surrogate key (e.g. Party_ld and Event_Id) because either the natural key is not suitable or there are crossed domains.

Register BKEY Key Sets

All surrogate key sets must be identified and allocated a unique identifier (Key_Set_Id). Macro Name: GCFR_Register Bkey_Key_Set

Argument List: Key_Set_Id, Description, Key_Table_Name, Key_Table_DB_Name

View name: GCFR_Bkey_Key_Set

The GCFR Metadata Registration Tool allows automation for this registration step through the "BKEY_Key_Set" work-sheet.

Register BKEY Domains

Domain identification is required where there are crossed domains going into a single key set. For example, a bank has branches and agencies. Both branches and agencies are recognized as a Party in the Financial Services LDM (FS-LDM). In the source system the branch is identified by a numeric value commencing from one. The agency is also identified in a different part of the source system by a numeric value commencing from one. Thus you would need to define two domains within the Party Id key set, one from branch and one for agency.

All domains within a surrogate key set must be identified and allocated a unique identifier (Domain_ld).

Macro Name: GCFR_Register_Bkey_Domain

Argument List: Key_Set_Id, Domain_Id, Description

View name: GCFR_Bkey_Domain

The GCFR Metadata Registration Tool allows automation for this registration step through the "BKEY_Domain" work-sheet.

Bkey Tables and Views Creation during Registration

For each Bkey registered, there should be a corresponding Bkey standard table which contains all the Bkeys generated for that particular Source Key set. The following stored procedure creates this table and may be called during registration process to avoid creating Bkey standard table manually.

Stored Procedure Name: GCFR_UT_BKEY_St_Key_CT

Argument List: iDatabaseName, iKey_Table_Name, Ibigint Flag

Register All BMAP Code Sets and Domains

The normal SDLC will see source data profiled, a source key model produced, a logical data model produced and mapping from the target model to the source data. During this process any reference data code sets should be identified. A register of all BMAP Code Sets, their valid languages and the domains they contain should be produced. See the GCFR Reference Code BMAP Guide (Asset Repository ID KA66947) for full details.

It is possible to use the BMAP data model to store Teradata Industry Logical Data Model (Ildm) minor entities rather than having a large number of minor entity tables in your physical data model.

Register BMAP Code Sets

All reference code sets must be identified and allocated a unique identifier (Code_Set_Id). Macro Name: GCFR_Register_Bmap_Code_Set

Argument List: Code_Set_Id, Description, Map Table_Name, Map Table_DB_Name

View name: GCFR_Bmap_Code_Set

The GCFR Metadata Registration Tool allows automation for this registration step through the "BMAP_Code_Set" work-sheet.

Register BMAP Domains

Domain identification is required where there are crossed source domains going into a single code set. Domains are identified where source systems do not use standard code values for the same code set. For example, one system may store the currency code for Australian Dollars as '01' and another system may store the currency code for Australian Dollars as '03'.

All domains within a reference code set must be identified and allocated a unique identifier (Domain_ld).

Macro Name: GCFR_Register_Bmap_Domain

Argument List: Code_Set_Id, Domain_Id, Description

View name: GCFR_Bmap_Domain

The GCFR Metadata Registration Tool allows automation for this registration step through the "BMAP_Domain" work-sheet.

Register BKEY Process_Type = =21

The following arguments are mandatory for BKEY process registrations. The GCFR Metadata Registration Tool allows automation for this registration step through the "GCFR_Process-BK" work-sheet.
--> Process Name, Process Type, Control ID, Stream Key, In DB_Name, In_Object_Name, Out_DB_Name, Out_Object_Name, Target TableDatabaseName, Target TableName, Temp_DatabaseName, Key_Set_Id, Domain_Id, Collect_S

Note:

--While Registring a Bkey process, the value passed for argument in 'Out_DB_Name' for call to MACRO 'GCFR_Register_Process' should be same as passed to MACRO 'GCFR Register_Bkey_Key_Set' against the "Key_Table_DB_Name' argument.

--Collect_Stats is optional parameter. Setting as 1 (meaning the BKey pattern will collect the statistics on the Target_tableName requires that the Statistics should have been defined on Target_tableName before having the pattern to automatically collect the statistics onwards.

--If the 'Key Table _DB_Name' is any database other than the GCFR default 'Utility tables' database (i.e. referred as '$UTL_T' in GCFR Installation), and 'Collect_Stats' is set to 1, then following Statistics permission must be granted to the Key_Table_DB_Name,

GRANT STATISTICS ON Key_Table_DB_Name TO $GCFR_P_UT;

Register BMAP Process_Type = 22

The following arguments are mandatory for BMAP process registrations. The GCFR Metadata Registration Tool allows automation for this registration step through the "GCFR_Process-BM" work-sheet.

--Process Name, Process Type, Control ID, Stream Key, In_DB_Name, In_Object_Name, Out_DB_Name, Out_Object_Name Target TableDatabaseName, Target TableName, Temp_DatabaseName, Domain_Id, Code_Set_Id, Collect

Note:

--While Registring a BMAP process, the value passed for argument in 'Out_DB_Name' for call to MACRO 'GCFR_Register_Process' should be same as passed to MACRO 'GCFR_Register_Bmap_Codeset against the 'Map Table_DB_Name' argument.

--Collect_Stats is optional parameter. Setting as 1 (meaning the BMAP pattern will collect the statistics on the Target_tableName requires that the Statistics should have been defined on Target_tableName before having the pattern to automatically collect the statistics onwards.

--If the 'Map Table_DB_Name' is any database other than the GCFR default 'Utility tables' database (i.e. referred as '$UTL_T' in GCFR Installation), and 'Collect_Stats' is set to 1, then following Statistics permission must be granted to the Map_Table_DB_Name.

GRANT STATISTICS ON Map Table_DB_Name TO $GCFR_P_UT;

Register Transformation Process_Type = 23, 24 or 25

The following arguments are mandatory for transformation process registrations. The GCFR Metadata Registration Tool allows automation for this registration step through the "GCFR_Process-TX" work-sheet.

Process Name, Process Type, Control ID, Stream Key, In_DB_Name, In_Object_Name, Out_DB_Name, Out_Object_Name, Target TableDatabaseName, Target TableName, Temp_DatabaseName, Collect_Stats, Verification_Flag Note: Collect Stats is optional parameter. Setting as 1 (meaning the Transformation pattern will collect the statistics on the Target_tableName

requires that the Statistics should have been defined on Target_tableName before having the pattern to automatically collect the statistics onwards.

The parameter Verification Flag is applicable to process_type 24 and 25 only, where as Truncate Target is applicable to process_type 25 and 26 only and can be set as '1' or unset as '0' depending on the setting preferences. See the GCFR_Data_Model_Guide (Asset ID KA66947) for definition of the GCFR_Process table for valid argument values.

Note:

In case the Transformation pattern requires to use Truncate_Target option to truncate the Target table, the following pre-requisite permissions need to be granted.

The complete Hierarchy would resemble as below (or the databases hierarchy may be different as per deployment requirments).

O GRANT EXECUTE PROCEDURE ON $GCFR_P_FF TO SETL_USER;

O GRANT DELETE ON $TXFM_OUT_V TO $GCFR_P_FF WITH GRANT OPTION;

O GRANT DELETE ON $BASE_V TO STXFM OUT V WITH GRANT OPTION;

O GRANT DELETE ON $BASE_T TO $BASE_V WITH GRANT OPTIONS,

Currently in default GCFR installation scripts, the above one shown red has been skipped and may need to be explicitely granted if Target Truncation is required. Executing this process on target table located other than the '$BASE_T', there needs to check/provide the missing permissions equally as of stated as per example hierarchy above.

Transform Key Columns

It is also necessary to register Transformation Key Columns for all transformation processes related to process types of 23 & 24 including processes related to History Merge pattern (process type 29 & 30), where as for Export pattern related processes it is optional. The GCFR Transform Patterns (of types 23 & 24) require the registration of Key Columns for all Transform Input Views and Transform Output Views.

Transform Key Columns for Transformation patterns are used for generating Primary Key Join SQL Clause between IMG table and Target view when building dynamic SQL. This Primary Key Join Clause is used for source data verification. This Marco can be executed multiple times if primary key contains multiple columns. The GCFR Metadata Registration Tool allows automation for this registration step through the "GCFR_Tfm_KeyCol" work-sheet.

Macro Name: GCFR_Register_Tfm_KeyCOI

Argument List: Out_DB_Name, Out_Object_Name, Key Column

View name: GCFR_Transform_KeyCol

3. GCFR TABLE
WHOLE TABLE

Search This Blog

Anz learnings

GCFR mine kt

2.0 BKEY

2.1 BMAP

2.2 Statements
A. START OF BUSSDATE

Increments the Business Date for the Stream to the next date ready for processing.
proc - GCFR_CP_StreamBusDate_Start

B. Start of Stream

This pattern is used to instantiate a Stream within the current business date. Only one instance of

any given Stream_Key can be active at any one time. All streams must be started first via the Start of Day process. ELT processes cannot proceed until the stream has been started and instantiated.
PROC - GCFR_CP_Stream_Start

C. BKEY (Surrogate Key Transform)

Once source data has been loaded to a staging table it is possible to run the BKEY (surrogate key management) transforms that relate to that source data.

The Process Type allocated for this pattern is 21.
proc - GCFR_PP_BKEY

D. BMAP (Reference Code Value Transform)

Once source data has been loaded to a staging table it is possible to run the BMAP (reference code value) transforms that relate to the source data. This pattern detects unexpected source reference code values and assigns the unexpected value an interim data warehouse code value.

The Process Type allocated for this pattern is 22.
proc - GCFR_PP_BMAP

E. Target Table Transformation

•

-Acquisition layer Staging table to Acquisition layer Source Image table

-Acquisition layer Staging or Source Image table to Integrated Data layer Target table

-Integrated Data layer Target table to Integrated Data layer Summary/Common/Aggregation table

-Integrated Data layer Target/Summary/Common/Aggregation table to Access layer physical data mart table

E.1
Development Option 1 (with no associated processing pattern)
-Preparation of Data Set is considered as a separated process
-The preparation process must be finished successfully before starting the GCFR Transform process

Development option 2:
-Preparation of Data Set is considered as a part of process

-The preparation process start after gcfr process initiate is finished

E.2

GCFR Transformation Processing Pattern
Following are key points to consider for Transform.
-Input data is normally from the staging tables
-A staging table can contain either
O A full dump of current state source data
OChanges to current state source data (delta)
OTransactions (events)

--The transformation process pattem compares the incoming source data set specified in the transformation view with the target data set
--Target rows that are to be updated will be end date/time stamped

O Valid only for full master and delta transforms
--Target rows that are to be deleted as they are no longer present in the source data will be flagged as deleted

OValid only for full master transforms or flagged delta

Registration:

1. Register stream
--initiate bus date to 26

2.2 Working NZ

2.3 Registration Macros

Comments

Post a Comment

Popular posts from this blog

Teradata

Git

work

GCFR mine kt

2.0 BKEY

2.1 BMAP

2.2 StatementsA. START OF BUSSDATE

Increments the Business Date for the Stream to the next date ready for processing.proc - GCFR_CP_StreamBusDate_StartB. Start of Stream

This pattern is used to instantiate a Stream within the current business date. Only one instance of

any given Stream_Key can be active at any one time. All streams must be started first via the Start of Day process. ELT processes cannot proceed until the stream has been started and instantiated.PROC - GCFR_CP_Stream_Start

C. BKEY (Surrogate Key Transform)

Once source data has been loaded to a staging table it is possible to run the BKEY (surrogate key management) transforms that relate to that source data.

The Process Type allocated for this pattern is 21.proc - GCFR_PP_BKEY

D. BMAP (Reference Code Value Transform)

Once source data has been loaded to a staging table it is possible to run the BMAP (reference code value) transforms that relate to the source data. This pattern detects unexpected source reference code values and assigns the unexpected value an interim data warehouse code value.

The Process Type allocated for this pattern is 22.proc - GCFR_PP_BMAP

E. Target Table Transformation

•

-Acquisition layer Staging table to Acquisition layer Source Image table

-Acquisition layer Staging or Source Image table to Integrated Data layer Target table

-Integrated Data layer Target table to Integrated Data layer Summary/Common/Aggregation table

-Integrated Data layer Target/Summary/Common/Aggregation table to Access layer physical data mart table

E.1Development Option 1 (with no associated processing pattern)-Preparation of Data Set is considered as a separated process-The preparation process must be finished successfully before starting the GCFR Transform process

Development option 2:-Preparation of Data Set is considered as a part of process

-The preparation process start after gcfr process initiate is finishedE.2

GCFR Transformation Processing PatternFollowing are key points to consider for Transform.-Input data is normally from the staging tables-A staging table can contain either O A full dump of current state source data OChanges to current state source data (delta) OTransactions (events)

--The transformation process pattem compares the incoming source data set specified in the transformation view with the target data set--Target rows that are to be updated will be end date/time stamped

O Valid only for full master and delta transforms--Target rows that are to be deleted as they are no longer present in the source data will be flagged as deleted

OValid only for full master transforms or flagged delta

Registration:

1. Register stream--initiate bus date to 26

2.2 Working NZ2.3 Registration Macros

Comments

Post a Comment

Popular posts from this blog

Teradata

Git

work

2.2 Statements
A. START OF BUSSDATE

Increments the Business Date for the Stream to the next date ready for processing.
proc - GCFR_CP_StreamBusDate_Start

B. Start of Stream

any given Stream_Key can be active at any one time. All streams must be started first via the Start of Day process. ELT processes cannot proceed until the stream has been started and instantiated.
PROC - GCFR_CP_Stream_Start

The Process Type allocated for this pattern is 21.
proc - GCFR_PP_BKEY

The Process Type allocated for this pattern is 22.
proc - GCFR_PP_BMAP

E.1
Development Option 1 (with no associated processing pattern)
-Preparation of Data Set is considered as a separated process
-The preparation process must be finished successfully before starting the GCFR Transform process

Development option 2:
-Preparation of Data Set is considered as a part of process

-The preparation process start after gcfr process initiate is finished

E.2

GCFR Transformation Processing Pattern
Following are key points to consider for Transform.
-Input data is normally from the staging tables
-A staging table can contain either
O A full dump of current state source data
OChanges to current state source data (delta)
OTransactions (events)

--The transformation process pattem compares the incoming source data set specified in the transformation view with the target data set
--Target rows that are to be updated will be end date/time stamped

O Valid only for full master and delta transforms
--Target rows that are to be deleted as they are no longer present in the source data will be flagged as deleted

1. Register stream
--initiate bus date to 26

2.2 Working NZ

2.3 Registration Macros