Copyright © 2020, Oracle and/or its affiliates. All rights reserved. The Universal Permissive License (UPL), Version 1.0
The Data Warehouse and ETL Offload Code Samples support data warehousing and ETL offload solution patterns on Oracle Cloud.
- Prerequisites
- Sample Source Files
- Loading Autonomous Data Warehouse
- Oracle Big Data Cloud Apache Spark Notebook
You should be familiar with provisioning and using the recommended services and technologies:
- Oracle Autonomous Data Warehouse Cloud
- Oracle Data Integration Platform Cloud
- Oracle Analytics Cloud
- Oracle Big Data Cloud
Use the sample code with the solutions:
The files in the SampleSourceFiles
folder are comma separate value (.csv) files that are used throughout the solution. The files are used in the Oracle Data Integrator Smart export samples. They are also used in the sample Oracle Big Data Cloud Notebook for ETL Offload with Apache Spark processing in Oracle Cloud.
CUSTOMER_SRC_FILE.csv
contains randomized customer names and region data that is used for the ETL offload sample.
Column Ordinal | Column Name | Data Type |
---|---|---|
1 | CUSTOMER_ID | NUMERIC |
2 | FIRST_NAME | VARCHAR |
3 | LAST_NAME | VARCHAR |
4 | REGION | VARCHAR |
5 | CITY | VARCHAR |
SRC_PRODUCT.csv
contains product data that is used in Oracle Autonomous Data Warehouse Cloud to provide product category and family data for the ETL offload sample.
Column Ordinal | Column Name | Data Type |
---|---|---|
1 | PRODUCT_ID | NUMERIC |
2 | PRODUCT_NAME | VARCHAR |
3 | PRICE | NUMERIC |
4 | FAMILY | VARCHAR |
ORDERS_FILE.csv
contains order data that is used to load the SALES_FACT and SALES_ANALYSIS tables in Oracle Autonomous Data Warehouse Cloud. It provides the data set for the SALES_ANALYSIS ETL Offload Apache Spark Notebook and Oracle Data Integrator project.
The artifacts in the LoadingAutonomousDataWarehouse
folder replicate a source table to Oracle Autonomous Data Warehouse Cloud for reporting with Oracle GoldenGate. They also load a star schema data warehouse with Oracle Data Integrator.
The artifacts in the GoldenGateParameterSamples
folder represent sample Oracle GoldenGate parameter files for real-time replication of data from an Oracle database source (on-premise or Oracle Database Cloud Service) to an Oracle Autonomous Data Warehouse Cloud target. These files can be used with any Oracle GoldenGate for Oracle 12.3 implementation including the Data Integration Platform Cloud’s remote agent.
exadwc.prm
is a sample Oracle GoldenGate EXTRACT parameter file to capture data from an Oracle database source. To configure this file, replace the <SID> entry with your Oracle database SID or CDB name. This file is configured to capture from the schema ADWC_SRC
.
padwc.prm
is a sample Oracle GoldenGate EXTRACT PUMP parameter file to pump transactions captured with the extract to the Data Integration Platform Cloud instance for delivery to Autonomous Data Warehouse Cloud. To configure this file, replace the <remote_host_ipt> and <SID> entries.
radwc.prm
is a sample Oracle GoldenGate REPLICAT parameter file for a Classic replicat. This applies the transactions to the Autonomous Data Warehouse Cloud instance. To configure this file, replace the <SID> entry with your SID or CDB name. This applies data from the ADWC_SRC
schema to the adwc_repl
schema in your Autonomous Data Warehouse Cloud instance.
Place these files in the dirprm directory for the Oracle GoldenGate installation: put exadwc.prm
and padwc.prm
in the source and put radwc.prm
in your Oracle Data Integration Platform Cloud host or Oracle Data Integration Platform Cloud remote agent installation.
The artifacts in the ODISampleAutonomousDataWarehouseCloudLoad
folder provide scripts to create a small star schema in the Autonomous Data Warehouse Cloud and an Oracle Data Integrator Smart Export to load the Autonomous Data Warehouse Cloud star schema.
To import and execute these scripts:
-
Connect to your Oracle Autonomous Data Warehouse Cloud instance using SQL Developer.
-
Run the
CreateODI_USER_and_TABLES.sql
script to create theODI_USER
and the target tables for the Oracle Data Integrator project.Replace the
<PASSWORD>
tag in the first line of the script with the password you want to use for theODI_User
. -
Use Secure Shell (SSH) to access your Data Integration Platform Cloud instance as the OPC user.
-
Create a directory in your Oracle Data Integration Platform Cloud instance:
Mk dir /tmp/ADWCSample
-
Copy the sample source files
ORDERS_FILE.csv
,SRC_PRODUCT.csv
andCUSTOMER_SRC_FILE.csv
and theODIADWCSample.xml
file into the directory you just created. -
Change permissions on the directory and files so that the Oracle user can read the files:
Sudo chmod 755 -R /tmp/ADWCSample
-
Use virtual network connectivity (VNC) to connect to your Oracle Data Integration Platform Cloud instance.
-
Run Oracle Data Integrator Studio.
-
Use Smart Import to import the Oracle Data Integrator processes.
-
Go to the Topology Manager in Oracle Data Integrator Studio and alter the Oracle Autonomous Data Warehouse Cloud connection strings. See the solution documentation for more details on how to obtain and configure your Oracle Data Integrator / Oracle Autonomous Data Warehouse Cloud connection.
-
Open the Oracle Data Integrator project and review the mappings.
-
To run the processes, execute the Oracle Data Integrator package
PKG ODI_User DW Sample Load
. -
Use SQL Developer to query the tables in the Oracle Autonomous Data Warehouse Cloud
ODI_USER
schema.CUST_DIM
DATE_DIM
PRODUCT_DIM
SALES_FACT
SALES_FACT_ANALYSIS
-
Review the Oracle Data Integrator mappings. Note that the two folders provide two methods for executing dimension and fact loads in Oracle Autonomous Data Warehouse Cloud:
Loads Using Mappings and KMs provides mappings the leverage certified Knowledge Modules for Oracle Autonomous Data Warehouse Cloud. These are appropriate for Type 1 dimension style loads.
Loads using Dimensions and Cubes objects provides mappings that leverage the Dimensions and Cubes objects within Oracle Data Integrator to load the same set of tables into the Oracle Autonomous Data Warehouse Cloud
ODI_USER
schema. The Dimensions objects are appropriate for Type 2 or Type 3 dimension loads.
BigDataCloud - ETL Offload Sample Notebook.json
is a sample Oracle Big Data Cloud Notebook that uses Apache Spark to load data from files stored in Oracle Object Storage. It performs an ETL routine leveraging SparkSQL and then stores the result in multiple file formats back in Object Storage.
To configure and execute the Apache Spark notebook:
- Login to your Oracle Cloud account.
- Create a container within your Oracle Cloud Infrastructure Object Storage service; for example,
BDCETL
. - Navigate to your Oracle Big Data Cloud instance and the Notebook tab.
- Click Import to import the notebook.
- Open the notebook and click Run / Play to execute the Apache Spark script.
- The Apache Spark notebook generates a SALES_FACT_ANALYSIS that is similar to the results of the Oracle Data Integrator Oracle Autonomous Data Warehouse Cloud processes.
To review the results, execute the %SQL section of the notebook which queries the resulting SALES_ANALYSIS table in Oracle Big Data Cloud.