DsMask - Scalable data masking sample code

Related tags

Spring Boot dsmask
Overview

DsMask Scalable Data Masking Sample Code

This repository contains the sample code, which shows a way to implement complex policy-based data masking on IBM DataStage platform, using masking algorithms coming with IBM InfoSphere Optim.

This sample code shows how to solve the problem of high-performance scalable static data masking based on masking rules, defining the masking operations which should be applied to the specific types of confidential information. The sample code can be used as a basis to build the actual data masking system using the IBM DataStage and IBM Optim.

This sample code also contains the example setup for data masking adjusted for the typical requirements of customers in Russian Federation.

The sample code provided in this repository has been iteratively developed and improved by pre-sales specialists of IBM EE/A as part of multiple pilot and demo implementation, to address the various requirements coming from the customers.

High-level logical overview

Components of IBM Information Server used:

  • Information Governance Catalog (IGC), a metadata management tool;
  • Information Analyzer (IA), a data profiling tool;
  • DataStage, a ETL tool.

DsMask Architecture

The actual data masking uses the algorithms of IBM InfoSphere Optim Data Privacy Providers Library (ODPP), through the Java API. This leads to a technical limitation that the solution can only run on Windows and Linux x86-64 platforms, because ODPP Java API is not supported on AIX.

The types of confidential information are defined through the data classes. Those data classes are defined in the Information Governance Catalog (IGC), along with the table structure definitions.

Data classes are assigned to the columns of tables manually in IGC, or in an automated way through the IA.

On top of the data classes, the data masking engineer/developer prepares a set of data masking rules in the special XML-based format, which link the actual data classes to the masking operations which need to be performed.

Each masking operation is defined as a sequence of steps, which needs to be applied to the input values to provide the (masked) output values. Each step calls some masking or data preparation algorithm, and can use the outputs of the previous steps as its input data.

The masking rules are linked to the actual table's fields in accordance to the data classess assigned in IGC. The actual set if masking operations to be performed on the particular table is calculated by the "configuration program" and is stored in the internal configuration database as an object called "masking profile".

Masking is performed by the custom Java-based DataStage operator, which reads the "masking profile" and applies it to the input data, providing the output data. The operator ensures that the input and output values are different, and generates warnings otherwise.

Flexible DataStage job design is used, based on the RCP (Runtime Column Propagation) feature and job parameters, and allows to handle masking of all tables from the particular data source type (e.g. Oracle, or Db2, or MSSQL) with just a single job design.

Custom components included

ia-custom-ru - set of data class definitions for the Russian market, with the customized logic for IA scanning.

dsmask-algo and dsmask-beans - the supporting libraries to handle data preparation and normalization, including some types of text values pre-processing which is hard to implement using "plain" ODPP.

dsmask-mock - the library of algorithms to generate synthetic data used by the JUnit tests. Only used when running the tests, not included in the target binaries.

dsmask-uniq - a network service which implements the global uniqueness checks of the masked values (e.g. ensuring that no two distinct input values will be mapped to a single masked value), by storing the mapping between the input and masked values. This service is optionally used by the data masking rules (if enabled).

dsmask-jconf - the configuration program, which reads the masking rules, loads the mapping between the table fields and confidential data classes, and writes the "masking profiles" to the configuration database. It also includes the logic to build the substitution dictionary for the names of people in Russian language.

dsmask-jmask - the custom Java-based data masking operator for DataStage.

dsjob - sample job designs for masking and substitution dictionary generation.

reports - sample reports on data masking activities recorded in the DataStage job logs (stored in DSODB database) using the Pentaho report generator.

batcher - the example script for running the masking DataStage jobs over the set of tables, and wait for the result.

dict-data - sample dictionaries for masking of data on the Russian market.

rules-testsuite - sample data masking rules, used in the internal tests for the configuration program and for the data masking operator.

You might also like...

The Spring Boot Sample App on K8S has been implemented using GKE K8S Cluster, Spring Boot, Maven, and Docker.

gke-springboot-sampleapp 👋 The Spring Boot Sample App on K8S has been implemented using GKE K8S Cluster, Spring Boot, Maven, and Docker. Usage To be

Feb 1, 2022

A sample microservice built with Spring Boot and Gradle.

Project Overview A sample microservice built with Spring Boot and Gradle. There are APIs built using REST and the resource is bicycle. All CRUD operat

Feb 2, 2022

An implementation of a sample E-Commerce app in k8s. This online retail marketplace app uses Spring Boot, React, and YugabyteDB.

An implementation of a sample E-Commerce app in k8s. This online retail marketplace app uses Spring Boot, React, and YugabyteDB.

An implementation of a sample E-Commerce app in k8s. This online retail marketplace app uses Spring Boot, React, and YugabyteDB.

Oct 27, 2022

Sample Spring Boot CLI application

sb-cli Sample Spring Boot CLI application. Shows how a Spring Boot application may be configured and packaged to create native executables with GraalV

Nov 2, 2022

A sample eForms application that can visualise an eForms notice

A sample eForms application that can visualise an eForms notice. It uses efx-translator-java to generate XSL templates from notice view templates written in EFX. It then uses an XSLT processor to generate an HTML visualisation of any given eForms notice.

Nov 23, 2022

EuphonyForever sample project

EuphonyForever sample project

Looking-for-job This is a sample project made by euphonyforever using the euphony library. The purpose of this app is provide an easy way to send and

Sep 7, 2021

This project contains many sample codes for demonstrating the usage of some common design patterns.

STUDY COMMON DESIGN PATTERNS 1. About this project This project contains many sample codes for demonstrating the usage of the following design pattern

Jan 2, 2023

Business Application Platform - no-code/low-code platform to build business applications

Business Application Platform - no-code/low-code platform to build business applications

Orienteer What is Orienteer Orienteer is Business Application Platform: Easy creation of business applications Extendable to fit your needs Dynamic da

Dec 6, 2022

Team 5468's 2022 FRC robot code. This code is written in Java and is based off of WPILib's Java control system and utilizes a command based system

FRC 2022 Team 5468's 2022 FRC robot code. This code is written in Java and is based off of WPILib's Java control system and utilizes a command based s

Oct 4, 2022
Releases(v1.2.1-release)
Owner
International Business Machines
International Business Machines
source code of the live coding demo for "Building resilient and scalable API backends with Apache Pulsar and Spring Reactive" talk held at ApacheCon@Home 2021

reactive-iot-backend The is the source code of the live coding demo for "Building resilient and scalable API backends with Apache Pulsar and Spring Re

Lari Hotari 4 Jan 13, 2022
Joyce is a highly scalable event-driven Cloud Native Data Hub.

Joyce Component docker latest version Build Import Gateway sourcesense/joyce-import-gateway Joyce Kafka Connect sourcesense/joyce-kafka-connect Mongod

Sourcesense 37 Oct 6, 2022
The VAST ad sample code provided by HUAWEI Ads Kit describes how to display linear ads by integrating the HUAWEI VAST SDK into your app.

HMS Ads Demo for VAST English | 中文 Table of Contents Introduction Installation Configuration Supported Environments Sample Code Result License Introdu

HMS 11 Jul 16, 2022
A simple and scalable Android bot emulation framework, as presented at Black Hat Europe 2021's Arsenal, as well as atHack 2021's Arsenal

m3 A simple and scalable Android bot emulation framework. A detailed explanation can be found here. This project was first published at Black Hat Euro

null 22 Aug 20, 2022
Clivia is a scalable, high-performance, elastic and responsive API gateway based on spring weblux

clivia是一款基于spring webflux的可扩展、高性能、高弹性、响应式的 API 网关 clivia_V0.0.1 架构概览 模块介绍 clivia-admin-core : 网关配置管理后台核心模块 clivia-client-core : 网关核心模块 clivia-example

palading 14 Jan 9, 2023
A cloud-native, serverless, scalable, cheap key-value store

Sleeper Introduction Sleeper is a serverless, cloud-native, log-structured merge tree based, scalable key-value store. It is designed to allow the ing

GCHQ 21 Dec 26, 2022
Sample serverless application written in Java compiled with GraalVM native-image

Serverless GraalVM Demo This is a simple serverless application built in Java and uses the GraalVM native-image tool. It consists of an Amazon API Gat

AWS Samples 143 Dec 22, 2022
Sample Spring-Cloud-Api-Gateway Project of Spring Boot

Sample-Spring-Cloud-Api-Gateway Sample Spring-Cloud-Api-Gateway Project of Spring Boot Proejct Stack Spring Webflux Spring Cloud Gateway Spring Data R

Seokhyun 2 Jan 17, 2022
This sample shows how to implement two-way text chat over Bluetooth between two Android devices, using all the fundamental Bluetooth API capabilities.

Zenitsu-Bluetooth Chat Application This sample shows how to implement two-way text chat over Bluetooth between two Android devices, using all the fund

Gururaj Koni 1 Jan 16, 2022
Movie,actor & director RESTful API. Sample app with jpa, flyway and testcontainers

spring-restful-jpa-flyway Movie,actor & director RESTful API. Sample app with jpa, flyway and testcontainers.

null 16 Dec 10, 2022