DsMask - Scalable data masking sample code

Last update: Feb 4, 2022

Related tags

Overview

DsMask Scalable Data Masking Sample Code

This repository contains the sample code, which shows a way to implement complex policy-based data masking on IBM DataStage platform, using masking algorithms coming with IBM InfoSphere Optim.

This sample code shows how to solve the problem of high-performance scalable static data masking based on masking rules, defining the masking operations which should be applied to the specific types of confidential information. The sample code can be used as a basis to build the actual data masking system using the IBM DataStage and IBM Optim.

This sample code also contains the example setup for data masking adjusted for the typical requirements of customers in Russian Federation.

The sample code provided in this repository has been iteratively developed and improved by pre-sales specialists of IBM EE/A as part of multiple pilot and demo implementation, to address the various requirements coming from the customers.

High-level logical overview

Components of IBM Information Server used:

Information Governance Catalog (IGC), a metadata management tool;
Information Analyzer (IA), a data profiling tool;
DataStage, a ETL tool.

The actual data masking uses the algorithms of IBM InfoSphere Optim Data Privacy Providers Library (ODPP), through the Java API. This leads to a technical limitation that the solution can only run on Windows and Linux x86-64 platforms, because ODPP Java API is not supported on AIX.

The types of confidential information are defined through the data classes. Those data classes are defined in the Information Governance Catalog (IGC), along with the table structure definitions.

Data classes are assigned to the columns of tables manually in IGC, or in an automated way through the IA.

On top of the data classes, the data masking engineer/developer prepares a set of data masking rules in the special XML-based format, which link the actual data classes to the masking operations which need to be performed.

Each masking operation is defined as a sequence of steps, which needs to be applied to the input values to provide the (masked) output values. Each step calls some masking or data preparation algorithm, and can use the outputs of the previous steps as its input data.

The masking rules are linked to the actual table's fields in accordance to the data classess assigned in IGC. The actual set if masking operations to be performed on the particular table is calculated by the "configuration program" and is stored in the internal configuration database as an object called "masking profile".

Masking is performed by the custom Java-based DataStage operator, which reads the "masking profile" and applies it to the input data, providing the output data. The operator ensures that the input and output values are different, and generates warnings otherwise.

Flexible DataStage job design is used, based on the RCP (Runtime Column Propagation) feature and job parameters, and allows to handle masking of all tables from the particular data source type (e.g. Oracle, or Db2, or MSSQL) with just a single job design.

Custom components included

ia-custom-ru - set of data class definitions for the Russian market, with the customized logic for IA scanning.

dsmask-algo and dsmask-beans - the supporting libraries to handle data preparation and normalization, including some types of text values pre-processing which is hard to implement using "plain" ODPP.

dsmask-mock - the library of algorithms to generate synthetic data used by the JUnit tests. Only used when running the tests, not included in the target binaries.

dsmask-uniq - a network service which implements the global uniqueness checks of the masked values (e.g. ensuring that no two distinct input values will be mapped to a single masked value), by storing the mapping between the input and masked values. This service is optionally used by the data masking rules (if enabled).

dsmask-jconf - the configuration program, which reads the masking rules, loads the mapping between the table fields and confidential data classes, and writes the "masking profiles" to the configuration database. It also includes the logic to build the substitution dictionary for the names of people in Russian language.

dsmask-jmask - the custom Java-based data masking operator for DataStage.

dsjob - sample job designs for masking and substitution dictionary generation.

reports - sample reports on data masking activities recorded in the DataStage job logs (stored in DSODB database) using the Pentaho report generator.

batcher - the example script for running the masking DataStage jobs over the set of tables, and wait for the result.

dict-data - sample dictionaries for masking of data on the Russian market.

rules-testsuite - sample data masking rules, used in the internal tests for the configuration program and for the data masking operator.

The Spring Boot Sample App on K8S has been implemented using GKE K8S Cluster, Spring Boot, Maven, and Docker.

gke-springboot-sampleapp 👋 The Spring Boot Sample App on K8S has been implemented using GKE K8S Cluster, Spring Boot, Maven, and Docker. Usage To be

Feb 1, 2022

A sample microservice built with Spring Boot and Gradle.

Project Overview A sample microservice built with Spring Boot and Gradle. There are APIs built using REST and the resource is bicycle. All CRUD operat

Feb 2, 2022

An implementation of a sample E-Commerce app in k8s. This online retail marketplace app uses Spring Boot, React, and YugabyteDB.

Oct 27, 2022

Sample Spring Boot CLI application

sb-cli Sample Spring Boot CLI application. Shows how a Spring Boot application may be configured and packaged to create native executables with GraalV

Nov 2, 2022

A sample eForms application that can visualise an eForms notice

A sample eForms application that can visualise an eForms notice. It uses efx-translator-java to generate XSL templates from notice view templates written in EFX. It then uses an XSLT processor to generate an HTML visualisation of any given eForms notice.

Nov 23, 2022

EuphonyForever sample project

Looking-for-job This is a sample project made by euphonyforever using the euphony library. The purpose of this app is provide an easy way to send and

Sep 7, 2021

This project contains many sample codes for demonstrating the usage of some common design patterns.

STUDY COMMON DESIGN PATTERNS 1. About this project This project contains many sample codes for demonstrating the usage of the following design pattern

Jan 2, 2023

Business Application Platform - no-code/low-code platform to build business applications

Orienteer What is Orienteer Orienteer is Business Application Platform: Easy creation of business applications Extendable to fit your needs Dynamic da

Dec 6, 2022

Team 5468's 2022 FRC robot code. This code is written in Java and is based off of WPILib's Java control system and utilizes a command based system

FRC 2022 Team 5468's 2022 FRC robot code. This code is written in Java and is based off of WPILib's Java control system and utilizes a command based s

Oct 4, 2022

Releases(v1.2.1-release)

v1.2.1-release(Mar 10, 2022)
Подавление спецсимволов в именах заданий DataStage (Invocation ID) для MaskBatcher. Требуется при маскировании таблиц с именами, содержащими символы '$', '#', '/' и тому подобные.

Source code(tar.gz)
Source code(zip)
dsmask-config.pdf(7.59 MB)
dsmask-jconf-1.2.1-bin.zip(20.56 MB)
dsmask-jconf-1.2.1.jar(150.91 KB)
dsmask-jmask-1.2.1-bin.zip(11.11 MB)
dsmask-uniq-1.2.1-bin.zip(5.49 MB)
ia-bundle-ru-bin.zip(404.06 KB)
MarkFields.zip(7.59 KB)
v1.2-release(Mar 10, 2022)
MaskBatcher теперь поддерживает подстановочную переменную invocationId для "безопасного" указания полного имени маскируемой таблицы в качестве идентификатора запуска заданий маскирования

Добавлен параметр ALLOW-SAME для функций маскирования на основе алгоритма FPE, который позволяет отключить проверку на совпадение входных и выходных значений

В целом отключён контроль на совпадение входных и выходных значений операций маскирования, поскольку на практике есть ситуации, когда такое совпадение допустимо

Source code(tar.gz)
Source code(zip)
dsmask-config.pdf(7.59 MB)
dsmask-jconf-1.2-bin.zip(20.54 MB)
dsmask-jmask-1.2-bin.zip(11.10 MB)
dsmask-uniq-1.2-bin.zip(5.49 MB)
ia-bundle-ru-bin.zip(404.06 KB)
v1.1-release(Mar 2, 2022)
Реализован инструмент MaskBatcher для автоматизации создания списков маскируемых таблиц и группового запуска заданий маскирования по спискам. Старый скрипт batcher удалён.

Добавлена поддержка хранения паролей в зашифрованном виде и утилита для управления хранилищем паролей

Обновлена документация: добавлен раздел 4.2 (Хранилище паролей), переработан раздел 4.7 (Пакетное маскирование)

Source code(tar.gz)
Source code(zip)
dsmask-config.pdf(7.51 MB)
dsmask-jconf-1.1-bin.zip(20.63 MB)
dsmask-jmask-1.1-bin.zip(11.19 MB)
dsmask-uniq-1.1-bin.zip(5.49 MB)
ia-bundle-ru-bin.zip(404.06 KB)
v1.0-release(Feb 1, 2022)

Initial public release
Source code(tar.gz)
Source code(zip)
dsmask-config.pdf(7.37 MB)
dsmask-jconf-1.0-bin.zip(21.21 MB)
dsmask-jmask-1.0-bin.zip(11.34 MB)
dsmask-uniq-1.0-bin.zip(6.10 MB)
ia-bundle-ru-bin.zip(404.08 KB)

Owner

International Business Machines

GitHub

source code of the live coding demo for "Building resilient and scalable API backends with Apache Pulsar and Spring Reactive" talk held at ApacheCon@Home 2021

reactive-iot-backend The is the source code of the live coding demo for "Building resilient and scalable API backends with Apache Pulsar and Spring Re

4 Jan 13, 2022

Joyce is a highly scalable event-driven Cloud Native Data Hub.

Joyce Component docker latest version Build Import Gateway sourcesense/joyce-import-gateway Joyce Kafka Connect sourcesense/joyce-kafka-connect Mongod

37 Oct 6, 2022

The VAST ad sample code provided by HUAWEI Ads Kit describes how to display linear ads by integrating the HUAWEI VAST SDK into your app.

HMS Ads Demo for VAST English | 中文 Table of Contents Introduction Installation Configuration Supported Environments Sample Code Result License Introdu

11 Jul 16, 2022

A simple and scalable Android bot emulation framework, as presented at Black Hat Europe 2021's Arsenal, as well as atHack 2021's Arsenal

m3 A simple and scalable Android bot emulation framework. A detailed explanation can be found here. This project was first published at Black Hat Euro

22 Aug 20, 2022

Clivia is a scalable, high-performance, elastic and responsive API gateway based on spring weblux

clivia是一款基于spring webflux的可扩展、高性能、高弹性、响应式的 API 网关 clivia_V0.0.1 架构概览模块介绍 clivia-admin-core : 网关配置管理后台核心模块 clivia-client-core : 网关核心模块 clivia-example

14 Jan 9, 2023

A cloud-native, serverless, scalable, cheap key-value store

Sleeper Introduction Sleeper is a serverless, cloud-native, log-structured merge tree based, scalable key-value store. It is designed to allow the ing

21 Dec 26, 2022

Sample serverless application written in Java compiled with GraalVM native-image

Serverless GraalVM Demo This is a simple serverless application built in Java and uses the GraalVM native-image tool. It consists of an Amazon API Gat

143 Dec 22, 2022

Sample Spring-Cloud-Api-Gateway Project of Spring Boot

Sample-Spring-Cloud-Api-Gateway Sample Spring-Cloud-Api-Gateway Project of Spring Boot Proejct Stack Spring Webflux Spring Cloud Gateway Spring Data R

2 Jan 17, 2022

This sample shows how to implement two-way text chat over Bluetooth between two Android devices, using all the fundamental Bluetooth API capabilities.

Zenitsu-Bluetooth Chat Application This sample shows how to implement two-way text chat over Bluetooth between two Android devices, using all the fund

1 Jan 16, 2022

Movie,actor & director RESTful API. Sample app with jpa, flyway and testcontainers

spring-restful-jpa-flyway Movie,actor & director RESTful API. Sample app with jpa, flyway and testcontainers.

16 Dec 10, 2022