All Products
Search
Document Center

DataWorks:Usage description of DataWorks modules

Last Updated:Mar 03, 2025

This topic describes the features and basic use scenarios of DataWorks modules.

Data processing procedure

DataWorks is an end-to-end data development and governance platform. The data processing procedure includes the phases that are shown in the following figure.

image

DataWorks modules

Feature directory

Module

Description

Data integration

Data Integration

Data Integration is a stable, efficient, and scalable data synchronization service.

  • Data Integration is designed to migrate and synchronize data between various heterogeneous data sources in complex network environments at a high speed and in a stable manner.

  • Data Integration supports batch synchronization, real-time synchronization, and mixed batch and real-time synchronization.

  • Data Integration supports table- and database-level synchronization management.

Upload and Download

Upload and Download allows you to upload data from multiple data sources, such as CSV files on an on-premises machine and Object Storage Service (OSS) objects, to big data compute engines such as MaxCompute for processing and analysis.

Data development and O&M

Data Modeling

Data Modeling is the first step for end-to-end data governance. Data Modeling focuses on the following aspects based on the modeling methodology of the Alibaba data mid-end:

  • Data warehouse planning: allows you to design an efficient data warehouse architecture.

  • Data standard: allows you to formulate unified data standards.

  • Dimensional modeling: allows you to build a reasonable data model.

  • Data metric: allows you to define accurate business metrics.

Data Modeling interprets the business data of an enterprise from a business perspective, and allows personnel inside the enterprise to quickly understand and share the idea of measuring and interpreting business data in compliance with data warehousing specifications.

DataStudio

Data Studio (new version: Participate in Public Preview of DataStudio of New Version turned on)

The data development service is an end-to-end big data development system that supports the development of data processing tasks of big data compute engine types, such as MaxCompute, E-MapReduce (EMR), Hologres, Realtime Compute for Apache Flink, and AnalyticDB, online.

  • Environment isolation: allows you to isolate the development environment from the production environment to ensure the stability of task running in the production environment.

  • Custom control process of pre-check for task deployment: provides a flexible control process for task deployment to enhance the reliability of task deployment.

Operation Center

Operation Center is a big data O&M and monitoring system that provides the following features:

  • Real-time task monitoring: You can view the status of tasks in real time. This way, you can have a command of the data processing progress at the earliest opportunity.

  • Intelligent O&M: You can perform operations, such as intelligent diagnosis and rerun, on abnormal tasks. This helps simplify the fault recovery process.

  • Intelligent baseline-based management: You can use the intelligent baseline feature to ensure that important tasks are complete as expected, and resolve issues such as uncontrollable output time of tasks and difficulties in the monitoring of massive tasks. This feature helps you ensure the timeliness of task output.

Data governance

Data Map

Data Map is an enterprise-grade data management system that provides management, sorting, quick search, and in-depth understanding capabilities for data objects based on the underlying unified metadata services.

Data Quality

Data Quality is a unified data quality check system. It is deeply integrated with the task scheduling system of DataWorks to help you identify quality issues at the earliest opportunity and to prevent data quality issues from escalating in an effective manner. This provides reliable data for business in an efficient manner.

Data Asset Governance

Data Asset Governance is a unified asset governance system. Data Asset Governance automatically identifies items to be governed based on accumulated rules, and provides governance and optimization solutions that cover pre-event issue prevention and post-event issue resolution in multiple governance fields. This helps actively and systematically complete data governance.

Security Center

Security Center is an end-to-end data security governance platform that covers classification of data assets, sensitive data identification, management on data-related authorization, masking of sensitive data, audit of access to sensitive data, and risk identification and response. Security Center helps you determine data security governance issues.

Data analysis and service

DataAnalysis

DataAnalysis provides lightweight analysis tools and data analysis capabilities, such as SQL query, workbook, visualized analysis, and intelligent data insight, and allows you to connect to different types of data sources and compute engines in a convenient manner. DataAnalysis can be used by data analysts and business operating personnel in business insight scenarios such as daily data acquisition, data query, and report analysis.

DataService Studio

DataService Studio is a flexible, lightweight, secure, and stable API construction system. It provides comprehensive data service and sharing capabilities for individuals, teams, and enterprises to help manage internal and external API services in a centralized manner.

More

Management Center

Management Center is a unified management interface that provides administrators with key features such as workspace common configurations, data sources, computing resources, members and roles, and tenant configurations. You can use Management Center to efficiently manage and optimize various resources to ensure the smooth running in workspaces. You can adjust configurations based on your business requirements.

Approval Center

Approval Center allows you to manage sensitive behaviors and permissions on data, configure approval policies, and process requests. Approval Center can help meet the approval requirements of enterprises in internal compliance scenarios.

Migration Assistant

Migration Assistant is an end-to-end task migration system. You can use Migration Assistant to migrate the tasks of open source scheduling engines, such as Oozie, Azkaban, Airflow, and DolphinScheduler, to DataWorks and back up and restore data development outcomes in DataWorks.

Open Platform

Open Platform provides the OpenAPI, OpenEvent, and Extensions sub-modules, which help quickly connect various application systems to DataWorks.