19 top data preparation tools
Alteryx Designer, Cambridge Semantics Anzo and SAP Agile Data Preparation are among the leading products to help organizations prepare data for analysis, according to a new Gartner guide.
19 top data preparation tools
Alteryx Designer, Cambridge Semantics Anzo and SAP Agile Data Preparation are among the leading products to help organizations prepare data for analysis.
Data preparation is first step to data quality
Data preparation tools have matured from initially being self-service-focused to now supporting data integration, analytics and data science use cases in production, according to research firm Gartner, Inc. Modern data preparation tools now enable data and analytics teams to build agile datasets at an enterprise scale, for a range of distributed content authors. In its latest report, “Market Guide for Data Preparation Tools,” Gartner analysts Ehtisham Zaidi and Sharat Menon examine 19 of the leading providers of data preparation software.
Altair Monarch and Altair Knowledge Hub
“Altair Knowledge Hub is a browser-based data preparation tool that enables visual, code-free data preparation for all sources, including unstructured and semi-structured sources,” according to Gartner. “Knowledge Hub uses machine learning to provide suggestions to the user regarding data enrichment and transformations during the data preparation process. Data can be prepared in a grid-like or flow-based view.”
Alteryx Designer
“The Alteryx Designer is part of the Alteryx Analytics and Data Science Platform, which is a self-service, end-to-end platform for business analysts, data scientists and IT professionals,” Gartner explains. “It is used to discover, catalog, prepare, blend, analyze and operationalize analytic models in a collaborative and governed environment. The platform is also composed of Alteryx Connect (data catalog), Alteryx Server and Alteryx Promote.”
Cambridge Semantics Anzo
“Anzo is an end-to-end platform that provisions a data fabric over existing data infrastructure by applying a semantic layer and an enterprise knowledge graph,” Gartner esplains. “This graph, built on open standards such as Web Ontology Language (OWL), Resource Description Framework (RDF), and SPARQL Protocol and RDF Query Language (SPARQL), captures a high-resolution twin of every structured and unstructured data source through mappings and ontologies that can be viewed in a dashboarding UI.”
Datameer Enterprise
“Datameer Enterprise is a data preparation and data engineering platform that provides a centralized view of data, with comprehensive data security and governance capabilities to build and operationalize data pipelines with minimal coding efforts,” Gartner explains. “Datameer Visual Explorer supports visual exploration on complete datasets. It allows analysts to visually explore the data at any point in the data preparation pipeline, and drill down on any variable or value in the dataset.”
Elegant MicroWeb Smarten
“Smarten is an Augmented Analytics tool with an embedded data preparation module—Self Service Data Preparation (SSDP)—that can also be purchased separately,” according to Gartner. “Smarten allows business users to explore, sample, clean, shape, reduce, join and blend data in a browser-based, Microsoft Excellike environment, with ML-based recommendations to help them in the process. The Action Editor saves the sequence of data preparation steps taken by a user, and these can be edited or rolled back at any time.”
erwin Metadata Manager, Mapping Manager and CATFx
“The erwin Mapping Manager is a web-based solution that performs source-target mapping in a drag-and-drop interface, by harvesting technical metadata from systems such as data stores, file connectors, big data platforms, analytics and BI solutions and master data management (MDM) hubs,” according to Gartner. “These mappings can then be imported to enterprisewide ETL tools, after which erwin Automation can be used for code-automation templates—for generating ETL scripts for these tools.”
Infogix Data3Sixty Analyze
“Infogix enters this market through its acquisition of Lavastorm in 2018,” Gartner explains. “Data3Sixty comes with prebuilt advanced analytics capabilities such as common regression, clustering and forecast techniques, and the ability to utilize either Python or R code to perform any transformations not possible using the prebuilt components. This is a web-based solution; access is both secured (through SSO, LDAP and AD solutions) and role-based.”
Lore IO platform
“Lore IO’s data preparation tool creates a data fabric design across the entire enterprise through an ML-optimized workflow to find, prepare and deliver data,” Gartner explains. “Lore IO focuses mainly on large enterprises as its target segment. It uses declarative data transformations rather than explicit ETL code, thereby reducing upfront manual ETL and data preparation effort. It uses a recommendation engine that helps business users with their data preparation process by auto-recommended business rules.”
Modak Analytics Nabu
“Modak Analytics’ Nabu provides stand-alone data preparation, as well as data preparation as a capability for its broader data lake enablement solution,” Gartner explains. “This data lake enablement solution provides rapid insights from siloed structured and unstructured sources through automated data ingestion and discovery, data lineage, data profiling, data catalog, augmented data preparation and visualization.”
Paxata Self-Service Data Preparation
“Paxata Self-Service Data Preparation (SSDP) has a visual, Excel/Google-like interface that guides the user to explore, clean, standardize and join data using simple clicks,” according to Gartner. “It is part of the Paxata Adaptive Information Platform, a unified platform with features for data quality, data enrichment, data integration, user collaboration and support for data governance. Paxata is not limited to samples, and allows interacting, profiling and cleaning data at scale.”
Rapid Insight Veera Construct
“Rapid Insight offers a data preparation platform, Veera Construct, to access data from any source and format, integrate and transform it to produce reports, perform ad hoc analysis or create analytics datasets,” according to Gartner. “Veera jobs can be saved, shared and turned into repeatable processes to be run on-demand or on a scheduled basis. The solution leverages in-database optimization when performing data preparation tasks.”
SAP Agile Data Preparation
“SAP Agile Data Preparation is a wizard-driven solution that provides users with the next-best recommendations for their data preparation transformations,” Gartner explains. “It provides users with automated data cleansing, enriching and deduplication, and seamless integration with the SAP portfolio of data management and analytics tools, and has been designed for business users, data stewards and IT. Trusted data is created and maintained by helping information stewards to define, assess and improve data, and to maintain critical controls over data standards, privacy and security.”
SAS Data Loader for Hadoop, SAS Data Preparation
“SAS Data Loader for Hadoop enables data preparation tasks such as profiling, cleansing, matching, merging and deduplicating directly on Hadoop clusters using MapReduce and Spark, through an intuitive user interface without any specialized coding skills,” Gartner explains. “Data can be left in place here. SAS Data Preparation, on the other hand, works on tables in SAS’s high-performance SAS Viya Cloud Analytic Services (CAS) in-memory processing environment, where data is loaded into in-memory tables for distributed, parallel execution.”
Talend Data Preparation
“Talend offers three data preparation tools: an open-source desktop version, Talend Data Preparation; and commercial versions Talend Data Preparation Cloud (offered as part of the Talend Cloud platform) and Talend Data Preparation (offered as part of the on-premises offering Talend Data Fabric),” according to Gartner. “These data preparation tools utilize ML algorithms for standardization, cleansing, pattern recognition and reconciliation, and also for offering automated recommendations to guide users through the data preparation process.”
Tamr Unify
“Tamr Unify differentiates itself by focusing primarily on self-service and enterprise-level data unification use cases,” Gartner explains. “Unify enables the creation of clean, unified datasets through supervised ML, by constructing a probabilistic model that can quickly map, match and classify data from multiple sources.”
TMMData Fix Tool, which is part of the TMMData Foundation platform
“TMMData offers a solution for enterprise data preparation as part of its Foundation platform — the Fix Tool,” Gartner explains. “The tool provides a non-code-based, point-and-click visual environment to explore, filter, format, sort and visualize data for nontechnical users; it provides access to SQL, Python, R Project, PHP, Perl and others for technical users. Other tools in the Foundation platform provide capabilities for data integration, data storage, data cataloging, visualization, data governance, data lineage, metadata collection, workflow management and so on.”
Trifacta Wrangler, Trifacta Wrangler Pro, Trifacta Wrangler Enterprise, Google Cloud Dataprep by Trifacta
“Trifacta Wrangler Enterprise is an enterprise-level data preparation platform supporting various cloud and on-premises computing environments,” Gartner explains. “Its embedded ML capabilities allow it to: recommend data to connect to, infer data structure and schema, recommend joins, define user data access, and automate visualizations for exploration/data quality. A spreadsheet-style grid makes it easy to use, and the ability for analysts to save data preparation steps in a task-based framework makes the process easily repeatable.”
Unifi Data Platform
“The Unifi Data Platform (UDP) provides AI-assisted data preparation and cataloging in a single platform,” Gartner explains. “The Unifi Data Catalog, which enables users to find data using natural language queries, is also available as a stand-alone product. Data exploration capabilities include automated data parsing and crawling (using its OneParse capability), automated profiling, cleansing, object indexing, and so on (using its one-click functions), and AI-based recommendations on datasets, join types, filtering, formatting and others (using OneMind).”
Yellowfin Data Prep
“Yellowfin Data Prep is a stand-alone data preparation tool that provides connectivity to a variety of data sources and output data formats,” Gartner explains. “The tool enables data preview in real-time to allow users to see their data during the preparation phase. For data profiling, the user can view the visual data shapes for columns, and statistics (min., max., outliers and more) around data distribution and cleanliness. The user also gets automated Suggest Actions to best fix or curate the data.”