Optimus Think

KNIME Versus Alteryx: A Comparison of ETL Capabilities

Share

As a Senior Business Intelligence consultant with extensive data preparation experience, clients often ask me to recommend a data prep/analytics platform. Whether I recommend KNIME or Alteryx depends on the project requirements and data sources because each product has specific capabilities, pros, and cons.

In this blog, I’m going to look at KNIME and Alteryx side by side in terms of their ETL and analytics capabilities:

KNIME is a powerful, free, open-source ETL and Business Intelligence tool.

Alteryx is a commercially licensed, self-service, analytic process automation platform with capabilities of ETL and complex analytics including predictive, spatial, and statistical analysis.

Introduction to ETL Tools

ETL tools, at the core, enable loading data from multiple data sources, combining and transforming them into a format that can then be loaded into a database for further querying. Beyond these primary functions, many of these tools contain a wide set of extra features. These can consist of everything from data analytics tools such as predictive modelling features, including the ability to create graphics, charts and full-fledged dashboards.

KNIME analytics platform is an open-source data analytics platform that is growing continuously by integrating new developments. KNIME provides a GUI (graphical user interface) based platform where reusable data workflows can be built quickly by simple visual drag and drop to perform ETL, business intelligence analytics and machine learning. It’s a popular data analytics software because it provides functionality ranging from natural language processing, text mining and information retrieval that reads, processes, mines and visualizes textual data.

Alteryx is an analytic process automation platform that provides automation capabilities for all analytics functions (ETL, diagnostic, predictive, prescriptive, and geospatial analytics). It combines code-free and code-friendly data science, machine learning, artificial intelligence, and business process automation in one platform. One of Alteryx’s differentiators is adding location intelligence through easy-to-use spatial analytical tools.

1. User Interface:

Both Alteryx and KNIME use a workbench sort of approach.

KNIME: There is a list of nodes (i.e., tools) in a repository, divided into different segments. Each node can be dragged onto the canvas and can be connected by a line from an output to an input of a similar or a different node. By double-clicking or right-clicking on a node, you can configure the node based on the functionality.

 

Alteryx: Interface is quite similar to that of KNIME. Tools are grouped into an understandable color-coded category, such as In/Out, Data Preparation etc., at the top of the application. For each tool clicking by category opens the interface, which can be expanded or closed as needed.

 

Verdict:

While both interfaces appear similar, navigating between nodes is considerably easier in Alteryx than with KNIME. With KNIME, one can end up with many windows all over the place, consuming a lot of memory which can be a factor on a slow computer. Purely from an interface perspective, Alteryx has a more intuitive workflow.

2. Input/Data Preparation:

Both tools can pull and prepare data from a wide set of sources ranging from CSV, databases or from cloud sources.

KNIME: By double-clicking or right-clicking on a node, we can see all configuration options. Below is an example of a file reader, opening a CSV file. While loading, you can see the preview of the data and can modify it, which includes file types and more. It also gives the ability to filter not only for rows but also for columns, which can be useful if the data has a range of keys in the database or any data source.

Alteryx: Provides an easy drag-and-drop interface and a selection of data types in a visual way. After connecting the input/output tool to a database or a file we are automatically provided with a quick data visualization with comprehensive information about each data type.

Verdict:
Alteryx’s Data cleansing tool is easy to understand. Modifying data types is easy as it uses a drop-down to select data types. Conversely, data type conversion in KNIME is a time-consuming process. Alteryx’s data prep tool seems a bit superior to KNIME, although KNIME has a slight edge when it comes to filtering data by columns.

3. Data Blending:

Both Alteryx and KNIME provide great tools for combining data, however the ease of use varies between them.

KNIME: The join node is easy to understand and can combine datasets off a shared identifier. We can choose the columns from each data set in the result. Some of the joins, such as left and right joins, can be a bit tricky since there are no options for left or right join in the JOIN node.

Alteryx: The Join tool works similarly to KNIME. Users can simply choose the identifier to link the datasets together. It allows users to construct SQL queries without writing a line of code. However, Alteryx has included some higher level-functionality by combining analysis tools that make sense to work together. However, this doesn’t always provide the expected user experience, which can cause trouble with larger data sets. For example, Alteryx chose a method for simplifying the analyst effort by using UNION features in SQL with the JOIN tool (including inner and left/right joins). This means Alteryx performs all the joins without specifying a join type. This is great for smaller datasets, but for larger data sets, this could be time and resource-consuming as Alteryx needs to process all joins to complete the processing. KNIME, on the other hand, is completely modular and treats each activity as a discrete activity in a separate node and can be completely managed by the ETL designer This requires the ETL designer to be more technically proficient in KNIME, but allows more control and can avoid some of the excessive processing and resource consumption in large data sets that occur in Alteryx.

Verdict:

Both tools have great and easy-to-use joining and data manipulation capabilities to combine data sets on the fly. However, when working with larger datasets, KNIME emerges stronger since it treats all the activities discreetly.

4. Data Analysis:

Both tools provide built-in predictive analytics capabilities which are useful in analyzing preliminary trends, which can be later used for more advanced analytics.

KNIME: KNIME is quite strong when it comes to different predictive and analytical nodes. Since KNIME is open source, many developers have created a wide range of plugins and adapters to be able to use for many already existing functionalities. KNIME also has many statistical tools as can be seen in the following image.

Alteryx: It does not offer the same range of analytical capabilities that KNIME offers, but it does include a few useful data investigation tools, such as Pearson and Spearman correlation. It also uses various analytical models and some simulation sampling.

Verdict:
As far as Alteryx designer alternatives go, KNIME has more tools and has an advantage over Alteryx when it comes to analytical models and machine learning components since it is an open-source application that has a large population of independent developers creating additional tools.

Overall Conclusion

When it comes to ETL, both KNIME and Alteryx are top data science tools. However, the tool selection largely depends on the intent and the user. If the user is looking for a user-friendly tool which can handle necessary data preparation by semi-technical or technical users, Alteryx offers an advantage in terms of the data preparation process. On the other hand, if the user is looking for some heavy analytical options and has a more technical understanding of “data”, KNIME, as an Alteryx equivalent, comes out stronger in terms of its capabilities. Also, KNIME is open source and a newer tool with a medium-sized user community; Alteryx has a larger community, is more adaptive to users, and has a more welcoming support environment.

Optimus SBR’s Data Practice

Optimus SBR provides data advisory services customized to support the needs of public and private sector organizations. We offer an end-to-end solution, from data strategy and governance to visualization, insights, and training.

Contact a Data Expert if you need help assessing your environment and deciding which essential data analysis tools and platforms best meet your needs.

 

Optimus Think


PreviousPrevious