Posted by Saurabh Jain
August 11, 2020
As a Senior Business Intelligence consultant with extensive data preparation experience, clients often ask me to recommend a data prep/analytics platform. Whether I recommend KNIME or Alteryx depends on the project requirements and data sources because each product has specific capabilities, pros, and cons.
In this blog, I’m going to look at the two tools side by side in terms of their ETL capabilities:
KNIME is a powerful, free, open source ETL and BI tool.
Alteryx is a commercially licensed, self-service, analytic process automation platform with capabilities of ETL and complex analytics including predictive, spatial and statistical analysis.
ETL tools, at the core, enable loading data from multiple data sources, combining and transforming them into a format that can then be loaded into a database for further querying. Beyond these primary functions, many of these tools contain a wide set of extra features. These can consist of everything from data analytics tools such as predictive modeling features, including the ability to create graphics, charts and full-fledged dashboards.
KNIME is an open source analytics platform that is growing continuously by integrating new developments. KNIME provides a GUI (graphical user interface) based platform where reusable workflows can be built quickly by simple drag and drop to perform ETL, business intelligence analytics and machine learning. It’s a popular tool because it provides functionality ranging from natural language processing, text mining and information retrieval that reads, processes, mines and visualizes textual data.
Alteryx is an analytic process automation platform that provides automation capabilities for all analytics functions (ETL, diagnostic, predictive, prescriptive, and geospatial analytics). It combines code-free and code-friendly data science, machine learning, artificial intelligence, and business process automation in one platform. One of Alteryx’s differentiators is adding location intelligence through easy to use spatial analytical tools.
Both Alteryx and KNIME use a workbench sort of approach.
KNIME: There is a list of nodes (i.e., tools) in a repository, divided into different segments. Each node can be dragged on to the canvas and can be connected by a line from an output to an input of a similar or a different node. By double clicking or by right clicking on a node, you can configure the node based on the functionality.
Alteryx: Interface is quite similar to that of KNIME. Tools are grouped into an understandable color-coded category such as In/Out, Data Preparation etc. on the top of the application. For each tool clicking by category opens the interface, which can be expanded or closed as needed.
While both interfaces appear similar, navigating between nodes is considerably easier in Alteryx than with KNIME. With KNIME one can end up with many windows all over the place, consuming a lot of memory which can be a factor on a slow computer. Purely from interface perspective, Alteryx has a more intuitive workflow.
Both tools can pull data from a wide set of sources ranging from CSV, databases or from cloud sources.
KNIME: By double clicking or right clicking on a node we can see all configuration options. Below is an example of file reader, opening a CSV file. While loading, you can see the preview of the data and can modify it, which includes file types and more. It also gives the ability to filter not only for rows but also for columns, which can be useful if the data has a range of keys in the database or any data source.
Alteryx: It provides an easy drag and drop and selection of datatypes in a visual way. After connecting the input/output tool to a database or a file we are automatically provided with a quick visualization with comprehensive information about each data type.
Alteryx’s Data cleansing tool is easy to understand. Modifying data types is easy as it uses a drop down to select data types. Conversely, data type conversion in KNIME is a time-consuming process. Alteryx’s data prep tool seems a bit superior to KNIME although KNIME has a slight edge when it comes to filtering data by columns.
Both Alteryx and KNIME provide great tools for combining data, however the ease of use varies between them.
KNIME: The join node is easy to understand and can combine datasets off a shared identifier. We can choose the columns from each data set in the result. Some of the joins such as left and right joins can be a bit tricky since there are no options for left or right join in the JOIN node.
Alteryx: The Join tool works similarly to KNIME. Users can simply choose the identifier to link the datasets together. It allows users to construct SQL queries without writing a line of code. However, Alteryx has included some higher level-functionality by combining analysis tools that make sense to work together. However, this doesn’t always provide the expected user-experience, which can cause trouble with larger data sets. For example, Alteryx chose a method for simplifying the analyst effort by using UNION features in SQL with the JOIN tool (including inner and left/right joins). This means Alteryx performs all the joins without specifying a join type. This is great for smaller datasets but for larger data sets this could be time and resource consuming as Alteryx needs to process all joins to complete the processing. KNIME, on the other hand, is completely modular and treats each activity as a discrete activity in a separate node and can be completely managed by the ETL designer This requires the ETL designer to be more technically proficient in KNIME, but allows more control and can avoid some of the excessive processing and resource consumption in large data sets that occur in Alteryx.
Both tools have great and easy to use joining and data manipulation capabilities to combine data sets on the fly. However, when working with larger datasets, KNIME comes out stronger since it treats all the activities discreetly.
Both tools provide built-in predictive analytics capabilities which are useful in analyzing preliminary trends, which can be later used for more in-depth analytics.
KNIME: KNIME is quite strong when it comes to different predictive and analytical nodes. Since KNIME is open source, many developers have created a wide range of plugins and adapters to be able to use for many already existing functionalities. KNIME also has many statistical tools as can be seen in the following image .
Alteryx: It does not offer the same range of analytical tools that KNIME offers, but it does include a few useful data investigation tools such as Pearson and Spearman correlation. It also uses various analytical models and some simulation sampling.
KNIME has more tools and has an advantage over Alteryx when it come to analytical models and machine learning since it is an open source application that has a large population of independent developers creating additional tools.
Both the tools are extremely powerful when it comes to ETL; however, the selection of the tool largely depends on the intent and the user. If the user is looking for a user-friendly tool which can handle necessary data preparation by semi-technical or technical users, Alteryx offers an advantage. On the other hand, if the user is looking for some heavy analytical options and has a more technical understanding of “data”, KNIME comes out stronger in terms of its capabilities. Also KNIME is open source and a newer tool that has a medium-size user community; Alteryx has a larger community, is more adaptive to users, and has a more welcoming support environment.