Interview KNIME Fabian Dill - SmartData Collective

We have covered KNIME.com ’s open source platform earlier. On the eve of it’s new product launch, co-founder of Knime.com Fabian Dill reveals his thoughts in an exclusive interview.

From the Knime.com website

The modular data exploration platform KNIME, originally solely developed at the University of Konstanz, Germany, enables the user to visually create data flows – or pipelines, execute selected analysis steps, and later investigate the results through interactive views on data and models. KNIME already has more than 2,000 active users in diverse application areas, ranging from early drug discovery and customer relationship analysis to financial information integration.

Ajay – What prompted you personally to be part of KNIME and not join a big technology company? What does the future hold for KNIME in 2009-10?

We have covered KNIME.com ’s open source platform earlier. On the eve of it’s new product launch, co-founder of Knime.com Fabian Dill reveals his thoughts in an exclusive interview.

From the Knime.com website

The modular data exploration platform KNIME, originally solely developed at the University of Konstanz, Germany, enables the user to visually create data flows – or pipelines, execute selected analysis steps, and later investigate the results through interactive views on data and models. KNIME already has more than 2,000 active users in diverse application areas, ranging from early drug discovery and customer relationship analysis to financial information integration.

Ajay – What prompted you personally to be part of KNIME and not join a big technology company? What does the future hold for KNIME in 2009-10?

Fabian -I was excited when I first joined the KNIME team in 2005. Back then, we were working exclusively on the open source version backed by some academic funding. Being part of the team that put together such a professional data mining environment from scratch was a great experience. Growing this into a commercial support and development arm has been a thrill as well. The team and the diverse experiences gained from helping get a new company off the ground and being involved in everything it takes to enable this to be successful made it unthinkable for me to work anywhere else.

We continue to develop the open source arm of KNIME and many new features lie ahead: text, image, and time series processing as well as better support for variables. We are constantly working on adding new nodes. KNIME 2.1 is expected in the fall and some of the ongoing development can already be found on the KNIME Labs page (http://labs.knime.org)

The commercial division is providing support and maintenance subscriptions for the freely available desktop version. At the same time we are developing products which will streamline the integration of KNIME into existing IT infrastructures:

the KNIME Grid Support lets you run your compute-intensive (sub-) workflows or nodes on a grid or cluster;
KNIME Reporting makes use of KNIME’s flexibility in order to gather the data for your report and provides simplified views (static or interactive=dashboards) on the resulting workflow and its results; and
the KNIME Enterprise Server facilitates company-wide installation of KNIME and supports collaboration between departments and sites by providing central workflow repositories, scheduled and remote execution, and user rights management.

Ajay -Software as a service and Cloud Computing is the next big thing in 2009. Are there any plans to put KNIME on a cloud computer and charge clients for the hour so they can build models on huge data without buying any hardware but just rent the time?

Fabian – Cloud computing is an agile and client-centric approach and therefore fits nicely into the KNIME framework, especially considering that we are already working on support for distributed computing of KNIME workflows (see above). However, we have no immediate plans for KNIME workflow processing on a per-use charge or similar. That’s an interesting idea, though. The way KNIME nodes are nicely encapsulated (and often even distributable themselves) would make this quite natural.

Ajay – What differentiates KNIME from other products such as RPro and Rapid Miner, for example? What are the principal challenges you have faced in developing it? Why do customers like and dislike it?

Fabian- Every tool has its strengths and weaknesses depending on the task you actually want to accomplish. The focus of KNIME is to support the user during his or her quest of understanding large and heterogeneous data and to make sense out of it. For this task, you cannot rely only on classical data mining techniques, wrapping them into a command line or otherwise configurable environment, but simple, intuitive access to those tools is required in addition to supporting visual exploration with interactive linking and brushing techniques.

By design, KNIME is a modular integration platform, which makes it easy to write own nodes (with the easy-to-use API) or integrate existing libraries or tools.

We integrated Weka, for example, because of its vast library of state-of-the-art machine learning algorithms, the open source program R – in order to provide access to a rich library of statistical functions (and of course many more) – and parts of the Chemistry Development Kit (CDK). All these integrations follow the KNIME requirements for easy and intuitive usage so the user does not need to understand the details of each tool in great depth.

A number of our commercial partners such as Schroedinger, Infocom, Symyx, Tripos, among others, also follow this paradigm and similarly integrate their tools into KNIME. Academic collaborations with ETH Zurich, Switzerland on the High Content Screening Platform HC/DC represent another positive outcome of this open architecture. We believe that this strictly result-oriented approach based on a carefully designed and professionally coded framework is a key factor of KNIME’s broad acceptance. I guess this is another big differentiator: right from the start, KNIME has been developed by a team consisting of SW developers with decades of industrial SW engineering experience.

Ajay – Any there any Asian plans for KNIME? Any other open source partnerships in the pipeline?

Fabian – We have a Japan-based partner, Infocom, who operates in the fields of life science. But we are always open for other partnerships, supporters, or collaborations.

In addition to the open source integrations mentioned above (Weka, R, CDK, HC/DC), there are many other different projects in the works and partnerships under negotiation. Keep an eye on our blog and on our Labs@KNIME page (labs.knime.org).

ABOUT

KNIME – development started in January 2004. Since then: 10 releases; approx. 350,000 lines of code; 25,000 downloads; an estimated 2000 active users. KNIME.com was founded in June 2008 in Zurich, Switzerland.

Fabian Dill – has been working for and with KNIME since 2005; co-founder of KNIME.com.

More Read