HiFun – A High Level Functional Query Language for Big Data Analytics

There are many systems developed today for the parallel processing of big datasets. Each of these systems is carefully optimized in accordance with the final application goals and constraints.
However, their evolution has resulted in an array of solutions catering to a wide range of diverse application environments. Unfortunately, this has also fragmented the big data solutions that are now adapted to particular types of applications. At the same time, applications have moved towards leveraging multiple paradigms in conjunction, for instance combining real time data and historical data.

This has led to a pressing need for solutions that seamlessly and transparently allow practitioners to mix different approaches that can function and provide answers as an all-in-one solution.
Based on these observations, the overall objective of our work is to separate clearly the conceptual and the physical level so that one can express analysis tasks as queries at the conceptual level independently of how their evaluation is done at the physical level. To achieve this objective we propose (a) a high level functional query language, called HiFun, in which we can formulate queries and study their properties at the conceptual level and (b) mappings to existing evaluation mechanisms (e.g. Hadoop or SQL) which perform the actual evaluation of queries. In other words, we propose a language which is agnostic of the application environment as well as of the nature, structure and location of data.

Based on the abstract definitions, we propose a formal approach to the rewriting of analytic queries and the generation of query execution plans. We demonstrate the practical use of our language by showing how queries in HiFun can be mapped as MapReduce jobs in Hadoop and as group-by queries in SQL. Additionally, we show how our approach can leverage semantics in data in order to improve performance. We emphasize that, although theoretical in nature, our work uses only basic and well known mathematical concepts, namely functions and their basic operations.

Biosketch

Nicolas Spyratos (spyratos@lri.fr) received his BEng degree from the National Technical University of Athens, Greece, his M.Sc. degree from the University of Ottawa, Canada, his Ph.D degree from Carleton University, Canada and his “thèse d’état” from the University of Paris South 11, France. He worked as a researcher for Bell-Northern Research in Canada, and for INRIA and the National Research Council (CNRS) in France, prior to joining the University of Paris South as a full professor in 1983, where he was heading the database group from 1985 to 2011.

He is currently Professor Emeritus at the University of Paris South, scientific advisor of Japan Science and Technology (JST), and affiliated senior scientist, at the Institute of Computer Science of Crete, in Greece (http://www.ics.forth.gr/). His research interests include databases, big data analytics, conceptual modeling and digital libraries. He is the author of over 200 articles in international journals, books and conferences, and has supervised the work of 24 PhD students. He has also served on the program committee of over 100 international conferences, he has participated in over 25 national, European and international research projects and has served as evaluator for the NSF and the European commission.