Data Science: A Primary Enabler to an Academic Grid

GRID 2020 Abstracts

Data Science:
A Primary Enabler to an Academic Grid

Prof. Niv Ahituv
Coller School of Management, Tel Aviv University

Keynote lecture
Monday, November 23, 2020 | 10:30-11:00

Most of the research endeavors performed now-a-days are multidisciplinary in nature. For instance, a research project on global warming requires the collection, processing and presentation of integrated data from: Climatology, Geology, Oceanography, Zoology, Botanic, Geophysics, and more. It is well known that each discipline maintains its own taxonomy, own records, own data definitions, own meta-data, own frequency of data recording, own Big Data, own algorithms, and the like. How can we handle an academic grid if we cannot share the same paradigms of data collection, storing, integration, processing and visualization? The new academic discipline of Data Sciences (DS) has been developed in recent years mainly because of the need to make decisions based on huge amounts of data -- Big Data. In parallel, there has been a huge progress in the development of technologies that enable to identify patterns, to filter big data, and to provide relevant meanings to information, due to machine learning and sophisticated inference techniques. The profession of Data Scientist (or Data Analyst) has become highly demanded in recent years. It is required in the business sector where data is the “oxygen” for business survival; it is needed in the governmental sector in order to improve its services to the citizens (as well as “Big Brother” tools); and it is very imperative in the scientific world, where large data depositories collected in varied disciplines have to be integrated, mined and analyzed, in order to enable interdisciplinary research. The purpose of this talk is to demonstrate how to build an academic program of Data Sciences to prepare data analysts for the business, public, government, and academic sectors. The talk first delineates the Data Cycle, which portrays the transformation of data and their derivatives along the route from data generation to decision making. The cycle includes the following stages: problem definition identifying pertinent data sources data collection, and storing (including cleansing and backup) data integration data mining processing and analysis visualization learning and decision-making feedback for future cycles. Within this cycle, there might be sub cycles, where a number of stages are repeated and reiterated. It should be noted that the data cycle is generic. It might have slight variations under various circumstances; however, there is not much difference between the cycles in all disciplines. Each stage within the cycle requires different tools, namely hardware and software technologies that support the stage. This talk classifies these tools. The final part of the talk suggests a typology for academic DS programs. It outlines an academic program that will be offered to those wishing to practice the Data Analyst profession as well as those wishing to pursue an academic career. An introductory course that should be mandatory to all students campus-wide is also sketched.