Data science in many ways could be called a study of the tools and computer science languages needed to apply statistics. Data science, computer assisted or not, however could also be defined as the process of analyzing information to make the best decisions, that is decisions that will facilitate the best outcomes for those involved. No decision in reality ever escapes the claws of statistics, regardless, outcomes for the most part always can be predicted as long as all the statistics variables and non-compromised data have been considered, so goes the theory. Contrary to that is that there is always a chance that a forgotten unknown variable exists and that those doing a statistical study have little understanding of how variables interact with each other, which makes predicting outcomes difficult.
Without computers, statistics programing languages like R, and statistics analysis programs like SPSS, data science would still exist. However in that case, one would be reduced to collecting data from surveys and doing the statistics calculation by hand, or to simplify things, by inference. Everyone makes inferences whether grounded with good data and statistics or not. Many make decisions, for example on the best diet for the longest life, the best career for financial stability or how best to behave in an job interview. For many people, the facts or data to make these decisions doesn’t come from statistics studies with thousands of participants, but ad hoc information they have read, experienced first hand or they have heard to be true.
In the world of business, politics, medicine, statistical driven decisions might be whether or not to develop a product or too find out if a health product does what it claims or in the case of a politician, find out if a stance on a political issue will affect chances for reelection This all is in some way static statistical analysis, dynamic statistical analysis tries to predict the future by looking at probability of events that might in the end change whatever the current statistics study suggests. For example, for years the office rental market, from a long term statistical perspective, had a good return on investment. The pandemic of 2019 to 2021,resulted in the emergence of remote work which had a large effect on the office rental market, a market that had been a source of income for pension funds for years. The event, the pandemic changed everything, just like an earthquake or climate change might. Unmodeled statistic variables are often just a fact of life.
The primary requirements for data science curriculums are statistics, computer science and mathematics classes. For those that plan to apply their data science skills to business applications, students often will elect to get a business degree. Today it is not uncommon for those that seek data science degrees to also obtain a degree in business, psychology, engineering, medicine, health, sports or nutrition. The idea is that the core statistics you learn will need to be supplemented with computer science and a knowledge of the field you intend to apply the statistics to. The computer science classes, such as Python and R programming, machine learning, inference analysis, artificial intelligence, that one must take for data science, are however not just merely tools you need to know to analyze statistics with. In their defense, many data science classes enhance ones deductive and predictive ability, and that is because data science is also the study of algorithms, coded logic, that can be used to analyze small sets of static data or very large sets of real-time ever chaining data.
Its not that statistics always need the most advanced analysis programs to come to a good decision, its the fact that there is often an overwhelming amount of statistical data needed to make decisions in fields like business and healthcare. As a case in point, even though one could use intensive computer power to analyze weather data over the last hundred years to conclude that a summer day in Santa Monica is most likely to be between 65 and 75 degrees, there is really no need to. Just ask a few people in the town, their human experience is enough to find out what the temperature will most likely be on a summer day. But, of course, not always. Advanced supercomputers and advanced algorithms can’t always be relied on to economically predict common sense things. One thing people have in common with supercomputers and algorithms, is that they are programmed with people. They to, often could be missing a critical fact or variable or data point if you like.
So how should data science be defined. One definition is that statistics exists without math or computers, that is common sense observational statistics, its hotter in the summer. But data science does not exist without statistics. Data science helps people apply statistics concepts and theories and very much so when the data that needs to be analyzed is massive. Without the help of data science tools, the ability to analyze the streams of modern data and perform advanced statistics analysis would not exist. People then would have to use their common sense to analyze statistics that affect retail, ice cream cone sales increase on days that are over 90 degrees, but not always.