{"id":80,"date":"2026-06-25T09:10:54","date_gmt":"2026-06-25T08:10:54","guid":{"rendered":"https:\/\/www.oceanblogs.org\/m219\/?p=80"},"modified":"2026-06-25T09:11:00","modified_gmt":"2026-06-25T08:11:00","slug":"ocean-of-data","status":"publish","type":"post","link":"https:\/\/www.oceanblogs.org\/m219\/2026\/06\/25\/ocean-of-data\/","title":{"rendered":"Ocean of Data"},"content":{"rendered":"\n<p>By Qi-Fan Wu (Niels Bohr Institutet, University of Copenhagen)<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"587\" src=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_1_reduced-1024x587.jpg\" alt=\"\" class=\"wp-image-81\" srcset=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_1_reduced-1024x587.jpg 1024w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_1_reduced-300x172.jpg 300w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_1_reduced-768x440.jpg 768w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_1_reduced.jpg 1439w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In 1943, when Warren McCulloch and Walter Pitts showed that neurons could be represented by simple electrical circuits, they laid the first foundation for machines that could learn, adapt, and predict. In 2023, when ChatGPT became widely used, my <em>Introduction to Python<\/em> professor found that it could answer every question correctly on his course exam. In the history of machine learning, there has been a repeated oscillation between \u201cextremely high expectations\u201d and \u201cdeep skepticism.\u201d What is machine learning? What should we expect from machine learning, and when should we be skeptical about it? Should the same principle also be applied to other numerical models?<\/p>\n\n\n\n<p>The goal of machine learning is to make computers \u201clearn\u201d from \u201cdata\u201d. From an end user&#8217;s perspective, it is about understanding your data, making predictions and decisions. Intellectually, it is a collection of models, methods and algorithms that have evolved over more than a half-century now <strong>[e]<\/strong>.\u00a0 Just as the human brain, neural networks, as one of the most popular machine learning methods, are theoretically capable of learning complex relationships from data. Theoretically, Neural Networks can compute any function in the world. No matter what the function is, there is guaranteed to be a neural network so that for every possible input x, the output value f(x) (or some close approximations) is output from the network (Figure 1). This result holds even if the function has many inputs and many outputs <strong>[a]<\/strong>. However, universal approximation only describes what neural networks are capable of, while the actual goal of machine learning is to fit an unknown function from a finite set of samples, ideally faster than traditional numerical methods.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"945\" src=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_2-1024x945.png\" alt=\"\" class=\"wp-image-82\" srcset=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_2-1024x945.png 1024w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_2-300x277.png 300w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_2-768x709.png 768w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_2.png 1430w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 1: This figure explains how neural networks turn inputs into outputs and approximate complex functions. The upper panel is a comparison between a biological neuron and an artificial neural unit, showing how weighted inputs, a threshold, and the sigmoid function transform inputs into an output that represents the neuron\u2019s activation level. The lower panel is a visual construction showing how sigmoid neurons can approximate a continuous function. The sigmoid function maps any real input to (0,1), with large weights it behaves like a step function, two steps form a bump, and many bumps can be added to approximate a target function before the final sigmoid produces the neural-network output.<\/em><\/figcaption><\/figure>\n\n\n\n<p>In this blog, however, I do not want to focus on large language models that help with writing, coding, and basic background research. Instead, I want to discuss the training and use of special-purpose AI models, such as neural networks, for solving problems in physics, which is also the main topic of my PhD project. Nowadays, an increasing number of scientists are working on AI-related topics, including climate physics. If we think of the physical world as a forward dynamics model, then given the current state and the action to be taken, machine learning aims to predict the next state, while the entire world can be viewed as a huge digital database.<\/p>\n\n\n\n<p>However, after the initial \u201cextremely high expectations,\u201d machine learning has also raised \u201cdeep skepticism\u201d. In physics, especially climate physics, the \u201cclose approximations\u201d mentioned earlier, together with the lack of standardized workflows, are often the source of trouble. The figure below shows rather discouraging results from reproducing ML-for-PDE-solving studies using stronger baselines.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"913\" src=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_3-1024x913.png\" alt=\"\" class=\"wp-image-83\" srcset=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_3-1024x913.png 1024w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_3-300x267.png 300w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_3-768x685.png 768w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_3.png 1384w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 2: This figure summarizes results from <strong>[b]<\/strong>, a review of AI methods for solving fluid-mechanics partial differential equation (PDE) compared with standard numerical methods. Upper panel shows distribution of reported AI performance claims across baseline types. Although most studies reported faster performance, most of these positive results were based on comparisons against weak baselines rather than strong numerical methods. Lower panel shows examples from papers showing how reported AI advantages change when weaker baselines are replaced by stronger numerical baselines. In many cases, the claimed speedup is substantially reduced, disappears, or becomes slower under the stronger comparison.<\/em><\/figcaption><\/figure>\n\n\n\n<p>As the quote attributed to von Neumann goes, \u201cWith four parameters I can fit an elephant, and with five I can make him wiggle his trunk.\u201d All models are wrong, including physics-based numerical methods and climate models, but many are useful because different well-performing models can still reveal different aspects of the same physical system <strong>[c]<\/strong>. \u00a0The physicist Paul Dirac reached a similar conclusion long ago: due to the limitations of human cognitive ability, scientific theories cannot be both closed and complete at the same time <strong>[d]<\/strong>. This means that we cannot have perfect, exact theories. He saw that theories based on approximations could sometimes have a considerable amount of beauty in them, and he began to infer that perhaps all theories of nature are, ultimately, only approximations <strong>[d]<\/strong>. Personally, I think the same rule could apply to machine learning models, and indeed to all models.<\/p>\n\n\n\n<p>My journey on the METEOR made me appreciate the importance of data even more from a modeler\u2019s perspective, and it deepened my belief that the people who create datasets deserve more applause and respect from the entire scientific community. Because of model uncertainty, data, especially observational data, become extremely important for understanding reality. Nature itself is the ultimate database, and its ocean of data is too vast to be compressed into a single dataset.<\/p>\n\n\n\n<p>Machine learning models and climate reanalysis systems require these high-quality data to be reliable in real-world applications. Traditional numerical weather prediction and climate models, including general circulation models, have comprehensive physical foundations but require enormous computational resources, have limited spatial resolution, and struggle to integrate multi-source observations such as station, satellite, and radar data <strong>[e]<\/strong>. Although AI-based weather models have developed rapidly in recent years, they still suffer from inconsistent training datasets, time periods, and regions, varying evaluation metrics, and a lack of standardized code and experimental workflows &#8211; issues similar to those previously mentioned for AI-based approaches to solving PDEs in fluid mechanics <strong>[e]<\/strong>. Under these circumstances, data collected by METEOR, along with all observational data, are necessary for accurately modeling weather and climate, as well as for developing model architectures for them (Figure 3). A good model should embody a trinity of observational results, physical insight, and mathematical formalism. These three aspects should correspond perfectly, with no redundancy.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_4_reduced-1024x1024.jpg\" alt=\"\" class=\"wp-image-84\" srcset=\"https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_4_reduced-1024x1024.jpg 1024w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_4_reduced-300x300.jpg 300w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_4_reduced-135x135.jpg 135w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_4_reduced-768x768.jpg 768w, https:\/\/www.oceanblogs.org\/m219\/wp-content\/uploads\/sites\/111\/2026\/06\/Blog_8_Figure_4_reduced.jpg 1430w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 3: Schematic of the global observing system used to collect data for modeling. Adapted from <strong>[f]<\/strong>.<\/em><\/figcaption><\/figure>\n\n\n\n<p>But what exactly can we do to make progress under data-limited conditions? And which scientifically important problems can be clearly formulated and addressed within an analytical modeling framework? These questions remain like dark clouds hanging over scientists working in related fields. A model that is mathematically beautiful and physically simple may still be inconsistent with observations. Some models considered correct may be mathematically unattractive, and their physical mechanisms may not be clearly explained either. My personal opinion is that, when dealing with this kind of situation in the age of AI, we may still need to rely on our own intuition (and even guessing), trying to understand reality with the help of many scientists who use observations and models to sail by night and expand the boundaries of human knowledge through tiny steps.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong><span style=\"text-decoration: underline\">Note:<\/span><\/strong> Artificial Intelligence\u2019s (AI) stated goal is to mimic human behavior in an intelligent manner, and to do what humans can do, which includes artificial \u201ccreativity\u201d like driving cars, playing games, responding to consumer questions, etc. In that sense, AI seeks to create muscle and mind of humans, and mind requires learning from data, i.e. Machine Learning. However, Machine Learning helps learn from data beyond mimicking humans. Having said that, the boundaries between AI and ML are getting blurry day-by-day.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>References:<\/p>\n\n\n\n<p>[a] Charniak, E. An Introduction to Deep Learning. Cambridge, MA: MIT Press, 2019; 192.<\/p>\n\n\n\n<p>[b] Nick McGreivy. I got fooled by AI-for-science hype\u2014here&#8217;s what it taught me. 2025. https:\/\/www.understandingai.org\/p\/i-got-fooled-by-ai-for-science-hypeheres<\/p>\n\n\n\n<p>[c] Fisher, A.; Rudin, C.; Dominici, F. All Models Are Wrong, but Many Are Useful: Learning a Variable\u2019s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research 2019, 20(177), 1\u201381.<\/p>\n\n\n\n<p>[d] Dirac, P. A. M. The Principles of Quantum Mechanics. Oxford University Press: Oxford, 1930.<\/p>\n\n\n\n<p>[e] Bansal, H.; Grover, A.; Jewik, J.; Nguyen, T.; Sharma, P. ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling. In Advances in Neural Information Processing Systems 36; 2023; pp 75009\u201375025. https:\/\/doi.org\/10.52202\/075280-3279.<\/p>\n\n\n\n<p>[f] Global Observing System (GOS). World Meteorological Organization. https:\/\/community.wmo.int\/site\/knowledge-hub\/programmes-and-initiatives\/global-observing-system-gos<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Qi-Fan Wu (Niels Bohr Institutet, University of Copenhagen) In 1943, when Warren McCulloch and Walter Pitts showed that neurons could be represented by simple electrical circuits, they laid the first foundation for machines that could learn, adapt, and predict. In 2023, when ChatGPT became widely used, my Introduction to Python professor found that it [&hellip;]<\/p>\n","protected":false},"author":272,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-80","post","type-post","status-publish","format-standard","hentry","category-at-sea"],"_links":{"self":[{"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/posts\/80","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/users\/272"}],"replies":[{"embeddable":true,"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/comments?post=80"}],"version-history":[{"count":1,"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/posts\/80\/revisions"}],"predecessor-version":[{"id":85,"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/posts\/80\/revisions\/85"}],"wp:attachment":[{"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/media?parent=80"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/categories?post=80"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.oceanblogs.org\/m219\/wp-json\/wp\/v2\/tags?post=80"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}