AI: From Data Traps to a Revolution in Thinking

Fundacja Dobre Państwo • January 2, 2026 • 🇵🇱 Polski

👤 About the Author

Bogusław Wolniewicz

Introduction

Artificial intelligence is not just a promise of revolution, but also a minefield of technical pitfalls. From data errors and ethical challenges to fundamental questions about the nature of thinking, AI compels us to revisit existing paradigms. This article analyzes the journey of artificial intelligence from practical engineering problems to its role in redefining human rationality, showing how technology becomes a mirror reflecting our own limitations and aspirations.

ML Pitfalls: Preventing Data Errors

The effectiveness of AI begins with data discipline. The most common pitfalls include improper dataset splitting, data leakage (leakage of labels into features), inadequate metrics (e.g., accuracy in rare event detection), and domain mismatch between training and production environments. Avoiding these errors requires rigorous practices: dataset isolation, feature auditing, metric selection aligned with error costs, and sensitivity tests to context changes.

In practice, as with financial fraud detection, the system must operate in real-time. An effective solution combines tabular models (e.g., gradient boosting) with a graph component for detecting anomalies in networks of connections. The key here is not so much the model's complexity as its reliability and ability to gracefully degrade in case of failure.

Multimodal Models and RAG: A Revolution in AI Capabilities

New architectures radically expand AI capabilities. Multimodal models, like CLIP, learn to combine different data types (text, image, sound) into a shared semantic space, bringing machine cognition closer to human understanding. Meanwhile, RAG (Retrieval-Augmented Generation) technology solves the problem of frozen knowledge in language models. It connects them with external databases, ensuring that responses are anchored in current and verifiable sources.

However, these tools introduce new challenges. Content moderation becomes one of the most difficult problems at the intersection of AI and society. An algorithm must distinguish hate speech from satire, a task that is almost philosophical in nature. Since hate speech constantly evolves, effective moderation requires adaptive systems, transparent processes, and human oversight.

AI Redefines Human Rationality

Artificial intelligence redefines our understanding of rationality. Instead of a Platonic ideal based on ironclad logic, AI embodies Herbert Simon's concept of bounded rationality – it operates heuristically, seeking a compromise between accuracy and cost. Its operation can also be compared to Kahneman's thinking systems: neural networks resemble the fast System 1, while symbolic algorithms are akin to the slow and analytical System 2.

Moreover, algorithms are becoming hidden political mechanisms. As Arrow's impossibility theorem demonstrates, no ideal system for aggregating preferences exists. Similarly, every recommendation algorithm is biased – the choice of optimization metric is a political decision that shapes our collective reality.

Conclusion

AI emerges from a long history of intellectual revolutions – from Aristotle's logic and Pascal's probability calculus to Wiener's cybernetics. Philosophers like Lem, however, warned that intelligence reduced to information is empty – machines can reason, but they lack intention. Today, AI is becoming a new cultural myth, and even a form of sacrum. Narratives about superintelligence organize collective emotions and justify investments. AI is therefore not just a technology, but also a ritual that compels us to redefine the boundaries of knowledge and values.

📄 Full analysis available in PDF

📖 Glossary

Data leakage: Sytuacja, gdy informacje z danych testowych nieintencjonalnie 'przeciekają' do danych treningowych, prowadząc do zawyżonych wyników modelu w testach, ale słabej wydajności w rzeczywistości.
Multimodalność: Zdolność systemu sztucznej inteligencji do przetwarzania i integrowania różnych typów danych, takich jak tekst, obraz i dźwięk, aby lepiej rozumieć i oddziaływać ze światem.
Retrieval-Augmented Generation (RAG): Technika, w której model językowy jest wzbogacony o mechanizm wyszukiwania zewnętrznych dokumentów w czasie rzeczywistym, co pozwala mu generować odpowiedzi oparte na aktualnych i weryfikowalnych informacjach.
Dryf domeny (Domain Shift): Problem występujący, gdy dane, na których model był trenowany, różnią się znacząco od danych, z którymi spotyka się w środowisku produkcyjnym, co obniża jego skuteczność.
Sieci grafowe (Graph Neural Networks): Architektury uczenia głębokiego zaprojektowane do przetwarzania danych o strukturze grafu, czyli zbioru obiektów (węzłów) połączonych relacjami (krawędziami), co pozwala wykrywać złożone zależności.
Transformery: Rodzaj architektury sieci neuronowych, która zrewolucjonizowała przetwarzanie języka naturalnego i umożliwiła rozwój dużych modeli językowych, dzięki mechanizmowi uwagi (attention).
PR-AUC (Precision-Recall Area Under the Curve): Metryka ewaluacyjna używana w uczeniu maszynowym, szczególnie w problemach z niezbalansowanymi klasami, która mierzy kompromis między precyzją a czułością modelu.
Teoria wyboru społecznego: Dziedzina badająca, jak preferencje indywidualne mogą być agregowane w decyzje zbiorowe, często pokazując trudności w osiągnięciu idealnej, racjonalnej decyzji grupowej.

Frequently Asked Questions

What are the main pitfalls in AI data and architectures?

The main pitfalls are data leakage, inadequate metrics, domain shift between training and production, and problems in the architectures themselves, such as over-smoothing in graph networks or context constraints in transformers.

What is 'data leakage' and why is it dangerous?

Data leakage occurs when labels or other information from test data subtly leak into training data. It's dangerous because it leads to artificially inflated model results during training and spectacular failures in real-world implementations.

Why are traditional metrics like accuracy insufficient for some AI applications?

In applications with imbalanced classes, such as detecting rare events like fraud, high accuracy can be misleading. In such cases, metrics such as PR-AUC or F-measure are more important, as they better reflect the costs of errors for the key positive class.

What technological solutions help AI cope with data and knowledge limitations?

The solution is Retrieval-Augmented Generation (RAG), which enriches language models with an external real-time search mechanism and multimodality, allowing AI to process and combine different types of data to obtain a more complete picture of reality.

How does multimodality change the capabilities of artificial intelligence?

Multimodality allows AI to integrate information from various digital senses (text, images, sound), bringing it closer to human cognition. This allows models to better understand context, search for images based on text, and generate visualizations based on linguistic commands.

How does social choice theory relate to the operation of AI algorithms?

Social choice theory, especially Arrow's impossibility theorem, shows that no perfect preference aggregation system exists. Recommendation algorithms or rating systems in AI operate as "micro-democracies," where the choice of metrics to optimize is always a political choice, not a neutral technical one.

🧠 Thematic Groups

Wyzwania i pułapki w systemach AI: Omówienie typowych problemów związanych z danymi (nieszczelny podział, data leakage, dryf domeny) oraz architekturami (nadmierne wygładzanie, długość kontekstu w transformerach), a także kwestie doboru metryk ewaluacyjnych, które mogą prowadzić do błędnych wniosków i porażek wdrożeniowych.
Praktyczne zastosowania i rozwiązania w AI: Przykłady wdrożeń AI w detekcji oszustw transakcyjnych, koncepcje multimodalności i modeli Retrieval-Augmented Generation (RAG) jako sposoby na zwiększenie użyteczności, rzetelności i zdolności adaptacyjnych systemów sztucznej inteligencji w dynamicznym środowisku.
Filozoficzne i społeczne aspekty sztucznej inteligencji: Refleksja nad redefinicją racjonalności, znaczeniem przyczynowości (Judea Pearl), entropią (Norbert Wiener), teorią wyboru społecznego (Kenneth Arrow) oraz etyką moderacji treści i odpowiedzialnością instytucjonalną, podkreślając, że AI to nie tylko technologia, ale także system społeczny.

Tags: Machine learning Data leakage Evaluation metrics Domain drift Graph networks Transformers Fraud detection Multimodality Retrieval-Augmented Generation (RAG) Content moderation AI Rationality Social choice theory AI Architecture Data Discipline Semantic evaluation