A knapsack-based entropy-clustering framework for multi-criteria decision making under epistemic uncertainty

[[Papers_Eval#EAAI|EAAI]] Received 5 October 2025; Received in revised form 7 February 2026; Accepted 23 February 2026

Keywords: AI, ML, Opt alg, MI, Hybrid Knapsack-Clustering, AI-based sustainable socio-economic welfare assessment

Abstract:

  • BG: Assessing sustainable socio-economic welfare using AI
    • Indicators: interdependent environmental, social, economic
    • Uncertatinty of their relative importance
    • Applied to cross-country sustainability and socio-economic data
  • Aim: AI-based decision support framework for BG ( assessment)
    • identifying most informative indicators
    • reveal structural differednces among countries
    • evaluate how alternative welfare representations affect national performance rankings.
  • Method: hybrid AI-based methodology
    • KP-based CO for variable selection
    • MI-driven clustering for structural grouping
    • MCDM techniques for performance evalution
  • Novelty:
    • treat welfare assessment as Information uncertainty and structural heterogeneity
    • uncovers distincet welfare profiles
    • country rankings vary systematically depending on sustainability-oriented or socio-economic-oriented indicators whether emphasized.
    • socio-economic profile exhibits greater dispersion and differentiation accross countries
    • sustainability-oriented profile produces cluster performance patterns
    • Sensitivity and robustness analysis confirm that differences are structurally driven.
    • Welfare Rankings are contigent on informational structure of indicators and highlight importance of variable selection in policy evaluation.
  • A generalized AI-enabled framework for high-dimensional decision analysis.
  • A trade-off between sustainability and socio-economic development.

Intro

Motivations

Sustainable socio-economic welfare, sustainability, sustainable development

  • a balanced approach to economic growth, social development, and environmental protection.
  • economic growth, social equity, environmental protection, and the sustainability of resources.

Socio-economic welfare,

  • The well-being of individuals and communities in terms of their economic and social conditions.
  • includes factors like income, employment, education, health, and quality of life

Measures:

  • green economy, renew able energy, sustainable agriculture, and green technologies
  • United Nations' Sustainable Development Goals are a major framework guiding research and policy.
  • developing new metrics to measure welfare in a sustainable context.
  • GDP, etc, are being supplementedor replaced by indicators that also account for environmentaland social factors
  • Research is focused on sustainable technologies, circular economy models, and innovation in policy and social practices

Aims

  • Practically, provides a structured,data-driven tool for understanding the trade-offs between socio-economic performance and environmental sustainability across countries for policymakers and sustainability analysts.
  • Methodologically, develops a hybrid approach that integrates KP-based variable selection, MI clustering, and MCDM to manage high dimensional welfare data and generate interpretable performance profiles.

Contributions

Limitations for now: interpretability and policy relevance

  • Indicator selection is typically treated as exogenous, relyhing on predefined frameworks (SGDs, composite welfare indices), with limited examination of how alternative indicator configurations may fundamentally alter welfare assessments.
    • Rankings are often presented as stable or objective, even though they implicitly embed normative assumptions about what dimensions of welfare matter most.
  • High-dimensional indicator sets introduces substantial informational redundancy, leading to overlapping signals that can distort rankings and mask meaningful heterogeneity across countries.
    • Most MCDM and composite-index approaches dont explicitly address the redundancy, while its direct implications for Uncertainty and Robustness.
  • Clustering methods typically employed to classify countries, are rarely integrated with ranking procedures in a way that reveals how structural groupings and performance evaluations interact under competing welfare narratives.
    • Clustering is indenpently of variable selection processes to meaningful structural differences between welare profiles.
  • Proor research offers little agreement on which variables should be prioritzed, lack of consensus introduces substantial nosie into welfare assessments and undermines comparability across coutries.
  • High-dimentional data without addressing the risk of multicolinearity and information overlap, which can distort rankings and lead to unstable policy conclusions.
  • MCDM applications treat indicator sets fixed without considering alternative variable configurations shift the rankings or introduce trade-off for policymakers.
  • Lack of sensitivity analysis limits interpretability and policy relevance of existing frameworks.

Contributions: reframing sustainable socio-economic welfare assessment as a decision problem under epistemic uncertainty(认识/认知不确定性)

  • Both The revlevance of indicatiors and The structure of country groupings are unkonwn a priori.
  • The framework can endogenously indentifies alternative welfare profiles while explicitily managing informational redundancy.
  • Then clustering and TOPSIS-based(MCDM) evaluation demonstrate how country rankings are not fixed and are contigent on informational structure of indicator set.
  • This study extends the foucus from constructing ever larger welfare indices to understanding how different representations of welfare systematically shape comparative performance and policy interpretation.

Objectives

  • a systematic variable-selection procedure
    • Capable of distinguish the relative importance of sustainability- and socio-economic-oriented indicators in welfare assessment.
  • Indentify how countires cluster and diverge under alternative indicator profiles
    • Clarifying the trade-off between su and se performence.
  • Evaluates how these diffrerent profiles affect country rankings and policy-relevant interpretations through an integrated MCDM approach.

Contributions:

  • a novel hybrid framework.
  • a structured way to seperate and compare su and se-oriented indicators sets
    • offering new insights into how variable selection shapes welfare evaluations
  • generates emperically grounded country clusters.
    • reveals distinct performance patterns and their underlying drivers.
  • demonstrates the policy implications of these differences
    • how alternative indicator emphases(指标重点) can materially alter welfare rankings and the strategic priorities that follow.
  • Methodological toolkit for welfare analysis & substantive understanding of how nations balance su and se development.

The hybrid knapsack–mutual information–clustering methodology reponds to gaps that can simultaneously

  • distinguish between essential and non-essential indicators,
  • reduce informational redundancy,
  • and expose how different variable profiles shape welfare outcomes. It provides a principled way to
  • identify the most informative variables,
  • to partition countries into meaningful groups based on distinct welfare dynamics,
  • and to explicitly compare how sustainability-oriented versus socio-economic-oriented profiles influence rankings.

Relevant theoretical foundations.

Theoretical foundations that explain how environmental, social, economic dimensions interact and be modeled analytically.

Theory

  • Welfare economics: emphasizing that well-being includes not only market output but also externalities, equity, and public goods.
    • motivates the need for comprehensive indicator systems that capture trade-offs among competing welfare objectives.
  • Sustainability theory: emphasizes intergenerational equity and the preserva tion of natural, social, and economiccapital.
    • underscores the necessity of integrating environmental indicators into welfare assessment frameworks.
  • Information theory: using entropy and MI to measure uncertainty, redundancy and informative relevance in complex datasets.
    • Assusing the augment that welfare dataset often contain overlapping information, requiring systematic varibale reduction to avoid biased or unstable evaluations.
  • Clustering theory: assumes that populations contain latent subgroups defined by shared structural characteristics
    • aligns with countries follow heterogeneous development paths.
    • Cluster-based analysis important.
  • MCDM theory: offers a nomative framework for evaluating alternatives when criteria confilce or scale differs.
    • TOPSIS and related methods to compare welfare profiles under different indicator configurations.

Details:

  • Factors influcing su&se welfare
    • Economic
      • Indicators such as GDP per capita, employment rates, and innovation capacity serve as key drivers of socio-economic welfare
      • Employment rate, quality of emploment, job security, inclusivity
      • Remittances contribute significantly to socio-economic stability, especially in developing economies, by supplementing household incomes and boosting consumption
      • multi-dimensional indicators such as financial inclusion to expand the measurement of se welfare. (FI-Index) based on banking penetration, disbursement, service access
    • Social
      • good governance and political stability
      • voice and accountability, control of corruption, and the democracy index
      • Effective governance
    • Enviromental
      • $CO_{2}$ emission per capita, energy consumption,
      • Efficient energy use, reduced emissions
  • Selection of relevant variables
    • GDP fails to account for environmental degradation and social inequalities.
    • MCDM for multi-dimensional frameworks incorperate economic, social, environmental indicators. for evaluating complex problems involving multiple variables and conflicting objectives. While distort outcomes due to redundancy or irrelevant varibales.
  • Knapsack Problem:
    • In the context of sustainable socio-economic welfare, the knapsack problem helps identify the most impactful variables while accounting for data availability and resource constraints. Can imporves the efficiency and reliability of MCDM models.
    • Different from PCA/FA, direct variable selection perserving the original indicators and policy interpretability. In welfare assessment, where stakeholders require transparent and actionable metrics rather than abstract composite factors.
    • Multiple-choice, confilct KP, in high-dimensional indicator problems by modern heuristics applicable to constrained variable selection.
  • Clustering techniques
    • used to group countries or reigions based on shared characteristics, enabling researchers to identify patterns and heterogeneity within datasets.
    • K-means, hierarchical clustering to classfiy countries based on welfare and sustainability indicators.
    • helps policymakers tailor strategies to specific country groups, as nations within the same cluster may share similar challenges and opportunities.
  • MI
    • enhances clustering by measuring the dependencies and relationships between variable.
    • ensuring the clustering process captures unique and relevant patterns by insights into the degree of association between selected and unselected varibales.
    • Minizing MI to reduce reduncancy and highlight distrinctive characteristics of each cluster, especially in high-dimensional data with complex interrelationships among variables.
    • It captures both linear and nonlinear dependencies between variables.
  • Integrating
    • variable selection + clustering techniques to hybrid approach
    • the TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) method has been widely used to rank alternatives based on their proximity to an ideal solution
  • TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution)
    • in sustainablematerials selection, industrial digital supplier evaluation for SMEs, complex problems such as smog mitigation strategies, SDG progressassessment,
    • evaluate complex supplier selection and economic performance scenarios
    • support operational optimization, resource allocation, and performance appraisal tasks.
    • has been used in transportation planning, energy policy assessment, risk evaluation, and socio-economic development analysis, confirming its suitability for problems requiring the ranking of alternatives based on proximity to ideal performance benchmarks.
    • usefulness of hybrid MCDM approaches for evaluating socio-economic development
  • Hybrid MCDM
    • accessing cross-country performance.
    • evaluate country-level drivers of investment attractiveness.

Methodology

  • Epistemic uncertatinty in MCDM
    • refers to uncertainty arising from incomplete knowledge about the relevance, redundancy, and structural relationships among welfare indicators
    • emerges from the coexistence of multiple plausible indicator sets, overlapping information across variables, and ambiguity regarding how sustainability and socio-economic dimensions should be represented.
    • not uniformly minimized
      • In variable selection stage, balance the uncertainty to perserve alternative welfare representations,
        • This stage addresses uncertainty regarding indicator relevance and informational redundancy, identifying alternative welfare representations under constrainted informational capacity. Entropy serves as the guiding metric ensuring that indicator selection reflects informational contribution rather than normative preference.
      • In clustering stage, reduced to clarify latent structural groupings
        • Indicator space is endogenously determined by variable selection stage, then MI-based clustering to uncover latent structural groupings among countries while minimizing overlapping information within clusters.
        • This stage addresses uncertainty related to cross-country heterogeneity and structural similarity which can not be resolved through ranking alone.
        • After selection stage then clustering, as the informational structure of indicator set directly shapes the resulting country groupings.
      • In evaluation stage, hold constant in fixed TOPSIS rule
        • TOPSIS as evaluation layer to assess country performance within each indicator profile.
        • TOPSIS hold constant across senarios, ensure differences in rankings can be attributed to changes in informational structure rather than decision rule.
          • informational structure <=> indicator profile
      • TO ensures observed differences in rankings can be attributed to informational structure rather than changes in preference modeling or decision logic.
      • The epistemic uncertainty can be managed upstream (through selection and structuring) while downstream (TOPSIS) evaluation remains transparent and comparable.
  • Cluster alternatives and reduce criteria by KP-based approach
    • What's alternatives.
  • Trade-off between maxmizing and minimizing Entropy
  • Uncertainty:
    • quantified by information theory measures
      • Entropy is used to quantify the informational dispersion (量化信息分散程度) of each criterion across countries, serving as a proxy for its potential contribution under certainty.
      • Entropy values for KP opt as value function, by maximizing information content under a capacity constraint.
  • MI
    • employed to measure Informational Redundancy and Dependency between clusters
    • optimize the clustering structure by minimizing overlapping information through a hill-climbing procedure.
  • KP constraints
    • capacity parameter is endogenously determined by balancing the total entropy of selected and unselected indicator sets, ensuring that no single wlefare narrative dominates due to informational imbalance.

A sequential decision process in which different sources of epistemic uncertainty are addressed at distinct methodological stages. (selection and integration of methods follow deliberate analytical logic.)

Alternatives, alternative welfare profile/congfiguration

Entropy-based KP clustering for MCDM

KP

K-means clustering

MI optimization via hill climbing

Integration with TOPSIS and epistemic unertainty

Analysis and discussion

Results

Discussions

Conclusions

Concluding remarks

Limitations and future reaserch