### 1 Introduction

*FE*) methods, specifically Principal Component Analysis (

*PCA*) and Auto-encoder (

*AE*), as initial steps to eliminate redundant data sampling from random input variables [1,3]. Furthermore, we integrate the Independent Features Test (

*IFT*) [4] into the proposed method to trim the number of input random variables. A criterion relying on entropy-based correlation coefficients [5] is established to determine whether the input data should undergo

*FE*or feature selection (

*FS*). Subsequently, we incorporate a copula function [6,7] to model the intricacies of correlated input variables in the framework precisely. The copula function, renowned for its capacity to amalgamate various marginal distributions related to the inputs, is pivotal in recreating a joint distribution of correlated random variables [7]. Lastly, we harness two machine learning techniques, Artificial Neural Network (

*ANN*) [8,9] and Probabilistic Neural Network (

*PNN*) [10], to alleviate the computational burden associated with predicting the response and reliability of multidisciplinary engineering systems. We validate the efficacy of this proposed framework through a reliability analysis involving multiple disciplines characterized by complex input data. These examples underscore how the framework can accurately forecast the behavior of multidisciplinary engineering systems while minimizing computational overhead. The paper’s structure is as follows: Section 2 outlines the data reduction techniques employed in the proposed framework to simplify the complexity of input random variables. Section 3 introduces surrogate modeling techniques (

*ANN*and

*PNN*) utilized for response and reliability prediction in the systems. In Section 4, we present a flowchart delineating the proposed data reduction framework. Finally, in Section 5, we apply the framework to two practical multidisciplinary engineering problems to demonstrate its effectiveness and advantages.

### 2 Data Reduction

*FE*and

*FS*. The key distinction between these techniques lies in their purpose.

*FS*methods are employed to identify the best subset of the original input features, while

*FE*methods generate new features within a lower-dimensional space through transformations of the original features. In certain scenarios, both approaches may be employed sequentially. In this study, we propose a guideline (Section 4) to facilitate the optimal selection between

*FE*and

*FS*. This choice is based on the calculated entropy-based correlation coefficient of the input data.

### 2.1 Feature Extraction

*FE*(Fig. 2) is a technique that transforms high-dimensional data into new features of reduced dimensionality. It allows the conversion of

*N*dimensions of original data into a smaller set of

*M*dimensions, where

*M*is less than

*N*. This process involves mapping and serves the purpose of eliminating redundant information from the input dataset. In this section, we delve into two highly effective

*FE*methods:

*PCA*and

*AE*.

#### 2.1.1 Principal Component Analysis (PCA)

*PCA*is a statistical technique utilized for transforming a correlated dataset into an uncorrelated or independent one.

*Y*=

*PX*. In this context,

*X*denotes the original dataset, while

*Y*signifies a restructured representation of

*X*. Both

*X*and

*Y*possess dimensions of

*m*(indicating the number of observations or cases) by

*n*(representing the number of features), resulting from this linear transformation. The matrix

*P*, an

*n*-by-

*n*transformation matrix, is constituted by columns that correspond to the eigenvectors of

*X*′

*X*. Fundamentally, data dimensions are reduced using

*PCA*through eigenvector decomposition. For instance, one can utilize a covariance matrix,

*C*

*, to derive the*

_{Y}*P*matrix endowed with the orthonormal property [11].

*A*=

*XX*

*,*

^{T}*A*is a symmetric matrix.

*A*is constructed from the eigenvectors originating from the rows of the

*P*matrix, where

*P*=

*E*

^{T}; consequently, we have

*A*=

*P*

^{T}*DP*.

*A*diagonal matrix,

*D*, can be linked with the eigenvector’s matrix

*E*as

*EDE*

*. Because of the property of orthonormal matrices, the inverse of*

^{T}*P*is expected to be equal to its transpose. Thus, the principal components of

*X*should be the eigenvectors of

*PC*

_{x}= (

*n*– 1)

^{−}

^{1}

*XX*

*, and each diagonal element of*

_{T}*C*

*corresponds to the variance of*

_{Y}*X*.

#### 2.1.2 Auto-encoder (AE)

*AEs*are a special type of

*ANN*[4].

*ANN*s are inspired by the human nervous system and can be trained to predict values or patterns of an output based on those recognized from the input.

*AE*s aim to reduce the dimensionality of data by transforming it into a compressed representation. As shown in Fig. 3,

*AE*s are composed of neural networks with multiple layers. In general, an

*AE*has three layers: an input layer, a hidden layer, and an output layer. To transform input data, the data is passed through the hidden layer using a weight function and a bias. The output data is a reconstruction of the original input data. Training an

*AE*aims to minimize the difference between the input and output layers. In other words, the training aims to reduce reconstruction’s mean squared error (MSE) [12]. Because

*AE*s use a reduced number of hidden units, they can be used for dimensionality reduction. Additionally,

*AE*s share the weight values and biases between the previous and subsequent layers [13]. The reconstruction error (

*re*) can be calculated using the squared-error cost function:

*W*,

*x*,

*b*, and

*X*represent the weights, input units, bias, and input vector. The notations

*W*

*and*

^{T}*b*

^{T}denote the transpose of

*W*and

*b*.

*AE*incorporates a distinctive criterion, such as sparsity, which is closely linked to the reduction of hidden units and, by extension, data reduction. Given that

*AE*essentially aims to have its output data equal to its input data, it becomes evident that

*AE*endeavors to learn an approximation of the identity function. Both

*PCA*and

*AE*will find application within the proposed framework, and the examples presented will elucidate both methods’ advantages and limitations.

### 2.2 Feature Selection

*FS*is identifying a subset from the original high-dimensional data that retains only meaningful features. When dealing with a set of original data, denoted as “

*n*”, the goal is to select a finite and informative subset, “

*m*”. This transformation process is essentially an algorithm for reducing dimensions. As depicted in Fig. 4, its significance diminishes if the selected subset lacks sufficient information from the original data. In Fig. 4, represent random data and indicate selected data. Various

*FS*techniques exist, including those based on mutual information, single-variable classifiers, or even genetic algorithms [1]. However, these methods are often associated with high computational costs, and the analysis of distributions can yield unreliable results when information about all data distributions is incomplete.

*IFT*) offers a cost-effective and rapid means of eliminating features that do not contribute to describing a given system.

*IFT*assumes that the target data is categorical or that all data features fall into one of two classes [4]. This process is necessary to estimate the scoring value of informative features:

*A*and

*B*represent datasets corresponding to feature

*IFT*values, while

*n*

_{1}and

*n*

_{2}refer to the number of features in the respective classes. “

*Sig*” denotes a significance value used as a threshold for eliminating less useful data. As shown in Eq. (3), statistical moments such as variance (

*var*(.)) and mean (

*mean*(.)) are employed to gauge the significance level of the data. Consequently, features can be selected based on the criterion of

*IFT*>

*sig*. In general, it is recommended that features are considered informative when they exhibit a significance value of 2 or higher [14].

### 3 Surrogate Modeling for Model Prediction

*ANN*) [16], decision trees [17], and discriminant analysis [18], among others. Numerous studies have demonstrated the potential of

*ANN*methods as viable alternatives to traditional classification techniques [16].

*ANN*exhibits substantial promise when dealing with decision domains with complex shapes that are challenging to capture accurately. One of the advantages of

*ANN*is its applicability in both classification and regression tasks. Probabilistic Neural Networks (

*PNN*) [10] can also be harnessed for predicting the class within large datasets.

*PNN*holds a distinct advantage over traditional

*ANN*models in terms of significantly reducing computational efforts during the training process. Detailed explanations of both

*ANN*and

*PNN*are provided in the following subsections.

### 3.1 Artificial Neural Network (ANN)

*ANN*can define patterns and identify similarities when confronted with new inputs, whether they involve incomplete information or noisy data [19]. In

*ANN*, a single computational neuron comprises input layers, one hidden layer, and one output layer. The input layers receive the input variables as

*X*= {

*x*

_{1},

*x*

*, ...,*

_{2}*x*

*}. Each input is assigned a weight, denoted as*

_{n}*W*, representing synaptic learning. The neuron’s output in the hidden layer is calculated as:

*f*,

*W*, and

*b*refer to an activation function, weights, and bias for a single neuron. The commonly used activation function is a sigmoid function, expressed as

*f*(

*h*) = (1 + exp(-

*h*))^(-1), where

*h*signifies mapping units. The slope of the sigmoid function determines the proximity to the threshold point, and its output range spans from 0 to 1. A generalized

*ANN*model encompasses multiple hidden units and can feature multiple hidden layers based on the complexity of the model. In a generalized

*ANN*, the total weighted sum of the inputs can be articulated as:

*W*

*(*

_{ij}*l*) and b represent a weight and bias between unit

*j*and layer

*l*. The output value of the activation function can be denoted as

*f*

*(*

_{j}*l*). In general, if layer

*l*is the input layer and layer

*n*

*is the output layer, each layer*

_{l}*l*is closely associated with layer

*l*+1.

### 3.2 Probabilistic Neural Network (PNN)

*PNN*was introduced to address the computational complexity associated with

*ANN*training [10]. It’s a specialized feedforward neural network that combines the Bayes decision rule with the Parzen nonparametric estimator to manage decision boundaries. This decision rule reduces the “expected risk” of misclassification in pattern classification [10].

*PNN*comprises four layers: the input, pattern, summation, and output layers. As depicted in Fig. 5, in the pattern layer of

*PNN*, classification decisions are made based on the dataset

*X*. Given Probability Density Functions (PDFs) for different categories

*A*and

*B*, data set

*X*belongs to class

*A*if

*f*

*(*

_{A}*X*) >

*f*

*(*

_{B}*X*) for all

*A*≠

*B*, where

*f*

_{A}*(X*) and

*f*

_{B}*(X*) represent the PDFs for class

*A*and

*B*, respectively. The Bayesian optimal decision rule for

*PNN*’s classification is as follows:

*h*

*and*

_{A}*h*

*are the prior probabilities of patterns occurring in each class. Decision accuracy relies on the PDF estimate for each class. Typically, a Gaussian kernel is used to represent a multivariate estimate of the class-conditional PDF for each class,*

_{B}*k*,

*f*

*(*

_{k}*X*),

*p*,

*σ*

*, m*, and

*i*represent classes, the summation of multivariate Gaussian distributions, the dimension of the measurement space, the smoothing parameter, the total number of training patterns, and the number of patterns, respectively.

*X*

^{T}*K*

*denotes the*

_{i}*i*

^{th}training pattern from class k.

*PNN*can accommodate training data set

*X*

*, typically with*

^{T}*n*data points, each having

*m*dimensions.

*PNN*, used for classifying data set

*X*into two classes

*A*and

*B*, comprises four layers: the input, pattern, summation, and output layers. Both

*PNN*and

*ANN*will be employed in the framework, showcasing the advantages of these methods.

### 4 Proposed Framework

### 4.1 Step 1: Generating Multivariate Data

*x*and

*y*with marginal distributions represented as

*F*and

*G*, a joint distribution

*J*can be defined as follows:

*C*denotes a copula function (

*C:*[0,1]

^{2}→[0,1]),

*ρ*is the linear correlation coefficient,

*h*and

*k*are copula parameters and represent the standard univariate Gaussian distribution function. This representation of multivariate input data is the foundation for modeling the complex behavior of multidisciplinary engineering systems in Step 1.

### 4.2 Step 2: Data reduction

*FS*or

*FE*to address the uncertainties present in the input data. For those without prior exposure to the problem or familiarity with the data structure, the use of a criterion becomes crucial in guiding the decision-making process between

*FS*and FE. Consequently, this research utilizes an entropy-based correlation coefficient (

*e*) as the criterion. Through the assessment of the e value, which indicates the degree of correlation within the data, users can thoughtfully select the most suitable data reduction tool.

#### 4.2.1 Entropy and Mutual Information

1) Information Quantity: The presence of information is greater in instances of rare events compared to frequently occurring events.

2) Uncertainty in a Random Variable or Vector: Events that are common or certain contribute to a reduction in uncertainty, impeding the predictability of responses. Consequently, events characterized by uncertainty exhibit higher levels of entropy.

3) Dispersion in Probability Distribution: A diminished dispersion signifies a reduced amount of entropy.

*p*(

*x*

*) represents the marginal probability of each occurring sample of the random variable*

_{i}*x*.

*H*

*, mutual information quantifies the amount of information shared between random variables*

_{x}*x*

*and*

_{i}*y*

*, thereby serving as a potent instrument for evaluating the significance of features and discerning between those possessing rich information and those lacking such attributes [31].*

_{j}*p*(

*x*

*)(*

_{i}*y*

*) denotes the joint probability distribution of*

_{j}*x*

*and*

_{i}*y*

*. In the context of*

_{j}*FE*, mutual information can pinpoint features that contribute the highest amount of information to the target variable. Features exhibiting substantial mutual information can be regarded as indispensable for the analysis [29]. It also effective in recognizing redundant features by quantifying the level of dependence between variables. Features exhibiting low mutual information with the target variable or other features may be deemed redundant and can be omitted during the process of

*FS*[32]. Therefore, the entropy-based approach can establish a solid theoretical basis for differentiation between

*FE*and

*FS*.

#### 4.2.2 Entropy-based Correlation Coefficient

*H*

*has no a scaled range, the scaled*

_{XY}*H*

*denoted as “*

_{XY}*e*”, which ranges from 0 to 1, directly provides information about the extent of data correlation. Based on Eqs. (10) and (11),

*e*can be calculated as:

*e*value falls within the scaled range of [0, 1]. When

*e*is 0,

*X*and

*Y*are uncorrelated, while

*e*= 1 signifies a complete correlation. As a criterion for determining whether to employ

*FE*or

*FS*for data reduction, the

*e*value is used as follows: if e falls between 0 and 0.5, it suggests that the features are uncorrelated, and consequently,

*FS*is employed to reduce feature size and redundancy, as illustrated in Fig. 1. Conversely, if e falls between 0.5 and 1,

*FE*is chosen due to the high correlation among features. Based on this decision, either

*FS*or

*FE*processes are carried out in Step 2. In the case of

*FS*, we consider

*IFT*; alternatively, for FE, both

*PCA*and

*AE*are taken into account. The advantages of

*PCA*and

*AE*will be elaborated upon in the examples in Section 5. To ascertain the effectiveness of data reduction in simplifying input features, the concept of redundancy (

*r*) can be employed in Step 2 [27]. The redundancy (

*r*) concerning each random variable

*x*is defined as:

_{2}

*N*” refers to the maximum entropy, with “

*N*” representing the total number of samples. Specifically, we will compare the redundancy values between the raw data and the reduced data to assess the effectiveness of the data reduction process. Based on this redundancy comparison, if the redundancy value of the raw data remains higher than that of the reduced data, we should repeat the data reduction process to minimize redundancy further. Therefore, with this proposed criterion, data reduction can be carried out effectively by combining “

*e*” with redundancy, leading to reduced computational costs for modeling and predicting system responses.

### 4.3 Step 3: Modeling Multivariate System Behavior

*ANN*and

*PNN*. However, it’s important to note that users can employ their preferred surrogate modeling approach during this step.

*PNN*makes it a favorable choice compared to traditional

*ANN*methods. Nevertheless,

*PNN*is typically employed for classification tasks, such as determining system reliability, and may necessitate additional adjustments for function approximation. Consequently, we also incorporate

*ANN*as an option for constructing the system’s response surface model.

*ANN*as the surrogate modeling technique. On the other hand, if the task involves reliability estimation with classification,

*PNN*is the chosen method.

### 5 Validation Examples

### 5.1 Cantilever Beam Example

*F*= 1,000 N is applied orthogonally to the end node of a beam. This particular beam exhibits dimensions of L = 4 m in length, H = 0.1 m in height, and W = 0.1 m in width. The beam is discretized into 30 elements, each of equal length. For each element, the Young’s moduli,

*E*, are treated as random variables, characterized by a mean value of

*μ*= 2.05e11 Pa and a coefficient of variance,

*COV*, of 0.1. These Young’s Moduli parameters are modeled as Gaussian random fields, utilizing the Gaussian covariance model. The correlation length value plays a pivotal role, where a higher value indicates a greater correlation within the random field. If the distance between two distinct points exceeds the correlation length, these points are statistically considered nearly uncorrelated. For this example, we will examine two datasets: one with high correlation (

*l*= 10 m) and another with low correlation (

*l*= 0.1 m). In the latter case, discretizing into 30 elements results in a distance of 0.133 m, which surpasses the correlation length. Consequently, we proceed with two data sets, one correlated and the other uncorrelated, following Gaussian distributions.

#### 5.1.1 Generation of Young’s Moduli

*e*” values of 0.7621 for the correlated data and 0.2517 for the uncorrelated data. These results suggest a preference for

*FE*methods in addressing redundancy in correlated data, while

*FS*is more appropriate for uncorrelated data (

*e*< 0.5).

#### 5.1.2 Data Reduction for Young’ Moduli

*PCA*as the

*FE*method, we reduce dimensionality to 4 for correlated data and 25 for uncorrelated data, preserving information effectively. As indicated in Table 1,

*PCA*significantly reduces redundancy in correlated data compared to uncorrelated data. Redundancy reduction is assessed using Eq. (13), where

*V*

*and*

_{new}*V*

*represent redundancy values of truncated and original data, respectively.*

_{org}*PCA*is compared to that of

*AE*. It is initially designed using Eq. (14) to determine the total number of hidden neurons.

*AE*reduces the dimension to 17, signifying 17 new data points as significant eigenvectors for reducing the original input data redundancy. The redundancy reduction is obtained for the correlated data, resulting in a 70.31% reduction. Although

*AE*slightly lags behind

*PCA*for correlated data, it maintains an acceptable 3.91% prediction error, computed using Eq. (14).

#### 5.1.3 Deflection Estimation

*ANN*model was considered to predict cantilever beam deflection, as Probabilistic Neural Networks (

*PNN*) c

*ANN*ot perform this task. We design an

*ANN*with 21 hidden neurons using Eq. (14). The

*ANN*’s network properties align with those of

*AE*. After obtaining reduced Young’s moduli through

*PCA*and

*AE*, the designed

*ANN*provides a statistical estimate of tip displacement. We compare the Probability Density Functions (PDFs) of tip displacement calculated from reduced data via

*PCA*and

*AE*with the PDF derived from the original data (Fig. 4). These results demonstrate a close match between the PDF predictions by

*AE*and

*PCA*compared to the original displacement PDF. Further comparisons in Table 2 reveal that

*AE*exhibits slightly lower accuracy for correlated data than

*PCA*. Nevertheless,

*AE*maintains an acceptable 3.91% prediction error.

#### 5.1.4 Probabilistic Neural Network for Classification

*PNN*instead of an

*ANN*due to the simplified training process for reliability estimation. In this classification process, we establish a limit state function,

*g = R-S*, where

*R*represents resistance (0.034 m) and S signifies loading. Two classes, “class A” and “class B”, are generated: “class A” corresponds to data points deemed safe (when g is greater than zero), while “class B” designates data points in the failure region (when g is less than or equal to zero). We conducted a reliability analysis using a simulation model and summarized the results in Table 3. The probability of failure (

*P*

*) is estimated using the limit state function, and the second column in Table 3 presents*

_{f}*P*

*values derived from*

_{f}*PNN*with

*IFT*.

*MCS*) method to assess accuracy to estimate

*P*

*as the ground truth. Comparing*

_{f}*P*

*values estimated by trained*

_{f}*PNN*with 1,000 samples to those obtained through the

*MCS*method with 10,000 samples reveals a slight error of 4.75% when employing

*IFT*with

*PNN*compared to using

*PNN*with the original data. A similar error of 3.84% emerges when comparing

*P*

*values estimated through the*

_{f}*MCS*method using

*IFT*and original data. These findings indicate that

*IFT FS*marginally reduces accuracy by less than 4%. Moreover, for both scenarios, employing the original input data and data reduced by

*IFT*, the discrepancy between

*P*

*results calculated through the*

_{f}*MCS*and

*PNN*methods remains below approximately 7%, affirming the ability of

*PNN*to accurately predict

*P*

*in cantilever beam reliability analysis, even with*

_{f}*FS*through

*IFT*.

### 5.2 Stretchable Antenna Example

*X*and y coordinates. The stretchable antenna is segmented into 32 sections to reflect thickness variations (Fig. 11(a)), comprising 50 coordinates (Fig. 11(b)). Upon obtaining deformed antenna coordinates from ANSYS deformation analysis, the resonance frequency is calculated using HFSS software based on the deformed mode’s coordinates. In pursuit of stable performance, the stretchable antenna’s frequency should remain within a reliable range (3 dB frequency) under deformation, leading to the rejection of antenna designs that fall outside this suitable range. Based on Table 4, HFSS-calculated resonance frequency changes with antenna deformations of 1, 3.2, and 12 mm were conducted (Fig. 12). As displacement increases, antenna efficiency diminishes due to declining absolute values of the reflection coefficient S11, with resonance frequencies does not stay well-preserved near 2.5 GHz.

#### 5.2.1 Generation of Varying Thickness

*X*and

*Y*coordinates for each point, as depicted in Fig. 7(b), are independently estimated from deformation analysis in ANSYS. These data serve as training points for regression using Artificial Neural Networks (

*ANN*) to calculate coordinates when applying displacement to an antenna with variable thickness. Consequently, thicknesses are treated as input parameters, while coordinates are assumed as outputs. Prior to

*ANN*training, input parameter data reduction is discussed in Section 5.3.2.

#### 5.2.2 Data Reduction of Varying Thickness

*e*” value using Eq. (11). An “

*e*” value of 0.6300 is computed for the original thickness dataset, indicating the necessity of employing

*FE*methods due to its exceeding the 0.5 threshold. Tabulated in Table 6 are redundancy results computed for

*AE*and

*PCA*methods. The comparison between these results and the original data redundancy values confirms the effective reduction of irrelevance between the original and truncated datasets. To retain 90% of the thickness information,

*PCA*selects 14 eigenvalues, while

*AE*employs 20 hidden neurons.

*AE*outperforms

*PCA*by yielding a smaller reconstruction error, attributed to its increased number of hidden neurons.

#### 5.2.3 Artificial Neural Network for Predicting Antenna Deformation

*ANN*is harnessed to predict antenna deformation based on varying thicknesses as input data and coordinates as model responses. Table 5 presents prediction errors for

*ANN*, revealing a 6.28% error for

*X*coordinates. In contrast, thicknesses used in

*PCA*yield an error of 3.27%, while those in

*AE*exhibit a 5.41% error. The

*ANN*’s inherent error of 7.43% is considered acceptable for

*Y*coordinates, akin to

*PCA*and

*AE*prediction errors. These results underscore the effective data reduction achieved with both

*PCA*and

*AE*, as evidenced by error values comparable to those of the original data. Furthermore, data reduction techniques lead to error reduction, signifying that reduced data redundancy indeed enhances

*ANN*prediction models. Fig. 9 provides a comparison between deformed antenna models predicted by

*ANN*and data reduction techniques and models obtained from actual simulations for 1 and 12 mm deformation values, highlighting a strong agreement.

#### 5.2.4 Data Reduction of Each Coordinate for Resonance Frequency Prediction

*ANN*models, all 50 coordinates are employed as input data in HFSS to generate the deformed antenna model. Consequently, 121 distinct resonance frequencies are computed as output data for varying displacement values. Given a dataset of coordinates, the

*FS*method is chosen for data reduction because it can identify the most informative variables. This process involves estimating significant coordinates to aid antenna redesign. The selected coordinates enable the assessment of whether the frequency can be classified within a reliable bandwidth.

*IFT*generates a new subset by selecting coordinates exceeding the “Significance value” of 3, ensuring high accuracy and selecting the five most significant coordinates. Following the

*IFT*application, a reduced dataset emerges with a reduced redundancy of 10.8147, compared to the original dataset's redundancy of 12.5689, signifying a 13.9567% reduction in uncertainty, estimated using Eq. (13). Fig. 15 visually presents the significant coordinates selected by

*IFT*.

#### 5.2.5 *PNN* for Classification of Antenna Frequency

*MCS*) with 10,000 samples is conducted to calculate the probability of failure value. Results from

*MCS*are compared with the Probability Neural Network (

*PNN*) model, employing 121 training data samples to train

*PNN*and 10,000 samples for accurate, computationally efficient failure probability prediction. Tabulated in Table 6 is a comparison between Pf values derived from

*PNN*classification and those obtained from

*MCS*, revealing a difference of 8.05%, falling below the 10% error threshold essential for accurate classification. The reduced coordinates yield a Pf value of 0.3016, signifying a 7.37% Pf increase compared to the original coordinates. Furthermore, the Pf value calculated with reduced coordinates closely aligns with that computed via

*MCS*(6.98%), indicating accurate

*FS*and classification.

#### 5.2.6 Validation of Efficacy of Proposed Framework

*FE*and

*FS*. Fig. 16 demonstrates that

*FE*yields higher frequency accuracy than

*FS*due to greater absolute values of each S11 calculated through

*FE*methods than those obtained through

*FS*. This discrepancy arises because

*FS*eliminates uninformative coordinates, whereas

*FE*retains principal components and reconstructs coordinates to match the original dataset’s dimensions. Despite reduced antenna efficiency under displacement due to resonance frequency variations, the proposed method accurately predicts resonance frequencies and enhances S11 values.

### 6 Conclusion

*PCA*,

*AE*for FE, and

*IFT*for

*FS*, enhancing performance prediction accuracy. An entropy-based correlation coefficient (“

*e*”) was used to decide between

*FE*and selection based on input parameter correlations.

*ANN*and

*PNN*were used to estimate responses and enhance computational efficiency during simulations. The framework’s efficacy was demonstrated through three engineering examples.

*PCA*and

*AE*effectively reduced data complexity without significant information loss in cases with high correlation, as indicated by the “

*e*” criterion. Redundancy reduction was confirmed across all datasets using

*FS*or extraction guided by the “

*e*” criterion.

*PNN*and

*MCS*showed that the framework achieved accurate classifications. Notably, the stretchable antenna example revealed that increased dimensionality in predicting resonance frequency resulted from electrical and mechanical properties. The framework effectively reduced data in mechanical and electrical analyses of multidisciplinary problems.