# Integrating Entropy-based Data Reduction and Machine Learning in Multidisciplinary Engineering Systems for Enhanced Response Prediction

## Article information

## Abstract

This research presents a framework that aims to capture and model streamlined design variables in multidisciplinary engineering systems, particularly when uncertainties are present. In multidisciplinary domains, there may be correlated design variables that are surplus, leading to data redundancy and potentially affecting the prediction of system responses. To address this issue, the framework utilizes data reduction techniques based on the correlation degree of random design variables, which are evaluated using an entropy-based correlation coefficient (e). By doing so, the framework enables a more precise prediction of system responses. The data reduction process is dependent on the value of e and employs two distinct approaches. For strong correlations (high e values), feature extraction techniques such as Principal Component Analysis and the Auto-encoder algorithm are applied. On the other hand, for weak correlations, feature selection is implemented using the Independent Features Test. To effectively predict the complex responses of multidisciplinary systems while enhancing computational efficiency, the framework integrates an Artificial Neural Network. The efficacy of the proposed framework is demonstrated through examples, including a cantilever beam with randomly distributed materials and an electro-mechanical stretchable patch antenna.

**Keywords:**Data reduction; Auto-encoder; Uncertainty quantification; Machine learning; Reliability analysis

## 1 Introduction

The future process of developing engineered products necessitates a fusion of technical knowledge from various engineering domains to meet multidisciplinary design criteria accurately. Hence, there is growing interest in multi-physics modeling and simulation techniques, as they are better suited to represent the behavior of multidisciplinary systems governed by multiple physical laws, each with its principles. With the increasing integration of numerous engineering disciplines, multidisciplinary domains yield substantial volumes of data demanding meticulous analysis. To precisely assess the risk and reliability of complex systems, it is imperative to accurately capture and propagate critical input parameters and their associated uncertainties. However, in scenarios where input data possess high dimensions, redundancy emerges, escalating computational complexities in multivariate system analysis [1]. Additionally, the multivariate random input variables of a multidisciplinary engineering system frequently exhibit correlations to depict specific system properties, as shown in Fig. 1. This correlation among input data introduces data redundancy, allowing irrelevant and uncertain data to detrimentally impact the predicted system response [2]. Notably, issues like extensive data processing and errors can further impede prediction model accuracy. Nonetheless, the precise and efficient modeling of correlated, high-dimensional random input variables remains imperative to enable users to derive accurate responses to analyze and design dependable multidisciplinary systems.

Thus, this study introduces an effective data reduction framework designed to faithfully represent the correlated, high-dimensional random variables characterizing complex multidisciplinary systems. Within this framework, we employ feature extraction (*FE*) methods, specifically Principal Component Analysis (*PCA*) and Auto-encoder (*AE*), as initial steps to eliminate redundant data sampling from random input variables [1,3]. Furthermore, we integrate the Independent Features Test (*IFT*) [4] into the proposed method to trim the number of input random variables. A criterion relying on entropy-based correlation coefficients [5] is established to determine whether the input data should undergo *FE* or feature selection (*FS*). Subsequently, we incorporate a copula function [6,7] to model the intricacies of correlated input variables in the framework precisely. The copula function, renowned for its capacity to amalgamate various marginal distributions related to the inputs, is pivotal in recreating a joint distribution of correlated random variables [7]. Lastly, we harness two machine learning techniques, Artificial Neural Network (*ANN*) [8,9] and Probabilistic Neural Network (*PNN*) [10], to alleviate the computational burden associated with predicting the response and reliability of multidisciplinary engineering systems. We validate the efficacy of this proposed framework through a reliability analysis involving multiple disciplines characterized by complex input data. These examples underscore how the framework can accurately forecast the behavior of multidisciplinary engineering systems while minimizing computational overhead. The paper’s structure is as follows: Section 2 outlines the data reduction techniques employed in the proposed framework to simplify the complexity of input random variables. Section 3 introduces surrogate modeling techniques (*ANN* and *PNN*) utilized for response and reliability prediction in the systems. In Section 4, we present a flowchart delineating the proposed data reduction framework. Finally, in Section 5, we apply the framework to two practical multidisciplinary engineering problems to demonstrate its effectiveness and advantages.

## 2 Data Reduction

As different disciplines are integrated, the quantity of observations and variables under examination grows exponentially. This expansion in data volume necessitates the analysis and organization of data in high-dimensional spaces. Consequently, in the realm of multidisciplinary engineering systems analysis, this multivariate data often results in what is commonly referred to as the “curse of dimension”. High dimensions introduce an abundance of features within the data, which can lead to uncertain predictions or overfitting in regression and classification [2]. Furthermore, when these features exhibit high correlations, they may essentially represent the same property, resulting in considerable redundancy and reduced simulation efficiency [1]. Consequently, efficient multivariate data analysis is often imperative to obtain accurate and realistic system responses.

Data reduction techniques can generally be categorized into two primary groups: *FE* and *FS*. The key distinction between these techniques lies in their purpose. *FS* methods are employed to identify the best subset of the original input features, while *FE* methods generate new features within a lower-dimensional space through transformations of the original features. In certain scenarios, both approaches may be employed sequentially. In this study, we propose a guideline (Section 4) to facilitate the optimal selection between *FE* and *FS*. This choice is based on the calculated entropy-based correlation coefficient of the input data.

### 2.1 Feature Extraction

*FE* (Fig. 2) is a technique that transforms high-dimensional data into new features of reduced dimensionality. It allows the conversion of *N* dimensions of original data into a smaller set of *M* dimensions, where *M* is less than *N*. This process involves mapping and serves the purpose of eliminating redundant information from the input dataset. In this section, we delve into two highly effective *FE* methods: *PCA* and *AE*.

#### 2.1.1 Principal Component Analysis (PCA)

*PCA* is a statistical technique utilized for transforming a correlated dataset into an uncorrelated or independent one.

This transformation is achieved through an orthogonal operation, represented as *Y* = *PX*. In this context, *X* denotes the original dataset, while *Y* signifies a restructured representation of *X*. Both *X* and *Y* possess dimensions of *m* (indicating the number of observations or cases) by *n* (representing the number of features), resulting from this linear transformation. The matrix *P*, an *n*-by-*n* transformation matrix, is constituted by columns that correspond to the eigenvectors of *X*′*X*. Fundamentally, data dimensions are reduced using *PCA* through eigenvector decomposition. For instance, one can utilize a covariance matrix, *C** _{Y}*, to derive the

*P*matrix endowed with the orthonormal property [11].

In the Eq. *A* = *XX** ^{T}*,

*A*is a symmetric matrix.

*A*is constructed from the eigenvectors originating from the rows of the

*P*matrix, where

*P*=

*E*

^{T}; consequently, we have

*A*=

*P*

^{T}*DP*.

*A*diagonal matrix,

*D*, can be linked with the eigenvector’s matrix

*E*as

*EDE*

*. Because of the property of orthonormal matrices, the inverse of*

^{T}*P*is expected to be equal to its transpose. Thus, the principal components of

*X*should be the eigenvectors of

*PC*

_{x}= (

*n*– 1)

^{−}

^{1}

*XX*

*, and each diagonal element of*

_{T}*C*

*corresponds to the variance of*

_{Y}*X*.

#### 2.1.2 Auto-encoder (AE)

*AEs* are a special type of *ANN* [4]. *ANN*s are inspired by the human nervous system and can be trained to predict values or patterns of an output based on those recognized from the input. *AE*s aim to reduce the dimensionality of data by transforming it into a compressed representation. As shown in Fig. 3, *AE*s are composed of neural networks with multiple layers. In general, an *AE* has three layers: an input layer, a hidden layer, and an output layer. To transform input data, the data is passed through the hidden layer using a weight function and a bias. The output data is a reconstruction of the original input data. Training an *AE* aims to minimize the difference between the input and output layers. In other words, the training aims to reduce reconstruction’s mean squared error (MSE) [12]. Because *AE*s use a reduced number of hidden units, they can be used for dimensionality reduction. Additionally, *AE*s share the weight values and biases between the previous and subsequent layers [13]. The reconstruction error (*re*) can be calculated using the squared-error cost function:

In this context, *W*, *x*, *b*, and *X* represent the weights, input units, bias, and input vector. The notations *W** ^{T}* and

*b*

^{T}denote the transpose of

*W*and

*b*.

*AE*incorporates a distinctive criterion, such as sparsity, which is closely linked to the reduction of hidden units and, by extension, data reduction. Given that

*AE*essentially aims to have its output data equal to its input data, it becomes evident that

*AE*endeavors to learn an approximation of the identity function. Both

*PCA*and

*AE*will find application within the proposed framework, and the examples presented will elucidate both methods’ advantages and limitations.

### 2.2 Feature Selection

*FS* is identifying a subset from the original high-dimensional data that retains only meaningful features. When dealing with a set of original data, denoted as “*n*”, the goal is to select a finite and informative subset, “*m*”. This transformation process is essentially an algorithm for reducing dimensions. As depicted in Fig. 4, its significance diminishes if the selected subset lacks sufficient information from the original data. In Fig. 4, represent random data and indicate selected data. Various *FS* techniques exist, including those based on mutual information, single-variable classifiers, or even genetic algorithms [1]. However, these methods are often associated with high computational costs, and the analysis of distributions can yield unreliable results when information about all data distributions is incomplete.

In contrast to these approaches, the Independent Features Test (*IFT*) offers a cost-effective and rapid means of eliminating features that do not contribute to describing a given system. *IFT* assumes that the target data is categorical or that all data features fall into one of two classes [4]. This process is necessary to estimate the scoring value of informative features:

Here, *A* and *B* represent datasets corresponding to feature *IFT* values, while *n*_{1} and *n*_{2} refer to the number of features in the respective classes. “*Sig*” denotes a significance value used as a threshold for eliminating less useful data. As shown in Eq. (3), statistical moments such as variance (*var*(.)) and mean (*mean*(.)) are employed to gauge the significance level of the data. Consequently, features can be selected based on the criterion of *IFT* > *sig*. In general, it is recommended that features are considered informative when they exhibit a significance value of 2 or higher [14].

## 3 Surrogate Modeling for Model Prediction

Introducing surrogate modeling techniques offers a promising avenue to reduce the complexity and cost of simulating engineering systems [15]. Extensive research efforts have been devoted to developing efficient classification methods and surrogate modeling techniques, including Artificial Neural Networks (*ANN*) [16], decision trees [17], and discriminant analysis [18], among others. Numerous studies have demonstrated the potential of *ANN* methods as viable alternatives to traditional classification techniques [16]. *ANN* exhibits substantial promise when dealing with decision domains with complex shapes that are challenging to capture accurately. One of the advantages of *ANN* is its applicability in both classification and regression tasks. Probabilistic Neural Networks (*PNN*) [10] can also be harnessed for predicting the class within large datasets. *PNN* holds a distinct advantage over traditional *ANN* models in terms of significantly reducing computational efforts during the training process. Detailed explanations of both *ANN* and *PNN* are provided in the following subsections.

### 3.1 Artificial Neural Network (ANN)

*ANN* can define patterns and identify similarities when confronted with new inputs, whether they involve incomplete information or noisy data [19]. In *ANN*, a single computational neuron comprises input layers, one hidden layer, and one output layer. The input layers receive the input variables as *X* = {*x*_{1}, *x** _{2}*, ...,

*x*

*}. Each input is assigned a weight, denoted as*

_{n}*W*, representing synaptic learning. The neuron’s output in the hidden layer is calculated as:

Here, *f*, *W*, and *b* refer to an activation function, weights, and bias for a single neuron. The commonly used activation function is a sigmoid function, expressed as *f*(*h*) = (1 + exp(-*h*))^(-1), where *h* signifies mapping units. The slope of the sigmoid function determines the proximity to the threshold point, and its output range spans from 0 to 1. A generalized *ANN* model encompasses multiple hidden units and can feature multiple hidden layers based on the complexity of the model. In a generalized *ANN*, the total weighted sum of the inputs can be articulated as:

Here, *W** _{ij}*(

*l*) and b represent a weight and bias between unit

*j*and layer

*l*. The output value of the activation function can be denoted as

*f*

*(*

_{j}*l*). In general, if layer

*l*is the input layer and layer

*n*

*is the output layer, each layer*

_{l}*l*is closely associated with layer

*l*+1.

### 3.2 Probabilistic Neural Network (PNN)

*PNN* was introduced to address the computational complexity associated with *ANN* training [10]. It’s a specialized feedforward neural network that combines the Bayes decision rule with the Parzen nonparametric estimator to manage decision boundaries. This decision rule reduces the “expected risk” of misclassification in pattern classification [10].

The PNN is a non-parametric model, offering advantages in handling complex, non-linear relationships due to its lack of explicit assumptions about data distribution [20]. It exhibits one-shot learning, storing the entire dataset during training, which is efficient for small to medium-sized datasets [21]. The model is robust to data noise, avoiding reliance on assumptions about the distribution’s specific form [22]. However, PNN’s memory-intensive nature, storing the complete training dataset, can be computationally expensive and impractical for large datasets [23]. Additionally, scalability issues arise, limiting PNN’s applicability to datasets with high dimensions or features [9].

*PNN* comprises four layers: the input, pattern, summation, and output layers. As depicted in Fig. 5, in the pattern layer of *PNN*, classification decisions are made based on the dataset *X*. Given Probability Density Functions (PDFs) for different categories *A* and *B*, data set *X* belongs to class *A* if *f** _{A}*(

*X*) >

*f*

*(*

_{B}*X*) for all

*A*≠

*B*, where

*f*

_{A}*(X*) and

*f*

_{B}*(X*) represent the PDFs for class

*A*and

*B*, respectively. The Bayesian optimal decision rule for

*PNN*’s classification is as follows:

where *h** _{A}* and

*h*

*are the prior probabilities of patterns occurring in each class. Decision accuracy relies on the PDF estimate for each class. Typically, a Gaussian kernel is used to represent a multivariate estimate of the class-conditional PDF for each class,*

_{B}Here, *k*, *f** _{k}* (

*X*),

*p*,

*σ*

*, m*, and

*i*represent classes, the summation of multivariate Gaussian distributions, the dimension of the measurement space, the smoothing parameter, the total number of training patterns, and the number of patterns, respectively.

*X*

^{T}*K*

*denotes the*

_{i}*i*

^{th}training pattern from class k.

*PNN*can accommodate training data set

*X*

*, typically with*

^{T}*n*data points, each having

*m*dimensions.

The smoothing parameter affects the placement of training data. As it increases, the degree of interpolation between data points also rises. As depicted in Fig. 5, *PNN*, used for classifying data set *X* into two classes *A* and *B*, comprises four layers: the input, pattern, summation, and output layers. Both *PNN* and *ANN* will be employed in the framework, showcasing the advantages of these methods.

## 4 Proposed Framework

The proposed study aims to address the computational complexity arising from high-dimensional input variables and complex modeling techniques, all while maintaining prediction accuracy in multidisciplinary engineering systems under uncertainty. This section outlines a data reduction framework consisting of three key components: multivariate data generation, data reduction of input data, and representation of multivariate system behavior, as depicted in Fig. 1.

### 4.1 Step 1: Generating Multivariate Data

In multidisciplinary engineering systems, multiple random input variables originate from various disciplines, influencing one another’s behavior. Therefore, the initial focus lies on generating these multivariate data. Copulas are employed for this purpose due to their ability to describe complex nonlinear behavior effectively. Copulas provide a realistic representation of dependencies among numerous random variables. A copula can be defined as a function that links multivariate distribution functions to their one-dimensional marginal distribution functions. For random variables *x* and *y* with marginal distributions represented as *F* and *G*, a joint distribution *J* can be defined as follows:

Here, *C* denotes a copula function (*C:* [0,1]^{2}→[0,1]), *ρ* is the linear correlation coefficient, *h* and *k* are copula parameters and represent the standard univariate Gaussian distribution function. This representation of multivariate input data is the foundation for modeling the complex behavior of multidisciplinary engineering systems in Step 1.

### 4.2 Step 2: Data reduction

During Step 2, the user decides whether to implement *FS* or *FE* to address the uncertainties present in the input data. For those without prior exposure to the problem or familiarity with the data structure, the use of a criterion becomes crucial in guiding the decision-making process between *FS* and FE. Consequently, this research utilizes an entropy-based correlation coefficient (*e*) as the criterion. Through the assessment of the e value, which indicates the degree of correlation within the data, users can thoughtfully select the most suitable data reduction tool.

#### 4.2.1 Entropy and Mutual Information

Numerous research [25–29] in the field of data mining frequently incorporate insights and terminology derived from entropy, rooted in information theory. Entropy functions as a metric for gauging uncertainty or information content within a dataset. In the domain of data mining, entropy manifests several crucial facets:

1) Information Quantity: The presence of information is greater in instances of rare events compared to frequently occurring events.

2) Uncertainty in a Random Variable or Vector: Events that are common or certain contribute to a reduction in uncertainty, impeding the predictability of responses. Consequently, events characterized by uncertainty exhibit higher levels of entropy.

3) Dispersion in Probability Distribution: A diminished dispersion signifies a reduced amount of entropy.

Thus, entropy captures uncertainty, randomness, or redundancy intrinsic to a random variable. Furthermore, entropy has the capacity to incapacitate certain constraints of traditional linear correlation estimation methods [30], particularly vulnerabilities to non-linear correlation or non-Gaussian distributions. Entropy can be expressed by:

Here, *p*(*x** _{i}*) represents the marginal probability of each occurring sample of the random variable

*x*.

Based on *H** _{x}*, mutual information quantifies the amount of information shared between random variables

*x*

*and*

_{i}*y*

*, thereby serving as a potent instrument for evaluating the significance of features and discerning between those possessing rich information and those lacking such attributes [31].*

_{j}Here, *p*(*x** _{i}*)(

*y*

*) denotes the joint probability distribution of*

_{j}*x*

*and*

_{i}*y*

*. In the context of*

_{j}*FE*, mutual information can pinpoint features that contribute the highest amount of information to the target variable. Features exhibiting substantial mutual information can be regarded as indispensable for the analysis [29]. It also effective in recognizing redundant features by quantifying the level of dependence between variables. Features exhibiting low mutual information with the target variable or other features may be deemed redundant and can be omitted during the process of

*FS*[32]. Therefore, the entropy-based approach can establish a solid theoretical basis for differentiation between

*FE*and

*FS*.

#### 4.2.2 Entropy-based Correlation Coefficient

Given that *H** _{XY}* has no a scaled range, the scaled

*H*

*denoted as “*

_{XY}*e*”, which ranges from 0 to 1, directly provides information about the extent of data correlation. Based on Eqs. (10) and (11),

*e*can be calculated as:

The computed *e* value falls within the scaled range of [0, 1]. When *e* is 0, *X* and *Y* are uncorrelated, while *e* = 1 signifies a complete correlation. As a criterion for determining whether to employ *FE* or *FS* for data reduction, the *e* value is used as follows: if e falls between 0 and 0.5, it suggests that the features are uncorrelated, and consequently, *FS* is employed to reduce feature size and redundancy, as illustrated in Fig. 1. Conversely, if e falls between 0.5 and 1, *FE* is chosen due to the high correlation among features. Based on this decision, either *FS* or *FE* processes are carried out in Step 2. In the case of *FS*, we consider *IFT*; alternatively, for FE, both *PCA* and *AE* are taken into account. The advantages of *PCA* and *AE* will be elaborated upon in the examples in Section 5. To ascertain the effectiveness of data reduction in simplifying input features, the concept of redundancy (*r*) can be employed in Step 2 [27]. The redundancy (*r*) concerning each random variable *x* is defined as:

The term “log_{2} *N*” refers to the maximum entropy, with “*N*” representing the total number of samples. Specifically, we will compare the redundancy values between the raw data and the reduced data to assess the effectiveness of the data reduction process. Based on this redundancy comparison, if the redundancy value of the raw data remains higher than that of the reduced data, we should repeat the data reduction process to minimize redundancy further. Therefore, with this proposed criterion, data reduction can be carried out effectively by combining “*e*” with redundancy, leading to reduced computational costs for modeling and predicting system responses.

### 4.3 Step 3: Modeling Multivariate System Behavior

Following the data reduction process in Step 2, the next and final phase involves modeling the multidisciplinary engineering system. Within our proposed framework, we consider using both *ANN* and *PNN*. However, it’s important to note that users can employ their preferred surrogate modeling approach during this step.

As highlighted in Section 3, the rapid training capability of *PNN* makes it a favorable choice compared to traditional *ANN* methods. Nevertheless, *PNN* is typically employed for classification tasks, such as determining system reliability, and may necessitate additional adjustments for function approximation. Consequently, we also incorporate *ANN* as an option for constructing the system’s response surface model.

To specify, when the objective is response prediction, we utilize *ANN* as the surrogate modeling technique. On the other hand, if the task involves reliability estimation with classification, *PNN* is the chosen method.

## 5 Validation Examples

In this section, we will illustrate the effectiveness of our proposed method through three distinct examples. The initial example will demonstrate the application of the proposed approach to a 3-D cantilever beam, subject to the influence of multiple mechanical properties, notably Young’s moduli, which constitute a random field. This example validates our approach’s accuracy in predicting the system’s behavior with precision. The final example involves a stretchable patch antenna, which presents a prototypical scenario of a multidisciplinary system. The flexible antenna is a commonly used sensor worn on the body to observe human activities, assessing changes in frequency [33]. This illustration underscores the method’s ability to realistically design antenna substrates and accurately predict frequency variations across differing thicknesses and displacements.

### 5.1 Cantilever Beam Example

In Fig. 7, a point load of *F* = 1,000 N is applied orthogonally to the end node of a beam. This particular beam exhibits dimensions of L = 4 m in length, H = 0.1 m in height, and W = 0.1 m in width. The beam is discretized into 30 elements, each of equal length. For each element, the Young’s moduli, *E*, are treated as random variables, characterized by a mean value of *μ* = 2.05e11 Pa and a coefficient of variance, *COV*, of 0.1. These Young’s Moduli parameters are modeled as Gaussian random fields, utilizing the Gaussian covariance model. The correlation length value plays a pivotal role, where a higher value indicates a greater correlation within the random field. If the distance between two distinct points exceeds the correlation length, these points are statistically considered nearly uncorrelated. For this example, we will examine two datasets: one with high correlation (*l* = 10 m) and another with low correlation (*l* = 0.1 m). In the latter case, discretizing into 30 elements results in a distance of 0.133 m, which surpasses the correlation length. Consequently, we proceed with two data sets, one correlated and the other uncorrelated, following Gaussian distributions.

#### 5.1.1 Generation of Young’s Moduli

We generate 30 random Young’s moduli using the copula function with Gaussian random fields, incorporating both correlated and uncorrelated parameters. Fig. 8 visually represents these moduli through color scales, illustrating their random field realizations generated by the Gaussian copula function. Subsequently, we construct two datasets containing 1,000 samples for the 30 Young’s moduli, distinguished by their correlated and uncorrelated parameters. We evaluate the redundancy of each dataset using Eq. (13) and summarize the results in Table 1. The correlated dataset exhibits a redundancy value of 7.6828, while the uncorrelated dataset registers a redundancy value of 4.9068. These findings confirm that the correlated dataset shows significantly higher redundancy. To assess the degree of correlation and guide data reduction methods, we calculate “*e*” values of 0.7621 for the correlated data and 0.2517 for the uncorrelated data. These results suggest a preference for *FE* methods in addressing redundancy in correlated data, while *FS* is more appropriate for uncorrelated data (*e* < 0.5).

#### 5.1.2 Data Reduction for Young’ Moduli

Employing *PCA* as the *FE* method, we reduce dimensionality to 4 for correlated data and 25 for uncorrelated data, preserving information effectively. As indicated in Table 1, *PCA* significantly reduces redundancy in correlated data compared to uncorrelated data. Redundancy reduction is assessed using Eq. (13), where *V** _{new}* and

*V*

*represent redundancy values of truncated and original data, respectively.*

_{org}The redundancy reduction effectiveness of *PCA* is compared to that of *AE*. It is initially designed using Eq. (14) to determine the total number of hidden neurons.

In this instance, *AE* reduces the dimension to 17, signifying 17 new data points as significant eigenvectors for reducing the original input data redundancy. The redundancy reduction is obtained for the correlated data, resulting in a 70.31% reduction. Although *AE* slightly lags behind *PCA* for correlated data, it maintains an acceptable 3.91% prediction error, computed using Eq. (14).

#### 5.1.3 Deflection Estimation

An *ANN* model was considered to predict cantilever beam deflection, as Probabilistic Neural Networks (*PNN*) c*ANN*ot perform this task. We design an *ANN* with 21 hidden neurons using Eq. (14). The *ANN*’s network properties align with those of *AE*. After obtaining reduced Young’s moduli through *PCA* and *AE*, the designed *ANN* provides a statistical estimate of tip displacement. We compare the Probability Density Functions (PDFs) of tip displacement calculated from reduced data via *PCA* and *AE* with the PDF derived from the original data (Fig. 4). These results demonstrate a close match between the PDF predictions by *AE* and *PCA* compared to the original displacement PDF. Further comparisons in Table 2 reveal that *AE* exhibits slightly lower accuracy for correlated data than *PCA*. Nevertheless, *AE* maintains an acceptable 3.91% prediction error.

#### 5.1.4 Probabilistic Neural Network for Classification

In the context of uncorrelated data, our objective is to assess the cantilever beam’s reliability concerning structural failure. We propose a classification process employing a *PNN* instead of an *ANN* due to the simplified training process for reliability estimation. In this classification process, we establish a limit state function, *g = R-S*, where *R* represents resistance (0.034 m) and S signifies loading. Two classes, “class A” and “class B”, are generated: “class A” corresponds to data points deemed safe (when g is greater than zero), while “class B” designates data points in the failure region (when g is less than or equal to zero). We conducted a reliability analysis using a simulation model and summarized the results in Table 3. The probability of failure (*P** _{f}*) is estimated using the limit state function, and the second column in Table 3 presents

*P*

*values derived from*

_{f}*PNN*with

*IFT*.

We employ the Monte Carlo Simulation (*MCS*) method to assess accuracy to estimate *P** _{f}* as the ground truth. Comparing

*P*

*values estimated by trained*

_{f}*PNN*with 1,000 samples to those obtained through the

*MCS*method with 10,000 samples reveals a slight error of 4.75% when employing

*IFT*with

*PNN*compared to using

*PNN*with the original data. A similar error of 3.84% emerges when comparing

*P*

*values estimated through the*

_{f}*MCS*method using

*IFT*and original data. These findings indicate that

*IFT FS*marginally reduces accuracy by less than 4%. Moreover, for both scenarios, employing the original input data and data reduced by

*IFT*, the discrepancy between

*P*

*results calculated through the*

_{f}*MCS*and

*PNN*methods remains below approximately 7%, affirming the ability of

*PNN*to accurately predict

*P*

*in cantilever beam reliability analysis, even with*

_{f}*FS*through

*IFT*.

### 5.2 Stretchable Antenna Example

A stretchable patch antenna, depicted in Fig. 10, comprises essential components: substrate, patch, feed line, ground, and source. The fabrication of a substrate with uniform thickness for stretchable antennas is atypical due to current manual engineering practices [34]. Consequently, modeling the substrate necessitates accounting for its variable thickness. Tabulated in Table 4 are details concerning substrate geometry, thickness variations, and properties. The primary objective of analyzing this stretchable patch antenna with variable thickness is to validate its ability to maintain a dependable frequency range during contraction and relaxation. Two key criteria govern an acceptable frequency range.

Firstly, employing mechanical behavior analysis involves subjecting the antenna to deformation, evaluated through a tensile test. Specifically, both ends of the antenna are subjected to tension. The assumption of antenna symmetry, where both ends deform symmetrically from the center, reduces finite element model size, subsequently reducing analysis time and cost. Deformation results are represented as *X* and y coordinates. The stretchable antenna is segmented into 32 sections to reflect thickness variations (Fig. 11(a)), comprising 50 coordinates (Fig. 11(b)). Upon obtaining deformed antenna coordinates from ANSYS deformation analysis, the resonance frequency is calculated using HFSS software based on the deformed mode’s coordinates. In pursuit of stable performance, the stretchable antenna’s frequency should remain within a reliable range (3 dB frequency) under deformation, leading to the rejection of antenna designs that fall outside this suitable range. Based on Table 4, HFSS-calculated resonance frequency changes with antenna deformations of 1, 3.2, and 12 mm were conducted (Fig. 12). As displacement increases, antenna efficiency diminishes due to declining absolute values of the reflection coefficient S11, with resonance frequencies does not stay well-preserved near 2.5 GHz.

#### 5.2.1 Generation of Varying Thickness

Consideration is given to the substrate’s thickness as a random field generated using a Gaussian copula function. The assumed correlation length is 20 mm, facilitating the generation of fairly correlated thickness parameters for patch antenna substrates. A total of 32 different substrate thicknesses are generated, while a patch with a constant thickness of 0.03 mm is modeled. The redundancy of the initial thickness data is quantified at 3.432 using Eq. (12), as detailed in Table 5. Subsequently, 121 distinct displacement values, ranging from 0 to 12 mm, are applied to the antenna’s ends. The *X* and *Y* coordinates for each point, as depicted in Fig. 7(b), are independently estimated from deformation analysis in ANSYS. These data serve as training points for regression using Artificial Neural Networks (*ANN*) to calculate coordinates when applying displacement to an antenna with variable thickness. Consequently, thicknesses are treated as input parameters, while coordinates are assumed as outputs. Prior to *ANN* training, input parameter data reduction is discussed in Section 5.3.2.

#### 5.2.2 Data Reduction of Varying Thickness

The determination of an appropriate data reduction method hinges on calculating the “*e*” value using Eq. (11). An “*e*” value of 0.6300 is computed for the original thickness dataset, indicating the necessity of employing *FE* methods due to its exceeding the 0.5 threshold. Tabulated in Table 6 are redundancy results computed for *AE* and *PCA* methods. The comparison between these results and the original data redundancy values confirms the effective reduction of irrelevance between the original and truncated datasets. To retain 90% of the thickness information, *PCA* selects 14 eigenvalues, while *AE* employs 20 hidden neurons. *AE* outperforms *PCA* by yielding a smaller reconstruction error, attributed to its increased number of hidden neurons.

#### 5.2.3 Artificial Neural Network for Predicting Antenna Deformation

In the subsequent step, an *ANN* is harnessed to predict antenna deformation based on varying thicknesses as input data and coordinates as model responses. Table 5 presents prediction errors for *ANN*, revealing a 6.28% error for *X* coordinates. In contrast, thicknesses used in *PCA* yield an error of 3.27%, while those in *AE* exhibit a 5.41% error. The *ANN*’s inherent error of 7.43% is considered acceptable for *Y* coordinates, akin to *PCA* and *AE* prediction errors. These results underscore the effective data reduction achieved with both *PCA* and *AE*, as evidenced by error values comparable to those of the original data. Furthermore, data reduction techniques lead to error reduction, signifying that reduced data redundancy indeed enhances *ANN* prediction models. Fig. 9 provides a comparison between deformed antenna models predicted by *ANN* and data reduction techniques and models obtained from actual simulations for 1 and 12 mm deformation values, highlighting a strong agreement.

#### 5.2.4 Data Reduction of Each Coordinate for Resonance Frequency Prediction

Upon obtaining coordinates for each antenna point from *ANN* models, all 50 coordinates are employed as input data in HFSS to generate the deformed antenna model. Consequently, 121 distinct resonance frequencies are computed as output data for varying displacement values. Given a dataset of coordinates, the *FS* method is chosen for data reduction because it can identify the most informative variables. This process involves estimating significant coordinates to aid antenna redesign. The selected coordinates enable the assessment of whether the frequency can be classified within a reliable bandwidth. *IFT* generates a new subset by selecting coordinates exceeding the “Significance value” of 3, ensuring high accuracy and selecting the five most significant coordinates. Following the *IFT* application, a reduced dataset emerges with a reduced redundancy of 10.8147, compared to the original dataset's redundancy of 12.5689, signifying a 13.9567% reduction in uncertainty, estimated using Eq. (13). Fig. 15 visually presents the significant coordinates selected by *IFT*.

#### 5.2.5 *PNN* for Classification of Antenna Frequency

In a subsequent phase, antenna reliability is assessed within a 3 dB frequency range, wherein the resonance frequency should remain for reliability during deformation. This range is based on the non-deformed antenna, having a resonance frequency of 2.5 GHz, with a reliable range spanning 2.4849 GHz (m1) to 2.5151 GHz (m2) as a 3 dB frequency range. The limit state function is estimated to facilitate classification, with a resistance or capacity of 2.5 GHz resonance frequency. A resonance frequency within the 3 dB frequency range is essential. If g exceeds 0.0302 (difference between m1 and m2), the stretchable patch antenna system is categorized as class B, necessitating its rejection due to unstable resonance frequency. Monte Carlo Simulation (*MCS*) with 10,000 samples is conducted to calculate the probability of failure value. Results from *MCS* are compared with the Probability Neural Network (*PNN*) model, employing 121 training data samples to train *PNN* and 10,000 samples for accurate, computationally efficient failure probability prediction. Tabulated in Table 6 is a comparison between Pf values derived from *PNN* classification and those obtained from *MCS*, revealing a difference of 8.05%, falling below the 10% error threshold essential for accurate classification. The reduced coordinates yield a Pf value of 0.3016, signifying a 7.37% Pf increase compared to the original coordinates. Furthermore, the Pf value calculated with reduced coordinates closely aligns with that computed via *MCS* (6.98%), indicating accurate *FS* and classification.

#### 5.2.6 Validation of Efficacy of Proposed Framework

The stretchable antenna remains acceptable until it experiences a 3.1 mm displacement while maintaining a resonance frequency of 2.5 GHz. Beyond this point, frequency variation becomes evident, with a maximum displacement of 12 mm, resulting in a resonance frequency of 3.2 GHz. Consequently, limiting the maximum allowable displacement to 3.1 mm is recommended. The final step involves comparing resonance frequencies between original coordinates and new coordinates derived from *FE* and *FS*. Fig. 16 demonstrates that *FE* yields higher frequency accuracy than *FS* due to greater absolute values of each S11 calculated through *FE* methods than those obtained through *FS*. This discrepancy arises because *FS* eliminates uninformative coordinates, whereas *FE* retains principal components and reconstructs coordinates to match the original dataset’s dimensions. Despite reduced antenna efficiency under displacement due to resonance frequency variations, the proposed method accurately predicts resonance frequencies and enhances S11 values.

## 6 Conclusion

This research developed an efficient framework for reducing input parameter dimensions and accurately predicting responses. In multidisciplinary engineering systems, input parameters are often correlated. The comprehension of multi-physics engineering system behavior relies on a substantial dataset. Therefore, it is crucial to extract the most refined and informative data. By eliminating uncertainty and redundancy from the data, the prediction or classification model’s accuracy can be assured. Consequently, the proposed framework provides effective guidance for users lacking expertise in data analysis, a prerequisite for analyzing multi-physics engineering systems.

In this research, the copula function was employed to demonstrate these interdependencies, resulting in more realistic modeling and precise response predictions.

The framework introduced data reduction techniques like *PCA*, *AE* for FE, and *IFT* for *FS*, enhancing performance prediction accuracy. An entropy-based correlation coefficient (“*e*”) was used to decide between *FE* and selection based on input parameter correlations.

After data reduction, *ANN* and *PNN* were used to estimate responses and enhance computational efficiency during simulations. The framework’s efficacy was demonstrated through three engineering examples. *PCA* and *AE* effectively reduced data complexity without significant information loss in cases with high correlation, as indicated by the “*e*” criterion. Redundancy reduction was confirmed across all datasets using *FS* or extraction guided by the “*e*” criterion.

Prediction errors indicated that reduced data with low redundancy yielded reliable results. *PNN* and *MCS* showed that the framework achieved accurate classifications. Notably, the stretchable antenna example revealed that increased dimensionality in predicting resonance frequency resulted from electrical and mechanical properties. The framework effectively reduced data in mechanical and electrical analyses of multidisciplinary problems.

These findings underscore the framework’s importance in addressing multidisciplinary engineering challenges, enabling efficient modeling in critical engineering applications involving multi-physics and uncertainty, such as stretchable electronics.

#### List of Symbols

e

Entropy-based Correlation Coefficient

PCA

Principal Component Analysis

AE

Auto-encoder

IFT

Independent Features Test

J

Joint distribution

C

Copula

H

Entropy

FE

Feature Extraction

FS

Feature Selection

## References

## Biography

Sungkun Hwang earned his Ph.D. in mechanical engineering from the Georgia Institute of Technology in Atlanta, USA. His research includes reliability and probability-based mechanical design, multidisciplinary design optimization, and data analysis under uncertainties. Currently, he is employed in mechanical research and development at Samsung Electronics.

**Seung-Kyum Choi** received the Ph.D. degree in Mechanical and Materials Engineering from Wright State University. He is an Associate Professor in School of Mechanical Engineering at Georgia Institute of Technology. He is an author of a graduate-level book on the topics of probabilistic mechanics and reliability-based design optimization (Reliability-based Structural Design, Springer, 2007). He served as a chair and session organizer at national conferences of AIAA, SDM, MDO, NDA and ASME/IDETC, in addition to being a chair of the ASME Advanced Modeling & Simulation Technical Committee. He is currently an associate editor of ASME Journal of Computing and Information Science in Engineering, and Journal of Computational Design and Engineering. Dr. Choi's research interests include structural reliability, probabilistic mechanics, statistical approaches to design of structural systems, multidisciplinary design optimization, and AI-enabled modeling & simulation for complex engineered systems. Dr. Choi is currently appointed the Director of Center for Additive Manufacturing Systems (CAMS), where he has responsibilities for developing research and education programs in additive manufacturing.

**Eun-Ho Lee** is a faculty member of the School of Mechanical Engineering at Sungkyunkwan University in South Korea. He received his Ph.D. degree from the Mechanical Engineering Department of KAIST in 2015 and worked at General Motors R&D (Warren, MI) and Samsung Electronics (Suwon, Korea). His research fields include intelligent manufacturing, semiconductor/packaging manufacturing, and intelligent monitoring.