Hwang, Lee, and Choi: Integrating Entropy-based Data Reduction and Machine Learning in Multidisciplinary Engineering Systems for Enhanced Response Prediction
Abstract
This research presents a framework that aims to capture and model streamlined design variables in multidisciplinary engineering systems, particularly when uncertainties are present. In multidisciplinary domains, there may be correlated design variables that are surplus, leading to data redundancy and potentially affecting the prediction of system responses. To address this issue, the framework utilizes data reduction techniques based on the correlation degree of random design variables, which are evaluated using an entropy-based correlation coefficient (e). By doing so, the framework enables a more precise prediction of system responses. The data reduction process is dependent on the value of e and employs two distinct approaches. For strong correlations (high e values), feature extraction techniques such as Principal Component Analysis and the Auto-encoder algorithm are applied. On the other hand, for weak correlations, feature selection is implemented using the Independent Features Test. To effectively predict the complex responses of multidisciplinary systems while enhancing computational efficiency, the framework integrates an Artificial Neural Network. The efficacy of the proposed framework is demonstrated through examples, including a cantilever beam with randomly distributed materials and an electro-mechanical stretchable patch antenna.
Keywords: Data reduction · Auto-encoder · Uncertainty quantification · Machine learning · Reliability analysis
List of Symbols
Entropy-based Correlation Coefficient
Principal Component Analysis
Independent Features Test
1 Introduction
The future process of developing engineered products necessitates a fusion of technical knowledge from various engineering domains to meet multidisciplinary design criteria accurately. Hence, there is growing interest in multi-physics modeling and simulation techniques, as they are better suited to represent the behavior of multidisciplinary systems governed by multiple physical laws, each with its principles. With the increasing integration of numerous engineering disciplines, multidisciplinary domains yield substantial volumes of data demanding meticulous analysis. To precisely assess the risk and reliability of complex systems, it is imperative to accurately capture and propagate critical input parameters and their associated uncertainties. However, in scenarios where input data possess high dimensions, redundancy emerges, escalating computational complexities in multivariate system analysis [ 1]. Additionally, the multivariate random input variables of a multidisciplinary engineering system frequently exhibit correlations to depict specific system properties, as shown in Fig. 1. This correlation among input data introduces data redundancy, allowing irrelevant and uncertain data to detrimentally impact the predicted system response [ 2]. Notably, issues like extensive data processing and errors can further impede prediction model accuracy. Nonetheless, the precise and efficient modeling of correlated, high-dimensional random input variables remains imperative to enable users to derive accurate responses to analyze and design dependable multidisciplinary systems.
Thus, this study introduces an effective data reduction framework designed to faithfully represent the correlated, high-dimensional random variables characterizing complex multidisciplinary systems. Within this framework, we employ feature extraction ( FE) methods, specifically Principal Component Analysis ( PCA) and Auto-encoder ( AE), as initial steps to eliminate redundant data sampling from random input variables [ 1, 3]. Furthermore, we integrate the Independent Features Test ( IFT) [ 4] into the proposed method to trim the number of input random variables. A criterion relying on entropy-based correlation coefficients [ 5] is established to determine whether the input data should undergo FE or feature selection ( FS). Subsequently, we incorporate a copula function [ 6, 7] to model the intricacies of correlated input variables in the framework precisely. The copula function, renowned for its capacity to amalgamate various marginal distributions related to the inputs, is pivotal in recreating a joint distribution of correlated random variables [ 7]. Lastly, we harness two machine learning techniques, Artificial Neural Network ( ANN) [ 8, 9] and Probabilistic Neural Network ( PNN) [ 10], to alleviate the computational burden associated with predicting the response and reliability of multidisciplinary engineering systems. We validate the efficacy of this proposed framework through a reliability analysis involving multiple disciplines characterized by complex input data. These examples underscore how the framework can accurately forecast the behavior of multidisciplinary engineering systems while minimizing computational overhead. The paper’s structure is as follows: Section 2 outlines the data reduction techniques employed in the proposed framework to simplify the complexity of input random variables. Section 3 introduces surrogate modeling techniques ( ANN and PNN) utilized for response and reliability prediction in the systems. In Section 4, we present a flowchart delineating the proposed data reduction framework. Finally, in Section 5, we apply the framework to two practical multidisciplinary engineering problems to demonstrate its effectiveness and advantages.
2 Data Reduction
As different disciplines are integrated, the quantity of observations and variables under examination grows exponentially. This expansion in data volume necessitates the analysis and organization of data in high-dimensional spaces. Consequently, in the realm of multidisciplinary engineering systems analysis, this multivariate data often results in what is commonly referred to as the “curse of dimension”. High dimensions introduce an abundance of features within the data, which can lead to uncertain predictions or overfitting in regression and classification [ 2]. Furthermore, when these features exhibit high correlations, they may essentially represent the same property, resulting in considerable redundancy and reduced simulation efficiency [ 1]. Consequently, efficient multivariate data analysis is often imperative to obtain accurate and realistic system responses.
Data reduction techniques can generally be categorized into two primary groups: FE and FS. The key distinction between these techniques lies in their purpose. FS methods are employed to identify the best subset of the original input features, while FE methods generate new features within a lower-dimensional space through transformations of the original features. In certain scenarios, both approaches may be employed sequentially. In this study, we propose a guideline (Section 4) to facilitate the optimal selection between FE and FS. This choice is based on the calculated entropy-based correlation coefficient of the input data.
2.1 Feature Extraction
FE ( Fig. 2) is a technique that transforms high-dimensional data into new features of reduced dimensionality. It allows the conversion of N dimensions of original data into a smaller set of M dimensions, where M is less than N. This process involves mapping and serves the purpose of eliminating redundant information from the input dataset. In this section, we delve into two highly effective FE methods: PCA and AE.
2.1.1 Principal Component Analysis (PCA)
PCA is a statistical technique utilized for transforming a correlated dataset into an uncorrelated or independent one.
This transformation is achieved through an orthogonal operation, represented as Y = PX. In this context, X denotes the original dataset, while Y signifies a restructured representation of X. Both X and Y possess dimensions of m (indicating the number of observations or cases) by n (representing the number of features), resulting from this linear transformation. The matrix P, an n-by- n transformation matrix, is constituted by columns that correspond to the eigenvectors of X′ X. Fundamentally, data dimensions are reduced using PCA through eigenvector decomposition. For instance, one can utilize a covariance matrix, CY, to derive the P matrix endowed with the orthonormal property [ 11].
In the Eq. A = XXT, A is a symmetric matrix. A is constructed from the eigenvectors originating from the rows of the P matrix, where P = ET; consequently, we have A = PTDP. A diagonal matrix, D, can be linked with the eigenvector’s matrix E as EDET. Because of the property of orthonormal matrices, the inverse of P is expected to be equal to its transpose. Thus, the principal components of X should be the eigenvectors of PCx = (n – 1)−1XXT, and each diagonal element of CY corresponds to the variance of X.
2.1.2 Auto-encoder (AE)
AEs are a special type of ANN [ 4]. ANNs are inspired by the human nervous system and can be trained to predict values or patterns of an output based on those recognized from the input. AEs aim to reduce the dimensionality of data by transforming it into a compressed representation. As shown in Fig. 3, AEs are composed of neural networks with multiple layers. In general, an AE has three layers: an input layer, a hidden layer, and an output layer. To transform input data, the data is passed through the hidden layer using a weight function and a bias. The output data is a reconstruction of the original input data. Training an AE aims to minimize the difference between the input and output layers. In other words, the training aims to reduce reconstruction’s mean squared error (MSE) [ 12]. Because AEs use a reduced number of hidden units, they can be used for dimensionality reduction. Additionally, AEs share the weight values and biases between the previous and subsequent layers [ 13]. The reconstruction error ( re) can be calculated using the squared-error cost function:
In this context, W, x, b, and X represent the weights, input units, bias, and input vector. The notations WT and bT denote the transpose of W and b. AE incorporates a distinctive criterion, such as sparsity, which is closely linked to the reduction of hidden units and, by extension, data reduction. Given that AE essentially aims to have its output data equal to its input data, it becomes evident that AE endeavors to learn an approximation of the identity function. Both PCA and AE will find application within the proposed framework, and the examples presented will elucidate both methods’ advantages and limitations.
2.2 Feature Selection
FS is identifying a subset from the original high-dimensional data that retains only meaningful features. When dealing with a set of original data, denoted as “ n”, the goal is to select a finite and informative subset, “ m”. This transformation process is essentially an algorithm for reducing dimensions. As depicted in Fig. 4, its significance diminishes if the selected subset lacks sufficient information from the original data. In Fig. 4, represent random data and indicate selected data. Various FS techniques exist, including those based on mutual information, single-variable classifiers, or even genetic algorithms [ 1]. However, these methods are often associated with high computational costs, and the analysis of distributions can yield unreliable results when information about all data distributions is incomplete.
In contrast to these approaches, the Independent Features Test ( IFT) offers a cost-effective and rapid means of eliminating features that do not contribute to describing a given system. IFT assumes that the target data is categorical or that all data features fall into one of two classes [ 4]. This process is necessary to estimate the scoring value of informative features:
Here, A and B represent datasets corresponding to feature IFT values, while n1 and n2 refer to the number of features in the respective classes. “ Sig” denotes a significance value used as a threshold for eliminating less useful data. As shown in Eq. (3), statistical moments such as variance ( var(.)) and mean ( mean(.)) are employed to gauge the significance level of the data. Consequently, features can be selected based on the criterion of IFT > sig. In general, it is recommended that features are considered informative when they exhibit a significance value of 2 or higher [ 14].
3 Surrogate Modeling for Model Prediction
Introducing surrogate modeling techniques offers a promising avenue to reduce the complexity and cost of simulating engineering systems [ 15]. Extensive research efforts have been devoted to developing efficient classification methods and surrogate modeling techniques, including Artificial Neural Networks ( ANN) [ 16], decision trees [ 17], and discriminant analysis [ 18], among others. Numerous studies have demonstrated the potential of ANN methods as viable alternatives to traditional classification techniques [ 16]. ANN exhibits substantial promise when dealing with decision domains with complex shapes that are challenging to capture accurately. One of the advantages of ANN is its applicability in both classification and regression tasks. Probabilistic Neural Networks ( PNN) [ 10] can also be harnessed for predicting the class within large datasets. PNN holds a distinct advantage over traditional ANN models in terms of significantly reducing computational efforts during the training process. Detailed explanations of both ANN and PNN are provided in the following subsections.
3.1 Artificial Neural Network (ANN)
ANN can define patterns and identify similarities when confronted with new inputs, whether they involve incomplete information or noisy data [ 19]. In ANN, a single computational neuron comprises input layers, one hidden layer, and one output layer. The input layers receive the input variables as X = { x1, x2, ..., xn}. Each input is assigned a weight, denoted as W, representing synaptic learning. The neuron’s output in the hidden layer is calculated as:
Here, f, W, and b refer to an activation function, weights, and bias for a single neuron. The commonly used activation function is a sigmoid function, expressed as f(h) = (1 + exp(-h))^(-1), where h signifies mapping units. The slope of the sigmoid function determines the proximity to the threshold point, and its output range spans from 0 to 1. A generalized ANN model encompasses multiple hidden units and can feature multiple hidden layers based on the complexity of the model. In a generalized ANN, the total weighted sum of the inputs can be articulated as:
Here, Wij(l) and b represent a weight and bias between unit j and layer l. The output value of the activation function can be denoted as fj(l). In general, if layer l is the input layer and layer nl is the output layer, each layer l is closely associated with layer l+1.
3.2 Probabilistic Neural Network (PNN)
PNN was introduced to address the computational complexity associated with ANN training [ 10]. It’s a specialized feedforward neural network that combines the Bayes decision rule with the Parzen nonparametric estimator to manage decision boundaries. This decision rule reduces the “expected risk” of misclassification in pattern classification [ 10].
The PNN is a non-parametric model, offering advantages in handling complex, non-linear relationships due to its lack of explicit assumptions about data distribution [ 20]. It exhibits one-shot learning, storing the entire dataset during training, which is efficient for small to medium-sized datasets [ 21]. The model is robust to data noise, avoiding reliance on assumptions about the distribution’s specific form [ 22]. However, PNN’s memory-intensive nature, storing the complete training dataset, can be computationally expensive and impractical for large datasets [ 23]. Additionally, scalability issues arise, limiting PNN’s applicability to datasets with high dimensions or features [ 9].
PNN comprises four layers: the input, pattern, summation, and output layers. As depicted in Fig. 5, in the pattern layer of PNN, classification decisions are made based on the dataset X. Given Probability Density Functions (PDFs) for different categories A and B, data set X belongs to class A if fA( X) > fB( X) for all A≠ B, where fA(X) and fB(X) represent the PDFs for class A and B, respectively. The Bayesian optimal decision rule for PNN’s classification is as follows:
where hA and hB are the prior probabilities of patterns occurring in each class. Decision accuracy relies on the PDF estimate for each class. Typically, a Gaussian kernel is used to represent a multivariate estimate of the class-conditional PDF for each class,
Here, k, fk (X), p, σ, m, and i represent classes, the summation of multivariate Gaussian distributions, the dimension of the measurement space, the smoothing parameter, the total number of training patterns, and the number of patterns, respectively. XTKi denotes the ith training pattern from class k. PNN can accommodate training data set XT, typically with n data points, each having m dimensions.
The smoothing parameter affects the placement of training data. As it increases, the degree of interpolation between data points also rises. As depicted in Fig. 5, PNN, used for classifying data set X into two classes A and B, comprises four layers: the input, pattern, summation, and output layers. Both PNN and ANN will be employed in the framework, showcasing the advantages of these methods.
4 Proposed Framework
The proposed study aims to address the computational complexity arising from high-dimensional input variables and complex modeling techniques, all while maintaining prediction accuracy in multidisciplinary engineering systems under uncertainty. This section outlines a data reduction framework consisting of three key components: multivariate data generation, data reduction of input data, and representation of multivariate system behavior, as depicted in Fig. 1.
4.1 Step 1: Generating Multivariate Data
In multidisciplinary engineering systems, multiple random input variables originate from various disciplines, influencing one another’s behavior. Therefore, the initial focus lies on generating these multivariate data. Copulas are employed for this purpose due to their ability to describe complex nonlinear behavior effectively. Copulas provide a realistic representation of dependencies among numerous random variables. A copula can be defined as a function that links multivariate distribution functions to their one-dimensional marginal distribution functions. For random variables x and y with marginal distributions represented as F and G, a joint distribution J can be defined as follows:
Here, C denotes a copula function (C: [0,1]2→[0,1]), ρ is the linear correlation coefficient, h and k are copula parameters and represent the standard univariate Gaussian distribution function. This representation of multivariate input data is the foundation for modeling the complex behavior of multidisciplinary engineering systems in Step 1.
4.2 Step 2: Data reduction
During Step 2, the user decides whether to implement FS or FE to address the uncertainties present in the input data. For those without prior exposure to the problem or familiarity with the data structure, the use of a criterion becomes crucial in guiding the decision-making process between FS and FE. Consequently, this research utilizes an entropy-based correlation coefficient (e) as the criterion. Through the assessment of the e value, which indicates the degree of correlation within the data, users can thoughtfully select the most suitable data reduction tool.
4.2.1 Entropy and Mutual Information
Numerous research [ 25– 29] in the field of data mining frequently incorporate insights and terminology derived from entropy, rooted in information theory. Entropy functions as a metric for gauging uncertainty or information content within a dataset. In the domain of data mining, entropy manifests several crucial facets:
1) Information Quantity: The presence of information is greater in instances of rare events compared to frequently occurring events.
2) Uncertainty in a Random Variable or Vector: Events that are common or certain contribute to a reduction in uncertainty, impeding the predictability of responses. Consequently, events characterized by uncertainty exhibit higher levels of entropy.
3) Dispersion in Probability Distribution: A diminished dispersion signifies a reduced amount of entropy.
Thus, entropy captures uncertainty, randomness, or redundancy intrinsic to a random variable. Furthermore, entropy has the capacity to incapacitate certain constraints of traditional linear correlation estimation methods [ 30], particularly vulnerabilities to non-linear correlation or non-Gaussian distributions. Entropy can be expressed by:
Here, p(xi) represents the marginal probability of each occurring sample of the random variable x.
Based on Hx, mutual information quantifies the amount of information shared between random variables xi and yj, thereby serving as a potent instrument for evaluating the significance of features and discerning between those possessing rich information and those lacking such attributes [ 31].
Here, p( xi)( yj) denotes the joint probability distribution of xi and yj. In the context of FE, mutual information can pinpoint features that contribute the highest amount of information to the target variable. Features exhibiting substantial mutual information can be regarded as indispensable for the analysis [ 29]. It also effective in recognizing redundant features by quantifying the level of dependence between variables. Features exhibiting low mutual information with the target variable or other features may be deemed redundant and can be omitted during the process of FS [ 32]. Therefore, the entropy-based approach can establish a solid theoretical basis for differentiation between FE and FS.
4.2.2 Entropy-based Correlation Coefficient
Given that HXY has no a scaled range, the scaled HXY denoted as “ e”, which ranges from 0 to 1, directly provides information about the extent of data correlation. Based on Eqs. (10) and (11), e can be calculated as:
The computed e value falls within the scaled range of [0, 1]. When e is 0, X and Y are uncorrelated, while e = 1 signifies a complete correlation. As a criterion for determining whether to employ FE or FS for data reduction, the e value is used as follows: if e falls between 0 and 0.5, it suggests that the features are uncorrelated, and consequently, FS is employed to reduce feature size and redundancy, as illustrated in Fig. 1. Conversely, if e falls between 0.5 and 1, FE is chosen due to the high correlation among features. Based on this decision, either FS or FE processes are carried out in Step 2. In the case of FS, we consider IFT; alternatively, for FE, both PCA and AE are taken into account. The advantages of PCA and AE will be elaborated upon in the examples in Section 5. To ascertain the effectiveness of data reduction in simplifying input features, the concept of redundancy ( r) can be employed in Step 2 [ 27]. The redundancy ( r) concerning each random variable x is defined as:
The term “log2 N” refers to the maximum entropy, with “N” representing the total number of samples. Specifically, we will compare the redundancy values between the raw data and the reduced data to assess the effectiveness of the data reduction process. Based on this redundancy comparison, if the redundancy value of the raw data remains higher than that of the reduced data, we should repeat the data reduction process to minimize redundancy further. Therefore, with this proposed criterion, data reduction can be carried out effectively by combining “e” with redundancy, leading to reduced computational costs for modeling and predicting system responses.
4.3 Step 3: Modeling Multivariate System Behavior
Following the data reduction process in Step 2, the next and final phase involves modeling the multidisciplinary engineering system. Within our proposed framework, we consider using both ANN and PNN. However, it’s important to note that users can employ their preferred surrogate modeling approach during this step.
As highlighted in Section 3, the rapid training capability of PNN makes it a favorable choice compared to traditional ANN methods. Nevertheless, PNN is typically employed for classification tasks, such as determining system reliability, and may necessitate additional adjustments for function approximation. Consequently, we also incorporate ANN as an option for constructing the system’s response surface model.
To specify, when the objective is response prediction, we utilize ANN as the surrogate modeling technique. On the other hand, if the task involves reliability estimation with classification, PNN is the chosen method.
5 Validation Examples
In this section, we will illustrate the effectiveness of our proposed method through three distinct examples. The initial example will demonstrate the application of the proposed approach to a 3-D cantilever beam, subject to the influence of multiple mechanical properties, notably Young’s moduli, which constitute a random field. This example validates our approach’s accuracy in predicting the system’s behavior with precision. The final example involves a stretchable patch antenna, which presents a prototypical scenario of a multidisciplinary system. The flexible antenna is a commonly used sensor worn on the body to observe human activities, assessing changes in frequency [ 33]. This illustration underscores the method’s ability to realistically design antenna substrates and accurately predict frequency variations across differing thicknesses and displacements.
5.1 Cantilever Beam Example
In Fig. 7, a point load of F = 1,000 N is applied orthogonally to the end node of a beam. This particular beam exhibits dimensions of L = 4 m in length, H = 0.1 m in height, and W = 0.1 m in width. The beam is discretized into 30 elements, each of equal length. For each element, the Young’s moduli, E, are treated as random variables, characterized by a mean value of μ = 2.05e11 Pa and a coefficient of variance, COV, of 0.1. These Young’s Moduli parameters are modeled as Gaussian random fields, utilizing the Gaussian covariance model. The correlation length value plays a pivotal role, where a higher value indicates a greater correlation within the random field. If the distance between two distinct points exceeds the correlation length, these points are statistically considered nearly uncorrelated. For this example, we will examine two datasets: one with high correlation ( l = 10 m) and another with low correlation ( l = 0.1 m). In the latter case, discretizing into 30 elements results in a distance of 0.133 m, which surpasses the correlation length. Consequently, we proceed with two data sets, one correlated and the other uncorrelated, following Gaussian distributions.
5.1.1 Generation of Young’s Moduli
We generate 30 random Young’s moduli using the copula function with Gaussian random fields, incorporating both correlated and uncorrelated parameters. Fig. 8 visually represents these moduli through color scales, illustrating their random field realizations generated by the Gaussian copula function. Subsequently, we construct two datasets containing 1,000 samples for the 30 Young’s moduli, distinguished by their correlated and uncorrelated parameters. We evaluate the redundancy of each dataset using Eq. (13) and summarize the results in Table 1. The correlated dataset exhibits a redundancy value of 7.6828, while the uncorrelated dataset registers a redundancy value of 4.9068. These findings confirm that the correlated dataset shows significantly higher redundancy. To assess the degree of correlation and guide data reduction methods, we calculate “ e” values of 0.7621 for the correlated data and 0.2517 for the uncorrelated data. These results suggest a preference for FE methods in addressing redundancy in correlated data, while FS is more appropriate for uncorrelated data ( e < 0.5).
5.1.2 Data Reduction for Young’ Moduli
Employing PCA as the FE method, we reduce dimensionality to 4 for correlated data and 25 for uncorrelated data, preserving information effectively. As indicated in Table 1, PCA significantly reduces redundancy in correlated data compared to uncorrelated data. Redundancy reduction is assessed using Eq. (13), where Vnew and Vorg represent redundancy values of truncated and original data, respectively.
The redundancy reduction effectiveness of PCA is compared to that of AE. It is initially designed using Eq. (14) to determine the total number of hidden neurons.
In this instance, AE reduces the dimension to 17, signifying 17 new data points as significant eigenvectors for reducing the original input data redundancy. The redundancy reduction is obtained for the correlated data, resulting in a 70.31% reduction. Although AE slightly lags behind PCA for correlated data, it maintains an acceptable 3.91% prediction error, computed using Eq. (14).
5.1.3 Deflection Estimation
An ANN model was considered to predict cantilever beam deflection, as Probabilistic Neural Networks ( PNN) c ANNot perform this task. We design an ANN with 21 hidden neurons using Eq. (14). The ANN’s network properties align with those of AE. After obtaining reduced Young’s moduli through PCA and AE, the designed ANN provides a statistical estimate of tip displacement. We compare the Probability Density Functions (PDFs) of tip displacement calculated from reduced data via PCA and AE with the PDF derived from the original data ( Fig. 4). These results demonstrate a close match between the PDF predictions by AE and PCA compared to the original displacement PDF. Further comparisons in Table 2 reveal that AE exhibits slightly lower accuracy for correlated data than PCA. Nevertheless, AE maintains an acceptable 3.91% prediction error.
5.1.4 Probabilistic Neural Network for Classification
In the context of uncorrelated data, our objective is to assess the cantilever beam’s reliability concerning structural failure. We propose a classification process employing a PNN instead of an ANN due to the simplified training process for reliability estimation. In this classification process, we establish a limit state function, g = R-S, where R represents resistance (0.034 m) and S signifies loading. Two classes, “class A” and “class B”, are generated: “class A” corresponds to data points deemed safe (when g is greater than zero), while “class B” designates data points in the failure region (when g is less than or equal to zero). We conducted a reliability analysis using a simulation model and summarized the results in Table 3. The probability of failure ( Pf) is estimated using the limit state function, and the second column in Table 3 presents Pf values derived from PNN with IFT.
We employ the Monte Carlo Simulation (MCS) method to assess accuracy to estimate Pf as the ground truth. Comparing Pf values estimated by trained PNN with 1,000 samples to those obtained through the MCS method with 10,000 samples reveals a slight error of 4.75% when employing IFT with PNN compared to using PNN with the original data. A similar error of 3.84% emerges when comparing Pf values estimated through the MCS method using IFT and original data. These findings indicate that IFT FS marginally reduces accuracy by less than 4%. Moreover, for both scenarios, employing the original input data and data reduced by IFT, the discrepancy between Pf results calculated through the MCS and PNN methods remains below approximately 7%, affirming the ability of PNN to accurately predict Pf in cantilever beam reliability analysis, even with FS through IFT.
5.2 Stretchable Antenna Example
A stretchable patch antenna, depicted in Fig. 10, comprises essential components: substrate, patch, feed line, ground, and source. The fabrication of a substrate with uniform thickness for stretchable antennas is atypical due to current manual engineering practices [ 34]. Consequently, modeling the substrate necessitates accounting for its variable thickness. Tabulated in Table 4 are details concerning substrate geometry, thickness variations, and properties. The primary objective of analyzing this stretchable patch antenna with variable thickness is to validate its ability to maintain a dependable frequency range during contraction and relaxation. Two key criteria govern an acceptable frequency range.
Firstly, employing mechanical behavior analysis involves subjecting the antenna to deformation, evaluated through a tensile test. Specifically, both ends of the antenna are subjected to tension. The assumption of antenna symmetry, where both ends deform symmetrically from the center, reduces finite element model size, subsequently reducing analysis time and cost. Deformation results are represented as X and y coordinates. The stretchable antenna is segmented into 32 sections to reflect thickness variations ( Fig. 11(a)), comprising 50 coordinates ( Fig. 11(b)). Upon obtaining deformed antenna coordinates from ANSYS deformation analysis, the resonance frequency is calculated using HFSS software based on the deformed mode’s coordinates. In pursuit of stable performance, the stretchable antenna’s frequency should remain within a reliable range (3 dB frequency) under deformation, leading to the rejection of antenna designs that fall outside this suitable range. Based on Table 4, HFSS-calculated resonance frequency changes with antenna deformations of 1, 3.2, and 12 mm were conducted ( Fig. 12). As displacement increases, antenna efficiency diminishes due to declining absolute values of the reflection coefficient S11, with resonance frequencies does not stay well-preserved near 2.5 GHz.
5.2.1 Generation of Varying Thickness
Consideration is given to the substrate’s thickness as a random field generated using a Gaussian copula function. The assumed correlation length is 20 mm, facilitating the generation of fairly correlated thickness parameters for patch antenna substrates. A total of 32 different substrate thicknesses are generated, while a patch with a constant thickness of 0.03 mm is modeled. The redundancy of the initial thickness data is quantified at 3.432 using Eq. (12), as detailed in Table 5. Subsequently, 121 distinct displacement values, ranging from 0 to 12 mm, are applied to the antenna’s ends. The X and Y coordinates for each point, as depicted in Fig. 7(b), are independently estimated from deformation analysis in ANSYS. These data serve as training points for regression using Artificial Neural Networks ( ANN) to calculate coordinates when applying displacement to an antenna with variable thickness. Consequently, thicknesses are treated as input parameters, while coordinates are assumed as outputs. Prior to ANN training, input parameter data reduction is discussed in Section 5.3.2.
5.2.2 Data Reduction of Varying Thickness
The determination of an appropriate data reduction method hinges on calculating the “ e” value using Eq. (11). An “ e” value of 0.6300 is computed for the original thickness dataset, indicating the necessity of employing FE methods due to its exceeding the 0.5 threshold. Tabulated in Table 6 are redundancy results computed for AE and PCA methods. The comparison between these results and the original data redundancy values confirms the effective reduction of irrelevance between the original and truncated datasets. To retain 90% of the thickness information, PCA selects 14 eigenvalues, while AE employs 20 hidden neurons. AE outperforms PCA by yielding a smaller reconstruction error, attributed to its increased number of hidden neurons.
5.2.3 Artificial Neural Network for Predicting Antenna Deformation
In the subsequent step, an ANN is harnessed to predict antenna deformation based on varying thicknesses as input data and coordinates as model responses. Table 5 presents prediction errors for ANN, revealing a 6.28% error for X coordinates. In contrast, thicknesses used in PCA yield an error of 3.27%, while those in AE exhibit a 5.41% error. The ANN’s inherent error of 7.43% is considered acceptable for Y coordinates, akin to PCA and AE prediction errors. These results underscore the effective data reduction achieved with both PCA and AE, as evidenced by error values comparable to those of the original data. Furthermore, data reduction techniques lead to error reduction, signifying that reduced data redundancy indeed enhances ANN prediction models. Fig. 9 provides a comparison between deformed antenna models predicted by ANN and data reduction techniques and models obtained from actual simulations for 1 and 12 mm deformation values, highlighting a strong agreement.
5.2.4 Data Reduction of Each Coordinate for Resonance Frequency Prediction
Upon obtaining coordinates for each antenna point from ANN models, all 50 coordinates are employed as input data in HFSS to generate the deformed antenna model. Consequently, 121 distinct resonance frequencies are computed as output data for varying displacement values. Given a dataset of coordinates, the FS method is chosen for data reduction because it can identify the most informative variables. This process involves estimating significant coordinates to aid antenna redesign. The selected coordinates enable the assessment of whether the frequency can be classified within a reliable bandwidth. IFT generates a new subset by selecting coordinates exceeding the “Significance value” of 3, ensuring high accuracy and selecting the five most significant coordinates. Following the IFT application, a reduced dataset emerges with a reduced redundancy of 10.8147, compared to the original dataset's redundancy of 12.5689, signifying a 13.9567% reduction in uncertainty, estimated using Eq. (13). Fig. 15 visually presents the significant coordinates selected by IFT.
5.2.5 PNN for Classification of Antenna Frequency
In a subsequent phase, antenna reliability is assessed within a 3 dB frequency range, wherein the resonance frequency should remain for reliability during deformation. This range is based on the non-deformed antenna, having a resonance frequency of 2.5 GHz, with a reliable range spanning 2.4849 GHz (m1) to 2.5151 GHz (m2) as a 3 dB frequency range. The limit state function is estimated to facilitate classification, with a resistance or capacity of 2.5 GHz resonance frequency. A resonance frequency within the 3 dB frequency range is essential. If g exceeds 0.0302 (difference between m1 and m2), the stretchable patch antenna system is categorized as class B, necessitating its rejection due to unstable resonance frequency. Monte Carlo Simulation ( MCS) with 10,000 samples is conducted to calculate the probability of failure value. Results from MCS are compared with the Probability Neural Network ( PNN) model, employing 121 training data samples to train PNN and 10,000 samples for accurate, computationally efficient failure probability prediction. Tabulated in Table 6 is a comparison between Pf values derived from PNN classification and those obtained from MCS, revealing a difference of 8.05%, falling below the 10% error threshold essential for accurate classification. The reduced coordinates yield a Pf value of 0.3016, signifying a 7.37% Pf increase compared to the original coordinates. Furthermore, the Pf value calculated with reduced coordinates closely aligns with that computed via MCS (6.98%), indicating accurate FS and classification.
5.2.6 Validation of Efficacy of Proposed Framework
The stretchable antenna remains acceptable until it experiences a 3.1 mm displacement while maintaining a resonance frequency of 2.5 GHz. Beyond this point, frequency variation becomes evident, with a maximum displacement of 12 mm, resulting in a resonance frequency of 3.2 GHz. Consequently, limiting the maximum allowable displacement to 3.1 mm is recommended. The final step involves comparing resonance frequencies between original coordinates and new coordinates derived from FE and FS. Fig. 16 demonstrates that FE yields higher frequency accuracy than FS due to greater absolute values of each S11 calculated through FE methods than those obtained through FS. This discrepancy arises because FS eliminates uninformative coordinates, whereas FE retains principal components and reconstructs coordinates to match the original dataset’s dimensions. Despite reduced antenna efficiency under displacement due to resonance frequency variations, the proposed method accurately predicts resonance frequencies and enhances S11 values.
6 Conclusion
This research developed an efficient framework for reducing input parameter dimensions and accurately predicting responses. In multidisciplinary engineering systems, input parameters are often correlated. The comprehension of multi-physics engineering system behavior relies on a substantial dataset. Therefore, it is crucial to extract the most refined and informative data. By eliminating uncertainty and redundancy from the data, the prediction or classification model’s accuracy can be assured. Consequently, the proposed framework provides effective guidance for users lacking expertise in data analysis, a prerequisite for analyzing multi-physics engineering systems.
In this research, the copula function was employed to demonstrate these interdependencies, resulting in more realistic modeling and precise response predictions.
The framework introduced data reduction techniques like PCA, AE for FE, and IFT for FS, enhancing performance prediction accuracy. An entropy-based correlation coefficient (“e”) was used to decide between FE and selection based on input parameter correlations.
After data reduction, ANN and PNN were used to estimate responses and enhance computational efficiency during simulations. The framework’s efficacy was demonstrated through three engineering examples. PCA and AE effectively reduced data complexity without significant information loss in cases with high correlation, as indicated by the “e” criterion. Redundancy reduction was confirmed across all datasets using FS or extraction guided by the “e” criterion.
Prediction errors indicated that reduced data with low redundancy yielded reliable results. PNN and MCS showed that the framework achieved accurate classifications. Notably, the stretchable antenna example revealed that increased dimensionality in predicting resonance frequency resulted from electrical and mechanical properties. The framework effectively reduced data in mechanical and electrical analyses of multidisciplinary problems.
These findings underscore the framework’s importance in addressing multidisciplinary engineering challenges, enabling efficient modeling in critical engineering applications involving multi-physics and uncertainty, such as stretchable electronics.
Fig. 1
Data redundancy in high-dimensional datasets
Fig. 2
Fig. 3
Representation of auto-encoder
Fig. 4
Fig. 5
Architecture of probabilistic neural network [ 24] (Adapted from Ref. 24 on the basis of OA)
Fig. 6
Fig. 7
2-D View of cantilever beam: cantilever beam on X-Y axes and on Y-Z axes
Fig. 8
Random field realization of uncorrelated cantilever beam elastic moduli: (a) uncorrelated (b) correlated
Fig. 9
Tip displacement of: (a) correlated Young’s moduli; (b) uncorrelated Young’s moduli
Fig. 10
Frequency analysis of stretchable patch antenna
Fig. 11
Schematic of stretchable antenna. (a) 31 parts with different thicknesses and 1 part with constant thickness and (b) 50 coordinates
Fig. 12
Resonance frequency under 1, 3.2, and 12 mm deformation
Fig. 13
Gaussian copula function demonstrating a joint distribution of sets of varying thickness
Fig. 14
Shape of deformed antenna
Fig. 15
Selected Significant Coordinates
Fig. 16
Validation of proposed method with resonance frequency: validation of proposed method with resonance frequency for 1 mm displacement and 12 mm displacement
Table 1
Redundancy estimation of Young’s moduli (E)
|
Redundancy estimation |
|
No. data reduction |
E after dimension reduction |
Redundancy reduction |
Correlated E
|
7.6828 |
1.0403 (by PCA) 2.2806 (by AE) |
86.52% 70.31% |
Uncorrelated E
|
4.9068 |
3.0715 (By IFT) |
37.40% |
Table 2
Prediction error of displacement of Young’s moduli (E)
By Eq. |
E after ANN (Prediction Error) |
E after PCA/ANN (Prediction error) |
E after AE/ANN (Prediction error) |
E |
0.0128 (5.47%) |
0.0135 (3.12%) |
0.0132 (3.91%) |
Table 3
Probability of failure by PNN and MCS
|
Pf of original data |
Pf of new data after IFT
|
Pf difference of original and new data |
PNN with 1,000 samples |
0.1095 |
0.1147 |
4.75% |
MCS with 10,000 samples |
0.1173 |
0.1218 |
3.84% |
Pf difference of PNN and MCS
|
7.12% |
6.19% |
|
Table 4
Properties of stretchable patch antenna
|
Substrate (μ, COV) |
Patch |
Feed line |
Ground |
Source (μ, COV) |
Young’s moduli (Mpa) |
1.32 |
0.012 |
0.012 |
1.32 |
|
Thickness (mm) |
0.09, |
0.1 |
0.03 |
0.03 |
0.05 |
Width (mm) |
70 |
35 |
32 |
70 |
0.09, 0.1 |
Length (mm) |
80 |
43 |
2.5 |
80 |
2.5 |
Permittivity |
3 |
1 |
1 |
1 |
1 |
Conductivity (S/cm) |
0 |
1.5e4 |
1.5e4 |
1.5e4 |
0 |
Dielectric loss tangent |
0.01 |
0.01 |
0.01 |
0.01 |
0 |
Magnetic loss tangent (kg/m^3) |
0.0001 |
0.001 |
0.001 |
0.001 |
0 |
Table 5
Redundancy estimation and prediction error of coordinates of thickness (t)
|
Redundancy estimation |
Coordinates prediction error |
|
No dimension reduction |
T after PCA (Redundancy Reduction) |
T after AE (Redundancy Reduction) |
T after ANN (X coordinates/Y coordinates) |
T after PCA/ANN (X coordinates/Y coordinates) |
T after AE/ANN (X coordinates/Y coordinates) |
Original t |
3.432 |
1.708 (50%) |
1.926 (43.91%) |
6.28%/7.43% |
3.27%/5.32% |
5.41%/7.17% |
Table 6
Probability of failure by PNN and MCS
|
Pf of Original Data |
Pf of new Data by IndFeaT |
Pf difference of original and new data |
PNN with 121 samples |
0.2809 |
0.3016 |
7.37% |
MCS with 10,000 samples |
0.3035 |
0.3247 |
6.98% |
Pf difference of PNN and MCS
|
8.05% |
7.66% |
|
References
1. Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D. & Saeed, J. (2020). A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. Journal of Applied Science and Technology Trends, 1(2), 56–70.
2. Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajput, D. S., Srivastava, G. & Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8, 54776–54788.
3. Nixon, M. & Aguado, A. (2019). Feature extraction and image processing for computer vision, Academic Press.
4. Gheisari, M., Hamidpour, H., Liu, Y., Saedi, P., Raza, A., Jalili, A., Rokhsati, H. & Amin, R. (2023). Data mining techniques for web mining: A survey. Artificial Intelligence and Applications (pp. 3–10.
5. Zhou, H., Wang, X. & Zhu, R. (2022). Feature selection based on mutual information with correlation coefficient. Applied Intelligence, 52, 5457–5474.
6. Zhang, L. & Singh, V. P. (2019). Copulas and their applications in water resources engineering, Cambridge University Press.
7. Durante, F. & Sempi, C. (2015). Principles of copula theory, CRC Press.
8. Villarrubia, G., De Paz, J. F., Chamoso, P. & De La Prieta, F. (2018). Artificial neural networks used in optimization problems. Neurocomputing, 272, 10–16.
9. Abiodun, O. I., Jantan, A., Omolara, A. E., Dada, K. V., Mohamed, N. A. & Arshad, H. (2018). State-of-the-art in artificial neural network applications: A survey. Heliyon, 4(11), e00938..
10. Mohebali, B., Tahmassebi, A., Meyer-Baese, A. & Gandomi, A. H. (2020). Probabilistic neural networks: A brief overview of theory, implementation, and application. Handbook of Probabilistic Models (pp. 347–367.
11. Hofert, M.Kojadinovic, I.Mächler, M. & Yan, J. (2018). Elements of copula modeling with R, Springer.
12. Chicco, D., Warrens, M. J. & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623..
13. Vahdat, A., & Kautz, J. (2020). NVAE: A deep hierarchical variational autoencoder. In: Proceedings of the 34th International Conference on Neural Information Processing Systems; pp 19667–19679.
14. Han, J.Pei, J. & Tong, H. (2022). Data mining: concepts and techniques, Morgan Kaufmann.
15. Kim, J. W., Nam, J., Kim, G. Y. & Lee, S. W. (2023). Artificial intelligence (AI)–based surface quality prediction model for carbon fiber reinforced plastics (CFRP) milling process. International Journal of Precision Engineering and Manufacturing-Smart Technology, 1(1), 35–47.
16. Witten, I. H.Frank, E.Hall, M. A. & Pal, C. J. (2017). Data mining: practical machine learning tools and techniques, 4th Edition.. Morgan kaufmann.
17. Lim, T.-S., Loh, W.-Y. & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.
18. Murthy, S. K., (1998). Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2, 345–389.
19. Kim, I.-S., Lee, M.-G. & Jeon, Y. (2023). Review on machine learning based welding quality improvement. International Journal of Precision Engineering and Manufacturing-Smart Technology, 1(2), 219–226.
20. Varuna Shree, N., & Kumar, T. N. R. (2018). Identification and classification of brain tumor MRI images with feature extraction using DWT and probabilistic neural network. Brain Informatics, 5(1), 23–30.
21. Maulik, R., Fukami, K., Ramachandra, N., Fukagata, K. & Taira, K. (2020). Probabilistic neural networks for fluid flow surrogate modeling and data recovery. Physical Review Fluids, 5(10), 104401.
22. Khokhar, S., Zin, A. A. M., Memon, A. P. & Mokhtar, A. S. (2017). A new optimal feature selection algorithm for classification of power quality disturbances using discrete wavelet transform and probabilistic neural network. Measurement, 95, 246–259.
23. Bouwmans, T., Javed, S., Sultana, M. & Jung, S. K. (2019). Deep neural network concepts for background subtraction: A systematic review and comparative evaluation. Neural Networks, 117, 8–66.
24. Patel, J. (2012). Enhanced classification approach with semi-supervised learning for reliability-based system design, Georgia Institute of Technology.
25. Zhang, A.Lipton, Z. C.Li, M. & Smola, A. J. (2023). Dive into deep learning, Cambridge University Press.
26. Lesne, A., (2014). Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science, 24(3), e240311..
27. Crooks, G. E., (2015). On measures of entropy and information. Tech Note, 9(4), 1–20.
28. Gray, R. M. (2011). Entropy and information theory, Springer Science & Business Media.
29. Belghazi, M. I., Baratin, A.Rajeshwar, S.Ozair, S.Bengio, Y.Courville, A. & Hjelm, D. (2018). Mutual information neural estimation. In: Proceedings of the 35th International Conference on Machine Learning; pp 531–540.
30. Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03); pp 856–863.
31. Paninski, L., (2003). Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253.
32. Brown, G., Pocock, A. C., Zhao, M.-J. & Luján, M. (2012). Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 13(2), 27–66.
33. Selvaraj, V., & Min, S. (2023). AI-assisted monitoring of human-centered assembly: A comprehensive review. International Journal of Precision Engineering and Manufacturing-Smart Technology, 1(2), 201–218.
34. Hwang, S., & Choi, S.-K. (2021). Deep learning-based surrogate modeling via physics-informed artificial image (PiAI) for strongly coupled multidisciplinary engineering systems. Knowledge-Based Systems, 232, 107446..
Biography
Sungkun Hwang earned his Ph.D. in mechanical engineering from the Georgia Institute of Technology in Atlanta, USA. His research includes reliability and probability-based mechanical design, multidisciplinary design optimization, and data analysis under uncertainties. Currently, he is employed in mechanical research and development at Samsung Electronics.
Biography
Seung-Kyum Choi received the Ph.D. degree in Mechanical and Materials Engineering from Wright State University. He is an Associate Professor in School of Mechanical Engineering at Georgia Institute of Technology. He is an author of a graduate-level book on the topics of probabilistic mechanics and reliability-based design optimization (Reliability-based Structural Design, Springer, 2007). He served as a chair and session organizer at national conferences of AIAA, SDM, MDO, NDA and ASME/IDETC, in addition to being a chair of the ASME Advanced Modeling & Simulation Technical Committee. He is currently an associate editor of ASME Journal of Computing and Information Science in Engineering, and Journal of Computational Design and Engineering. Dr. Choi's research interests include structural reliability, probabilistic mechanics, statistical approaches to design of structural systems, multidisciplinary design optimization, and AI-enabled modeling & simulation for complex engineered systems. Dr. Choi is currently appointed the Director of Center for Additive Manufacturing Systems (CAMS), where he has responsibilities for developing research and education programs in additive manufacturing.
Biography
Eun-Ho Lee is a faculty member of the School of Mechanical Engineering at Sungkyunkwan University in South Korea. He received his Ph.D. degree from the Mechanical Engineering Department of KAIST in 2015 and worked at General Motors R&D (Warren, MI) and Samsung Electronics (Suwon, Korea). His research fields include intelligent manufacturing, semiconductor/packaging manufacturing, and intelligent monitoring.
|
|
TOOLS |
|
|
|
|
|
|
|
METRICS
|
|
|
|