The Influence of Heatmap Correlation-based Feature Selection on Predictive Modeling of Low Alloy Steel Mechanical Properties Using Artificial Neural Network (ANN) Algorithm

Abstract

This study aims to evaluate the influence of heatmap correlation-based feature selection on predictive modeling of low alloy steel mechanical properties using an artificial neural network (ANN) algorithm.Heatmap correlation was used to determine the chemical elements most correlated to the low alloy steel mechanical properties, such as Yield strength (YS) and Tensile strength (TS).There were 15 input variables of chemical elements in this study, and after feature selection, 11 input variables were obtained for YS, and 13 input variables were obtained for TS.The ANN model was validated using K-fold 10 cross-validation and evaluated using loss metric, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE).The results showed that modeling with feature selection was able to improve the YS prediction, with a decrease in value of 6.83% in MAE and 4.97% in RMSE, while the TS prediction decreased by 16.46% in MAE and 18.34% in RMSE after feature selection.These results indicate that the use of feature selection provides better performance compared to the model without feature selection, and heatmap correlation can be used as an alternative to improve model performance in predictive modeling of low alloy steel mechanical properties using the ANN algorithm.

Introduction
Low alloy steel is a type of steel that contains certain alloying chemical elements in small amounts, typically less than 10%.These alloying chemical elements can enhance the mechanical properties of low alloy steel, such as strength, hardness, ductility, wear resistance, and corrosion resistance (Leni D. et al., 2023).Low alloy steel is commonly used in the manufacturing of machine structures, vehicles, and building constructions that require high strength and toughness (Miletić, I. et al. 2020).The increasing and diverse use of low-alloy steel in engineering worldwide aligns with the need for the industry to develop steel alloys that meet the demand.It is aimed at identifying the characteristics of low alloy steel and preventing material failure.According to Morini et al. (2019), understanding the mechanical properties of a material is not only to prevent premature failure of machine components or industrial safety aspects but also for user safety.Chemical composition is one of the factors that affect the mechanical properties of low-alloy steel.The chemical elements present in low-alloy steel can affect the microstructure of steel, thus affecting the mechanical properties of low-alloy steel (Goritskii et al., 2016).In addition, the heat treatment process also affects the microstructure of low alloy steel, thus affecting its mechanical properties.The heat treatment process performed on low alloy steel can change its microstructure, such as the formation of phase structure and grain size changes.Observing the changes in the microstructure of low alloy steel during the heat treatment process can affect its mechanical properties (Aziz et al., 2016).Therefore, knowledge of the chemical composition and mechanical properties of materials is essential for researchers in developing materials that meet the needs of applications.Experimental testing of chemical composition and mechanical properties of materials has drawbacks such as high cost, time-consuming, and requiring sufficient expertise.It can slow down the analysis and development process of low-alloy steel material.
Machine learning (ML) is a field of study that explores how computers can learn from data and build statistical models to perform tasks without explicit instructions (Wei et al., 2019).These models are constructed by identifying patterns and making inferences from data, enabling predictions of new data.This technology has been widely applied in various fields, including the material industry, such as material classification and predicting mechanical properties using machine learning algorithms (Narayana et al., 2020).Artificial Neural Network (ANN) is one of the machine learning algorithms inspired by the workings of the human brain.Thus, ANN can learn patterns from training data and use them to predict outcomes in new data.The use of machine learning methods in the material science field has been widely researched, such as predicting the mechanical properties of low alloy steel using Artificial Neural Network (ANN), where this study produced accurate modeling that aligned with experimental testing (Reddy et al., 2009).Reddy et al.Research comparing machine learning algorithms conducted by Leni et al. stated that ANN performed better in predicting the mechanical properties of low alloy steel based on chemical elements and heat treatment (Leni et al., 2022).However, using ANN to predict the mechanical properties of low alloy steel, there is an issue of feature selection, which is to determine the most influential variable in predicting the outcome.Some variables may be irrelevant or less significant in predicting the mechanical properties of low alloy steel, thus slowing down the process and adding to the model's complexity, which could reduce model accuracy (Xiong et al., 2020), (Zhu et al., 2020).
Feature selection is selecting a relevant and essential subset of features from a large dataset for building machine learning models.The main goal of feature selection is to reduce the dimensionality of the dataset and retain the most relevant and essential features in building accurate prediction models (Cai J. et al., 2018), (Dhal P. et al., 2022).Heatmap correlation is one commonly used feature selection method that involves visualizing the correlation between each variable in the dataset by differentiating the color of variables in the heatmap.The closer the correlation value of an input variable is to 1 or -1, the greater its influence on the target variable or output that is to be predicted (Jović A. et al., 2015).Using heatmap correlation for selecting the most influential and relevant input variables for the mechanical properties of low-alloy steel can result in more effective and accurate modeling (Chen Y. et al., 2023) (Choudhury, A., 2023).
Based on the issues previously outlined, this study aims to determine the influence of feature selection based on heatmap correlation on the prediction modeling of the mechanical properties of low-alloy steel using Artificial Neural Network (ANN) algorithms.This study compares input variables using feature selection with input variables without feature selection, where the input variables in this study are the chemical elements of low-alloy steel.The results of this study can provide vital information to improve the effectiveness and accuracy of the prediction modeling of mechanical properties of low-alloy steel using feature selection techniques based on heatmap correlation.

Research Methods
This research is an experimental study using a quantitative approach.Experimental research is conducted by controlling the research variables and varying one or more independent variables to see their effects on the dependent variable (Holman et al., 2021).This study aims to compare the modeling results using Artificial Neural Network (ANN) for predicting the mechanical properties of low-alloy steel with feature selection and without feature selection on input variables, where the input variables in this study are chemical elements.This research consists of several stages, as shown in Figure 1.Leni et al., 2023/ J. Energy Mater. Instrum. Technol. Vol. 4 No. 4, 2023 1. 2. Feature selection.Next, a correlation analysis was performed between the input variables, such as chemical elements and heat treatment, with the mechanical properties of low-alloy steel using a correlation heatmap.The purpose of this analysis is to determine the most relevant and significant correlation between chemical elements and the mechanical properties of low-alloy steel, namely YS and TS.The selection of input variables is based on the value of the correlation heatmap, where variables with sufficiently high correlation will be used as input variables, and variables with low correlation will be eliminated as input variables.The value of the correlation heatmap can be calculated using Equation 1.The standard correlation values used are a strong correlation between 0.7 and 1.0, a moderate correlation between 0.4 and 0.6, and a weak correlation for values between 0 and 0.3 (Dhal et al., 2022).
with r being the correlation coefficient, x and y being the two variables being calculated, and n being the number of observations.The correlation coefficient values range from -1 to 1.A value of -1 indicates a perfect negative correlation between the two variables, 0 indicates no correlation, and 1 indicates a perfect positive correlation between the two variables [EDA].treatment.After obtaining the best parameters, a comparison of the model's performance is carried out using input variables that have been feature-selected and input variables without feature selection.It is done to determine whether feature selection on input variables provides a significant contribution to improving the model's performance.

Model validation.
The model validation used is the cross-validation method, which will divide the data into several parts (folds) and then train and test the model by iterating the number of folds created.In each iteration, one fold is used as the testing data, while the remaining folds are used as the training data.
5. Model evaluation.The model is evaluated using evaluation metrics such as Loss, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).Furthermore, the evaluation results of the two models are compared to see if the use of feature selection on input variables improves the performance of the ANN model in predicting the mechanical properties of low-alloy steel.

Dataset
This low-alloy steel dataset has 15 input variables consisting of chemical elements and heat treatment temperature, with two output variables, Yield Strength (YS) and Tensile Strength (TS), with data characteristics shown in Table 2.

Feature selection
The feature selection in this study used heatmap correlation to determine the most relevant and significant chemical elements to the mechanical properties of low alloy steel, namely Yield Strength (YS) and Tensile Strength (TS).Based on the heatmap correlation results, it can be seen that the correlation values of chemical elements with YS differ from those with TS, as shown in Figure 2. Leni et al., 2023/ J. Energy Mater. Instrum. Technol. Vol. 4 No. 4 Based on Figure 2, it can be seen that the chemical element that has the strongest correlation to YS is V (vanadium), with a positive correlation value of 0.6, followed by Ni (Nickel) and Mn (Manganese) with correlation values of 0.47 and 0.4, respectively.Meanwhile, the chemical elements that have a very weak correlation to YS are P (Phosphorus) and Nb+Ta (Niobium + Tantalum) with a correlation value of -0.04, followed by S (Sulfur) and N (Nitrogen) with correlation values of 0.02.The chemical element that has the strongest correlation to TS is V (vanadium), with a correlation value of 0.3, followed by Mo (Molybdenum) and Ni (Nickel), with correlation values of 0.17 and 0.14, respectively.Temperature has a strong negative correlation to the mechanical properties of low alloy steel, namely -0.43 for YS and -0.33 for TS.
The results of the heatmap correlation show that V (vanadium) has the strongest positive correlation to YS and TS in low alloy steel.This result is consistent with previous studies (Wang et al., 2020), which also found that V is the most influential element on the mechanical properties of low alloy steel.In addition, studies by Goritskii et al. (2016) and García et al. (2021) also showed that higher V concentrations can increase the tensile strength and hardness of low alloy steel.Higher concentrations of Ni (Nickel) and Mn (Manganese) can also increase the tensile Leni D, Sumiati R, Adriansyah, Angelia N, Nofriyanti E, 2023, The Influence of Heatmap Correlation-based Feature Selection on Predictive Modeling of Low Alloy Steel Mechanical Properties Using Artificial Neural Network (ANN) Algorithm, Journal of Energy, Material, and Instrumentation Technology Vol. 4 No. 4, 2023 strength and hardness of low alloy steel as both can form solid alloys with steel matrix.Nickel can increase the tensile strength of low-alloy steel by increasing grain boundary strength and stabilizing the alloy's microstructure (Wang et al., 2020 andFar et al., 2019).Manganese, on the other hand, can increase the strength, hardness, and toughness of low-alloy steel by forming carbides and alloys with other elements (Jorge et al., 2021).However, in other studies [N], it was shown that Mo does not have a significant effect on the mechanical properties of low alloy steel.It can be caused by several factors, such as heat treatment, the dominant composition of other elements in the alloy, and different manufacturing techniques that can affect the mechanical properties of low alloy steel.Therefore, although Mo is known to influence the mechanical properties of low alloy steel, its effect may become less significant if other factors also play a substantial role.
Based on the results of the heatmap correlation, feature selection was performed for chemical element variables with small positive and negative correlation values ranging from 0 to 0.04.These variables were not used as input variables in the ANN modeling to predict the mechanical properties of low-alloy steel.Initially, there were 15 input variables consisting of chemical elements and heat treatment, namely C, Si, Mn, P, S, Ni, Cr, Mo, Cu, V, Al, N, Ceq, Nb + Ta, and Temperature.After feature selection using the heatmap correlation, the input variables for YS decreased to 11, as there were four chemical elements with a slight correlation to YS, namely P, S, N, and Nb+Ta.Meanwhile, TS had 13 input variables after feature selection, where P and N were the variables with tiny correlations.Figure 3 illustrates the feature selection of input variables for YS, while Figure 4 illustrates the feature selection for TS.

Model Training
Machine learning modeling using the Artificial Neural Networks (ANN) algorithm for predicting the mechanical properties of low alloy steel was created using the Python programming language, as well as libraries such as scikitlearn, keras, and tensorflow, which were run on Google Colaboratory.The best model parameters were searched using the grid search parameter, where the obtained parameters are as follows: The number of layers = 32, 16, 8, and 1. Optimizer type = Adam.Learning rate = 0.0001.Activation function = relu.Batch size = 16.Epoch = Early stopping.Leni et al., 2023/ J. Energy Mater. Instrum. Technol. Vol. 4 No. 4, 2023 The obtained parameters were used to train the model using variables without and with feature selection.The dataset in this model training was divided into 80% for training and 20% for model testing.Early stopping in this modeling aims to stop the model training if overfitting occurs or there is no improvement in the validation loss value after several iterations.
In this study, an evaluation was carried out on the performance of the Artificial Neural Network (ANN) model in predicting the output values of the given dataset.The evaluation was done by comparing the performance of the ANN model on two scenarios, namely on the dataset without feature selection and the dataset that has undergone feature selection.The comparison of the loss results in training the ANN model using feature selection and without feature selection for the output YS and TS can be seen in Figure 5.

Model Validation
Cross-validation is a commonly used method to measure the performance of a machine learning model in predicting new data that has not been seen before (Choudhury, A. 2022).In this study, the model validation used is k-fold cross-validation with k=10 to evaluate the prediction accuracy of the ANN model.The dataset will be divided into ten equally sized subsets, and the model will be trained and evaluated ten times.In each iteration, one subset will become the validation subset, and the model will be trained using the other nine subsets as training data.The validation results show that the resulting model can provide consistent performance across all k-fold cross-validation iterations, as seen in Figure 7.

Model Evaluation
The model evaluation aims to determine how well the resulting model performs in predicting new data that has yet to be seen by the model before.The model evaluation results are measured using the same metrics as the previous evaluation, such as MAE and RMSE.In this evaluation, the model will be tested against new data that has not been seen before, and the predicted results will be compared with the actual values.The smaller the values of MAE and RMSE, the better the model performance in predicting new data.The predicted model results can be seen in Figure 8 for YS prediction and Figure 9 for TS prediction.

Conclusions
Based on the results of the research on the effect of heatmap correlation-based feature selection on the modeling of mechanical properties of low-alloy steel using the artificial neural network (ANN) algorithm, it can be concluded that heatmap correlation can determine the chemical elements that are most correlated to the mechanical properties of low-alloy steel such as Yield strength (YS) and Tensile strength (TS).Based on the heatmap correlation results for the YS output, chemical elements such as P, S, N, and Nb+Ta were removed from the feature selection because they had fragile correlation values.In contrast, for the TS output, the chemical elements that were removed were P and N. The ANN model was evaluated using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics, and the results showed that the model with feature selection performed better than the model without feature selection.In other words, feature selection can help improve the accuracy of predictions in modeling the mechanical properties of low-alloy steel using the ANN algorithm.Using heatmap correlation as a feature selection method in modeling the prediction of mechanical properties of low-alloy steel can be an alternative to improve the model's performance.

Figure 1 .
Figure 1.Research Framework 1. Data preparation.The dataset used in this study was taken from Kaggle, a data competition and open-source platform containing thousands of datasets that can be used for various types of research.The dataset used results from testing low-alloy steel tensile strength, consisting of chemical composition, heat treatment temperature, and mechanical properties such as Yield Strength (YS) and Tensile Strength (TS).This dataset consists of 916 data points with 15 input and two output variables, as shown in Table 1.

3.
Model training.The training of the model using ANN aims to find the best parameters that can optimize the model's performance in predicting the mechanical properties of low-alloy steel based on chemical elements and heat Leni D, Sumiati R, Adriansyah, Angelia N, Nofriyanti E, 2023, The Influence of Heatmap Correlation-based Feature Selection on Predictive Modeling of Low Alloy Steel Mechanical Properties Using Artificial Neural Network (ANN) Algorithm, Journal of Energy, Material, and Instrumentation Technology Vol. 4 No. 4, 2023

Figure 2 .
Figure 2. (a) Heatmap correlation of chemical elements against YS, (b) Heatmap correlation of chemical elements against TS

Figure 3 .
Figure 3. Feature selection for yield strength.

Figure 5 .Figure 6 .
Figure 5.Comparison of YS and TS loss, (a) without YS feature selection, (b) with YS feature selection, (c) without TS feature selection, (d) with TS feature selection

Figure 7 .
Figure 7.Comparison of evaluation metrics in model validation.

Figure 8 .
Figure 8.Comparison of YS prediction results with feature selection and without feature selection

Figure 9 .Figure 10 .
Figure 9.Comparison of YS prediction results with feature selection and without feature selection.Based on the prediction results, a comparison of YS and TS predictions using datasets with and without feature selection can be observed.The ANN modeling using a dataset with feature selection treatment performs better than the dataset without feature selection, as seen in the more petite MAE and RMSE values for the feature selection dataset, as shown in Figure10.In the YS test, there was a decrease of 6.83% in MAE and 4.97% in RMSE after feature selection, while the TS test showed a change of 16.46% in MAE and 18.34% in RMSE after feature selection.

Table 1 .
The dataset consists of 916 data points with 15 input variables and two output variables.