SVM IN REAL LIFE
Fig1(a) and (b) : Intro-image
Introduction:
Support Vector Machines (SVM) stand as stalwarts in the realm of machine learning, often revered for their versatility and robustness. Originally developed by Vladimir Vapnik and his colleagues in the 1990s, SVM has found its way into a plethora of real-world applications, solving intricate problems with finesse. In this exploration, we delve into the practical domains where SVM shines, illuminating its efficacy in diverse fields.
What Is SVM:
Before we embark on our journey through real-life applications, let’s briefly revisit the essence of Support Vector Machines. At its core, SVM is a supervised learning model used for classification and regression analysis. It works by finding the optimal hyperplane that best divides a dataset into classes, aiming to maximize the margin between the classes. This intrinsic quality makes SVM particularly adept at handling complex, high-dimensional data.
Types of SVM:
Support Vector Machines (SVMs) come in various forms, each tailored to different types of problems and data structures. Here are some of the main types of SVMs:
1. Linear SVM
Description: The simplest form of SVM, which constructs a linear decision boundary.
Use: Ideal for linearly separable data, where classes can be separated by a straight line.
2. Non-Linear SVM
Description: Utilizes a kernel trick to transform the input space into a higher-dimensional space, where a linear separation is possible.
Use: Effective for data that cannot be separated by a straight line in the original space.
C Parameter In SVM: The Role of C in SVM in SVM helps control the trade-off between the training error and the margin since it can determine the penalty for misclassified data points during the training process. puts more emphasis on minimizing the training error, potentially leading to a narrower margin.
Applications in real life:
1. Text and Document Classification
In the age of information overload, SVM emerges as a beacon of order in the chaos of textual data. Its ability to efficiently classify documents into categories has made it indispensable in spam email detection, sentiment analysis, and topic categorization. Firms rely on SVM to sift through mountains of text, swiftly identifying patterns and sentiments crucial for decision-making.
2. Image Recognition and Computer Vision
From facial recognition to object detection, SVM plays a pivotal role in the evolution of computer vision. By analyzing features extracted from images, SVM models can discern between different objects, aiding in tasks such as medical image analysis, autonomous vehicles, and quality control in manufacturing processes. The ability to handle high-dimensional data makes SVM a cornerstone in the quest for visual intelligence.
3. Bioinformatics and Genomics
Peering into the building blocks of life, SVM unveils its prowess in bioinformatics and genomics. Researchers harness its capabilities to classify proteins, predict gene functions, and diagnose diseases from genetic data. SVM’s knack for handling large-scale biological data sets equips scientists with invaluable tools for understanding the complexities of life at a molecular level.
4. Financial Forecasting and Stock Market Analysis
In the realm of finance, SVM stands as a trusted ally for analysts and traders alike. Its ability to analyze historical market data and identify patterns lends itself to stock market prediction, portfolio optimization, and risk management. Financial institutions leverage SVM to make data-driven decisions, mitigating risks and seizing opportunities in dynamic markets.
5. Medical Diagnostics and Healthcare
The realm of healthcare benefits immensely from SVM’s predictive prowess. In medical diagnostics, SVM aids in disease identification, patient outcome prediction, and the development of personalized treatment plans. From detecting cancerous cells in radiology images to forecasting patient readmission rates, SVM empowers healthcare professionals with data-driven insights, enhancing patient care and treatment efficacy.
Support Vector Machine Terminology:
- Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of different classes in a feature space. In the case of linear classifications, it will be a linear equation i.e. wx+b = 0.
- Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a critical role in deciding the hyperplane and margin.
- Margin: Margin is the distance between the support vector and hyperplane. The main objective of the support vector machine algorithm is to maximize the margin. The wider margin indicates better classification performance.
- Kernel: Kernel is the mathematical function, which is used in SVM to map the original input data points into high-dimensional feature spaces, so, that the hyperplane can be easily found out even if the data points are not linearly separable in the original input space. Some of the common kernel functions are linear, polynomial, radial basis function(RBF), and sigmoid.
- Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a hyperplane that properly separates the data points of different categories without any misclassifications.
- Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft margin technique. Each data point has a slack variable introduced by the soft-margin SVM formulation, which softens the strict margin requirement and permits certain misclassifications or violations. It discovers a compromise between increasing the margin and reducing violations.
- C: Margin maximisation and misclassification fines are balanced by the regularisation parameter C in SVM. The penalty for going over the margin or misclassifying data items is decided by it. A stricter penalty is imposed with a greater value of C, which results in a smaller margin and perhaps fewer misclassifications.
- Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect classifications or margin violations. The objective function in SVM is frequently formed by combining it with the regularisation term.
- Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange multipliers related to the support vectors can be used to solve SVM. The dual formulation enables the use of kernel tricks and more effective computing.
CASE STUDY :- MONSOON AND SVM
Rainfall is key to hydrological cycle, any alteration in its pattern affects the availability of water
resources. The extreme events like droughts, floods occur due to extreme changes in the trends of the rainfall. The
production of crop is totally dependent on the amount of moisture in the soil, which in turn is dependent upon the
ground water level and the amount of rainfall.
The rainfall plays an important role for agriculture in the North
Maharashtra Region as this region lacks ample of rivers, lake. Due to the unpredictable nature of the rainfall i.e.,
occurrence of rainfall in non-monsoon season and non-occurrence of rainfall in monsoon season, farmers who are
dependent on the rainfall for their agriculture and have to bear enormous losses to their crops. Hence the research
on the occurrences of the rainfall is most significant.
More recently, machine learning (ML) algorithms like Support Vector Machines were examined for
forecasting of rainfall in various regions. AI models give good prediction results and it is found that SVM is the most robust and
efficient method for the prediction of rainfall.
Study Area: Maharashtra is one of the largest state in India. Maharashtra came into existence on 1st May 1960.
Population wise and area wise Maharashtra ranks second in the country.
The state compromises of 36 districts. The zone of North Maharashtra lies in Central India on the north-western corner of the Deccan Plateau, in the valley of the Tapi River. It is bounded to the north by the Satpura Range, to the east by the Berar (Varhad) region, to the south by the Hills of Ajanta, and to the west by the northernmost ranges of the Western Ghats. The region is located at 20°15′30″N to 22°03′00″N latitudes and 73°47′00″E to 76°16′00″E longitudes. North Maharashtra region is geographically very large consisting of three districts viz: Dhule, Nandurbar and Jalgaon.
Data Collection:
The hourly metrological data was collected from the three districts Jalgaon, Dhule and Nandurbar
for the period of 2009 to 2018. The list of the predictors is described in table 1.
Normalization:
The hourly data collected was normalized using Z-Score normalization approach as below:
Normalized Data 𝑥
′ =
𝑥− µ
𝜎
Where 𝑥 is unscaled value and µ is the arithmetic mean and 𝜎 is the standard deviation.
Arithmetic mean µ =
1
𝑁
∑ 𝑥𝑖
𝑁
𝑖=1
Standard deviation 𝜎 = √
1
𝑁
∑ (𝑥𝑖 − µ)
𝑁 2
𝑖=1
Training and Testing:
Hourly metrological data for Jalgaon, Dhule and Nandurbar for the period 2009-2018 is considered for
analysis purpose. The variable window size approach is used for training the dataset. We have used 2-year to 9-
year training window in the training datasets, i.e. past 1-year value is used to predict next year value, or past 8-
year values are used to predict the next year value.
Support Vector Machine (SVM)
In this case study linear kernel function is used.
The evaluation of the performance of the SVM algorithm is verified by the following indices:
Accuracy = 𝑇𝑃 + 𝑇𝑁
𝑁
Precision = 𝑇𝑃
𝑇𝑃 + 𝐹𝑃
Recall = 𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Where the number of events True Positive (TP), False Negative (FN), False Positive (FP), True Negative
(TN) are defined by the confusion matrix in Table 2, and N is defined as TP + FN + FP + TN. TN are the number
of events where prediction is precipitation when the observation is precipitation. FN are the number of events
where prediction is non-precipitation when the observation is precipitation. FP are the number of events where
the prediction is precipitation when observation is non-precipitation. TN are the number of events where
prediction is non-precipitation when observation is non-precipitation.
RESULTS:
The experiment results for the three districts viz: Jalgaon, Dhule, Nandurbar are shown in Table 3.
Overall prediction result for all the three districts is shown in Fig. 2. It can be observed that the SVM model is
working well in forecasting rainfall for three different districts of the North Maharashtra Region, these three
districts have different geographical conditions ranging from plains and plateaus to hills. It is further observed
that as the number of records for training is increased, the accuracy of the model increases. From the results
reported in Fig. 2, it is observed that there is a drop in precision level and accuracy of the model. This drop is due
to low precipitation in that year resulting in large number of true negative records for that particular year.
The experiments states that SVM is capable to predict accurate results with 82% accuracy. It further
proves the ability of application of machine learning algorithm to the predication of precipitation.
Conclusion:
Support Vector Machines, with their ability to handle complex data, find optimal solutions, and generalize well, have etched their mark across an array of industries. As we continue to innovate and push the boundaries of machine learning, SVM remains a steadfast companion, unraveling insights from data mazes and steering us towards informed decisions. In the tapestry of modern technology, SVM emerges not just as a tool, but as a beacon illuminating the path to data-driven success.
So, the next time you encounter a complex classification or regression problem in your domain, remember the silent workhorse that is the Support Vector Machine, ready to unravel complexities and unveil patterns hidden within your data.
References
1. (a)
https://www.google.com/url?sa=i&url=https%3A%2F%2Fbdtechtalks.com%2F2021%2F12%2F27%2Fwhat-is-support-vector-machine-svm%2F&psig=AOvVaw2IIAm-OJ4ToOWNpnH9X2AI&ust=1709006429628000&source=images&cd=vfe&opi=89978449&ved=0CBUQjhxqFwoTCLDc_puPyIQDFQAAAAAdAAAAABAR
1.(b)
https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.analyticsvidhya.com%2Fblog%2F2017%2F09%2Funderstaing-support-vector-machine-example-code%2F&psig=AOvVaw2QKN_4kgI17YQzduV62FdS&ust=1709006955105000&source=images&cd=vfe&opi=89978449&ved=0CBYQjhxqFwoTCICpvpaRyIQDFQAAAAAdAAAAABAD
Imp. link for Case Strudy: https://turcomat.org/index.php/turkbilmat/article/download/2957/2529/5569
Comments
Post a Comment