This project demonstrates a machine learning approach to classify gender based on voice characteristics using Support Vector Machine (SVM) algorithms. It uses a dataset containing voice features extracted from male and female voice samples, and evaluates different SVM kernels and hyperparameters to find the optimal model for accurate classification.
- Introduction
- Dataset
- Libraries Used
- Data Preprocessing
- Model Training and Evaluation
- Hyperparameter Tuning
- Results
- Conclusion
This project aims to classify male and female voices using their acoustic properties. We leverage the Support Vector Machine (SVM) method with different kernels and hyperparameter settings to achieve optimal classification performance. The primary focus is on understanding how different kernels (linear, RBF, and polynomial) and parameters like C
and gamma
affect model accuracy.
The dataset used in this project is the Voice Gender Dataset, which contains 3,168 samples with 21 acoustic features, such as mean frequency, median frequency, etc. Each sample is labeled as either 'male' or 'female'.
- Total number of samples: 3,168
- Features: 21
- Classes: 2 ('male' and 'female')
pandas
: Data manipulation and analysis.numpy
: Numerical computing.scikit-learn
: Machine learning algorithms and tools.matplotlib
: Plotting and visualization.seaborn
: Statistical data visualization.
- Data Loading: The dataset is read into a Pandas DataFrame.
- Null Check: Verified that there are no missing values in the dataset.
- Feature and Label Separation: Extracted features and labels for further processing.
- Label Encoding: Converted categorical labels ('male', 'female') into numerical values (1, 0) using
LabelEncoder
. - Data Standardization: Standardized features to have a mean of 0 and a standard deviation of 1 using
StandardScaler
. - Data Splitting: Split the dataset into training (80%) and testing (20%) sets.
- Baseline Model: SVM with default parameters was evaluated.
- Kernels Used: Linear, RBF (Radial Basis Function), Polynomial.
- Performance Metrics: Accuracy was used as the primary metric to evaluate model performance.
- Train-Test Split: Data was divided into training and testing sets.
- SVM with Different Kernels: SVM models with linear, RBF, and polynomial kernels were trained and evaluated.
- Cross-Validation: 10-fold cross-validation was used to validate model performance across different data splits.
Grid Search was used to find the best combination of parameters:
- Linear Kernel: Tuned the
C
parameter. - RBF Kernel: Tuned the
gamma
andC
parameters. - Polynomial Kernel: Tuned the degree,
gamma
, andC
parameters.
This approach helped identify the optimal hyperparameters for each kernel type to maximize classification accuracy.
- The best model achieved an accuracy of approximately 97% with a linear kernel and
C=0.1
. - The RBF kernel performed best with
gamma=0.01
, also achieving similar accuracy. - The polynomial kernel showed reduced performance at higher degrees, indicating overfitting.
The SVM model demonstrated high accuracy in classifying gender based on voice features. Both linear and RBF kernels provided excellent performance with appropriate hyperparameter tuning. This project highlights the importance of choosing the right kernel and hyperparameters in SVM to achieve high classification accuracy.
This README provides an overview of the SVM-based gender classification project, including data preprocessing, model evaluation, and hyperparameter tuning. For a detailed explanation and visualization, please refer to the project notebook.