Skip to main content

Posts

Showing posts with the label data science

How random are Random Forest Classifications

What is Random Forest Classification? Random Forest is an ensemble learning technique combining multiple decision trees to improve the model's accuracy. It's called "Random" because each tree in the forest is trained on a random subset of the training data and a random subset of the features. This helps to reduce overfitting and improve the generalization of the model. In Random Forest, the output of each tree is combined to make the final prediction. The majority vote of all the trees is taken to make the final prediction. This approach is called bagging or bootstrap aggregating. Real-world Example Let's consider a real-world example to understand how Random Forest Classification works. We will use the famous Iris dataset, which consists of 150 samples of iris flowers. Each sample has four features: sepal length, sepal width, petal length, and petal width. The task is to classify the flowers into one of the three species: setosa, versicolor, or virginica. Impleme...
Prophet of The Future .             Prophet is open source software released by Facebook’s Core Data Science team. It is available for download on CRAN and PyPI. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.       Accurate and fast.       Fully automatic.       Tunable forecasts.       Available in R or Python. Let’s explore this with an example.   Here we are using Air Passenger dataset and our jupyter workbook. (you can get the link to this dataset at the end) import warnings warnings . filterwarnings( "ignore" ) import numpy as np from d...
K Means Algorithm With Real Life Problem It has a complicated name but it is sample and is a popular unsupervised machine learning technique. It means to create k number of centroid and then allocate every data point to the nearest cluster, while keeping the number centroid.                 Let’s explore this technique with an example, Here we have an online tea store data where we have details of customer, their date of account created and purchase styles.   In this we are interested to know what makes the customer comeback to the store. Retention is the one of the biggest mystery in any industry. Let us quickly open our jupyter notebook. # Import Necessary Libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline import warnings warnings . filterwarnings('ignore') Read the file in Pandas, # Importing Data #Import Dataset cr = pd . rea...