Machine Learning

Carepal

In a whirlwind of technology and innovation, I recently had the privilege to participate in a Hackathon at Georgian College, a gathering that proved to be both a challenge and an exhilarating opportunity to push the boundaries of artificial intelligence (AI) in solving real-world problems. Presented with four themes—information security, healthcare, smart cities, and sustainability—my team and I embarked on a journey to find a meaningful application of AI that could make a tangible difference in people’s lives. After several hours of brainstorming and discussion, we were drawn to healthcare, a field where AI’s potential to improve lives is both vast and deeply personal. The spark for our project came from a simple yet profound observation: a team member shared how assisting a senior neighbor with technology brought her immense joy and highlighted a critical need—many seniors live alone, often without the assistance they need. With over 42% of Canadian seniors living alone, we saw a clear opportunity to make a difference. Thus, CarePal was born. CarePal is not just another piece of technology; it’s a proactive AI companion designed to perform wellness checks, ensure medication adherence, provide company, detect behavioral trends, and alert caregivers to emergencies or anomalies. What sets CarePal apart is its unparalleled accessibility, offering connectivity across various devices to accommodate seniors with audio, visual, or speech impairments. Leveraging the power of COHERE’s API—a Canadian enterprise specializing in generative AI solutions—we equipped CarePal with a large language model enhanced by retrieval augmented generation. This foundation allows CarePal to offer not just interaction, but truly insightful and helpful engagement, tailored to the unique needs of seniors. Developing CarePal was a marathon of innovation, requiring around 20 hours of dedicated work. Our team was a blend of talents, divided into three key roles: Hackers: The tech wizards who brought the first prototype of CarePal to life. Business Development: That’s where I contributed, diving into business research, branding, and development to ensure CarePal’s market readiness and impact. The Hustler: The charismatic force who pitched our product, presenting CarePal’s potential to transform senior care. Our journey culminated in the Hackathon’s finals, where CarePal was awarded second place—a moment of immense pride and validation for our hard work. But beyond the accolades, the experience was a profound reminder of the power of technology to make a difference in the lives of those who need it most. As we move forward, our experience at the Georgian College Hackathon remains a beacon of what’s possible when innovation meets empathy. CarePal is just the beginning. The journey of using technology to enhance human lives is endless, and I am eager to continue on this path, wherever it may lead. For a closer look at our pitch and the story of CarePal, check our pitch video here.  

Carepal Read More »

Alcohol Sales Regression Using AutoML

Introduction The project aims to tackle the challenge of predicting alcohol sales in Iowa, particularly focusing on the crucial December period when sales peak due to the Christmas season. Historically, liquor stores in Iowa have relied on a simple moving sales average of the past five years to forecast December sales. This method, however, has proven insufficient, as it fails to consider various influential factors such as day, region, product, and vendor, leading to inaccurate predictions. This has resulted in either stock shortages or excesses, causing financial losses either from missed sales opportunities or from the costs associated with unsold stock. Methodology The project employs AutoML techniques to develop a more accurate prediction model for December alcohol sales in Iowa. The methodology involves several key steps: Data Collection and Preprocessing: The team collected sales data, including historical sales figures, product types, vendor information, and regional sales data. This comprehensive dataset underwent preprocessing to clean and structure the data for analysis. Feature Selection: To address the limitations of previous forecasting methods, the project expanded the feature set to include not just historical sales data but also day of the week, region, product type, and vendor information. AutoML Implementation: The team utilized AutoML tools to automatically select the best machine learning model for the prediction task. AutoML evaluated various models based on the expanded feature set, optimizing for prediction accuracy. Model Training and Evaluation: The selected model was trained on a portion of the data, with the remaining data used for testing and validation. Evaluation metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared were employed to assess model performance. Results The AutoML-based approach significantly outperformed the traditional moving sales average method. Key findings include: Improved Accuracy: The AutoML model demonstrated a substantial improvement in prediction accuracy, with lower MAE and RMSE values compared to the traditional method. Comprehensive Analysis: The inclusion of additional factors like product type and vendor information in the model allowed for a more nuanced understanding of sales dynamics. Model Performance: The R-squared value indicated a good fit between the model’s predictions and the actual sales data, suggesting the model’s effectiveness in capturing the variability in December alcohol sales. Conclusion and Recommendations The application of AutoML techniques in predicting December alcohol sales in Iowa represents a significant advancement over traditional methods. The project’s success highlights the importance of incorporating a broader set of factors into sales forecasting models. Recommendations for liquor store owners and suppliers include: Adoption of AutoML-based forecasting models for more accurate inventory planning. Consideration of regional sales trends, product preferences, and vendor performance in stocking decisions. Continuous data collection and model retraining to adapt to changing market conditions. Check out my Github profile for the code ! 

Alcohol Sales Regression Using AutoML Read More »

My Journey with Spark and Kafka

In the ever-evolving landscape of data processing, the quest for efficiency and precision seems endless. My latest project, an Employee Salary Processor built with Apache Spark and Kafka, stands as a testament to this ongoing journey. This endeavor was not just about harnessing data; it was about creating a seamless bridge between raw information and actionable insights. At the heart of this project lies Spark’s Streaming capabilities, coupled with Kafka’s robust messaging system. The goal was simple yet ambitious: to categorize employee salaries into high and low brackets in real-time, enabling dynamic decision-making for businesses. But as we all know, the simplest goals often require the most sophisticated solutions. The Blueprint Imagine a relentless stream of data, each piece a tiny puzzle of the bigger picture. My first step was to define a schema—a blueprint if you will—of the employee data, including fields like ID, Name, Department, and Salary. This schema served as the foundation, ensuring that each piece of data was recognized and correctly placed within our larger puzzle. The Stream : With Kafka set up as the source, data began its journey, flowing into our Spark application. This is where the magic happens. As data streamed in, Spark’s powerful processing capabilities kicked in, categorizing salaries with precision. High salaries were distinguished from low, each finding its path within our defined categories. The Insight: But what good is data if it cannot be interpreted? The high and low salary data streams were not just categorized; they were transformed into a format ready for analysis, then stored for accessibility. This dual path not only provided immediate insights but also laid the groundwork for future analysis, painting a picture of trends over time. The Impact : To the technical minds, this project is a symphony of Spark Streaming and Kafka, a showcase of real-time data processing and analysis. To the non-technical, it represents clarity—a clear, accessible view into the dynamics of employee salaries. This journey has been more than just technical execution; it has been a step towards demystifying data, making it accessible and understandable for all. Whether you’re a data scientist, a business leader, or simply a curious mind, the implications of this project extend far beyond its codebase. It’s about making informed decisions, understanding trends, and ultimately, about harnessing the true power of data. Check out the complete code on my Github: https://github.com/TirtheshJani/Data_Collection_and_Curation

My Journey with Spark and Kafka Read More »

A Deep Dive into Stellar Classification

“The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself..“ Carl Sagan Motivation Imagine standing under the night sky, gazing up at the stars twinkling above. Each star is a story, a beacon from afar that holds secrets about our universe’s past, present, and future. But how do we begin to understand these celestial narratives? This is where my first adventure into the realm of data science meets the infinite expanse of space—through a project dedicated to classifying stars, not with telescopes, but with the power of data, we embark on a journey to categorize the stars, aiming to deepen our understanding of their properties and behaviors. This blog post delves into the project’s motivations, methodologies, findings, and the broader implications of marrying data science with astronomy. The motivation behind this project was to leverage the power of machine learning to contribute to our understanding of the universe, making sense of the data that the cosmos offers. Utilizing a comprehensive dataset that encapsulates various stellar parameters such as temperature, luminosity, radius, and more, the goal was to predict the classification of stars into one of several types, each reflecting unique stages in stellar evolution or distinct properties. This endeavor was not merely an academic exercise but a practical exploration into how data science techniques can be applied to real-world astronomical data. The dataset, sourced from reputable astronomical studies, included observations of stars across different spectral classes, sizes, and luminosities, providing a rich tapestry of information for analysis. By applying classification algorithms, the project sought to identify patterns and relationships within the data, enabling the categorization of stars in a way that aligns with our current astronomical understanding. In the following sections, we’ll delve into the methodology employed to achieve this classification, discuss the results and their implications, and consider how this project not only advances our knowledge of the stars but also demonstrates the potential of data science in enhancing our comprehension of the universe.   The Data  The dataset includes the following variables for each star: Temperature (in Kelvin)Luminosity (L, in L/Lo)Absolute Magnitude (AM, in Mv)Color (General Color of Spectrum)Spectral Class (O, B, A, F, G, K, M)Type (categorized from 0 to 5):0: Red Dwarf1: Brown Dwarf2: White Dwarf3: Main Sequence4: Super Giants5: Hyper Giants Methodology The methodology adopted in this project was a systematic approach that combined data preprocessing, model selection, and rigorous validation to classify star types accurately. Here’s how the process unfolded:   Data Preprocessing The first step involved cleaning and preparing the astronomical data for analysis. This phase was crucial, as the quality of data directly impacts the model’s performance. We addressed missing values, normalized the data to ensure consistency across different scales, and encoded categorical variables when necessary. This preprocessing not only streamlined the dataset but also enhanced the model’s ability to learn from the data effectively. Model Selection Choosing the right machine learning model was pivotal to the project’s success. Given the nature of the classification task, we experimented with several algorithms renowned for their classification capabilities, including Decision Trees, Random Forest, Support Vector Machines (SVM), and Neural Networks. Each model was evaluated for its suitability based on the dataset’s characteristics and the complexity of the classification task at hand. Training and Validation With the models selected, the next step was training them using a portion of the dataset. This process involved feeding the models with data for which the classifications were already known, allowing them to learn and make predictions. To ensure the models’ accuracy and avoid overfitting, we employed cross-validation techniques. This involved dividing the dataset into a training set and a validation set, where the latter was used to test the models’ predictive power and adjust parameters accordingly. Model Evaluation The final step in the methodology was evaluating each model’s performance using metrics such as accuracy, precision, recall, and F1 score. These metrics provided insights into how well each model performed in classifying the stars, guiding the selection of the most effective model for the task. Results and Analysis The culmination of meticulous data preprocessing, careful model selection, and rigorous validation was a comprehensive understanding of the models’ abilities to classify star types accurately. The project yielded several key findings: Model Performance: Among the various models tested, the Random Forest algorithm emerged as the standout, demonstrating superior performance in terms of accuracy, precision, and recall. Its ability to handle the complexity and nuances of the dataset was evident, making it the preferred choice for this classification task. Feature Importance: Analysis revealed that certain features played a more significant role in determining star types. Temperature, luminosity, and radius were among the most influential, aligning with astronomical principles that these characteristics are pivotal in defining a star’s classification. Classification Insights: The project not only achieved high accuracy in classification but also provided insights into the relationships between different stellar characteristics and their types. For instance, it highlighted how certain combinations of features are indicative of specific star types, offering a data-driven approach to understanding stellar classification. As we stand at the confluence of data science and astronomy, projects like these not only contribute to our scientific knowledge but also inspire a sense of wonder and possibility. 

A Deep Dive into Stellar Classification Read More »