Pandas vs Apache Spark vs Power BI Desktop: big data performance on a single machine

The selection of the right tool for the right job can take some experience and knowledge. Awareness of the right tool can help save a lot of time, energy and cost. Therefore, this post aims to provide some insight in this area which should allow for the selection of a tool that is appropriate forContinue reading “Pandas vs Apache Spark vs Power BI Desktop: big data performance on a single machine”

Data Extraction – Text and Numeric

Once you have located the data, you now have to begin the task of data extraction. Data extraction does not only refer to extracting data from online sources, but it includes offline and non-digital sources also. This post will make an effort to cover some of the methodologies I find useful in this process. ItContinue reading “Data Extraction – Text and Numeric”

Data Acquisition

The first process of any machine learning pipeline starts with the process of Extraction, Transformation and Loading (ETL) of data into the system, which is by far my most favourite part of data science. ETL Basics As the name suggests, ETL comprises of the following parts: Data Extraction: This part deals with the acquisition ofContinue reading “Data Acquisition”

Where do I start? Where do I begin?

First come the questions: what is RNN? What is CNN? By the way, isn’t that an American News channel? What on earth is a Support Vector Machine? What is business intelligence? What is deep learning? Why is it ‘deep’? For that matter, is there anything called ‘shallow learning’? Then comes the resolution: ‘Ok. Let meContinue reading “Where do I start? Where do I begin?”

Design a site like this with WordPress.com
Get started