Data Mining

1.Write the differences between Data mining and Query tools.

Query tools can be used to easily build and input queries to databases. Query tools make it very easy to build queries without even having to learn a database-specific query language.

On the other hand, Data Mining is a technique or a concept in computer science, which deals with extracting useful and previously unknown information from raw data. Most of the times, these raw data are stored in very large databases.

Therefore Data miners can use the existing functionalities of Query Tools to pre process raw data before the Data mining process. However, the main difference between Data mining techniques and using Query tools is that, in order to use Query tools the users need to know exactly what they are looking for, while data mining is used mostly when the user has a vague idea about what they are looking for.

 

2.Write the benefits of data mining.

➨The data mining helps financial institutions and banks to identify probable defaulters and hence will help them whether to issue credit card, loan etc. or not. This is done based on past transactions, user behavior and data patterns.

➨It helps advertisers push right advertisements to the internet surfer on web pages based on machine learning algorithms. This way data mining benefit both possible buyers as well as sellers of the various products.  

➨The retail malls and grocery stores arrange and keep most sellable items in the most attentive positions. It has become possible due to inputs obtained from data mining software's. This way data mining helps in increasing revenue.

➨It helps in obtaining desired search results of queries posed to e-commerce websites (e.g. amazon, Alibaba, snapdeal, Walmart, flipkart, BestBuy, ebay etc.) , search engines (google, yahoo, bing, Ask.com DuckDuckGo etc.)

➨The data mining based methods are cost effective and efficient compare to other statistical data applications.

➨It has been used in many different areas or domains viz. bio-informatics, medicine, genetics, education, agricultural, law enforcement, emarketing, electrical power engineering etc. For example, in genetics it helps in predicting risk of diseases based on DNA sequence of individuals.

➨It helps in identifying criminal suspects by law enforcement agencies as mentioned above.

 


3. Explain supervised learning and unsupervised learning.

 

Supervised learning :

In Supervised learning, you train the machine using data which is well "labeled." It means some data is already tagged with the correct answer. It can be compared to learning which takes place in the presence of a supervisor or a teacher.

A supervised learning algorithm learns from labelled training data, helps you to predict outcomes for unforeseen data.

Successfully building, scaling, and deploying accurate supervised machine learning Data science model takes time and technical expertise from a team of highly skilled data scientists. Moreover, Data scientist must rebuild models to make sure the insights given remains true until its data changes.

 

Unsupervised learning :

Unsupervised learning is a machine learning technique, where you do not need to supervise the model. Instead, you need to allow the model to work on its own to discover information. It mainly deals with the unlabelled data.

Unsupervised learning algorithms allow you to perform more complex processing tasks compared to supervised learning. Although, unsupervised learning can be more unpredictable compared with other natural learning deep learning and reinforcement learning methods.

 

4.Write the difference between Classification and Prediction.

 

Classification

Predication

Definition

Classification is the process of identifying to which category, a new observation belongs to on the basis of a training data set containing observations whose category membership is known.

Predication is the process of identifying the missing or unavailable numerical data for a new observation.

Accuracy

In classification, the accuracy depends on finding the class label correctly

In predication, the accuracy depends on how well a given predicator can guess the value of a predicated attribute for a new data.

Model

A model or the classifier is constructed to find the categorical labels.

A model or a predictor will be constructed that predicts a continuous-valued function or ordered value.

Synonyms for the Model

In classification, the model can be known as the classifier.

In predication, the model can be known as the predictor.

 

5.Write a note on Association rule mining.              

Association Rule Mining, as the name suggests, association rules are simple If/Then statements that help discover relationships between seemingly independent relational databases or other data repositories. Most machine learning algorithms work with numeric datasets and hence tend to be mathematical.

However, association rule mining is suitable for non-numeric, categorical data and requires just a little bit more than simple counting. No Coding Experience Required. Association rule mining is a procedure which aims to observe frequently occurring patterns, correlations, or associations from datasets found in various kinds of databases such as relational databases, transactional databases, and other forms of repositories.

An association rule has 2 parts:

  1. an antecedent (if) and
  2. a consequent (then)

An antecedent is something that’s found in data, and a consequent is an item that is found in combination with the antecedent. 

Have a look at this rule for instance: “If a customer buys bread, he’s 70% likely of buying milk.” In the above association rule, bread is the antecedent and milk is the consequent. Simply put, it can be understood as a retail store’s association rule to target their customers better. If the above rule is a result of a thorough analysis of some data sets, it can be used to not only improve customer service but also improve the company’s revenue.

Association rules are created by thoroughly analyzing data and looking for frequent if/then patterns. Then, depending on the following two parameters, the important relationships are observed: Support: Support indicates how frequently the if/then relationship appears in the database. Confidence: Confidence tells about the number of times these relationships have been found to be true.

So, in a given transaction with multiple items, Association Rule Mining primarily tries to find the rules that govern how or why such products/items are often bought together. For example, peanut butter and jelly are frequently purchased together because a lot of people like to make PB&J sandwiches. 

Association Rule Mining is sometimes referred to as “Market Basket Analysis”, as it was the first application area of association mining. The aim is to discover associations of items occurring together more often than you’d expect from randomly sampling all the possibilities. The classic story of Beer and Diaper will help in understanding this better

Post a Comment

Previous Post Next Post