1.Write the differences between Data mining and Query tools.
Query tools can be used to easily build and input queries to
databases. Query tools make it very easy to build queries without even having
to learn a database-specific query language.
On the other hand, Data Mining is a technique or a concept
in computer science, which deals with extracting useful and previously unknown
information from raw data. Most of the times, these raw data are stored in very
large databases.
Therefore Data miners can use the existing functionalities
of Query Tools to pre process raw data before the Data mining process. However,
the main difference between Data mining techniques and using Query tools is
that, in order to use Query tools the users need to know exactly what they are
looking for, while data mining is used mostly when the user has a vague idea
about what they are looking for.
2.Write the benefits of data mining.
➨The data mining helps financial
institutions and banks to identify probable defaulters and hence will help them
whether to issue credit card, loan etc. or not. This is done based on past
transactions, user behavior and data patterns.
➨It helps advertisers push right
advertisements to the internet surfer on web pages based on machine learning
algorithms. This way data mining benefit both possible buyers as well as
sellers of the various products.
➨The retail malls and grocery
stores arrange and keep most sellable items in the most attentive positions. It
has become possible due to inputs obtained from data mining software's. This way
data mining helps in increasing revenue.
➨It helps in obtaining desired
search results of queries posed to e-commerce websites (e.g. amazon, Alibaba,
snapdeal, Walmart, flipkart, BestBuy, ebay etc.) , search engines (google,
yahoo, bing, Ask.com DuckDuckGo etc.)
➨The data mining based methods
are cost effective and efficient compare to other statistical data
applications.
➨It has been used in many
different areas or domains viz. bio-informatics, medicine, genetics, education,
agricultural, law enforcement, emarketing, electrical power engineering etc.
For example, in genetics it helps in predicting risk of diseases based on DNA
sequence of individuals.
➨It helps in identifying
criminal suspects by law enforcement agencies as mentioned above.
3. Explain supervised learning and unsupervised learning.
Supervised learning :
In Supervised learning, you train the machine using data
which is well "labeled." It means some data is already tagged with
the correct answer. It can be compared to learning which takes place in the
presence of a supervisor or a teacher.
A supervised learning algorithm learns from labelled
training data, helps you to predict outcomes for unforeseen data.
Successfully building, scaling, and deploying accurate
supervised machine learning Data science model takes time and technical
expertise from a team of highly skilled data scientists. Moreover, Data
scientist must rebuild models to make sure the insights given remains true
until its data changes.
Unsupervised learning :
Unsupervised learning is a machine learning technique, where
you do not need to supervise the model. Instead, you need to allow the model to
work on its own to discover information. It mainly deals with the unlabelled
data.
Unsupervised learning algorithms allow you to perform more
complex processing tasks compared to supervised learning. Although,
unsupervised learning can be more unpredictable compared with other natural
learning deep learning and reinforcement learning methods.
4.Write the difference between Classification and
Prediction.
|
Classification |
Predication |
Definition |
Classification is the process of identifying to which category, a new
observation belongs to on the basis of a training data set containing
observations whose category membership is known. |
Predication is the process of identifying the missing or unavailable
numerical data for a new observation. |
Accuracy |
In classification, the accuracy depends on finding the class label
correctly |
In predication, the accuracy depends on how well a given predicator
can guess the value of a predicated attribute for a new data. |
Model |
A model or the classifier is constructed to find the categorical
labels. |
A model or a predictor will be constructed that predicts a
continuous-valued function or ordered value. |
Synonyms for the Model |
In classification, the model can be known as the classifier. |
In predication, the model can be known as the predictor. |
5.Write a note on
Association rule mining.
Association Rule Mining,
as the name suggests, association rules are simple If/Then statements that help
discover relationships between seemingly independent relational databases or
other data repositories. Most machine learning algorithms work with numeric
datasets and hence tend to be mathematical.
However, association rule
mining is suitable for non-numeric, categorical data and requires just a little
bit more than simple counting. No Coding Experience Required. Association rule
mining is a procedure which aims to observe frequently occurring patterns,
correlations, or associations from datasets found in various kinds of databases
such as relational databases, transactional databases, and other forms of
repositories.
An association rule has 2 parts:
- an antecedent (if) and
- a consequent (then)
An antecedent is something that’s found in data, and a consequent is an item that is found in combination with the antecedent.
Have a look at this rule for instance: “If a customer buys
bread, he’s 70% likely of buying milk.” In the above association rule, bread is
the antecedent and milk is the consequent. Simply put, it can be understood as
a retail store’s association rule to target their customers better. If the
above rule is a result of a thorough analysis of some data sets, it can be used
to not only improve customer service but also improve the company’s revenue.
Association rules are
created by thoroughly analyzing data and looking for frequent if/then patterns.
Then, depending on the following two parameters, the important relationships
are observed: Support: Support indicates how frequently the if/then
relationship appears in the database. Confidence: Confidence tells about the
number of times these relationships have been found to be true.
So, in a given transaction with multiple items, Association Rule Mining primarily tries to find the rules that govern how or why such products/items are often bought together. For example, peanut butter and jelly are frequently purchased together because a lot of people like to make PB&J sandwiches.
Association Rule Mining is
sometimes referred to as “Market Basket Analysis”, as it was the first
application area of association mining. The aim is to discover associations of
items occurring together more often than you’d expect from randomly sampling
all the possibilities. The classic story of Beer and Diaper will help in
understanding this better