7 Questions that may be asked in Data Science Interview

  1. Why one should learn Data Science and Machine Learning?

Data Science is the hottest trend in the industry right now. Data Scientist is the buzz of the 21st century for good reason! The tech revolution is just starting and Data Science is at the forefront. As a result, “Data Scientist has become the top job in the US for the last 4 years running!” according to Harvard Business Review & Glassdoor. problems and applications. We can see the application of Machine learning everywhere nowadays in our daily life routine. Ex:- you can predict the future behavior of your friend based on your past experiences, youtube recommendation, Amazon recommendation, google translator, google map, Siri, Alexa, weather forecasting and many more these all are the application of Artificial Intelligence and Data Science.

2. What is the difference between Data Science, Deep Learning and Artificial Intelligence?

With the Venn diagram given below it is clear that all these terms are interrelated to each other. Machine learning and Deep Learning are the subsets of Artificial Intelligence and Data Science handle all these terms with the help of Statistics and Probability, Linear Algebra and Differential Calculus.

In simple words Data Science means making sense out of data, extracting more and more insights from our data.

Deep Learning is a technique that can mimic the human brain. It is a subset of Machine Learning which makes mathematical computation feasible with the help of a Multi-layer neural network.

Artificial Intelligence is a technique which enables machine to mimic human behavior. It is a broad area that enables computers to think.

3. What are the applications of Data Science and Machine Learning in Business?

As I said earlier Data Science and Machine Learning are everywhere nowadays so there is a lot of application of Data Science and Machine Learning.

· Many companies use this technique to determine their best (most profitable) customers.

· Most of the marts use the FMCG technique, which includes EDA and other statistical analyses.

· Used in-season trends and forecasting.

· Recommendation System is useful for many e-commerce sites with a large variety of items.

· Modern businesses need to be smart in detecting fraud that could potentially hurt their business.

· Natural Language Processing (NLP) nowadays has numerous applications in modern businesses like summarizing survey data, summarizing reviews, chatbots, social media sentiment analysis and many more.

4. Which programming language is better R or Python and why?

In my opinion, Python is the best programming language for Machine learning and Deep learning and I think Python is a perfect statistical programming language. You can read more about it here.

5. Which is your favorite Data Science model and why?

I don’t think that any Data Science model is my favorite one until and unless it provides me an accurate result. In my opinion, it totally depends on data that which type of algorithm we are going to choose. But one technique that I use a lot and helped me in many competitions like Kaggle competition and Data Hack summit is Ensemble methods and Gradient Boosting algorithm.

Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. The ensemble method usually produces a more accurate solution than a single model would. Bagging and Boosting are two commonly used Ensemble techniques.

The most famous example of bagging is the Random Forest algorithm, which is simply bagging on the decision trees. When you open your phone’s camera app and see it drawing boxes around people’s faces — it’s probably the results of Random Forest work.

In some tasks, the ability of the Random Forest to run in parallel is more important than a small loss in accuracy to the boosting, for example. Especially in real-time processing. There is always a tradeoff.

Boosting Algorithms are trained one by one sequentially. Each subsequent one paying most of its attention to data points that were mispredicted by the previous one. Repeat until you are happy.

Same as in bagging, we use subsets of our data but this time they are not randomly generated. Now, in each subsample, we take a part of the data the previous algorithm failed to process. Thus, we make a new algorithm to learn to fix the errors of the previous one.

The main advantage here — it’s still faster than neural networks. It’s like a race between a dump truck and a racecar. The truck can do more, but if you want to go fast — take a car.

If you want a real example of boosting — open Facebook or Google and start typing in a search query. Can you hear an army of trees roaring and smashing together to sort results by relevance? That’s because they are using boosting.

Nowadays there are three popular tools for boosting, you can see CatBoost, LightGBM, XGBoost.

6. What are the data structures in R and Python and their usage?

Data structures are basically structures that can hold some data together. In other words, they are used to store a collection of related data. There are four built-in data structures in Python — list, tuple, dictionary, and set.

List- List is an ordered data structure with elements separated by a comma and enclosed within square brackets. Once you have created a list, you can add, remove or search for items on the list. Since we can add and remove items, we say that a list is a mutable data type i.e. this type can be altered.

Ex: list of items I want to buy from supermarkets.

Items = [‘vegetables’, ‘fruits’, ‘rice’, ‘spices’, ‘sugar’, ‘salt’]

Tuple- Tuples are used to hold together multiple objects. Think of them as similar to lists, but without the extensive functionality that the list class gives you. One major feature of tuples is that they are immutable i.e. you cannot modify tuples. Tuples are defined by specifying items separated by commas within an optional pair of parentheses.

Dictionary- A dictionary is like an address book where you can find the address or contact details of a person by knowing only his/her name i.e. we associate keys (name) with values (details). Pairs of keys and values are specified in a dictionary by using the notation:

d = {key1: value1, key2 : value2}.

Set- Sets are unordered collections of simple objects. These are used when the existence of an object in a collection is more important than the order or how many times it occurs. Using sets, you can test for membership, whether it is a subset of another set, find the intersection between two sets, and so on.

7. What are the methods in Python?

Functions in a class are called as methods. Syntax of methods are similar to function but in the parameter list, we need to add an extra parameter in case of methods. The first parameter in any method refers to the object itself, so we use word ‘self’ for this. ‘self’ is helpful to differentiate between local and instance variables Class contain one special method called as initialization method which is helpful to initialize the object, and no need to call this method externally whenever an object is created this method is called automatically. Ex:-

class Person:

def _init_(self, name):

self.name = name

def say_hi(self):

print(“Hello, my name is”, self.name)

p = Person(‘Shivam’)


Output: Hello, my name is Shivam