14/10/2025
β
*Data Science Tools & Languages β Interview Q&A Guide* π§ π§°
πΉ *1. Python*
*Q:* *Why is Python preferred in Data Science?*
*A:* Python is easy to learn, has vast libraries (NumPy, Pandas, Scikit-learn), supports visualization (Matplotlib, Seaborn), and is widely used in ML and AI.
*Q:* *Whatβs the difference between a list and a NumPy array?*
*A:* Lists can store mixed data types and are slower. NumPy arrays are faster and support element-wise operations and broadcasting.
πΉ *2. Pandas*
*Q:* *How do you handle missing values in Pandas?*
*A:* Using `df.isnull()`, `df.dropna()`, or `df.fillna(value)` based on context.
*Q:* *How to filter rows based on condition?*
*A:* `df[df['column'] > 50]` filters rows where values in `'column'` are greater than 50.
πΉ *3. NumPy*
*Q:* *What is broadcasting in NumPy?*
*A:* It allows operations between arrays of different shapes (e.g., adding a scalar to a matrix).
*Q:* *Difference between ndarray and array?*
*A:* `ndarray` is NumPyβs main array class; `array()` is a method to create it.
πΉ *4. Scikit-learn*
*Q:* *How do you handle model overfitting?*
*A:* Using techniques like cross-validation, regularization (L1, L2), pruning (for trees), or simplifying the model.
*Q:* *How do you evaluate a classification model?*
*A:* With accuracy, precision, recall, F1-score, and confusion matrix.
πΉ *5. SQL*
*Q:* *Whatβs the difference between WHERE and HAVING?*
*A:* `WHERE` filters rows before grouping; `HAVING` filters after `GROUP BY`.
*Q:* *Write a query to find the second highest salary.*
*A:*
```sql
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
```
πΉ *6. Jupyter Notebook*
*Q:* *Why is Jupyter used in Data Science?*
*A:* It's interactive, supports visualizations inline, and is ideal for prototyping, documentation, and sharing results.
πΉ *7. Git & GitHub*
*Q:* *How do you revert a commit in Git?*
*A:* Use `git revert ` to undo changes with a new commit.
*Q:* *Difference between Git and GitHub?*
*A:* Git is a version control tool; GitHub is a cloud-based hosting platform for Git repositories.
πΉ *8. Cloud Platforms (AWS, GCP, Azure)*
*Q:* *Which AWS service is commonly used for ML?*
*A:* Amazon SageMaker β it's used for building, training, and deploying ML models.
*Q:* *Why use cloud in Data Science?*
*A:* For scalable storage, high computing power, collaboration, and cost-effective data processing.
π‘ *Pro Tip:* Tailor your answers with real-world experience if possible (e.g., "I used Pandas for cleaning 100k+ rows of raw sales dataβ¦").
π¬ *Tap β€οΈ for more!*