Python Pandas is a Python package providing fast,flexible and expressible data structures designed for manipulation. Python Pandas was developed by Wes Mckinney in 2008 and used for data analysis in python.
Features of Pandas :
Some important features of Python Pandas are as follows:-
(i) Handling of Data:- Python library provides fast and efficient way to manage and explore the data. It provides two methods or structures as Series and DataFrames, which help us not only to represent data efficiently but also manipulate the data in various ways.
(ii) Input and Output tools:- Pandas provide an extremely simple wide array of built -in tools such as input and output tools for the purpose of reading and writing data.
(iii) Visualise:- Visualising the data is an important part of data science. It makes the results of the study understandable by human eyes. Pandas have in-built ability to help you plot your data and see the various data and see the various kinds of graphs formed.
(iv) Grouping:- With the help of this feature of pandas you can split data into categories of your choice, according to the criteria you set. The GroupBy function splits the data, implements a function and then combines the result.
(v) Merging and joining of datasets:- While analysing data, we constantly need to merge and join multiple datasets to create a final dataset to be able to properly analyse it. Pandas can help to merge various datasets, with extreme efficiency so that we don't have any problems while analysing the data.
(vi) Optimised performance:- Pandas have a really optimised performance, which makes it really fast and suitable for data. The critical code for Pandas is written in C or Cython, which makes it extremely responsive and fast.
Installing Pandas:-
(i) Click on start button and type command prompt
(ii) Right click on command prompt, run as administrator
(iii) Type the command:- pip install pandas
Python Pandas Data Structure:- A DataFrame is a particular way of storing and organising data in a computer so that it can be accessed and worked with in appropriate ways. The Pandas provides two data structures for processing the data i.e, Series and DataFrame , Which are discussed below:- The columns can be heterogeneous types like int, bool, and so on.
- It can be seen as a dictionary of Series structure where both the rows and columns are indexed. It is denoted as "columns" in case of columns and "index" in case of rows.
- Conceptually it is like a spreadsheet where each value is identifiable with the combination of row index and column name.
- The indexes can be numbers, letters or strings.
- It is value-mutable.
- We can add or delete rows/columns in a DataFrame. So, it is size-mutable.
- dictionaries whose values are list.
- dictionaries whose values are dictionaries.
- dictionaries whose values are ndarray.
- dictionaries whose values are series.
- 2-D dictionary having values as list
- 2-D dictionary having values as tuple
- 2-D dictionary having values as nd array
- 2-D dictionary having values as Series
- 2-D dictionary having values as dictionary
- Creating DataFrames from 2-D array
- Creating a DataFrame from another DataFrame object.
- Creating a DataFrame from a list of dictionaries.
- Creating a DataFrame from a Text/CSV file
DataFrame Attributes:-
The information related to a DataFrame can be obtained through its attributes
(i) df.index:- It displays the index (row labels) of the DataFrame.
(ii) df.columns:- it displays column labels of the DataFrame.
(iii) df.axes:- It returns a list representing both the axes (axes = 0 i.e index and axis = 1 i.e. columns) of the DataFrame.
(iv) df.dtypes:- It returns the dtype of data in the DataFrame
(v) df.size:- It returns an int representing the number of elements in the object.
(vi) df.shape:- It returns a tuple representing the dimensionality of the DataFrame.
(vii) df.ndim:- It returns an list representing the number of axes dimension
(viii) df.empty:- It indicates whether DataFrame is empty.
(ix) df.T:- It returns the transpose of a DataFrame by swapping its indexes and columns by using attribute T.
(x) df.values:- It returns the values of a DataFrame object in numpy array way
(xi) len(df):- It returns number of rows in a DataFrame
(xii) Getting count of non-NAN values in DataFrame:-
- df.Count() or df.Count(0):- It counts the number of non-NAN values row wise for each colummn.
- df.count(1):- It counts the number of non-NAN values column wise for each row.
Deleting columns:-
- Using del statement
- This will not return the DataFrame or display the DataFrame after deleting the DataFrame after deleting the columns we have to explicity print df
- usng drop()
- This will display the DataFrame after deleting the column without explicity printing.
- Multiple columns can be deleted.
- To delete rows
Renaming Rows and Columns in DataFrame:-
To rename rows(index) or column names in DataFrame, we use 'rename()' method.
Ex- df.rename(index={"Eng":"English"},columns={"Name":"Sname"}, inplace=True)
Iteration in DataFrame:-
- Sometimes, we need to process all the data values of a DataFrame. In Such Case, we need to iterate over a DataFrame
- There are two ways to iterate over a DataFrame
(ii) Column-wise using iteritems():-
Good one!
ReplyDeleteNice one
ReplyDeleteVery nice really amazing post thank for this.
ReplyDeleteYour Blog is very nice.
ReplyDeleteWish to see much more like this.
It's really a very nice one
ReplyDelete