Loading, please wait...

A to Z Full Forms and Acronyms

Explain DataSet in the R programming language | R Tutorial

Jan 27, 2022 #RLanguage #Programming, 2420 Views
In this article, you will learn about the Datasets in the R programming language.

Explain DataSet in the R programming language | R Tutorial

In this article, you will learn about the Datasets in the R programming language.

The central location in the package in RStudio where the data from different sources are stored, managed, etc is known as DataSet in R language. It has been extremely difficult to find out the data that is proper, structured, and the metadata of the dataset is so easy to explain. RStudio is an Integrated Development Environment. Through this, the developers can develop statistical models for graphics. It is present inside the format of the RStudio application. It serves the required reusability for the essential use case. There are two types of the RStudio format available in the market namely RStudio Desktop and RStudio Server. 

Read DataSet in the R programming language

There are two types of datasets with their reading ways. The first dataset is pre-stored in the package inside RStudio through which the developers can directly access and the second dataset is present in the raw format i.e. excel, csv, database, etc. The dataset present in the RStudio is limited but it is not limited to the domain of the dataset.

Read data from the Pre-defined dataset in the package

Most of the dataset available in the RStudio package exists in the repository called “UCI Machine Learning”. These datasets are extremely so powerful due to these properties:

  • If it is available in the RStudio, it downloads the dataset faster. 
  • The size of the dataset is so small though it can easily fit into the memory. 
  • The predefined dataset is so clean and therefore, the data cleaning process is avoided. Due to this, we can quickly run the algorithms.

The famous datasets of the R programming language used in data science:

  • Datasets Library: It comes with lots of base versions. Therefore, there is no need to load the library. It comes with a bundle of various libraries. It executes the following commands to check the datasets in the library

         Code:

         library(help = “datasets”)

  • Iris Datasets: The dataset contains various Iris flowers. It is based on the measurement of the flowers and the different features of the flower. 3 types of varieties have 4 different types of features. You can load the dataset by executing the following command.

         Code:

         data(iris)

  • Longley’s Economic Dataset: This dataset holds up the information of the % people who are employed during the particular year based on multiple economic factors. It has 6 factors on which we can verify the % of people employed and % of people who will get employment in the defined period. You can load the dataset by executing the following command. 

         Code:

         data(longley)

  • mlbench library: This library has the data of real-world benchmark problems. You need to install the library by executing the following command. 

         Code: 

         install.packages(“mlbench”)

         And to load the library use the following command

         Code:

          library(mlbench)

  • Boston Housing Dataset: The dataset holds up the data of the houses situated in the city called Boston. It is available in the dataset based on the 13 features. You can load the dataset by using the following command:

         Code:

         data(BostonHousing)

Read data from the Raw Format data file

Mostly, the datasets are available in the raw format file such as csv, excel, etc.

You can load the data from the raw file in this way:

CSV File

<- read.csv(“name along with the extension of the file”)

Excel File

<-read.xlsx(“<name along with the extension of the file>”, sheet_index = <index number of the sheet>)

 

A to Z Full Forms and Acronyms