Importing a Dataset in Picostat, R and SAS
This guide discusses how to import a dataset using Picostat, R, and SAS.
To import a dataset into Picostat, you must first register for a free account. Please see this guide on how to register an account with Picostat.
Once you have registered and set your password, click the Create Dataset link at the top of a page. From there, you will be presented with a screen similar to the one shown below.
From here you can choose a method to create a dataset:
- Use randomly generated numbers as the dataset values
- Upload a dataset file from your computer
- Copy and paste a dataset into a textarea
- Start with an empty dataset
- Use an Excel spreadsheet to import the data
- Use an sas7bdat file
- Use an SPSS file
Depending on which method is chosen, the user will have the option to choose whether or not the first line is a header and also the separator for the dataset. If the dataset was created successfully, the user will be taken to a screen to view the dataset grid that has just been imported. If there were any errors while importing the dataset, the user will be redirected to the previous screen. If not, the dataset is ready for consumption. Datasets created as an authenticated user will be saved indefinitely or until a delete request has been made.
If using a file to import the data like a CSV file or SPSS file, there will be a limit on the upload size which will be displayed to the user. The file widget also displays which types of files may be uploaded as seen here.
Authenticated users can also specify licenses.
There are also various other options. These may vary depending on your choice of import method. These include:
- The number of columns and rows to create for a dataset of random numbers generated at creation time
- Header - does the first row of the dataset contain column names?
- Separator - What separates observations. Options include tab, space, comma and newlines.
- Start from scratch with an empty dataset.
- Excel column names: does the first row of the sheet contain column names?
- Excel sheet name: Name the sheet to be imported as an Excel workbook may have multiple sheets
- Additional options may be added as requested
To import a CSV dataset with R, use a command similar to this one. This is essentially the command used by Picostat but without a graphical interface.
stent30 <- read.table("/home/ubuntu/stent30.tsv", header = TRUE, sep = "\t");
The dataset is being read into the stent30 variable. This terse syntax makes R quite powerful. The
\t tells R to use the TAB key as the separator between different values. The first parameter is the path to the TSV file (this value may be different depending on your operating system.) The header of a dataset is the first line of a dataset which contains the column names. In this case, we know the TSV file does have a header (We're using the stent30 and stent365 datasets of the OpenIntro Textbook Advanced High School Statistics.
Remember, when exiting to save the R environment otherwise the dataset will need to be re-imported.
If you are importing a proprietary format, follow these links for additional details:
The SAS code to import a dataset is considerably more complex. If you need help setting up SAS on your computer, you can read the previous blog post about downloading and installing SAS.
To start importing a dataset with SAS Studio, start at top left hand corner of the screen. You will see an asterisk icon dropdown. Choose "Import Data."
Next you will be presented to choose a file from your computer.
After selecting a file, you will be presented with a screen to configure the import. Remember: the file must be in the folder you configured with VirtualBox. SAS will not be able to see any files outside that directory. For most of the values, the default settings will suffice, but choosing a delimiter is important. The first dataset that The OpenIntro Textbook Advanced High School Statistics discusses is tab delimited. You will need to enter the following to symbolize the tab delimiter.
Here is the raw SAS code that will be generated based upon our selections. You can see the Tab machine code in the text below.
/* Generated Code (IMPORT) */
/* Source File: stent30.txt */
/* Source Path: /folders/myshortcuts/Picostat */
/* Code generated on: 5/10/17, 11:17 AM */
FILENAME REFFILE '/folders/myshortcuts/Picostat/stent30.txt';
PROC IMPORT DATAFILE=REFFILE
PROC CONTENTS DATA=WORK.IMPORT1; RUN;
If everything goes well, you will see a screen like this. You can scroll down to see the entire dataset.
From here you will be able to work with the data on future projects. Remember to save the state of the VirtualBox after successfully importing the dataset otherwise you will have to repeat this process.
|Title||Authored on||Content type|
|R Dataset / Package Zelig / seatshare||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package plm / Grunfeld||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package KMsurv / btrial||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package wooldridge / card||March 9, 2018 - 1:06 PM||Dataset|
|R Dataset / Package KMsurv / pneumon||March 9, 2018 - 1:06 PM||Dataset|