INF 559: Introduction to Data Management
Home » INF 559: Introduction to Data Management

# INF 559: Introduction to Data Management

## INF 559: Introduction to Data Management

Homework #1: Storage Access Patterns

Don't use plagiarized sources. Get Your Custom Essay on
INF 559: Introduction to Data Management
Just from \$13/Page

Preparation Lab 1(Q1) Due: January 28, Thursday 11:59 PM PST

Complete HW1 (Q2 & Q3) Due: February 8, Tuesday 11:59 PM PST

PT 100 points

In the lectures, you have studied the differences between sequential and random file access. In this assignment, you will read increasing amounts of data using sequential and random access on a large file and plot the results obtained.

1. Preparation (Lab 1) [10 points]

https://releases.ubuntu.com/20.10/ubuntu-20.10-desktop-amd64

.iso This will be used as our test file.​

2. Open a new Python 3 notebook in Google Colab (Warning: Python 2 notebooks will not be accepted).
3. Import Pandas​ , ​ NumPy,​ ​ and ​ Matplotlib1, you may use them in this
4. Many of you indicated that you have used Jupyter Notebook before, so we want to provide a starter notebook that you can open in Google Colab to get a head start. The notebook outlines the expected layout of your final submission. This step is not mandatory but is strongly recommended.
5. For the Lab 1​ submission due on Thursday night, simply plot the y = sin x function in Google Colab and submit the .ipynb​ file including​ the plot.

1  Here are good guides to get started with Pandas, Numpy, and Matplotlib:

Pandas Quick Tutorial

Numpy Quick Tutorial

Matplotlib Quick Tutorial

1. Plotting latency and bandwidth [50 points]
1. Open the test file in unbuffered ​ mode​  [1]​           .
2. Sequentially read [1, 4, 16, 64, 256, 1024, 4*1024, 16*1024, 64*1024, 256*1024, 1024*1024] blocks of data[2]. Use a fixed block size of  4KB.​ Measure the latency for each iteration in terms of wall-clock time.
4. Plot the latencies measured in 2b and 2c against the number of blocks read. Both sequential and random results should appear on the same plot and the number of blocks should be scaled logarithmically instead of linearly. Briefly describe your observations from this plot.
5. Calculate the bandwidth for each iteration of 2b and 2c using the latency and amount of data transferred. Plot the results in the same manner as latency and briefly describe your observations.

Sample output for step 2 (generated using random numbers, not suggestive of actual output)

1. Measurement Statistics [50 points]
1. Run 10 times of steps 2b and 2c and store the results.
2. For each of the 11 iterations, calculate the mean and standard error[3]​ over the 10 runs.
3. Use the results of 3b to generate errorbar plots for latency and bandwidth. Again, both sequential and random results should be on the same plot and the number of blocks should be scaled logarithmically. Briefly describe your observations from these plots.

Sample output for step 3 (generated using random numbers, not suggestive of actual output)

Submission:

1. Submit a single file on Blackboard ipynb
2. The only file format accepted is .ipynb​ . You do not need to submit the plots and explanations separately. Describe your observations using text cells (Jupyter notebooks allow both code and text cells). Your final notebook will have both the plots and explanations as part of it.
3. Make sure to mention your name and USC ID in your notebook (as done in the starter notebook).

1. Late submissions (up to 24 hours) will be penalized by 20%. No credit will be given after 24 hours of the submission deadline.
2. As mentioned above, Python 2 notebooks will receive no credit.
3. The submitted notebook must have all its cells executed and outputs visible (if applicable). Notebooks without outputs will be penalized by 30%.​
4. You may use any Python internal library, but the only external libraries allowed are pandas​ , ​ NumPy, ​ and ​    matplotlib​     . Use of any external library​ other than these will be penalized by 20​     %.​

Important Notes:

1. Submitted work must be your own. Don’t share your code with anyone.
2. Q3 may take up to 5 minutes to execute. This is expected behavior given such large reads.
3. Start early, and make sure to visit the TA’s during office hours to make sure you’re on the right track or if you need help.

[1]  Refer to the documentation to switch buffering off. Also, make sure you use the binary read mode while opening.

[2]  It may be possible that you reach the end of the file prematurely during sequential access. Make sure to seek to the start of the file again and continue reading in this case.

[3]  The standard error is defined as the standard deviation divided by the square root of the number of observations.

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

## Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.