subject

A good hash function h(x) behaves in practice very close to the simple uniform hashing assumption analyzed in class, but is a deterministic function. Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under-loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing. Consider the following two hash functions. Let U be the universe of strings composed of the characters from the alphabet Σ = [A, . . . ,Z], and let the function f(xi) return the index of a letter xi ∈ Σ, e. g., f(A) = 1 and f(Z) = 26. Let x be a string of length m.

(1) The first hash function we consider is h1(x) = [Pm i=1 f(xi)] mod `, where ` is the number of buckets in the hash table.

(2) For the second hash function, first—globally, external to the hash function— choose uniformly random integers ai (one for each xi ∈ Σ) from {0, . . . ,10,000}, and then define h2(x) = [Pm i=1 ai · f(xi)] mod `. List your values of ai here: (and please use consistent values of ai throughout this question)

(a) There is a txt file on Canvas that contains US Census derived last names. Using these names as input strings, first choose a uniformly random 50% of these name strings. Let ` = 5851 be the number of buckets. For each of the two hash functions (separately), produce a histogram showing the distribution of hash locations for the names you chose. Label the axes of your figures. Give a brief description of what the figure shows about h1(x) and h2(x); justify your results in terms of the behavior of these hash functions. Hint: the raw file includes information other than the name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table.

(b) State at least 4 reasons why h1(x) is a bad hash function relative to the ideal behavior of uniform hashing

(c) Produce two plots—one for each hash function h1, h2—showing the length of the longest chain (were we to use chaining for resolving collisions) as a function of the number n of these strings that we hash into a table with ` = 5851 buckets. That is, you may use the 50% of names from part (a), and as you hash them one by one, show how the length of the longest chain is growing

(d) Produce another pair of plots—one for each of h1, h2—showing the number of collisions as a function of `. Comment on how collisions decrease as ` increases. Aside from size, do you notice any particular kinds of values for ` that seem better than others? (e. g. odd/even, prime, etc.) Discuss briefly.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 11:00
in 2007, floridians died in alcohol-related collisions.a.  501b.  1,051c.  5,015d.  10,839
Answers: 1
question
Computers and Technology, 23.06.2019 18:00
What can a word user do with the customize ribbon dialog box? check all that apply. minimize the ribbon add a new tab to the ribbon remove a group from a tab add a group to a tab choose which styles appear choose which fonts appear choose tools to appear in a group
Answers: 1
question
Computers and Technology, 24.06.2019 10:30
Which of the following types of software is most applicable to the promotion of new products through advertising? a.databases b. spreadsheets c. web design programs d. word processing tools
Answers: 2
question
Computers and Technology, 24.06.2019 17:00
Carlos, an algebra teacher, is creating a series of powerpoint presentations to use during class lectures. after writing, formatting, and stylizing the first presentation, he would like to begin writing the next presentation. he plans to insert all-new content, but he wants to have the same formatting and style as in the first one. what would be the most efficient way for carlos to begin creating the new presentation? going under the file tab and opening the first presentation, deleting all content from each page, and adding new content going under the file tab and clicking on new in the left pane, then choosing new from existing going under the design tab and clicking on themes, then selecting the theme that was used for the first template going under the design tab and opening the template that was created for the first presentation
Answers: 2
You know the right answer?
A good hash function h(x) behaves in practice very close to the simple uniform hashing assumption an...
Questions
Questions on the website: 13722363