subject

Example: data set: collections of text documents. problem: count the frequency of nouns that appear at least 100 times in the documents. (i) mapper function: tokenize each line into a set of terms (words), and filter out terms that are not nouns. (ii) mapper output: key is a noun, value is 1. (iii) reducer input: key is a word, value is list of 1’s. (iv) reduce function: sums up the 1’s for each key (noun). (v) reducer output: key is a noun, value is frequency of the word (filter the nouns whose frequencies are below ) data set: amazon book ratings data. each line in the data file has 4 columns (reviewer id, book id, book genre, rating), where ratings are integer-valued ranging from 1 to 4. problem: identify the highest rated book, i. e., the book with highest average rating, for each book genre. note that each book can have more than one ratings (e. g., by different ) data set: movie preference data. each record in the data file contains the movie title and list of users who liked the movie. for example, the record jaws user111 user134 user313 user5812 star_wars user111 user313 user388 user4422 problem: for each pair of users, count the number of movies they both liked. the output may exclude pairs of users who do not have any movies they both liked.(c) data set: maximum and minimum daily temperature readings for weather stations from around the world. each line in the data files has 4 columns (station id, date, max temperature, min temperature). 2 problem: find the station id and date of anomalous temperature readings in the dataset. a temperature reading is anomalous if the minimum daily temperature exceeds the maximum temperature for the given day.(d) data set: instagram friendship graph. each record corresponds to an instagram user, followed by a list of his/her friends. for example, the graph data may contain the following records: john123 mary456 tom312 lee222 mary456 john123 tom312 john123 lee222 lee222 john123 tom312 the first line above states that mary456, tom312, and lee222 are friends of john123. problem: find pairs of instagram users who are not friends with each other but who share one or more common friends. this is known as the "friend-of-a-friend" (fof) problem. for example, mary456 and tom312 are both friends of john123, but they are not friends with each other. the hadoop program should only output the pair (u, v) if u < v. in the previous example, the program should only output the pair (mary456, tom312) but not (tom312, ) data set: cancer data. each line in the data file corresponds to a patient with the following nominal-valued attributes: patientid, gender, marital status, smoker, weight class, and class, where the class attribute has value yes or no to indicate whether the patient has cancer. 12345, female, married, smoker, normal, yes. 13, male, single, nonsmoker, normal, no. 14423, male, married, smoker, overweight, yes. problem: compute the gini index for each of the following attributes: gender, marital status, smoker, and weight class, based on the distribution of their class values.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 07:20
Write a pseudocode solution for each of these problems. 1. design a while loop that lets that user enter a number. the number should be multiplied by 10, and the result stored in a variable named product. the loop should iterate as long as product contains a value less than 100. 2. design a do-while loop that asks the user to enter two numbers. the numbers should be added and the sum displayed. the loop should ask the user whether he or she wishes to perform the operation again. if so, the loop should repeat; otherwise it should terminate. 3. design a for loop that displays the following set of numbers: 0, 10, 20, 30, 40, 50 100. 4. design a nested loop that displays 10 rows of # characters. there should be 15 # characters in each row. 5. convert this for loop to a while loop. declare integer count for count = 1 to 50 display count end for 6. find the error in the following pseudocode. declare boolean finished = false declare integer value, cube while not finished display “enter a value to be cubed.” input value; set cube = value ^ 3 display value, “ cubed is “, cube end while
Answers: 2
question
Computers and Technology, 23.06.2019 05:00
Which best explains why a digital leader would join a society specializing in technology
Answers: 1
question
Computers and Technology, 23.06.2019 13:30
Me ! evelyn is a manager in a retail unit. she wants to prepare a report on the projected profit for the next year. which function can she use? a. pmt b. round c. division d. what-if analysis
Answers: 2
question
Computers and Technology, 23.06.2019 21:10
Asample of 200 rom computer chips was selected on each of 30 consecutive days, and the number of nonconforming chips on each day was as follows: 8, 19, 27, 17, 38, 18, 4, 27, 9, 22, 30, 17, 14, 23, 15, 14, 12, 20, 13, 18, 14, 20, 9, 27, 30, 13, 10, 19, 12, 26. construct a p chart and examine it for any out-of-control points. (round your answers to four decimal places.)
Answers: 2
You know the right answer?
Example: data set: collections of text documents. problem: count the frequency of nouns that appe...
Questions
question
Biology, 15.07.2021 14:00
question
English, 15.07.2021 14:00
question
Mathematics, 15.07.2021 14:00
question
Mathematics, 15.07.2021 14:00
question
Mathematics, 15.07.2021 14:00
question
Biology, 15.07.2021 14:00
question
Mathematics, 15.07.2021 14:00
Questions on the website: 13722367