subject

Question 1 (Index Construction):
Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model.
You have collected the following documents (unstructured) and plan to apply an index technique to convert them into an inverted index.

Doc 1:data science is field to use scientific method, process, algorithm, system to extract knowledge.

Doc 2:data mining is the process to discover pattern in large data to involve method at the database system.

Doc 3:information system is the study of network of hardware and software that people use to process data.

To answer the below questions, you have to provide the detailed procedures step by step.
You need to remove all stop words and punctuation before the process of creating the inverted index. After that, please complete the following steps:

Question 1.1:
Create a merged inverted list including the within-document frequencies for each term.

Question 1.2:
Use the index created as above to create a dictionary and the related posting file.

Question 1.3:
Please design three Boolean queries, (for example, web AND search) and list the relevant documents for each query. Each query must contain at least two keywords while no one keyword appears in one document only.

Question 1.4:
Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold).

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 18:30
Which of these options are the correct sequence of actions for content to be copied and pasted? select content, click the copy button, click the paste button, and move the insertion point to where the content needs to be inserted. click the copy button, select the content, move the insertion point to where the content needs to be inserted, and click the paste button. select the content, click the copy button, move the insertion point to where the content needs to be inserted, and click the paste button. select the content, move the insertion point to where the content needs to be inserted, click the copy button, and click the paste button.
Answers: 3
question
Computers and Technology, 24.06.2019 01:30
How can you make your column headings stand out?
Answers: 1
question
Computers and Technology, 24.06.2019 17:50
You will subnet the network address 172.31.103.0/24. the network has the following requirements: · room-114 lan will require 27 host ip addresses · room-279 lan will require 25 host ip addresses · room-312 lan will require 14 host ip addresses · room-407 lan will require 8 host ip addresses how many subnets are needed in the network topology?
Answers: 2
question
Computers and Technology, 24.06.2019 21:30
Jenny wants to create an animated short video to add to her website. which software will she use to create this animated video?
Answers: 1
You know the right answer?
Question 1 (Index Construction):
Suppose you have joined a search engine development team to...
Questions
Questions on the website: 13722362