subject
Computers and Technology, 03.03.2020 01:28 196336

In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whitespace, and your goal is to insert spaces into this string such that the result is the most fluent according to the language model.

a. Consider the following greedy algorithm: Begin at the front of the string. Find the ending position for the next word that minimizes the language model cost. Repeat, beginning at the end of this chosen segment.
Show that this greedy search is suboptimal. In particular, provide an example input string on which the greedy approach would fail to find the lowest-cost segmentation of the input.
In creating this example, you are free to design the n-gram cost function (both the choice of n and the cost of any n-gram sequences) but costs must be positive and lower cost should indicate better fluency. Note that the cost function doesn't need to be explicitly defined. You can just point out the relative cost of different word sequences that are relevant to the example you provide. And your example should be based on a realistic English word sequence — don't simply use abstract symbols with designated costs.

b. Implement an algorithm that, unlike greedy, finds the optimal word segmentation of an input character sequence. Your algorithm will consider costs based simply on a unigram cost function.
Before jumping into code, you should think about how to frame this problem as a state-space search problem. How would you represent a state? What are the successors of a state? What are the state transition costs? (You don't need to answer these questions in your writeup.)
Fill in the member functions of the SegmentationProblem class and the segmentWords function. The argument unigramCost is a function that takes in a single string representing a word and outputs its unigram cost. You can assume that all the inputs would be in lower case. The function segmentWords should return the segmented sentence with spaces as delimiters, i. e. ' '.join(words).
For convenience, you can actually run python submission. py to enter a console in which you can type character sequences that will be segmented by your implementation of segmentWords. To request a segmentation, type seg mystring into the prompt. For example:

>> seg
Query (seg):
this is not my beautiful house

Console commands other than seg — namely ins and both — will be used for the upcoming parts of the assignment. Other commands that might help with debugging can be found by typing help at the prompt.
Hint: You are encouraged to refer to NumberLineSearchProblem and GridSearchProblem implemented in util. py for reference. They don't contribute to testing your submitted code but only serve as a guideline for what your code should look like.
Hint: the final actions that ucs (a UniformCostSearch object) takes can be accessed through ucs. actions.

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 23.06.2019 01:30
How do you set up a slide show to play continuously, advancing through all the slides without requiring your interaction? a. click set up slide show, and then select the loop continuously until ‘esc' and show without narration options. b. click set up slide show, and then select the loop continuously until ‘esc' and use timings, if present options. c. click set up slide show, and then select the show presenter view and use timings, if present options. d. click set up slide show, and then select the show without animation and browsed at a kiosk (full screen) options.
Answers: 3
question
Computers and Technology, 23.06.2019 20:30
1. for which of the following are you not required to signal beforehand? a. changing lanes b. speeding up c. stopping
Answers: 2
question
Computers and Technology, 23.06.2019 21:40
Simon says is a memory game where "simon" outputs a sequence of 10 characters (r, g, b, y) and the user must repeat the sequence. create a for loop that compares the two strings. for each match, add one point to user_score. upon a mismatch, end the game. sample output with inputs: 'rrgbryybgy' 'rrgbbrybgy'
Answers: 3
question
Computers and Technology, 24.06.2019 09:50
Suppose you are an ad-serving company and you maintain a log of cookie data for ads you serve to the web pages for a particular vendor (say amazon). a. how can you use this data to determine which are the best ads? b. how can you use this data to determine which are the best ad formats? c. how could you records of past ads and ad clicks to determine which ads to send to a given ip address? d. how could you use this data to determine how well the technique you used in your answer to part c was working? e. how could you use this data to determine that a given ip address is used by more than one person? f. how does having this data give you a competitive advantage vis-à-vis other ad-serving companies?
Answers: 2
You know the right answer?
In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whi...
Questions
question
Mathematics, 16.07.2021 04:30
question
English, 16.07.2021 04:30
Questions on the website: 13722361