Automation of the Compilation and Processing of a Hausa Corpus

10203 Words41 Pages

AUTOMATION OF THE COMPILATION AND PROCESSING OF A HAUSA CORPUS BY Eno Sam Okon Supervised By: Dr. Tunde Adegbola April 2014 ABSTRACT A spell checker is an indispensible tool for text editing as it can be used to assist the possible poor language skills of writers as well as to identify and correct inevitable typing errors. With a population of over 40 million speakers, the Hausa language is the second most widely spoken language in Africa, yet it is without a standard spell checker. To create a Hausa spell checker, a Hausa corpus was built by data entry and web crawling. The wordlist was cleaned to remove non-Hausa words as well as to correct typographical and other errors. Also, in order to determine the extent to which the modest corpus used for the spell checker covers the Hausa language, the rate of increase in the size of the wordlist in relation to corpus size was determined. A modest 2 million-word Hausa corpus was realized. The corpus was then tokenized to produce a wordlist of about 30,000 Hausa tokens. After cleaning, the wordlist was reduced to 23,306 tokens. Based on the use of Hausa morphology, the word list was compressed to 12,569 stems and 62 affix rules. This made up the spell checker files. Also, a 700,000 word corpus drawn from the Hausa corpus was tokenized in separate files with a successive increment of 20,000 words per file. Results showed that Hausa morphology proved effective for information compression as expected and a rudimentary spell checker was produced. Furthermore, results of the corpus study showed that a corpus of 20,000 words would produce an average of about 3000 tokens and the number of new tokens produced will decrease with every

Jgt2 Task 2
1684 Words | 7 Pages
Although the supply increased by 1,500 units, this revised distribution pattern would cost the company only $13,400. See charts below. A1a. Analysis Tool Transportation Pattern 1 | | | | | | | | | | Data | | | | | COSTS | WH1 | WH1 | WH 3 | Supply | Shanghai | 4 | 3 | 3 | 1300 | Shuzworld H | 3 | 4 | 2 | 2300 | Shuzworld F | 2 | 4 | 6 | 2200 | Demand | 2500 | 1500 | 1800 | 5800 \ 5800 | Shipments | | | | | Shipments | WH1 | WH1 | WH 3 | Row Total | Shanghai | 0 | 0 | 1300 | 1300 | Shuzworld H | 300 | 1500 | 500 | 2300 | Shuzworld F | 2200 | 0 | 0 | 2200 | Column Total | 2500 | 1500 | 1800 | 5800 \ 5800 | Total Cost | 16200 | | | | Transportation Pattern Increase in Supply | | | | | | Data | | | | |
Read More
Pam and Sue
1715 Words | 7 Pages
There are a total of 28 quantitative demographic variables each measured as a percentage of the population within the trading zone, 2 quantitative store variables including sales and square feet, and 7 quantitative categorical variables for competitive type. Our sample size is 250 stores. Sales data is provided in $1,000s therefore a unit change in X will correspond to that X coefficient multiplied by $1,000. Results and Discussion In order to build a successful multiple regression model it is necessary to follow a multi step approach. Without taking this approach, we could run into issues in which we have an incorrect and imprecise forecast.
Read More
Acct 504 Week 4 Mini Case
939 Words | 4 Pages
If the sales outlook for the coming three years was only 20,000,000 and B.E. continued producing at the rate of 30,000,000 units, a total of 10,000,000 units would be dumped into ending inventory at the end of each year once again reducing costs of goods sold and falsely increasing income. By the end of year 2013, B.E. Company would have 35,000,000 units sitting in ending inventory taking up space and costing money to store. Once again if the president’s bonus is based off of net income, this situation is the most favorable for a high paying bonus and encourages stockpiling inventory to inflate net income.
Read More
Qnt 561 Week 6
1144 Words | 5 Pages
If they want to cut this by a factor of three to get it down to $4000, they need to multiply the sample size by 3^2=9, and get a sample size of 25×9= 225. Here is how we can calculate it more directly. We want ME = 4000, and we know ME = 2×SE. Therefore, SE = ME/2 =2000, and also, SE = SD/Square root of sample size. So, 2000 = 30000/Square root of sample size.
Read More
Unit 1 Lab 2.2 Network Drives
496 Words | 2 Pages
Two back up printers $1000 VS 30 users X $120 for the price of each printer = $3600 30 users X $120 for the price of each printer = $3600 DIVIDED BY $500 printer = 7 printers can bought and still be less expensive than each user getting individual printers Exercise 2.2.2 If each of the 30 users in an organization prints an average of 22 pages per hour and a shared printer has the capacity to print 180 pages per hour, how many shared printers will be needed to prevent overloading each printer’s capacity? Would there be a benefit to having more that the exact number needed? Justify your answer If 30 users averaged 22 pages per hour than that would be 660 pages an hour So if 1 printer can only print out 180 pages per hour you would need 4 printers to share the load. 4 printers averaging 180 pages per hour = 720 pages per hour. Also if having more that the exact number needed will help because you can average even higher pages per minute increasing production.
Read More
To What Extent Does the Multi-store Model Offer a Reasonable Account of Human Memory?
686 Words | 3 Pages
Research that was carried out by Sperling in 1960 gives evidence for the MSM, this is because the experiment that was carried out showed that when reporting a group of 12 items that were flashed on a screen for 50 milliseconds, it was 42% less accurate than reporting only one row, which was 75%. This shows that information in the SM decays rapidly unless it is able to be transferred into STM and then into LTM. Another piece of research that supports the MSM is that carried out by Glanzer and Cunitz. There research looked at the serial position effect. When the participants were asked to record the number of words they could remember from a list of 20 words it was found that the participants tended to remember the words that were at the beginning of the list because the words are rehearsed and therefore they are transferred into the LTM.
Read More
Mendel Paper Company
602 Words | 3 Pages
Prepare contributions margins in part (1) with all revisions included. 3. For the original estimates, compute each of the following: (a) Break-even point for the given sales mix. (b) Margin of safety for the estimated sales volume. 3 Part A and B (Original Estimates) | Comp Paper | Napkins | Place mats | Poster Board | Total | Volume | 30000 | 120000 | 45000 | 80000 | | Selling price | 14 | 7 | 12 | 8.5 | | Material cost | 6 | 4.5 | 3.6 | 2.5 | | Units per hour | 6 | 10 | 5 | 4 | | Variable overheads | 9 | 6 | 12 | 8 | | Variable overheads per unit | 1.5 | 0.6 | 2.4 | 2 | | | | | | | | Sales (Volume*Selling Price) | 420000 | 840000 | 540000 | 680000 | 2480000 | Material cost | 180000 | 540000 | 162000
Read More
Mat 117 Problem Set 6: System Of Equations
391 Words | 2 Pages
Given is the augmented matrix of a system of equations: 1  5 6 2  7 1  3 5    1 5 7 13   Write the new form of the augmented matrix after the following row operations. R1  r1  r3 , R2  r2  7r3           6. Four times the number of white marbles exceeded 9 times the number of red marbles by 10. The ratio of blue marbles to red marbles was 3 to 1. There is a total of 65 marbles of all 3 colors.
Read More
Hourly Rounding: A Replication Study
3785 Words | 16 Pages
Central tendency and spread were calculated for all variables. Chi- square tests and rank sum tests were specific to baseline and post intervention call light use between the two units. The fall rate before the intervention was 3.37 per 1,000 patient days. The fall rate post intervention was 2.6 per 1,000 patient days. The author noted that although the decrease was not statistically significant (p= 0.672), it was clinically significant at a 23% reduction in patient falls.
Read More
Layered Curriculum Projects In Shakespeare's 'Macbeth'
721 Words | 3 Pages
Macbeth by William Shakespeare Layered Curriculum Projects Due Friday, September 14, 2012 Assignment: You may choose from any of the projects/activities below, but you must choose enough activities to equal 100 points. Notice that the 50-point projects are more involved than, for example, the ones that are only worth 25 points, but you can do fewer activities with the more difficult projects. ALL WRITTEN WORK MUST BE TYPED, DOUBLESPACED, USING TIMES NEW ROMAN FONT AND STANDARD 1” MARGINS. 25 points each * Doctor’s Perspective - Write a journal entry from the doctor’s perspective reflecting on his visit to the Macbeth house. 10-15 sentences * Malcolm’s perspective- Write a journal entry from Malcolm’s perspective (after he becomes king) reflecting on the events of the play.
Read More

Open Document

Automation of the Compilation and Processing of a Hausa Corpus

Jgt2 Task 2

Pam and Sue

Acct 504 Week 4 Mini Case

Qnt 561 Week 6

Unit 1 Lab 2.2 Network Drives

To What Extent Does the Multi-store Model Offer a Reasonable Account of Human Memory?

Mendel Paper Company

Mat 117 Problem Set 6: System Of Equations

Hourly Rounding: A Replication Study

Layered Curriculum Projects In Shakespeare's 'Macbeth'

More about Automation of the Compilation and Processing of a Hausa Corpus