.. was given product strings for a product to work on the extraction portion… However, the product strings are not in standard format.. Without much information, I guess the design is to draw the strings into the program during the runtime…
The following example shows how the product string looks like… In the example… we took only <b>one unit</b> as an example… In a standard file, there can be more than hundred of them… and there are more than hundred of this type of file…
= PRODUCT_STRING_ID * SPECIAL NOTES <---- Optional to exist ! INSTUCTIONS <---- Optional to exist < SOURCE STRINGS HERE - OLD SOURCE STRINGS HERE > TARGET STRINGS HERE
Product strings, that are in bilingual format, are strings that are done in the past… We have to align those past translation to create a Translation Database… However, for those appear in one language usually have TRANSLATE in the instructions portion… These strings are to be extracted out for translation… Besides the instruction, TRANSLATE, there is one more instruction that requires attention. That is UPDATE. Units that have the instruction, UPDATE, are also require to be extracted out for translation….
Unit that has the instruction, TRANSLATE,……
= PRODUCT_STRING_ID * SPECIAL NOTES ! TRANSLATE < SOURCE STRINGS HERE
And, for unit that has instruction, UPDATE,…..
= PRODUCT_STRING_ID * SPECIAL NOTES ! UPDATE < NEW SOURCE STRINGS HERE - OLD SOURCE STRINGS HERE > OLD TARGET STRINGS HERE
Looking at the UPDATE unit… we will require to extract the old source and target to use it as reference when translating the new source strings…
When extracting the strings… there are a few things we need to take note… these strings are stored in multiple files and in multiple folders… There are quite a number of programming codes and HTML taggings in the strings… one product string unit can have multiple sentences in the source, target languages…. and worst, the strings are broken into small sentences to fit into the programs…. so they appear like this…
> SOURCE TEXT > SOURCE TEXT CONTINUED < TARGET TEXT < TARGET TEXT CONTINUED < TARGET TEXT END...
So my challenge is how to join the sentences and how to split them into sentence segmentation structure… how to differentiate the programming language and HTML Tags… How should I align old translation… and to which format, should I extract my product strings to…?
Things that are required to do…
1. Extract strings, that have the instructions, TRANSLATE and UPDATE, for analysis and translation…
2. Align past-completed product strings to create a Translation Database for analysis and translation…
3. …
