Category Archives: Localization

… Extracting materials…

.. was given product strings for a product to work on the extraction portion… However, the product strings are not in standard format.. Without much information, I guess the design is to draw the strings into the program during the runtime…

The following example shows how the product string looks like… In the example… we took only <b>one unit</b> as an example… In a standard file, there can be more than hundred of them… and there are more than hundred of this type of file…

= PRODUCT_STRING_ID
* SPECIAL NOTES <---- Optional to exist
! INSTUCTIONS <---- Optional to exist
< SOURCE STRINGS HERE
- OLD SOURCE STRINGS HERE
> TARGET STRINGS HERE

Product strings, that are in bilingual format, are strings that are done in the past… We have to align those past translation to create a Translation Database… However, for those appear in one language usually have TRANSLATE in the instructions portion… These strings are to be extracted out for translation… Besides the instruction, TRANSLATE, there is one more instruction that requires attention. That is UPDATE. Units that have the instruction, UPDATE, are also require to be extracted out for translation….

Unit that has the instruction, TRANSLATE,……

= PRODUCT_STRING_ID
* SPECIAL NOTES
! TRANSLATE
< SOURCE STRINGS HERE

And, for unit that has instruction, UPDATE,…..

= PRODUCT_STRING_ID
* SPECIAL NOTES
! UPDATE
< NEW SOURCE STRINGS HERE
- OLD SOURCE STRINGS HERE
> OLD TARGET STRINGS HERE

Looking at the UPDATE unit… we will require to extract the old source and target to use it as reference when translating the new source strings…

When extracting the strings… there are a few things we need to take note… these strings are stored in multiple files and in multiple folders… There are quite a number of programming codes and HTML taggings in the strings… one product string unit can have multiple sentences in the source, target languages…. and worst, the strings are broken into small sentences to fit into the programs…. so they appear like this…

> SOURCE TEXT
> SOURCE TEXT CONTINUED
< TARGET TEXT 
< TARGET TEXT CONTINUED
< TARGET TEXT END...

So my challenge is how to join the sentences and how to split them into sentence segmentation structure… how to differentiate the programming language and HTML Tags… How should I align old translation… and to which format, should I extract my product strings to…?

Things that are required to do…
1. Extract strings, that have the instructions, TRANSLATE and UPDATE, for analysis and translation…
2. Align past-completed product strings to create a Translation Database for analysis and translation…
3. …


SDLX: Harvest TagEditor files back to SDLX….

I kept having the same error, the number of segments is different from the source, while trying to harvest TagEditor files (TTX) back to SDLX files (ITD) today. Weeks ago, I tried to convert a huge Excel file (XLS) to TagEditor file but every time the software quited without converting the file successfully. So I chose to use SDLX to do the conversion from Excel to TagEditor file format.

I suspect this is due to the segmentation settings in the TagEditor file. I guess for Excel’s TagEditor file created using the SDLX Exchange, we must make sure that it is translated using paragraph segementation (column) instead of the regular sentence segmentation……. hmm… am i right?

It is so frustating when you are unable to get the target language file cleaned after so many tries using different methods…. Tomorrow I will try again using the paragraph segmentation setting to try again…. Hope it really works….. :s


Okapi Framework… TXT to XLIFF

I was writing a tool to extract and convert text from plain text file format to XLIFF format. I came across this set of tools, Okapi Framework while looking for more information regarding the XLIFF specifications. I have only tried the text extraction and text rewriting tool in the framework. It is a very versatile application and could extract translatable text in various formats and create XLIFF format files. The application provides easy interface to allow users to create and test the filters using regular expressions.

Script Filter Tool

Okapi is an open source application so it is FREE!! It is created to help the localizers to develop new localization processes or enhance the existing ones to meet their needs.

After trying the text extraction tool, I personally think it is now easier to extract text to XLIFF format. However, it allows you to use <ph> for the inline codes in the extracted text only so I am wondering are there any other ways to use the inline codes that XLIFF format could support besides <ph>, <g> and <x/>. It would be great if it can support merged-trans, seg-source and etc…


Trados: The process cannot access the file because it is being used by another process

Recently, I kept getting the same irritating message, The process cannot access the file because it is being used by another process, when I try to process my working files using Trados. At first, I thought it might be due to Microsoft Word or winword.exe is running in background. However, after making sure that those programs are not running and re-started my Trados, it still gave me the same error messages.

So I slowly examined my machine properly, and kill each process slowly in the task manager. I found out that after I removed the virus scanner, Trados actually don’t give me the error message anymore. I wonder is there any other possibilities that give the same error message? Like objects in the files that is externally linked? I am not sure……


Analyze and process all files in the folder, as well as, sub-folders in Trados

One of the tools, I used very frequently to do analysis, pre- and post-engineering of files is Trados. Usually, to process files in Trados Workbench, I would use Windows Explorer to open the folder containing all the materials. I would, then drag and drop them into the Trados Workbench to analyze, pre- or post-process the files. Although this is one of the commonly used methods, it is not very efficient if the working folder contains sub-folders. Windows Explorer is unable to display all the files that reside in the sub-folders in one go; and it is very troublesome and “dangerous” to access each sub-folders at a time to add the files, especially when you missed out some files.

I start to explore and try some shortcuts that allow me to perform the same process in one go. One of the methods that I use very frequently now is to use the search command in the Windows Explorer. You have to point to the particular working folder that contains all the files to perform a search. The results of the search command will includes all the working files in the folder, as well as, the sub-folders. You can simply select and drag all the files into Trados Workbench to process the files. This will save up lots of time required for manually add and process the files.


Follow

Get every new post delivered to your Inbox.