chicken: (Default)
[personal profile] chicken
Probably no one knows this since it is way geeky. Oh well.But any help would be appreciated.

Anyhow, here's the sitch:

With Java, one can use InputStreamReader and OutputStreamReader to convert files from one character set to another. In my case, from cp1252 (aka Windows) to MacRoman or vice versa.

But the real question is, how to tell which files to bother converting. Let's say I have a bunch of files derived from another process, and some of them are of Windows origin and some of Macintosh origin. But I don't know which, and want to programmatically decide which ones to bother converting from, say, MacRoman to cp1252. And not necessarily using Java, but using any tool which will work, Perl, PHP, grep, whatever.

So e.g. this sort of thing (in pseudo code of indeterminate language):

while (looping through the files) {
if (HasMacRomanCharacters($thisFile) {
runJavaProgramToConvertToCp1252($thisFile);
}
}

???? Help!

If no one can help, I will devise some really hacky thing.

Visually I can tell -- if I use 'less' to look at a file and see <8F>, I know I have an &egrave; (i.e. è) in MacRoman. If I use 'less' to look at a file and see <E8>, I know I have an &egrave; (i.e. è) in cp1252. And I know what the hex equivalents of these are. And I know emacs displays them differently yet. But I can't use grep or something like that to search for these successfully, so I am kind of stumped.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

chicken: (Default)
chicken

April 2009

S M T W T F S
   1234
56 78 9 1011
12131415161718
192021 22232425
2627282930  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags