chicken: (Default)
[personal profile] chicken
Probably no one knows this since it is way geeky. Oh well.But any help would be appreciated.

Anyhow, here's the sitch:

With Java, one can use InputStreamReader and OutputStreamReader to convert files from one character set to another. In my case, from cp1252 (aka Windows) to MacRoman or vice versa.

But the real question is, how to tell which files to bother converting. Let's say I have a bunch of files derived from another process, and some of them are of Windows origin and some of Macintosh origin. But I don't know which, and want to programmatically decide which ones to bother converting from, say, MacRoman to cp1252. And not necessarily using Java, but using any tool which will work, Perl, PHP, grep, whatever.

So e.g. this sort of thing (in pseudo code of indeterminate language):

while (looping through the files) {
if (HasMacRomanCharacters($thisFile) {
runJavaProgramToConvertToCp1252($thisFile);
}
}

???? Help!

If no one can help, I will devise some really hacky thing.

Visually I can tell -- if I use 'less' to look at a file and see <8F>, I know I have an &egrave; (i.e. è) in MacRoman. If I use 'less' to look at a file and see <E8>, I know I have an &egrave; (i.e. è) in cp1252. And I know what the hex equivalents of these are. And I know emacs displays them differently yet. But I can't use grep or something like that to search for these successfully, so I am kind of stumped.

(no subject)

Date: 2003-08-23 02:17 pm (UTC)
From: [identity profile] paulv.livejournal.com
The differences I always thought Unix, Mac, and Windows had with regard to files was as follows

* Unix ends lines in a file with \n
* Windows ends lines in a file with \r\n (or \n\r, I forget)
* Macs end lines in a file with \r

It sounds like you may need to do more than just replace the line breaks, though. Maybe you can look for \r or \r\n to figure out what kind of file you've got.

If all you need to do is replace the line endings, you can do

perl -pe 's/\r\n/\r/g' < windowsfile.txt > macfile.txt

but the important thing is to replace \r\n with \r.

(no subject)

Date: 2003-08-23 06:58 pm (UTC)
From: [identity profile] chicken-cem.livejournal.com
Line ends aren't definitive and haven't been for some time. Especially with MacOS X being a weird bsd-mac hybrid, any given application could give you either \r or \n, depending on the application, and many applications have a preference setting to pick one of the other, or even to use windoze-style line ends.

Besides, the application in question gets data piped in from a web form.

Therefore I was thinking of just checking $_SERVER["REMOTE_ADDR"] with php or User-Agent with Java, to see what the client is, embedded that string in a predictable place inside the results transcript, and then use that as a boolean test in the next script.

Profile

chicken: (Default)
chicken

April 2009

S M T W T F S
   1234
56 78 9 1011
12131415161718
192021 22232425
2627282930  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags