Parsing an ANSI-coded text file in HTML/Javascript

For parsing a table from an arbitrarily ordered text file, PapaParse is the method of choice and indeed it works wonderfully, see e.g. here. There is one little problem if the file that needs to be processed is not UTF-8-encoded as the browser typically expects but ANSI. Interestingly, PapaParse has the option to specify the character encoding, but that does not work on a remote file. It turns out to be not so simple to resolve this issue and I scouted all over the internet to find a solution. A combination of hints finally produced the solution that I here share with you.

Let me present a working html-file that does everything that was desired:

  • Opening en parsing the remote file immediately when the document is presented on the user’s screen thereby using the file contents.
  • Properly decode characters according to ANSI.
  • Using a non-standard delimiter between the columns.

There is one caveat that cannot be avoided: the html-code only works on a proper (remote) server, not on a local computer unless a mini-server is employed. See here how I solve this.

The first part contains the html code: it sets character coding to UTF-8 (not always required), calls the required javascript files and in the body there is only a div-block (line 9) where the parsed list will land so that it can be displayed on the screen.

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.1.2/papaparse.min.js"></script>
</head>

<body>
 <div  id="parsed_csv_list"></div>
</div>
</body>
</html>

The second part contains the javascript code that is needed to open and parse the file.

$( document ).ready(function(){
  $.ajax({
    url: 'demo.txt',
    type: 'get',
    contentType: "text/csv; charset=iso-8859-1'",
    beforeSend: function(jqXHR) {
        jqXHR.overrideMimeType('text/html;charset=iso-8859-1');
        },
     success: function(adata){
     results = Papa.parse(adata, {
        encoding: "iso-8859-1",
        delimiter: "|",
        quoteChar: '/',
        header: false,
        complete: function (results){
           data = results.data;
           console.log("Success?", results);
           console.log("Row errors:", results.errors);
           displayHTMLTable(results);
           }
         })
        },
        error: function (xhr, ajaxOptions, thrownError) {
          var errorMsg = 'Ajax request failed: ' + xhr.responseText;
          $('#content').html(errorMsg);
          }
        });
  });
</script>

Most of the lines are pretty standard even though putting them together to achieve the desired result is far from trivial. Usually, a button event is used to trigger the file to be downloaded. Here, the event is when the document has been loaded and ready to be displayed on the screen. An AJAX-get request is used to open the file demo.txt, this is specified in line 3. The important parts are lines 5-8 and 11. Here the ANSI character set specification is given, here as iso-8859-1. The first time is a standard encoding request as common to the AJAX-get request. That turns out not to suffice and before the file is accessed the character set specification needs to be set, here in line 6-8. In line 11 this is done – again – for PapaParse.

Finally, in line 19 a little routine is called to make the html-table using the information in the text file. This routine is obtained from here albeit that it required some correction and modification.

<script>
 function displayHTMLTable(results){
   var table = "<table class='table'>";
   var data = results.data;
   for(i=0;i<data.length;i++){
     table+= "<tr>";
     var cells = data[i];
     for(j=0;j<cells.length;j++){
       table+= "<td>";
       table+= cells[j];
       table+= "</td>";
       }
     table+= "</tr>";
     }
   table+= "</table>";
   $("#parsed_csv_list").html(table);
}
</script>


The three parts go into one file demo.htm that, together with the data file demo.txt, is to be uploaded to a webhost so that its function can be tested. For your convenience, the two files are in this ZIP-file.

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *