Link to home
Start Free TrialLog in
Avatar of Galina Besselyanova
Galina Besselyanova

asked on

C# HTML to PDF

Hello experts,
I have html code that need to be converted to PDF.
HTML contains table populated from recordset.
Please check the code. It is working, except for some reason in pdf it cuts out last row in the table.
 private void GenerateReport(string Html, HttpContext context)
        {
            MemoryStream stream = createPDF(Html);

            context.Response.ContentType = "application/pdf";
            context.Response.AddHeader("Content-Disposition", "attachment; filename=\"Report.pdf\"");
            context.Response.BinaryWrite(stream.ToArray());
       
        }

        private MemoryStream createPDF(string html)
        {
            MemoryStream msOutput = new MemoryStream();
            TextReader reader = new StringReader(html);

            Document document = new Document(PageSize.A4,10f,10f,10f,0f);
            
            PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

            HTMLWorker worker = new HTMLWorker(document);
    
            document.Open();
            worker.StartDocument();

            worker.Parse(reader);
            worker.EndDocument();
            worker.Close();
            document.Close();

            return msOutput;
        }

Open in new window


If i run just html, all rows are displayed. The generated PDF document won't include last row.
Please, help.
Thank you.
Avatar of Robberbaron (robr)
Robberbaron (robr)
Flag of Australia image

1. are you sure your html is terminated correctly.  many browsers fix bad html for you.

2. can you paste the last part of the HTML as displayed in 'view-source' ?

3. what library are you using ?  I had problems with iTextSharp and ended up using WebKit commandline on a separate thread to get excellent results and consistent.

     /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        ///
Avatar of Galina Besselyanova
Galina Besselyanova

ASKER

The HTML page is not displayed. When someone clicks on the link
<a id="Summary" title="List" href="/handlers/file.ashx" target="_blank">Click</a>

Open in new window

it doesn't generates the HTML page, it generates html string and then converts it into pdf (see the code above). How can i check the source of generated html?
We do use iTextSharp.
I do exactly the same process.
Create html as string and then convert to html.  You could write the string to console or text file to test.
As I said, I tried itextsharp but gave up as it couldn't handle my formatted html with css.
So I write the string to a temp file and then send that file to webkit for pdf output. Works well but needs the external webkit files to be available.
Ok for me as my app is intranet only.
We actually convert html string into PDF and this is on our website. Also, our website is based on iParts so we will have to stay with itextsharp at least for now.
However, your idea to write the string to console or text file might work. It can show us what is wrong.
Can you please show me an example of how to do this ? I would really appreciate it.
Thanks!
this includes my calls to WebKit on a separate thread but it writes the incoming HTMLCode to a temporary file.  You can use whatever path/filename  you want.

                StreamWriter sWriter = File.CreateText(myPathFile);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


        #region WebKit
        /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        /// 
        private string _WebKitFiles = "DocMan_Files";
        private void ConvertHTMLToPDF_Wk(string HTMLCode)
        {
            string sFileName = ""; //GetNewName();
            string sPage = sFileName + ".html";
            //docman_files

            if (HTMLCode == "")
            {
                HTMLCode = "<HTML><HEAD><title>Blank data</title></head><body><h1>Blank document</h1></body></html>";
            }
            string GlobOptions = "-orientation Portrait -page-size A4 -title " + _DocInfo_title;
            StringWriter sw = new StringWriter();

            //Server.Execute(sUrlVirtual, sw);
            using (TemporaryFile htmlfile = new TemporaryFile(false, "HTML"))
            {
                StreamWriter sWriter = File.CreateText(htmlfile.Path);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


                _threadArgs = RG_Utils.StringManip.Quoted(htmlfile.Path) + " " + RG_Utils.StringManip.Quoted(_PDFName);
                _threadWorkingDir = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation);
                _threadApp = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation + @"\" + "wkhtmltopdf.exe");

                System.Threading.ThreadStart job = new System.Threading.ThreadStart(ThreadStart);
                System.Threading.Thread thread = new System.Threading.Thread(job);
                thread.Start();

                // Wait for NewThread to terminate.
                thread.Join();
            }

        }

Open in new window

Great. I'll try and will let you know how it goes.
Thanks!
Hi,
We tested the HTML and generated HTML contains all records. So , probably the issue is when it converting into PDF.
Please, any ideas? The code is above.
as before, look very carefully at the end of the HTML .    can you post the last 2 rows ?
are all rows properly terminated ?

1. try pasting your html to  http://validator.w3.org/#validate_by_input

2. try a small set of your data through iTextSharp.  as i said, I had problems with it parsing HTML.
Here is a generated HTML string. As you can see there are 3 rows with 5 records.
But when this string converted into PDF, the last row  on the PDF document is not there .
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<body style="font-family: Arial, Helvetica, sans-serif; font-size: 9px; line-height: 1.1em;" bgcolor="FFFFFF#" link="CC3300#" vlink="333300"  leftmargin="0" topmargin="5" marginwidth="0" marginheight="0" alink="333300">
<p align="center" style="color:003366;font-size:16px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">
NEW YORK CITY<br />
<em>Committee List</em></p>
<br />
<p align="center" style="color:003366;font-size:12px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">AIDS Committee (5)</p>
<div style="clear:both;"></div><br />
<div>
<table>
<tr>
	<td valign="top">
	<b><em>Chair</em></b>- <b><em>10-28-2011</em></b><br />
	Ly Neuer, Esq.<br />
	Safe Law Proj<br />
	150 Court St<br />Rm 1600<br />Brooklyn, NY &nbsp; 10001<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:ler@san.org">ler@san.org</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Alt Ren, Esq.<br />
	860 E 63rd St<br />
	New York, NY &nbsp; 10011<br />
	Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:albertrchen@gmail.com">an@gmail.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Doy Chr, Esq.<br />The Bronx Defenders<br />
	1760 Ave<br />Bronx, NY &nbsp; 10651<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:chy@yahoo.com">chy@yahoo.com</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>06-25-2013</em></b><br />
	Last Join<br />42 east 46th Street<br />New York, NY &nbsp; 10011<br />
	United States<br />Email: <a style="color:blue;" href="mailto:as@as.com">as@as.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top"><b><em>Member</em></b>- <b><em>03-06-2013</em></b><br />
	Chott Ho, Esq.<br />1180 Heat St<br />New York, NY &nbsp; 10011<br />
	United States<br />Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:sgl@nyc.org">sgl@nyc.org</a><br />
	</td>
</tr>
</table>
</div>
</body>
</html>

Open in new window


Can't find anything wrong. Am i missing something?
Thank you .
ASKER CERTIFIED SOLUTION
Avatar of Robberbaron (robr)
Robberbaron (robr)
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great! Thank you so much!
When generating an HTML I have a variable that keeps track how many columns.
So, i added the following in the end and everything is working.
 if (columnCounter == 2)
            {
                html += "</tr></table></div></body></html>";
            }
            else
            {
                html += "<td></td></tr></table></div></body></html>";
            }
Thank you!