How to realize a pdf export?

Marcel · April 5, 2011, 1:44pm

Hi all,

the problem is as follows:
We created an application with a table (that shows a result set from a db query) with sorting and filtering possibility.

Now there is a requirement to be able to export the content of the table to a pdf file. The sort and filter critera must be recognized (so I suggest it is absolutely necessary to go via the container data source).

Do you know a solution for creating the pdf export file as dynamically as possible? We don’t want to implement any logic twice.

Another use case is a “normal” page (no filtered/sorted table results, but many labels), e.g. some information about the “user data” (user name, email address, city…).

The intention is to export “what you see”, without re-building and re-styling the information only for export.

I hope I could explain the problem and that you have some good ideas

Marko1 · April 5, 2011, 3:50pm

I think there are many solutions to your question, which is a very common one. I’d probably try formatting the output with XSL FO and then using a FO processor, such as XEP or Apache FOP, to generate the PDF.

FOP at least has a Java API, so you should be able to run the processor totally dynamically in-memory with streams. The docs for XEP tell only of command-line use, although it is a Java application, but I don’t know if it has a Java API as well.

Then, have a Link in the user interface to a StreamResource, which generates the PDF document. The browser launches a PDF reader to display the document (unless it blocks popups). You could also embed the PDF in an iframe.

If you’re a Pro Account subscriber, please see
#130: How to display a PDF document in Vaadin application?
.

Marko1 · April 5, 2011, 3:57pm

Another way is to generate an OpenDocument document and then run OpenOffice as a background process to convert it to PDF. JODConverter is a widely used Java tool to run the conversion.

Charles · April 6, 2011, 6:57am

Hi

In addition to what Marko’s said above :

I don’t believe that’s really possible; you could - and should reuse the container datasource - for generating your PDF, so you get exactly the same underlying data. However, there is no way to “simply” reuse the style information.

However, an idea is dully forming in my brain - you could re-use the Table#formatPropertyValue to get the (formatted) string value to display, Table#getVisibleColumns and Table#getColumnHeaders to get the name of the properties and column headers, Table#getColumnAlignment to get alignment. If you do any further css prettifying by using CellStyleGenerator, you could possibly re-use the name to look up some formatting properties elsewhere.

In short, you can re-use most of the “vanilla” stuff from the Vaadin table. However, if you go off in to the realm of GeneratedColumns (generating custom components/images etc/checkboxes) for columns, you will not be able to render them in the PDF. In other words, it cannot every be a one-to-one “true” rendered copy of the table, but you can re-use alot of the table information to produce the PDF.

For non-table data - e.g. “normal page” - sorry, no, no chance if the page is direclty built up from Vaddin components. However, if you built a model from the data, and then the Vaadin “page” from the model, you could build a PDF from the same model. Gives you more complexity, though.

Cheers,

Charles.

Marcel · April 11, 2011, 6:12am

Thank you for the input.
I will post again, when / if I could find a proper solution.

Marko1 · April 11, 2011, 12:24pm

There’s a new
example
for generating PDFs dynamically with Apache FOP.

There’s a quite a lot of work to get the “what you see” part though…

If you’re a Pro Account subscriber, please see
#291: How do I generate a PDF file?
. It’s more about the generation, while #130 is about displaying.

David3 · April 11, 2011, 10:13pm

It is interesting that it’s so hard to convert HTML+CSS to PDF. For a few of our PDF-centric customers (aka legacy integrators), if we could just get a reliable print version of HTML+CSS from Java as a PDF, we’d be golden, and that doesn’t even include the complexity of dealing with JS-DOM frameworks like Vaadin. We just keep putting it off, hoping that one day there will be a straightforward way to do this.

jean-francois.lamy · April 12, 2011, 1:58am

The Lobo project looks interesting as a Java browser (it has a separate HTML4 parser and renders to Swing; perhaps one could shoehorn the Swing part to itext to get something reasonable).

Michel · April 12, 2011, 5:50am

There’s a tool called
wkhtmltopdf
using webkit engine to render pages and converting it to HTML:
I don’t know whether or not it supports JavaScript
. Anyway, I think there’s no way to get either the resulting HTML code, nor access the vaadin session with this tool.

Or is it possible to fake a user’s session by copying it’s cookie?

jean-francois.lamy · April 12, 2011, 12:10pm

If you look at http://code.google.com/p/wkhtmltopdf/wiki/IntegrationWithPhp the integration is done the other way around: pass an HTML file to the command line and recover the output. So there is no need for the tool to have access to the session.

Marko1 · April 12, 2011, 1:29pm

It’s hard to see how converting HTML to PDF could ever be a proper solution, at most a quick-and-dirty one. The paged layout in PDF requires very different layout solutions than HTML. Using a PDF report generator is probably the only proper solution and many people seem to be doing it that way.

If we want to turn to wild ideas, it would be possible use Vaadin API for report generation. Vaadin is, after all, a kind of a document rendering API, if you leave the user interaction out. You’d just need an XML-based terminal adapter and probably some special paged layout components. That would make data binding easy and you could even reuse some of the UI code of an application. Actually, the old Toolkit 4 was a bit closer to that, with XSLT style sheet based themes for rendering the output from the XML UIDL. Just output UIDL → FO → PDF.

jean-francois.lamy · April 12, 2011, 4:47pm

DOM to PDF would be interesting. Vaadin is a DOM manipulation API on the client (via GWT, of course). The key idea for 80% of the situations is NOT to have to do anything other than writing the original code – all I want is a reasonable printable version. I’m willing to tweak CSS, not willing to curse at XSLT for a couple days.

David3 · April 12, 2011, 6:39pm

I may take a look at wkhtmltopdf as it could be a fit for our closer needs. Few customers want PDFs (bigger and non-native to the web, and the data inside is impossible for a human to determine without actually rendering it with a viewer), but we do a have few that would like a PDF for use in other systems. We have the HTML, so it could work.

I agree that a DOM to PDF would be great.

While I agree there may be issues with a “proper solution,” I think most people would be okay with the idea of whatever a browser renders and then prints to paper (or print-to-PDF which is also available) as being the resulting PDF is fine. Yes, you can’t get all of the power of PDF, but our customers don’t care about that, just that they have a PDF “image” – in fact, a regular image also suffices as a PDF replacement for quite a few.

Of course, this is off-topic to Vaadin, so sorry about that, but it’s great to hear from the community about any solutions they’ve come across, like wkhtmltopdf. The question for us is how easily can we integrate Java servlets with this tool as we don’t do high/batch volumes, so absolute performance shouldn’t be an big issue.

UPDATED:
Did take a look at wkhtmltopdf and it runs well on Linux (and presumably Windows and OSX too) and does one of the best jobs I’ve seen for “printing” HTML to PDF. It even did a nice job of taking several HTML files and producing a single PDF.

Michel · April 13, 2011, 7:25am

Yes, but how to get the actual HTML of a vaadin “page” from server side? You had to catch the HTML code sent to the client and all following JSON commands and manipulate that HTML like the client’s browser does… not an easy job.

Marko1 · April 13, 2011, 7:33am

You whould be able to do that by making a simple widget that takes the “document.body.innerHTML” from the DOM, when requested from the server-side in a variable, and sends it back to the server with updateVariable().

Marko1 · April 13, 2011, 8:07am

If you don’t want to make a custom widget, you could probably do it with JavaScript and getWindow().executeJavaScript(). You can “return” the result by writing it to a TextField, for example. You can find the TextField element by setting its ID with setDebugId() as done in
this post
.

Well, a widget would nevertheless be a more “proper” solution. If you or somebody else does it, perhaps post it as an add-on in Directory?

Michel · April 13, 2011, 2:49pm

Ahh… never thought about that. I guess printing what-the-user-sees should be quite easy than. I’ll try that

Marko1 · April 13, 2011, 3:46pm

Well yeah, I have a bit trouble understanding that doing “Print → Print to File → Output format PDF → OK” is really so hard? …wait, it does work that way for everybody, doesn’t it? :blink:

Or are there some other use cases for outputting the UI to PDF?

jean-francois.lamy · April 13, 2011, 7:47pm

The printable report is often very similar to, but not quite, the screen version.
In many cases, you will have a scrollable table on the actual UI, but you would like all rows in the printable version.

Some kludgery setting most extensible things to have undefined height, capturing the HTML, removing background colors and doing other “printable-version” stuff would often work nicely.

Marko1 · April 13, 2011, 9:43pm

Ok, yes, but isn’t that something that only the application can do? It needs to have “show printable page” logic, where it rebuilds the view using the printable settings and possibly sets another theme. And then, lets the user hit the browser’s print button (or calls print() automatically).

What I don’t understand is that why does this make it necessary to generate a PDF on the server-side and not just let the user do the printing when the printable version is displayed? What’s the benefit of the PDF generation? Well, except for the case that the user doesn’t want to print but to have a PDF for archival or email and the browser/OS doesn’t support printing to PDF.