Fix : Issue with getting slide title using apache POI

Indexing office files is one of the common case while developing search applications. In my case I needed to index slides of a presentation in which title and content needs to indexed separately as we need to provide high boosting for title. But while extracting title for slides using XSLFSlide objects method getTitle(). Title is not getting returned for many slides.

When I dig into the source found following implmentation for the method

/**
*
* @return title of this slide or empty string if title is not set
*/
public String getTitle(){
XSLFTextShape txt = getTextShapeByType(Placeholder.TITLE);
return txt == null ? "" : txt.getText();
}

Above is picking only TITLE type placeholder but there can be another type of title placeholder : Placeholder.CENTERED_TITLE. So you might want to override this method by extending XSLFSlide class, till POI team fixes it. Or write a utility method like below.

/**
*
* @return title of this slide or empty string if title is not set
*/
private String getTitle(XSLFSlide slide) {
for (XSLFShape shape : slide.getShapes()) {
if (shape instanceof XSLFTextShape) {
XSLFTextShape txtshape = (XSLFTextShape) shape;
Placeholder textType = txtshape.getTextType();
if (textType != null && (textType.equals(Placeholder.TITLE) || textType.equals(Placeholder.CENTERED_TITLE))) {
return txtshape.getText();
}
}
}
return "";
}

Hope its useful 🙂 .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s