Skip to main content
Solved

Creating Bounding box for 2D pdf file

  • February 27, 2023
  • 3 replies
  • 73 views

Forum|alt.badge.img+1

Hi,

This is regarding creating of bounding box on PDF file

Assume we can get bounding box details from another source.

And that is for whole string 18"-1S6(10S)-471454, using that need create bounding box for 471454.

 

To find the differences we manually created the two bounding box 

Below are the bounding box details also attached the  screenshot.

#smaller bounding box
"textRegion": {
                "xMax": 0.12297021616798368,
                "xMin": 0.10935503986439225,
                "yMax": 0.6030244780685147,
                "yMin": 0.5957594432410297
            }

 Bigger bounding box we got from XML

#bigger bounding box
"textRegion": {
                "xMax": 0.1245098493906997,
                "xMin": 0.0782598493906997,
                "yMax": 0.6047371474808384,
                "yMin": 0.5915995895210239
            }

 

So we can observe if we can reduce only xmin and we create  bounding box for 471454 and keep all other values same.

As we have to do this for lots of places in PDF, can we get any formula to “reduce Xmin based on bigger bounding box xmin” and use String length.???

 

Best answer by Dilini Fernando

Hi @shubha k v,

I hope Ola's explanation was useful. I'm closing this topic now, but please feel free to leave a comment if you require any further assistance.

Best regards,
Dilini

View original
Did this topic help you find an answer to your question?

3 replies

Hi!

I dont think there is a very good way to do that. You could try to count the number of characters and assume they have ~the same width. If there are then M characters to the left of your new x_min and N characters to the right you would get new_x_min = (N*old_x_min + M*old_x_max) / (N+M), but you can see that the boxes are a bit different in padding around the text, which will be an additional source for error. 

What is the reason that you want to reduce the box? If the smaller box is what you get from the detect algorithm and you want to compare that with the xml boxes, I can see the point to some extent. For clickable functionality, I would think that having the whole box as a reference would not be a problem. 

The full text in this case contains both an id/line number for a line, and some metadata for the line. I.e. its diameter is 18 inches I believe. Logically, the whole text is referencing the line, even though you only need the last part to identify it. 


Dilini Fernando
Seasoned Practitioner
Forum|alt.badge.img+2

Hi @shubha k v,

I hope above explanation was helpful. Could you please share some background information about this use case as requested by @Ola Liabøtrø?

Look forward to hearing from you.

Best regards,
Dilini   


Dilini Fernando
Seasoned Practitioner
Forum|alt.badge.img+2
  • Seasoned Practitioner
  • 671 replies
  • Answer
  • April 7, 2023

Hi @shubha k v,

I hope Ola's explanation was useful. I'm closing this topic now, but please feel free to leave a comment if you require any further assistance.

Best regards,
Dilini


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings