Discussion:
[Geoserver-users] Issue with Shapefile encoding through WFS
Emmanuel Blondel
2017-06-28 19:11:45 UTC
Permalink
Hello,

I'm currently trying to publish shapefiles in a Geoserver, where the
specificity of the shapefile is that it handles characters from others
alphabets (my current case is with Greek names). The shapefile i have
has been generated by setting the DBF encoding to UTF-8. This shapefile
creation goes fine, and i can exploit and read correctly the shapefile
in other softwares.

I then push it programmatically through GeoServer REST API:
• If I push it to a datastore configured with DBF encoding "UTF-8", my
names are badly encoded through WFS response.
• on the other hand, i could get properly encoded names in WFS if I do
the following: before pushing it to the datastore, my datastore is set
with "ISO-8859-1" encoding for DBF; i then push my shapefile. If i go to
the datastore, and change DBF encoding from "ISO-8859-1" to "UTF-8",
then look at my WFS response, it's fine. And of course, if I try to
upload again my shapefile to this last datastore, i lose my encoded names...

Any idea how to solve this?

Many thanks in advance for your help
Emmanuel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Geoserver-users mailing list

Please make sure you read the following two resources before posting to this list:
- Earning your support instead of buying it, but Ian Turton: http://www.ianturton.com/talks/foss4g.html#/
- The GeoServer user list posting guidelines: http://geoserver.org/comm/userlist-guidelines.html

Geoserver-***@lists.sourceforge.net
https://lists.sourceforg
Andrea Aime
2017-06-29 09:44:23 UTC
Permalink
On Wed, Jun 28, 2017 at 9:11 PM, Emmanuel Blondel <
Post by Emmanuel Blondel
Hello,
I'm currently trying to publish shapefiles in a Geoserver, where the
specificity of the shapefile is that it handles characters from others
alphabets (my current case is with Greek names). The shapefile i have has
been generated by setting the DBF encoding to UTF-8. This shapefile
creation goes fine, and i can exploit and read correctly the shapefile in
other softwares.
• If I push it to a datastore configured with DBF encoding "UTF-8", my
names are badly encoded through WFS response.
• on the other hand, i could get properly encoded names in WFS if I do the
following: before pushing it to the datastore, my datastore is set with
"ISO-8859-1" encoding for DBF; i then push my shapefile. If i go to the
datastore, and change DBF encoding from "ISO-8859-1" to "UTF-8", then look
at my WFS response, it's fine. And of course, if I try to upload again my
shapefile to this last datastore, i lose my encoded names...
I don't understand, the above charset operations are done via the UI or the
REST API?
And are you trying to configure the shapefile as-is, or dumping it into a
target database (e.g., postgis?).
Do you have a set of reproducible steps to run against a vanilla GeoServer
installation?

Cheers
Andrea
--
Regards,

Andrea Aime

==
GeoServer Professional Services from the experts! Visit http://goo.gl/it488V
for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo Ú consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.
Emmanuel Blondel
2017-06-29 10:00:13 UTC
Permalink
Thanks Andrea for your prompt reply.

First when i create the shapefile, i encode my DBF as UTF-8. The
shapefile is always pushed through API using charset UTF-8, and the
target is a shapefile datastore (or directory of datastores depending on
the case). I don't dump it to a database.
Let me reexplain:
1- if my target shapefile datastore is initially in UTF-8 charset,
I upload my shape, then Greek names are badly encoded in WFS
2- if instead my target datastore is initially in ISO-8859-1, when
i upload, here it's still badly encoded, but by updating the datastore
charset through the GUI, and looking my WFS response, everything is
correctly encoded

Looking at this (strange) behavior, i've set a temporary workaround in
order to make it work programmatically for my Greek use case (and it works):
step 1- Check if my datastore already exists, If yes, i updated it.
If not i create it. In both cases, i set charset to ISO-8859-1
step 2- Upload the shapefile (and publish the layer if not done)
step 3- Update the datastore charset to UTF-8
With this procedure, my output is a WFS layer with Greek names well
encoded, in a shapefile datastore with charset UTF-8. But i'm wondering
why point 1 here above doesn't work.

I'm going to send you privately one shapefile i've created so you could
look at it.

Many thanks in advance
Emmanuel
Post by Andrea Aime
On Wed, Jun 28, 2017 at 9:11 PM, Emmanuel Blondel
Hello,
I'm currently trying to publish shapefiles in a Geoserver, where
the specificity of the shapefile is that it handles characters
from others alphabets (my current case is with Greek names). The
shapefile i have has been generated by setting the DBF encoding to
UTF-8. This shapefile creation goes fine, and i can exploit and
read correctly the shapefile in other softwares.
• If I push it to a datastore configured with DBF encoding
"UTF-8", my names are badly encoded through WFS response.
• on the other hand, i could get properly encoded names in WFS if
I do the following: before pushing it to the datastore, my
datastore is set with "ISO-8859-1" encoding for DBF; i then push
my shapefile. If i go to the datastore, and change DBF encoding
from "ISO-8859-1" to "UTF-8", then look at my WFS response, it's
fine. And of course, if I try to upload again my shapefile to this
last datastore, i lose my encoded names...
I don't understand, the above charset operations are done via the UI
or the REST API?
And are you trying to configure the shapefile as-is, or dumping it
into a target database (e.g., postgis?).
Do you have a set of reproducible steps to run against a vanilla
GeoServer installation?
Cheers
Andrea
--
Regards,
Andrea Aime
==GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.==Ing. Andrea Aime
@geowolfTechnical LeadGeoSolutions S.A.S.Via di Montramito 3/A55054
Massarosa (LU)phone: +39 0584 962313fax: +39 0584 1660272mob: +39
339 8844549http://www.geo-solutions.ithttp://twitter.com/geosolutions_it
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo Ú consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.
The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.
Andrea Aime
2017-06-29 10:14:18 UTC
Permalink
HI Emmanuel,
I see, if you are using an existing data store the contents are likely
being read and re-written out using
the wrong charset for reading, instead of just dumping the file as is (the
REST API has no clue about
what the target store is or does, it's just going through standard store
interfaces).
Maybe you can try deleting the target store (if it's a single shapefile)
and re-creating it as a workaround.

I'd still open a ticket, attaching a file that can be publicly shared with
everybody and instructions to reproduce.
Maybe it can be looked at in one of the monthly bug fix stomps (as usual,
no guarantee about if or when it
be looked at).
If you need to share private data, or need a quick fix, then I believe
commercial support is the way to go.

Cheers
Andrea


On Thu, Jun 29, 2017 at 12:00 PM, Emmanuel Blondel <
Post by Emmanuel Blondel
Thanks Andrea for your prompt reply.
First when i create the shapefile, i encode my DBF as UTF-8. The shapefile
is always pushed through API using charset UTF-8, and the target is a
shapefile datastore (or directory of datastores depending on the case). I
don't dump it to a database.
1- if my target shapefile datastore is initially in UTF-8 charset, I
upload my shape, then Greek names are badly encoded in WFS
2- if instead my target datastore is initially in ISO-8859-1, when i
upload, here it's still badly encoded, but by updating the datastore
charset through the GUI, and looking my WFS response, everything is
correctly encoded
Looking at this (strange) behavior, i've set a temporary workaround in
step 1- Check if my datastore already exists, If yes, i updated it. If
not i create it. In both cases, i set charset to ISO-8859-1
step 2- Upload the shapefile (and publish the layer if not done)
step 3- Update the datastore charset to UTF-8
With this procedure, my output is a WFS layer with Greek names well
encoded, in a shapefile datastore with charset UTF-8. But i'm wondering why
point 1 here above doesn't work.
I'm going to send you privately one shapefile i've created so you could
look at it.
Many thanks in advance
Emmanuel
On Wed, Jun 28, 2017 at 9:11 PM, Emmanuel Blondel <
Post by Emmanuel Blondel
Hello,
I'm currently trying to publish shapefiles in a Geoserver, where the
specificity of the shapefile is that it handles characters from others
alphabets (my current case is with Greek names). The shapefile i have has
been generated by setting the DBF encoding to UTF-8. This shapefile
creation goes fine, and i can exploit and read correctly the shapefile in
other softwares.
• If I push it to a datastore configured with DBF encoding "UTF-8", my
names are badly encoded through WFS response.
• on the other hand, i could get properly encoded names in WFS if I do
the following: before pushing it to the datastore, my datastore is set with
"ISO-8859-1" encoding for DBF; i then push my shapefile. If i go to the
datastore, and change DBF encoding from "ISO-8859-1" to "UTF-8", then look
at my WFS response, it's fine. And of course, if I try to upload again my
shapefile to this last datastore, i lose my encoded names...
I don't understand, the above charset operations are done via the UI or
the REST API?
And are you trying to configure the shapefile as-is, or dumping it into a
target database (e.g., postgis?).
Do you have a set of reproducible steps to run against a vanilla GeoServer
installation?
Cheers
Andrea
--
Regards,
Andrea Aime
== GeoServer Professional Services from the experts! Visit
+39 0584 962313 <+39%200584%20962313> fax: +39 0584 1660272
<+39%200584%20166%200272> mob: +39 339 8844549 <+39%20339%20884%204549>
http://www.geo-solutions.it http://twitter.com/geosolutions_it
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo Ú consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.
The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.
--
Regards,

Andrea Aime

==
GeoServer Professional Services from the experts! Visit http://goo.gl/it488V
for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo Ú consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.
Emmanuel Blondel
2017-06-29 10:19:50 UTC
Permalink
I think can stick with the workaround i found for now, thinks are
working with it, although I'm performing through REST operations instead
of one. When i have some time i will create a ticket with some sample data.

Thanks
Post by Andrea Aime
HI Emmanuel,
I see, if you are using an existing data store the contents are likely
being read and re-written out using
the wrong charset for reading, instead of just dumping the file as is
(the REST API has no clue about
what the target store is or does, it's just going through standard
store interfaces).
Maybe you can try deleting the target store (if it's a single
shapefile) and re-creating it as a workaround.
I'd still open a ticket, attaching a file that can be publicly shared
with everybody and instructions to reproduce.
Maybe it can be looked at in one of the monthly bug fix stomps (as
usual, no guarantee about if or when it
be looked at).
If you need to share private data, or need a quick fix, then I believe
commercial support is the way to go.
Cheers
Andrea
On Thu, Jun 29, 2017 at 12:00 PM, Emmanuel Blondel
Thanks Andrea for your prompt reply.
First when i create the shapefile, i encode my DBF as UTF-8. The
shapefile is always pushed through API using charset UTF-8, and
the target is a shapefile datastore (or directory of datastores
depending on the case). I don't dump it to a database.
1- if my target shapefile datastore is initially in UTF-8
charset, I upload my shape, then Greek names are badly encoded in WFS
2- if instead my target datastore is initially in ISO-8859-1,
when i upload, here it's still badly encoded, but by updating the
datastore charset through the GUI, and looking my WFS response,
everything is correctly encoded
Looking at this (strange) behavior, i've set a temporary
workaround in order to make it work programmatically for my Greek
step 1- Check if my datastore already exists, If yes, i
updated it. If not i create it. In both cases, i set charset to
ISO-8859-1
step 2- Upload the shapefile (and publish the layer if not done)
step 3- Update the datastore charset to UTF-8
With this procedure, my output is a WFS layer with Greek names
well encoded, in a shapefile datastore with charset UTF-8. But i'm
wondering why point 1 here above doesn't work.
I'm going to send you privately one shapefile i've created so you
could look at it.
Many thanks in advance
Emmanuel
Post by Andrea Aime
On Wed, Jun 28, 2017 at 9:11 PM, Emmanuel Blondel
Hello,
I'm currently trying to publish shapefiles in a Geoserver,
where the specificity of the shapefile is that it handles
characters from others alphabets (my current case is with
Greek names). The shapefile i have has been generated by
setting the DBF encoding to UTF-8. This shapefile creation
goes fine, and i can exploit and read correctly the shapefile
in other softwares.
• If I push it to a datastore configured with DBF encoding
"UTF-8", my names are badly encoded through WFS response.
• on the other hand, i could get properly encoded names in
WFS if I do the following: before pushing it to the
datastore, my datastore is set with "ISO-8859-1" encoding for
DBF; i then push my shapefile. If i go to the datastore, and
change DBF encoding from "ISO-8859-1" to "UTF-8", then look
at my WFS response, it's fine. And of course, if I try to
upload again my shapefile to this last datastore, i lose my
encoded names...
I don't understand, the above charset operations are done via the
UI or the REST API?
And are you trying to configure the shapefile as-is, or dumping
it into a target database (e.g., postgis?).
Do you have a set of reproducible steps to run against a vanilla
GeoServer installation?
Cheers
Andrea
--
Regards,
Andrea Aime
==GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.==Ing. Andrea Aime
@geowolfTechnical LeadGeoSolutions S.A.S.Via di Montramito
3/A55054 Massarosa (LU)phone: +39 0584 962313
<tel:+39%200584%20962313>fax: +39 0584 1660272
<tel:+39%200584%20166%200272>mob: +39 339 8844549
<tel:+39%20339%20884%204549>http://www.geo-solutions.ithttp://twitter.com/geosolutions_it
<http://twitter.com/geosolutions_it>
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta
elettronica e/o nel/i file/s allegato/i sono da considerarsi
strettamente riservate. Il loro utilizzo Ú consentito
esclusivamente al destinatario del messaggio, per le finalità
indicate nel messaggio stesso. Qualora riceviate questo messaggio
senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del
messaggio stesso, cancellandolo dal Vostro sistema. Conservare il
messaggio stesso, divulgarlo anche in parte, distribuirlo ad
altri soggetti, copiarlo, od utilizzarlo per finalità diverse,
costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.
The information in this message and/or attachments, is intended
solely for the attention and use of the named addressee(s) and
may be confidential or proprietary in nature or covered by the
provisions of privacy act (Legislative Decree June, 30 2003,
no.196 - Italy's New Data Protection Code).Any use not in accord
with its purpose, any disclosure, reproduction, copying,
distribution, or either dissemination, either whole or partial,
is strictly forbidden except previous formal approval of the
named addressee(s). If you are not the intended recipient, please
contact immediately the sender by telephone, fax or e-mail and
delete the information in this message that has been received in
error. The sender does not give any warranty or accept liability
as the content, accuracy or completeness of sent messages and
accepts no responsibility for changes made after they were sent
or for other risks which arise as a result of e-mail
transmission, viruses, etc.
--
Regards,
Andrea Aime
==GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.==Ing. Andrea Aime
@geowolfTechnical LeadGeoSolutions S.A.S.Via di Montramito 3/A55054
Massarosa (LU)phone: +39 0584 962313fax: +39 0584 1660272mob: +39
339 8844549http://www.geo-solutions.ithttp://twitter.com/geosolutions_it
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo Ú consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.
The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.
Loading...