Changeset 11
- Timestamp:
- 10/20/15 13:18:19 (9 years ago)
- Location:
- trunk/docs/manual
- Files:
-
- 1 added
- 3 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/docs/manual/for_typo/institution.txt
r9 r11 1 Institut 2 Hadley 1 3 CCRT 2 4 ccrt -
trunk/docs/manual/for_typo/nemo.txt
r9 r11 1 GYRE 1 2 AGRIF 2 3 NEMO -
trunk/docs/manual/source/developers/guides/typo.rst
r9 r11 92 92 `<http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/>`_ 93 93 94 By convention, it has been decided to start all |igcmg_doc| pages names by ``Doc``. 95 96 To get all wiki Doc* pages URI of `<http://forge.ipsl.jussieu.fr/igcmg_doc/>`_ [#tracoops]_: 97 98 .. code-block:: bash 99 100 excluded_uri=DocYgraphvizLibigcmprod 101 list_uri=$(xsltproc \-\-novalid \ 94 By convention, it has been decided to start all |igcmg_doc| pages names 95 by ``Doc`` but there is exception like Train, WikiStart, etc. 96 97 To get all hand written wiki pages URI of 98 `<http://forge.ipsl.jussieu.fr/igcmg_doc/>`_ [#tracoops]_: 99 100 .. code-block:: bash 101 102 excluded_href=DocYgraphvizLibigcmprod 103 list_href=${PROJECT_LOG}/list_href 104 xsltproc \-\-novalid \ 102 105 ${PROJECT}/docs/manual/for_tracwiki/titleindex.xsl http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/TitleIndex | \ 103 grep "/igcmg_doc/wiki/Doc" | grep -v ${excluded_uri} | sort -u | \ 104 sed -e "s@^@http://forge.ipsl.jussieu.fr/@") 106 grep "/igcmg_doc/wiki/" | grep -v "?action" | grep -v "?format" | \ 107 grep -v ${excluded_href} | sort -u | \ 108 sed -e "s@^@http://forge.ipsl.jussieu.fr@" > ${list_href} 109 trac_uri=${PROJECT_LOG}/trac_uri 110 sed -e "s@^@http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/@" ${PROJECT}/docs/manual/for_tracwiki/tracpages.txt sort -u > ${trac_uri} 111 list_uri=$(comm -13 ${trac_uri} ${list_href}) 105 112 106 113 .. [#tracoops] we exclude DocYgraphvizLibigcmprod because Trac detected an internal error ... some graphviz trac plugins issue … … 110 117 .. code-block:: bash 111 118 112 dirhtml=${PROJECT_LOG} 113 rm -f ${dirhtml}/Doc* 119 dirhtml=${PROJECT_LOG}/html/ 120 rm -fr ${dirhtml} 121 mkdir ${dirhtml} 114 122 for uri in ${list_uri} 115 123 do … … 122 130 123 131 cd ${PROJECT_LOG} 124 listf=$(find ${dirhtml} - name "Doc*")132 listf=$(find ${dirhtml} -type f) 125 133 hunspell_out=${PROJECT_LOG}/hunspell_out 126 134 hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq … … 141 149 - typo to be fixed in wiki pages 142 150 - false positif to be added in a :file:`docs/manual/for_typo/` 143 - false positif to be ignored because to hard to add (encoding issue for151 - false positif to be ignored because too hard to add (encoding issue for 144 152 Greek word, etc.) 145 153 … … 163 171 164 172 w=amonch # take a real one from ${hunspell_out_uniq} 165 find ${dirhtml} - name "Doc*"-exec grep -Hi ${w} {} \;173 find ${dirhtml} -type f -exec grep -Hi ${w} {} \; 166 174 167 175 .. note:: … … 174 182 175 183 Correction have to be done via the wiki interface of the forge. 184 185 Check typo in attached PDF on wiki pages 186 ++++++++++++++++++++++++++++++++++++++++ 187 188 To get all the URI of attached files: 189 190 .. code-block:: bash 191 192 listf=$(find ${dirhtml} -type f) 193 list_attached=${PROJECT_LOG}/list_attached 194 list=${PROJECT_LOG}/list 195 rm -f ${list_attached} 196 for onefile in ${listf} 197 do 198 xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -t -m "//x:a[@title='Download']" -v "concat('http://forge.ipsl.jussieu.fr/',@href)" -n ${onefile} >> ${list} 199 done 200 sort -u ${list} > ${list_attached} 201 202 To isolated PDF files among these URI: 203 204 .. code-block:: bash 205 206 list_pdf=$(grep "\.pdf$" ${list_attached}) 207 208 To download those URI locally: 209 210 .. code-block:: bash 211 212 dirpdf=${PROJECT_LOG}/pdf/ 213 rm -rf ${dirpdf} 214 for uri in ${list_pdf} 215 do 216 wget -P ${dirpdf} ${uri} 217 done 218 219 We can know convert these PDF files to text files 220 221 .. code-block:: bash 222 223 list_pdf=$(find ${dirpdf} -type f) 224 for pdf in ${list_pdf} 225 do 226 pdftotext ${pdf} ${pdf}.txt 227 done 228 229 We can now check typo in the text files 230 231 .. code-block:: bash 232 233 cd ${PROJECT_LOG} 234 listf=$(find ${dirpdf} -type f -name "*.pdf.txt") 235 hunspell_out=${PROJECT_LOG}/hunspell_out 236 hunspell_out_uniq=${PROJECT_LOG}/hunspell_out_uniq 237 rm -f ${hunspell_out} ${hunspell_out_uniq} 238 for onefile in ${listf} 239 do 240 LC_ALL=C;hunspell -d en_US,nontypo_uniq --check-url -i utf-8 -l < ${onefile} >> ${hunspell_out} 241 done 242 sort -u ${hunspell_out} | sort --ignore-case > ${hunspell_out_uniq} 243 244 :file:`${hunspell_out_uniq}` contains : 245 246 - typo to be fixed in PDF files 247 - false positif to be added in a :file:`docs/manual/for_typo/` 248 - false positif to be ignored because bad convertion from PDF to text 249 (ie ligature), too hard to add (encoding issue for Greek word, etc.) 250 251 To find one of the wrong spelling in converted PDF files: 252 253 .. code-block:: bash 254 255 w=infrastucture # take a real one from ${hunspell_out_uniq} 256 find ${dirpdf} -type f -name "*.pdf.txt" -exec grep -Hi ${w} {} \;
Note: See TracChangeset
for help on using the changeset viewer.