Tuesday, July 29, 2014

Endeca data modeling for best performance

ECommerce is all about selling products online. For capitalizing a sell opportunity and leaving an impression on customer, you have to present your products as quickly as possible in a nice clean way.


First thought that comes into mind to achieve ‘as quickly as possible’ requirement is Cache;


  • Browser cache – e.g. local downloaded objects
  • CDN cache – e.g. can cache whole page static or dynamic (with ajax calls for loading customer specific data), cache static content, cache redirects, cache error pages/responses  
  • Web server cache – e.g. cache assets in disk/memory, persistent connection
  • Applications cache – e.g. cache to reduce db calls, static objects holding common properties
  • Database cache – e.g. query caching, pre-compiled queries


Any medium or large eCommerce application considers some sort of caching strategies in the architecture and it should be. But caching can’t resolve response time consumed in first hit when nothing was cached. Caching at any level needs some unique way to identify your second hit is exactly same to first one – cache key. What if there are infinite combinations and all possible combinations cannot be cached or what if you have too many session variables that make a customer session so unique that becomes impossible to cache and customer starts experiencing delay in page responses.


So apart from caching you have to work on basics and basic is data modeling. You should always model your data in a way that application should not end up doing iteration/consolidation or lot of manipulation. If Data returned from layer 1 to layer 2 is in the form that can be presented directly without iteration or with some iteration then you have done a good job. e.g. A nested for loop is always one of the biggest killer for eCommerce applications and application performance goes down and down as your data grow with time.  


Let me explain above principle with a data modeling assignment that I recently completed.


Suppose your eCommerce database stores products information into hierarchical format as shown in below –
One product that comes into three colors and each color has three size variants. It means this product information is distributed into 13 objects.


Above is very common structure for any eCommerce site especially for retailers those deal in apparel domain.


You are asked to design data model for search engine so that application should able to display color swatches on product listing page with price ranges. E.g. Color red, size M=$5 L=$7 XL=$10 XXL=$12, so your price range becomes $5-$12.


For example, Below screen shot displays swatches with price ranges on product listing page.


A search engine no matter it is Solr or Endeca always manages data into flat structure format called a Record or document. They are built to present data quickly instead of taking joins across tables at run time. That is one of the basic difference between search engine and relational databases.


In your case you have product information distributed at three levels.  


So you have following options


Option 1 – Index data at Base product level. This means a record have to store all color and sizes information as multi valued attributes.


Information distributed in 13 objects as in above diagram need to be mapped into one single record.
This become even worse when you have to store multi value attributes for image urls for corresponding color.


If you have a naming convention followed for image urls then you might not require index image urls otherwise your record will be very messy and web application fetching such records will end up splitting multi value data, identify and pick corresponding urls. Lots of lot iteration we are building in this design.


This design will become a bottleneck as your business grows and you enrich your products with more attributes at color or size level.


I will never recommend this solution.


Option 2 – Index data at color level and ask search engine to group them on basis of base product id. I mean use parent product id as rollup key (Endeca specific concept).


This way data available at 13 objects level need to be mapped into three records and if you index base product id then you can use this id as rollup key (group by).


So you will get an aggregated record list with a representative record that will be presenting one of the records values among three by default. In Endeca you have good control on representative record e.g. though this record is one of them but it can have attributes that represent lowest and highest price values among three. (Called derived attributes in Endeca). This lowest and highest price becomes your price range.
Since your records representing color so you have to keep size and price data as multivalue attributes. But on search/listing pages you generally don’t need size specific information. It is required on product detail page which you generally don’t render through search engine.


The best part of this approach is search engine gives you data in almost same format that you want to display on listing pages. So fast rendering is possible with less iteration.


Option 3 - Index data at size level and use base product id as rollup key (group by)


13 objects are now mapped into 9 size records with a common base product key.
Duplicity of records is high in this solution and web application has to identify color by consolidating all size records. This can be a performance hit because you have to iterate all size records and consolidate color and other attributes.


Generally you don’t need size specific information on listing pages, if that is the case then I will recommend option 2 is the best way of modeling this requirement.


I hope I could link my principal of ‘less iteration’ with the very basic concept of ‘data modelling’.


Thanks.
http://www.linkedin.com/in/sumitag

Sunday, July 27, 2014

Hybris Design consideration

  • Ensure Hybris app servers are not serving static content. (High priority) Use CDN for static media/pages delivery. This may reduce 90% load from your infrastructure. If above not possible then simply ensure web server layer fronting hybris serving static media.
  • Ensure hybris servers are running on physical servers instead of virtual machines. Though VMs are supported but we have tested that hybris serves best on physical. (Low Priority)
  • DB Indexing – Hybris out of the box DB indexes are poor so ensure according to your model customization, DB indexes are created/updated otherwise site goes into deadlock soon. (High priority) You would like to drop unused indexes also, so that DB should not waste it’s time. (Medium priority)
  • Perform SSL termination at load balancer/web server. Don’t trouble hybris app servers for SSL handshake. (Medium priority)
  • Leverage web server modules for browser caching and compression. Don’t forget disabling unused apache modules. (Low priority)
  • Ensure only required extensions are deployed on front end hybris servers and avoid deploying any back office extension on front end. This is good for security and performance. (Medium priority)
  • Ensure loggers are not configured in debug mode. In production it should be in Error mode only on front boxes. (Low)
  • Hybris application is very prone to deadlocks because it maintains staging and online catalog version in the same table of the database. Avoid maintaining/developing catalog version aware types. If cannot then avoid making them part of catalog sync. e.g price  - Touching price daily for every product may result into full cat sync. Either maintain prices with online version only or at the time of create/update update both version. This way you can remove price item type from catalog sync.
  • OOB Hyrbis Catalog sync process is not a great design and that is rectified in hybris 5.1. But in short term developer should consider removing all unnecessary root types from sync process. (High priority)
  • My several years of experience with hyrbis says if a site is falling down after few hours of operations then issue is not with the front end code or tomcat configuration or infrastructure. Issue is definitely with the hybris application code. So don’t waste much energy in optimizing css, js, images etc, this may help a little but can’t solve the real instability issue.  (Low priority)
  • Stock service – Ensure you are not checking stock on loading of product detail page, category listing pages or add/remove event on basket page. Do this check only on add item into basket, submission of basket page and just before taking payment (i.e. you need to reserve stock here). (High priority)
  • Ensure hybris stock service is used to check stock status and no logic is written on Stock Model directly. (High priority)
  • Ensure JALO and web app session time out are configured to same value. (High priority)
  • 4 load balanced hybris app servers should be more than sufficient for a decent load, if solution is designed correctly. Adding n number of servers in infrastructure will not save you from a site crash. Trust me you have to fix core coding issue (not using hyrbis services correctly) or DB deadlocks.
  • Set the minimum and maximum heap sizes for the JVM to the same value. 8GB should be more than sufficient. More memory means longer GC pause and GC pause means all threads on halt. So avoid giving more memory this is not going to solve problem.
  • TCP/UDP clustering configuration doesn’t matter much for 4 app server cluster. But prefer sticking with default UDP settings. Hybris works well for either case until you have some serious networking issue. Use udpsniff to validate packets (Low priority)
  • Solr – Ensure solution is designed properly. You should be really very smart here.
  • Run Solr in standalone mode with one master server. Perform delta Index frequently 10-15 mins.  Prefer two-phase mode so that your site should not stuck while indexing happening behind the scene. Run full index update once in a week or only when you perform schema change. (High Priority)
  • Use Solr as much as you can. Because this way you can save DB calls. Hybris is very chatty with DB because of it’s cache refreshment and lazy loading concept. I really hate this part of hybris but now used to live with this and identified ways to avoid implementing DB centric solution. E.g. you can use solr to render category listing pages, you can index price and stock data and use this data as much as you can. Whole objective touch DB when it is really necessary. (High priority)
  • Disable Quick Search in hmc on the front-end and back-end hybris application servers. (Low priority)
  • If hybris is not your PIM and you have some other system where merchandising team perform preview before publishing a product then you really don’t need multiple catalog version in hybris. Here you can save lots of overhead from hybris. (High priority)
  • Hyrbis customizations – you should be in position to justify the customization you are going to suggest. I have seen several implementations where customization is done while functionality was available OOB. This happens when java developers with less hybris knowledge working in architect role. I have endless list of such examples -
  • Order Invoice generation/re-generation in pdf format was the requirement and solution implemented was pure custom by using some open source api.  Developers didn’t realized that this all available hybris OOB.
  • Another example is purging/archive items those are older than 30 days is a requirement and I have seen developers written lots of java code/scheduler etc to   achieve this while this can be done without any single line of code. Hybris OOB provide you to configure purging of any item type. You can configure this manually through hmc or write an impex.
  • Key point is avoid re-inventing the wheel and tries to find the available tested wheels in the system that you already bought. (High priority)
  • Ensure your custom types defined into items xml are persisting data into its own table rather than piggy banking on generic table. (High priority)
  • A common mistake is that developers cut-and-paste type and relation definitions in items.xml files which may result in unintentionally setting relation ends to be ordered. As I said Hybris application is very chatty with DB so you should be very cautious while defining your types and relations.  Copy paste may result into make it more DB chatty while you could have avoided this. (High Priority)
  • If no deployment table is defined for a many-to-many relation, a generic join table is used to store relations and is not optimal for performance. So ensure you define rel table for your custom many to many relationships.
  • Any relation-end that does not need to be ordered should have this attribute set to false.
  • Collection types – Avoid defining collection type. To maintain data integrity, always use a relation.
  • Avoid creating history records if possible or purge them on regular interval. Creating a history record by using auditing service might be fancy but can become a big performance overhead very quickly. Imagine creating an audit record for each stock level change. (Medium Priority)
  • Cron job logs. Hybris forgot to add pagination and this kills your hmc when you open a job with thousands of log files. (Medium Priority)
  • Ensure you use hybris WCMS Navigation node design for Mega menu construction so that you should not end-up preparing nested category hierarchy with every hit. 
  • Ensure passwords of default users (customer/employee/none) are changed and made complex enough to guess.
Default users - admin, anonymous, vjdbcReportsUser, csagent, cmsmanager
Note - 1. Hybris creates these users when you perform initialize/update. It only updates if such a user does not exist in the users table. So If you  have  changed the  password once then hybris won’t override them during re-running of hybris update process. But if you delete a default user then it will be created again with default password on running the hybris update with corresponding extension selected.
2. If you are dropping a new OOB extension for some new  functionality then it is worth ensuring that the new extension does not create a user record,  if it does then it is your responsibility to change the password in live environment at least.
 

I will share more experience as n when I will get time but post your comments if you need more details in any specific area. Feel free to ask any question related to Hybris, Solr, Endeca or webMethods. For my learning, I am after questions that I can't answer.
Thanks.